I have the following shell script that allows me to start my rails app, let's say it's called start-app.sh:
#!/bin/bash
cd /var/www/project/current
. /home/user/.rvm/environments/ruby-2.3.3
RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
the file above have permissions of:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
if i want to check the process I do the following:
ps aux | grep -v grep | grep ":4445"
it'd give me the following output:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
P.S: the reason i grep ":4445" is because i have few processes running on different ports. (for different projects)
now coming to monit, i used apt-get to install it, and the latest version from repo is 5.16, as i'm running on Ubuntu 16.04, also note that monit is running as root, that's why i specified the gid uid in the following. (because the start script is used to be executed from "user" and not "root")
Here's the configuration for monit:
set daemon 20 # check services at 20 seconds interval
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events # set the base directory where events will be stored
slots 100 # optionally limit the queue size
set mailserver xx.com port xxx
username "[email protected]" password "xxxxxx"
using tlsv12
with timeout 20 seconds
set alert [email protected]
set mail-format {
from: [email protected]
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
}
set limits {
programOutput: 51200 B
sendExpectBuffer: 25600 B
fileContentBuffer: 51200 B
networktimeout: 10 s
}
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 90% for 10 cycles then alert
if memory usage > 85% then alert
if swap usage > 35% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check process redis
matching "redis"
start program = "/bin/systemctl start redis"
stop program = "/bin/systemctl stop redis"
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
include /etc/monit/conf.d/*
include /etc/monit/conf-enabled/*
Now monit, is detecting and alerting me when the process goes down (if i kill it manually) and when it's manually recovered, but it won't start that shell script automatically.. and according to /var/log/monit.log, it's showing the following:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon
[UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started
[UTC Aug 13 10:16:43] error : 'myapp' process is not running
[UTC Aug 13 10:16:46] info : 'myapp' trying to restart
[UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash
[UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
So far what I see when monit tries to execute the script is that it tries to load it (i can see it for less than 3 seconds using ps aux | grep -v grep | grep ":4445", but this output is different from the above output i showed up, it shows the content of the shell script being executed and specifically this one:
blablalba... nohup bundle exec rails s -e production -p 4445
and then it disappears. then it tries to re-execute the shell.. again and again...
What am I missing, and what is wrong with my configuration? note that I can't change anything in the start-app.sh because it's on production and working 100%. (i just want to monitor it)
Edit: To my understanding and experience, it seems to be a Environment Variable issue or path issue, but i'm not sure how to solve it, it doesn't make any sense to put the env variables inside monit .. what if someone else wanted to edit that shell script or add something new? i hope you get my point
See Question&Answers more detail:
os