I opened up a question for this problem and did not get a thorough enough answer to solve the issue (most likely due to a lack of rigor in explaining my issues which is what I am attempting to correct): Zombie process in python multiprocessing daemon
I am trying to implement a python daemon that uses a pool of workers to executes commands using Popen
. I have borrowed the basic daemon from http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
I have only changed the init
, daemonize
(or equally the start
) and stop
methods. Here are the changes to the init
method:
def __init__(self, pidfile):
#, stdin='/dev/null', stdout='STDOUT', stderr='STDOUT'):
#self.stdin = stdin
#self.stdout = stdout
#self.stderr = stderr
self.pidfile = pidfile
self.pool = Pool(processes=4)
I am not setting stdin, stdout and stderr so that I can debug the code with print statements. Also, I have tried moving this pool around to a few places but this is the only place that does not produce exceptions.
Here are the changes to the daemonize
method:
def daemonize(self):
...
# redirect standard file descriptors
#sys.stdout.flush()
#sys.stderr.flush()
#si = open(self.stdin, 'r')
#so = open(self.stdout, 'a+')
#se = open(self.stderr, 'a+', 0)
#os.dup2(si.fileno(), sys.stdin.fileno())
#os.dup2(so.fileno(), sys.stdout.fileno())
#os.dup2(se.fileno(), sys.stderr.fileno())
print self.pool
...
Same thing, I am not redirecting io so that I can debug. The print here is used so that I can check the pools location.
And the stop
method changes:
def stop(self):
...
# Try killing the daemon process
try:
print self.pool
print "closing pool"
self.pool.close()
print "joining pool"
self.pool.join()
print "set pool to None"
self.pool = None
while 1:
print "kill process"
os.kill(pid, SIGTERM)
...
Here the idea is that I not only need to kill the process but also clean up the pool. The self.pool = None
is just a random attempt to solve the issues which didn't work. At first I thought this was a problem with zombie children which was occurring when I had the self.pool.close()
and self.pool.join()
inside the while loop with the os.kill(pid, SIGTERM)
. This is before I decided to start looking at the pool location via the print self.pool
. After doing this, I believe the pools are not the same when the daemon starts and when it stops. Here is some output:
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py start
<multiprocessing.pool.Pool object at 0x1c543d0>
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py stop
<multiprocessing.pool.Pool object at 0x1fb7450>
closing pool
joining pool
set pool to None
kill process
kill process
... [ stuck in infinite loop]
The different locations of the objects suggest to me that they are not the same pool and that one of them is probably the zombie?
After CTRL+C
, here is what I get from ps aux|grep jobQueue
:
root 21161 0.0 0.0 50384 5220 ? Ss 22:59 0:00 /usr/bin/python ./jobQueue.py start
root 21162 0.0 0.0 0 0 ? Z 22:59 0:00 [jobQueue.py] <defunct>
me 21320 0.0 0.0 7624 940 pts/0 S+ 23:00 0:00 grep --color=auto jobQueue
I have tried moving the self.pool = Pool(processes=4)
to a number of different places. If it is moved to the start()' or
daemonize()methods,
print self.pool` will throw an exception saying that it is NoneType. In addition, the location seems to change the number of zombie process that will pop up.
Currently, I have not added the functionality to run anything via the workers. My problem seems completely related to setting up the pool of workers correctly. I would appreciate any information that leads to solving this issue or advice about creating a daemon service that uses a pool of workers to execute a series of commands using Popen
. Since I haven't gotten that far, I do not know what challenges I face ahead. I am thinking I might just need to write my own pool but if there is a nice trick to make the pool work here, it would be amazing.
See Question&Answers more detail:
os