Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
831 views
in Technique[技术] by (71.8m points)

linux - Why do I have to `wait()` for child processes?

Even though the linux man page for wait 1 explains very well that you need to wait() for child processes for them no to turn into zombies, it does not tell why at all.

I planned my program (which is my first multithreaded one, so excuse my naivity) around a for(;;)ever loop that starts child processes which get exec()ed away and are sure to terminate on their own.

I cannot use wait(NULL) because that makes parallel computation impossible, therefore I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked, when I use it too late, I get overwhelmed by zombies and cannot fork() anymore, which is not only bad for my process, but can cause unexpected problems on the whole system.

I'll probably have to program some logic of using some maximum number of children and block the parent when that number is reached - but that should be not necessary because most of the children terminate quickly. The other solution that I can think of (creating a two-tiered parent process that spawns concurrent children which in turn concurrently spawn and wait for grandchildren) is too complicated for me right now. Possibly I could also find a non-blocking function to check for the children and use waitpid only when they have terminated.

Nevertheless the question:

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature".

If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies. On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked

Check out the documentation for waitpid. You can tell waitpid to NOT block (i.e., return immediately if there are no children to reap) using the WNOHANG option. Moreover, you don't need to give waitpid a PID. You can specify -1, and it will wait for any child. So calling waitpid as below fits your no-blocking constraint and no-saving-pids constraint:

waitpid( -1, &status, WNOHANG );

If you really don't want to properly handle process creation, then you can give the reaping responsibility to init by forking twice, reaping the child, and giving the exec to the grandchild:

pid_t temp_pid, child_pid;
temp_pid = fork();
if( temp_pid == 0 ){
    child_pid = fork();
    if( child_pid == 0 ){
        // exec()
        error( EXIT_FAILURE, errno, "failed to exec :(" );
    } else if( child_pid < 0 ){
        error( EXIT_FAILURE, errno, "failed to fork :(" );
    }
    exit( EXIT_SUCCESS );
} else if( temp_pid < 0 ){
    error( EXIT_FAILURE, errno, "failed to fork :(" );
} else {
    wait( temp_pid );
}

In the above code snippet, the child process forks its own child, immediately exists, and then is immediately reaped by the parent. The grandchild is orphaned, adopted by init, and will be reaped automatically.

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature". If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies.

How else do you propose one may efficiently retrieve the exit code of a process? The problem is that the mapping of PID <=> exit code (et al.) must be one to one. If the kernel released the PID of a process as soon as it exits, reaped or not, and then a new process inherits that same PID and exits, how would you handle storing two codes for one PID? How would an interested process retrieve the exit code for the first process? Don't assume that no one cares about exit codes simply because you don't. What you consider to be a nuisance/bug is widely considered useful and clean.

On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

Something about making a widely accepted kernel behavior a scapegoat for what are clearly frustrations over a badly-maintained/cheap system doesn't seem right.

Typically, your maximum number of processes is limited only by your memory. You can see your limit with:

cat /proc/sys/kernel/threads-max

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...