You don't indicate how many processes there may be, but no resource is unlimited and you should limit the number or you'll see a rapid degradation of performance as you reach saturation.
This is even more so when going out on the network since you may be annoying a server (and things will also stop being faster quite soon). Perhaps run up to a few tens of processes at a time?
Then one option is to limit a number of parallel downloads using Parallel::ForkManager. It has a way to return data to parent, so a child can report failure. Then its run_on_finish
method can check each batch for such a flag (of failure), and set a variable that controls the forking.
use warnings;
use strict;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(2); # only 2 for a managable demo
my $stop_forking;
# The sub gets 6 parameters, but only first (pid) is always defined
# The last one is what a child process may have passed
$pm->run_on_finish(
sub { $stop_forking = 1 if defined $_[-1] }
);
for my $i (0..9)
{
last if $stop_forking;
$pm->start and next; # forks
my $ret = run_job($i); # child process
# Pass data to parent under a condition
if ($ret eq 'FAIL') { $pm->finish(0, $ret) } # child exits
else { $pm->finish }
}
$pm->wait_all_children;
sub run_job {
my ($i) = $_[0];
sleep 2;
print "Child: job $i exiting
";
return ($i == 3 ? 'FAIL' : 1);
}
This stops forking after the batch of jobs within which $i == 3
. Add prints for diagnostics.
The "callback" run_on_finish
runs only once a whole batch completes.† The anonymous sub in it always receives 6 arguments, but only the first one, the child pid, is always defined. The last one has data possibly passed by the child, and when that happens we set the flag. A child can return data by passing a reference to finish
method. To only indicate a condition we can simply pass anything. I use $ret
as an example of passing actual data.
See documentation for more, but this does what you ask. For yet far more see Forks::Super.
If you wish to fork as you do, I'd first put in a little sleep
there, so you don't bombard the server with too many requests. Your children can talk with the parent using socketpair. The failed child can write while all others can simply close their socket. The parent keeps checking, for example with can_read
from IO::Select. There is an example in perlipc. Since you only need children to write to the parent the pipe would suffice as well.
You can also do it with a signal. The child that fails sends (say) SIGUSR1
to the parent, which the parent traps and sets a global variable that controls further forks. This is simpler as the parent only traps that one signal and doesn't care where it comes from. See perlipc and sigtrap pragma.
You can also use a file, much like you do, which is probably simplest since here you don't care about racing issues (whether children writes overlap), but only about an empty file showing up.
However, in all these you'd also want to limit the number of parallel processes.
Finally, there are also modules that help with external commands, for example IPC::Run.
† To run the callback right as each child exits use reap_finished_children. See this post.