The problem reported by Winston Chang that you cite appears to have been fixed in R 2.15.3. There was a bug in mccollect
that occurred when assigning the worker results to the result list:
if (is.raw(r)) res[[which(pid == pids)]] <- unserialize(r)
This fails if unserialize(r)
returns a NULL, since assigning a NULL to a list in this way deletes the corresponding element of the list. This was changed in R 2.15.3 to:
if (is.raw(r)) # unserialize(r) might be null
res[which(pid == pids)] <- list(unserialize(r))
which is a safe way to assign an unknown value to a list.
So if you're using R <= 2.15.2, the solution is to upgrade to R >= 2.15.3. If you have a problem using R >= 2.15.3, then presumably it's a different problem then the one reported by Winston Chang.
I also read over the issues discussed in the R-help thread started by Elizabeth Purdom. Without a specific test case, my guess is that the problem is not due to a bug in mclapply because I can reproduce the same symptoms with the following function:
work <- function(i, poison) {
if (i == poison) quit(save='no')
i
}
If a worker started by mclapply dies while executing a task for any reason (receiving a signal, seg faulting, exiting), mclapply will return a NULL for all of the tasks that were assigned to that worker:
> library(parallel)
> mclapply(1:4, work, 3, mc.cores=2)
[[1]]
NULL
[[2]]
[1] 2
[[3]]
NULL
[[4]]
[1] 4
In this case, NULL's were returned for tasks 1 and 3 due to prescheduling, even though only task 3 actually failed.
If a worker dies when using a function such as parLapply or clusterApply, an error is reported:
> cl <- makePSOCKcluster(3)
> parLapply(cl, 1:4, work, 3)
Error in unserialize(node$con) : error reading from connection
I've seen many such reports, and I think they tend to happen in large programs that use lots of packages that are hard to turn into reproducible test cases.
Of course, in this example, you'll also get an error when using lapply, although the error won't be hidden as it is with mclapply. If the problem doesn't seem to happen when using lapply, it may be because the problem rarely occurs, so it only happens in very large runs that are executed in parallel using mclapply. But it is also possible that the error occurs, not because the tasks are executed in parallel, but because they are executed by forked processes. For example, various graphics operations will fail when executed in a forked process.