Trying to write to and read from pipes to a sub-process is tricky because of the default buffering going on in both directions. It's extremely easy to get a deadlock where one or the other process (parent or child) is reading from an empty buffer, writing into a full buffer or doing a blocking read on a buffer that's awaiting data before the system libraries flush it.
For more modest amounts of data the Popen.communicate()
method might be sufficient. However, for data that exceeds its buffering you'd probably get stalled processes (similar to what you're already seeing?)
You might want to look for details on using the fcntl
module and making one or the other (or both) of your file descriptors non-blocking. In that case, of course, you'll have to wrap all reads and/or writes to those file descriptors in the appropriate exception handling to handle the "EWOULDBLOCK" events. (I don't remember the exact Python exception that's raised for these).
A completely different approach would be for your parent to use the select
module and os.fork()
... and for the child process to execve()
the target program after directly handling any file dup()ing. (Basically you'd be re-implement parts of Popen()
but with different parent file descriptor (PIPE) handling.
Incidentally, .communicate, at least in Python's 2.5 and 2.6 standard libraries, will only handle about 64K of remote data (on Linux and FreeBSD). This number may vary based on various factors (possibly including the build options used to compile your Python interpreter, or the version of libc being linked to it). It is NOT simply limited by available memory (despite J.F. Sebastian's assertion to the contrary) but is limited to a much smaller value.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…