Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
114 views
in Technique[技术] by (71.8m points)

php - Stream run-time generated gzip file with proc_open

I'm trying to stream a tar.gz without buffering anything in memory or saving data do disk. I need to gzip a bunch of PDF files (~100kb per file).

Everything seems to work fine if small 10-20 bytes text files are sent through the script and the user downloads a readable tar.gz file, but when sending real data (run-time generated PDF files) the script blocks and stops

Below is a snippet of the code. Why is the script blocking when writing to stdin after a couple of iterations of the loop? It stops at this point waiting for something

Every step is logged to a file to see the message before writing to stdin is the last logged message

$proc = proc_open('gzip - -c', [
    0   => ['pipe', 'r'],
    1   => ['pipe', 'w'],
    2   => ['pipe', 'w']
], $pipes);

stream_set_read_buffer($pipes[1], 0);
stream_set_read_buffer($pipes[2], 0);

stream_set_blocking($pipes[1], false);
stream_set_blocking($pipes[2], false);

while(true){
    log_step('file stream');
    // fetching data from database and generating PDF file as tar stream (string)

    log_step('stdin: '.strlen($tar_string));
    fwrite($pipes[0], $tar_string); // <--- in the second iteration the script blocks/stops here!
    log_step('stdin done!');
    
    if($output = stream_get_contents($pipes[1])){
        log_step('output: '.strlen($output));
        echo $output;
    }
}

Output log file

2021-01-26 10:28:29 file stream
2021-01-26 10:28:29 stdin: 116224
2021-01-26 10:28:29 stdin done!
2021-01-26 10:28:29 output: 32768
2021-01-26 10:28:29 file stream
2021-01-26 10:28:29 stdin: 116736

full code

$proc = proc_open('gzip - -c', [
    0   => ['pipe', 'r'],
    1   => ['pipe', 'w'],
    2   => ['pipe', 'w']
], $pipes);
stream_set_read_buffer($pipes[1], 0);
stream_set_read_buffer($pipes[2], 0);
stream_set_blocking($pipes[1], false);
stream_set_blocking($pipes[2], false);

//  get data from database
while($row = $result->fetch()){
    //  generate PDF

    $filename = $pdf['name'];
    $filesize = strlen($pdf['data']);

    $header = pack(
        'a100a8a8a8a12A12a8a1a100a255',
        $filename,
        sprintf('%6s ',     ''),
        sprintf('%6s ',     ''),
        sprintf('%6s ',     ''),
        sprintf('%11s ',    $filesize),
        sprintf('%11s',     ''),
        sprintf('%8s ',     ' '),
        0,
        '',
        ''
    );
    
    $checksum = 0;
    for($i=0; $i<512; $i++){
        $checksum += ord($header{$i});
    }
    
    $checksum_data = pack(
        'a8',
        sprintf('%6s ',     decoct($checksum))
    );
    
    for($i=0, $j=148; $i<8; $i++, $j++){
        $header{$j} = $checksum_data{$i};
    }
    
    fwrite($pipes[0], $header.$pdf['data'].pack(
        'a'.(512 * ceil($filesize / 512) - $filesize),
        ''
    ));
    
    if($output = stream_get_contents($pipes[1])){
        echo $output;
    }
}

fwrite($pipes[0], pack('a512', ''));
fclose($pipes[0]);

while(true){
    if($output = stream_get_contents($pipes[1])){
        echo $output;
    }
    
    if(!proc_get_status($proc)['running']){
        foreach($pipes as $pipe){
            if(is_resource($pipe)){
                fclose($pipe);
            }
        }
        proc_close($proc);
        
        break;
    }
}
question from:https://stackoverflow.com/questions/65883806/stream-run-time-generated-gzip-file-with-proc-open

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The reason your script doesn’t progress is that it is attempting to write more data into the pipe than the gzip process is able to handle at once. The situation looks roughly like this:

  1. Your script writes 116736 bytes into the pipe.
  2. The gzip process reads some of it from its standard input, compresses it, and outputs compressed data on its standard output.
  3. The PHP process is blocked until the gzip process reads the rest of the input it wrote to the pipe.
  4. The gzip process is blocked until the PHP process reads the compressed output it wrote to standard output.

And so your script finds itself in a deadlock.

The root of the problem is that unlike its namesake in C, the PHP fwrite function in blocking mode will always attempt to write the entirety of the buffer to the stream until everything is written. This can be worked around by enabling non-blocking mode on the standard input pipe as well, and monitoring how much input has been actually written. For example like this:

$proc = proc_open('gzip -c -', [
    0 => ['pipe', 'r'],
    1 => ['pipe', 'w'],
], $pipes);

stream_set_read_buffer($pipes[1], 0);

stream_set_blocking($pipes[0], false);
stream_set_blocking($pipes[1], false);

$tar_string = '';
for (;;) {
    if ($tar_string === '') {
        if (/* more input available */)
            $tar_string = /* read more input */;
        else {
            $tar_string = null;
            fclose($pipes[0]);
        }
    }

    if ($tar_string !== null) {
        $written = fwrite($pipes[0], $tar_string);
        if ($written === false)
            throw new Exception('write error');
        $tar_string = substr($tar_string, $written);
    }

    /* THIS IS JUST SOME DUMB DEMONSTRATIVE CODE, DO NOT COPY-PASTE */

    for (;;) {
        $outbuf = fread($pipes[1], 69420);
        if ($outbuf === false)
            throw new Exception('read error');
        if ($outbuf === '')
            break;
        $outlen = strlen($outbuf);
        echo $outbuf;
    }
    
    if (feof($pipes[1]))
        break;
}

The above will superficially work. A big downside is that it is going to perform extremely poorly: when the gzip process is ready neither to read or write any data, the script is going to keep busy-looping uselessly and take away CPU time from the gzip process which actually needs it.

In a saner programming language, you would have access to:

  • calls such as poll or select, which are able to signal when a stream is ready to be read from or written into, and otherwise give up CPU time to other processes which may need it;
  • I/O primitives that can return immediately upon a successful partial read or write, instead of trying to process the entire size of the buffer.

But this is PHP, so we can’t have nice things. At least not built in.

There is, however, a much better solution for this problem that avoids proc_open entirely, and instead implements gzip compression using the zlib extension, like this:

$zctx = deflate_init(ZLIB_ENCODING_GZIP);
if ($zctx === false)
    throw new Exception('deflate_init failed');

while (/* more data available */) {
    $input = /* get more data */;
    $data = deflate_add($zctx, $input, ZLIB_NO_FLUSH);
    if ($data === false)
        throw new Exception('deflate_add failed');
    echo $data;
}

$data = deflate_add($zctx, '', ZLIB_FINISH);
if ($data === false)
    throw new Exception('deflate_add failed');
echo $data;

unset($zctx); // free compressor resources

deflate_init and deflate_add are available since PHP 7, assuming that the zlib extension was enabled while building PHP. Calling a library is preferable to invoking a subprocess (in any language, in fact) as it is much more lightweight: putting everything in the same process avoids memory and context-switching overheads.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...