Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
369 views
in Technique[技术] by (71.8m points)

c# - Capturing binary output from Process.StandardOutput

In C# (.NET 4.0 running under Mono 2.8 on SuSE) I would like to run an external batch command and capture its ouput in binary form. The external tool I use is called 'samtools' (samtools.sourceforge.net) and among other things it can return records from an indexed binary file format called BAM.

I use Process.Start to run the external command, and I know that I can capture its output by redirecting Process.StandardOutput. The problem is, that's a text stream with an encoding, so it doesn't give me access to the raw bytes of the output. The almost-working solution I found is to access the underlying stream.

Here's my code:

        Process cmdProcess = new Process();
        ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
        cmdStartInfo.FileName = "samtools";

        cmdStartInfo.RedirectStandardError = true;
        cmdStartInfo.RedirectStandardOutput = true;
        cmdStartInfo.RedirectStandardInput = false;
        cmdStartInfo.UseShellExecute = false;
        cmdStartInfo.CreateNoWindow = true;

        cmdStartInfo.Arguments = "view -u " + BamFileName + " " + chromosome + ":" + start + "-" + end;

        cmdProcess.EnableRaisingEvents = true;
        cmdProcess.StartInfo = cmdStartInfo;
        cmdProcess.Start();

        // Prepare to read each alignment (binary)
        var br = new BinaryReader(cmdProcess.StandardOutput.BaseStream);

        while (!cmdProcess.StandardOutput.EndOfStream)
        {
            // Consume the initial, undocumented BAM data 
            br.ReadBytes(23);

// ... more parsing follows

But when I run this, the first 23bytes that I read are not the first 23 bytes in the ouput, but rather somewhere several hundred or thousand bytes downstream. I assume that StreamReader does some buffering and so the underlying stream is already advanced say 4K into the output. The underlying stream does not support seeking back to the start.

And I'm stuck here. Does anyone have a working solution for running an external command and capturing its stdout in binary form? The ouput may be very large so I would like to stream it.

Any help appreciated.

By the way, my current workaround is to have samtools return the records in text format, then parse those, but this is pretty slow and I'm hoping to speed things up by using the binary format directly.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using StandardOutput.BaseStream is the correct approach, but you must not use any other property or method of cmdProcess.StandardOutput. For example, accessing cmdProcess.StandardOutput.EndOfStream will cause the StreamReader for StandardOutput to read part of the stream, removing the data you want to access.

Instead, simply read and parse the data from br (assuming you know how to parse the data, and won't read past the end of stream, or are willing to catch an EndOfStreamException). Alternatively, if you don't know how big the data is, use Stream.CopyTo to copy the entire standard output stream to a new file or memory stream.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...