Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
277 views
in Technique[技术] by (71.8m points)

c# - How to split large files efficiently

I'd like to know how I can split a large file without using too many system resources. I'm currently using this code:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    byte[] buffer = new byte[chunkSize];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "" + index))
            {
                int chunkBytesRead = 0;
                while (chunkBytesRead < chunkSize)
                {
                    int bytesRead = input.Read(buffer, 
                                               chunkBytesRead, 
                                               chunkSize - chunkBytesRead);

                    if (bytesRead == 0)
                    {
                        break;
                    }
                    chunkBytesRead += bytesRead;
                }
                output.Write(buffer, 0, chunkBytesRead);
            }
            index++;
        }
    }
}

The operation takes 52.370 seconds to split a 1.6GB file into 14mb files. I'm not concerned about how long the operation takes, I'm more concerned about the system resource used as this app will be deployed to a shared hosting environment. Currently this operation max's out my systems HDD IO usage at 100%, and slows my system down considerably. CPU usage is low; RAM ramps up a bit, but seems fine.

Is there a way I can restrict this operation from using too many resources?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It seems odd to assemble each output file in memory; I suspect you should be running an inner buffer (maybe 20k or something) and calling Write more frequently.

Ultimately, if you need IO, you need IO. If you want to be courteous to a shared hosting environment you could add deliberate pauses - maybe short pauses within the inner loop, and a longer pause (maybe 1s) in the outer loop. This won't affect your overall timing much, but may help other processes get some IO.

Example of a buffer for the inner-loop:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    const int BUFFER_SIZE = 20 * 1024;
    byte[] buffer = new byte[BUFFER_SIZE];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "" + index))
            {
                int remaining = chunkSize, bytesRead;
                while (remaining > 0 && (bytesRead = input.Read(buffer, 0,
                        Math.Min(remaining, BUFFER_SIZE))) > 0)
                {
                    output.Write(buffer, 0, bytesRead);
                    remaining -= bytesRead;
                }
            }
            index++;
            Thread.Sleep(500); // experimental; perhaps try it
        }
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...