Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
566 views
in Technique[技术] by (71.8m points)

c# - FileHelpers throws OutOfMemoryException when parsing large csv file

I'm trying to parse a very large csv file with FileHelpers (http://www.filehelpers.net/). The file is 1GB zipped and about 20GB unzipped.

        string fileName = @"c:myfile.csv.gz";
        using (var fileStream = File.OpenRead(fileName))
        {
            using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
            {
                using (TextReader textReader = new StreamReader(gzipStream))
                {
                    var engine = new FileHelperEngine<CSVItem>();
                    CSVItem[] items = engine.ReadStream(textReader);                        
                }
            }
        }

FileHelpers then throws an OutOfMemoryException.

Test failed: Exception of type 'System.OutOfMemoryException' was thrown. System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount) at System.Text.StringBuilder.Append(Char value, Int32 repeatCount) at System.Text.StringBuilder.Append(Char value) at FileHelpers.StringHelper.ExtractQuotedString(LineInfo line, Char quoteChar, Boolean allowMultiline) at FileHelpers.DelimitedField.ExtractFieldString(LineInfo line) at FileHelpers.FieldBase.ExtractValue(LineInfo line) at FileHelpers.RecordInfo.StringToRecord(LineInfo line) at FileHelpers.FileHelperEngine1.ReadStream(TextReader reader, Int32 maxRecords, DataTable dt) at FileHelpers.FileHelperEngine1.ReadStream(TextReader reader)

Is it possible to parse a file this big with FileHelpers? If not can anyone recommend an approach to parsing files this big? Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You must work record by record in this way:

  string fileName = @"c:myfile.csv.gz";
  using (var fileStream = File.OpenRead(fileName))
  {
      using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
      {
          using (TextReader textReader = new StreamReader(gzipStream))
          {
            var engine = new FileHelperAsyncEngine<CSVItem>();
            using(engine.BeginReadStream(textReader))
            {
                foreach(var record in engine)
                {
                   // Work with each item
                }
            }
          }
      }
  }

If you use this async aproach you will only be using the memory for a record a time, and that will be much more faster.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...