Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

linq - Reading a file line by line in C#

I am trying to read some text files, where each line needs to be processed. At the moment I am just using a StreamReader, and then reading each line individually.

I am wondering whether there is a more efficient way (in terms of LoC and readability) to do this using LINQ without compromising operational efficiency. The examples I have seen involve loading the whole file into memory, and then processing it. In this case however I don't believe that would be very efficient. In the first example the files can get up to about 50k, and in the second example, not all lines of the file need to be read (sizes are typically < 10k).

You could argue that nowadays it doesn't really matter for these small files, however I believe that sort of the approach leads to inefficient code.

First example:

// Open file
using(var file = System.IO.File.OpenText(_LstFilename))
{
    // Read file
    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Ignore empty lines
        if (line.Length > 0)
        {
            // Create addon
            T addon = new T();
            addon.Load(line, _BaseDir);

            // Add to collection
            collection.Add(addon);
        }
    }
}

Second example:

// Open file
using (var file = System.IO.File.OpenText(datFile))
{
    // Compile regexs
    Regex nameRegex = new Regex("IDENTIFY (.*)");

    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Check name
        Match m = nameRegex.Match(line);
        if (m.Success)
        {
            _Name = m.Groups[1].Value;

            // Remove me when other values are read
            break;
        }
    }
}
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can write a LINQ-based line reader pretty easily using an iterator block:

static IEnumerable<SomeType> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            SomeType newRecord = /* parse line */
            yield return newRecord;
        }
    }
}

or to make Jon happy:

static IEnumerable<string> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            yield return line;
        }
    }
}
...
var typedSequence = from line in ReadFrom(path)
                    let record = ParseLine(line)
                    where record.Active // for example
                    select record.Key;

then you have ReadFrom(...) as a lazily evaluated sequence without buffering, perfect for Where etc.

Note that if you use OrderBy or the standard GroupBy, it will have to buffer the data in memory; ifyou need grouping and aggregation, "PushLINQ" has some fancy code to allow you to perform aggregations on the data but discard it (no buffering). Jon's explanation is here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...