Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
157 views
in Technique[技术] by (71.8m points)

c# - Consuming an IEnumerable multiple times in one pass

Is it possible to write a higher-order function that causes an IEnumerable to be consumed multiple times but in only one pass and without reading all the data into memory? [See Edit below for a clarification of what I'm looking for.]

For example, in the code below the enumerable is mynums (onto which I've tagged a .Trace() in order to see how many times we enumerate it). The goal is figure out if it has any numbers greater than 5, as well as the sum of all of the numbers. A function which processes an enumerable twice is Both_TwoPass, but it enumerates it twice. In contrast Both_NonStream only enumerates it once, but at the expense of reading it into memory. In principle it is possible carry out both of these tasks in a single pass and in a streaming fashion as shown by Any5Sum, but that is specific solution. Is it possible to write a function with the same signature as Both_* but that is the best of both worlds?

(It seems to me that this should be possible using threads. Is there a better solution using, say, async?)

Edit

Below is a clarification regarding what I'm looking for. What I've done is included a very down-to-earth description of each property in square brackets.

I'm looking for a function Both with the following characteristics:

  1. It has signature (S1, S2) Both<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1>, Func<IEnumerable<T>, S2>) (and produces the "right" output!)
  2. It only iterates the first argument, tt, once. [What I mean by this is that when passed mynums (as defined below) it only outputs mynums: 0 1 2 ... once. This precludes function Both_TwoPass.]
  3. It processes the data from the first argument, tt, in a streaming fashion. [What I mean by this is that, for example, there is insufficient memory to store all the items from tt in memory simultaneously, thus precluding function Both_NonStream.]
using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp
{
    static class Extensions
    {
        public static IEnumerable<T> Trace<T>(this IEnumerable<T> tt, string msg = "")
        {
            Console.Write(msg);
            try
            {
                foreach (T t in tt)
                {
                    Console.Write(" {0}", t);
                    yield return t;
                }
            }
            finally
            {
                Console.WriteLine('.');
            }
        }

        public static (S1, S2) Both_TwoPass<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
        {
            return (f1(tt), f2(tt));
        }

        public static (S1, S2) Both_NonStream<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
        {
            var tt2 = tt.ToList();
            return (f1(tt2), f2(tt2));
        }

        public static (bool, int) Any5Sum(this IEnumerable<int> ii)
        {
            int sum = 0;
            bool any5 = false;
            foreach (int i in ii)
            {
                sum += i;
                any5 |= i > 5; // or: if (!any5) any5 = i > 5;
            }
            return (any5, sum);
        }

    }
    class Program
    {
        static void Main()
        {
            var mynums = Enumerable.Range(0, 10).Trace("mynums:");
            Console.WriteLine("TwoPass: (any > 5, sum) = {0}", mynums.Both_TwoPass(tt => tt.Any(k => k > 5), tt => tt.Sum()));
            Console.WriteLine("NonStream: (any > 5, sum) = {0}", mynums.Both_NonStream(tt => tt.Any(k => k > 5), tt => tt.Sum()));
            Console.WriteLine("Manual: (any > 5, sum) = {0}", mynums.Any5Sum());
        }
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The way you've written your computation model (i.e. return (f1(tt), f2(tt))) there is no way to avoid multiple iterations of your enumerable. You're basically saying compute Item1 then compute Item2.

You have to either change the model from (Func<IEnumerable<T>, S1>, Func<IEnumerable<T>, S2>) to (Func<T, S1>, Func<T, S2>) or to Func<IEnumerable<T>, (S1, S2)> to be able to run the computations in parallel.

You implementation of Any5Sum is basically the second approach (Func<IEnumerable<T>, (S1, S2)>). But there's already a built-in method for that.

Try this:

Console.WriteLine("Aggregate: (any > 5, sum) = {0}",
    mynums
        .Aggregate<int, (bool any5, int sum)>(
            (false, 0),
            (a, x) => (a.any5 | x > 5, a.sum + x)));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...