Is it possible to write a higher-order function that causes an IEnumerable
to be consumed multiple times but in only one pass and without reading all the data into memory? [See Edit below for a clarification of what I'm looking for.]
For example, in the code below the enumerable is mynums
(onto which I've tagged a .Trace()
in order to see how many times we enumerate it). The goal is figure out if it has any numbers greater than 5, as well as the sum of all of the numbers. A function which processes an enumerable twice is Both_TwoPass
, but it enumerates it twice. In contrast Both_NonStream
only enumerates it once, but at the expense of reading it into memory. In principle it is possible carry out both of these tasks in a single pass and in a streaming fashion as shown by Any5Sum
, but that is specific solution. Is it possible to write a function with the same signature as Both_*
but that is the best of both worlds?
(It seems to me that this should be possible using threads. Is there a better solution using, say, async
?)
Edit
Below is a clarification regarding what I'm looking for. What I've done is included a very down-to-earth description of each property in square brackets.
I'm looking for a function Both
with the following characteristics:
- It has signature
(S1, S2) Both<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1>, Func<IEnumerable<T>, S2>)
(and produces the "right" output!)
- It only iterates the first argument,
tt
, once. [What I mean by this is that when passed mynums
(as defined below) it only outputs mynums: 0 1 2 ...
once. This precludes function Both_TwoPass
.]
- It processes the data from the first argument,
tt
, in a streaming fashion. [What I mean by this is that, for example, there is insufficient memory to store all the items from tt
in memory simultaneously, thus precluding function Both_NonStream
.]
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp
{
static class Extensions
{
public static IEnumerable<T> Trace<T>(this IEnumerable<T> tt, string msg = "")
{
Console.Write(msg);
try
{
foreach (T t in tt)
{
Console.Write(" {0}", t);
yield return t;
}
}
finally
{
Console.WriteLine('.');
}
}
public static (S1, S2) Both_TwoPass<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
{
return (f1(tt), f2(tt));
}
public static (S1, S2) Both_NonStream<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
{
var tt2 = tt.ToList();
return (f1(tt2), f2(tt2));
}
public static (bool, int) Any5Sum(this IEnumerable<int> ii)
{
int sum = 0;
bool any5 = false;
foreach (int i in ii)
{
sum += i;
any5 |= i > 5; // or: if (!any5) any5 = i > 5;
}
return (any5, sum);
}
}
class Program
{
static void Main()
{
var mynums = Enumerable.Range(0, 10).Trace("mynums:");
Console.WriteLine("TwoPass: (any > 5, sum) = {0}", mynums.Both_TwoPass(tt => tt.Any(k => k > 5), tt => tt.Sum()));
Console.WriteLine("NonStream: (any > 5, sum) = {0}", mynums.Both_NonStream(tt => tt.Any(k => k > 5), tt => tt.Sum()));
Console.WriteLine("Manual: (any > 5, sum) = {0}", mynums.Any5Sum());
}
}
}
See Question&Answers more detail:
os