Lazy IO has the problem that releasing whatever resource you have acquired is somewhat unpredictable, as it depends on how your program consumes the data -- its "demand pattern". Once your program drops the last reference to the resource, the GC will eventually run and release that resource.
Lazy streams are a very convenient style to program in. This is why shell pipes are so fun and popular.
However, if resources are constrained (as in high-performance scenarios, or production environments that expect to scale to the limits of the machine) relying on the GC to clean up can be an insufficient guarantee.
Sometimes you have to release resources eagerly, in order to improve scalability.
So what are the alternatives to lazy IO that don't mean giving up on incremental processing (which in turn would consume too many resources)? Well, we have foldl
based processing, aka iteratees or enumerators, introduced by Oleg Kiselyov in the late 2000s, and since popularized by a number of networking-based projects.
Instead of processing data as lazy streams, or in one huge batch, we instead abstract over chunk-based strict processing, with guaranteed finalization of the resource once the last chunk is read. That's the essence of iteratee-based programming, and one that offers very nice resource constraints.
The downside of iteratee-based IO is that it has a somewhat awkward programming model (roughly analogous to event-based programming, versus nice thread-based control). It is definitely an advanced technique, in any programming language. And for the vast majority of programming problems, lazy IO is entirely satisfactory. However, if you will be opening many files, or talking on many sockets, or otherwise using many simultaneous resources, an iteratee (or enumerator) approach might make sense.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…