Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
169 views
in Technique[技术] by (71.8m points)

java - Java8 : how to aggregate objects from a stream?

Edit

IMHO : I think it is not a duplicate because the two questions are trying to solve the problem in different ways and especially because they provide totally different technological skills (and finally, because I ask myself these two questions).

Question

How to aggregate items from an ordered stream, preferably in an intermediate operation ?

Context

Following my other question : Java8 stream lines and aggregate with action on terminal line

I've got a very large file of the form :

MASTER_REF1
    SUBREF1
    SUBREF2
    SUBREF3
MASTER_REF2
MASTER_REF3
    SUBREF1
    ...

Where SUBREF (if any) is applicable to MASTER_REF and both are complex objects (you can imagine it somewhat like JSON).

On first look I tried to group the lines with an operation returning null while agregating and a value when a group of line could be found (a "group" of lines ends if line.charAt(0)!=' ').

This code is hard to read and requires a .filter(Objects::nonNull).

I think one could achieve this using a .collect(groupingBy(...)) or a .reduce(...) but those are terminal operations which is :

  • not required in my case : lines are ordered and should be grouped by their position and groups of line are to be transformed afterwards (map+filter+...+foreach);
  • nor a good idea : I'm talking of a huge data file that is way bigger than the total amount of RAM+SWAP ... a terminal operation would saturate availiable resources (as said, by design I need to keep groups in memory because are to be transformed afterwards)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As I already noted in the answer to the previous question, it's possible to use some third-party libraries which provide partial reduction operations. One of such libraries is StreamEx which I develop by myself.

In StreamEx library the partial reduction operation is the intermediate stream operation which combines several input elements while some condition is met. Usually the condition is specified via BiPredicate applied to the pair of adjacent stream elements which returns true when elements should be combined together. The simplest way to combine elements is to make a List via StreamEx.groupRuns() method like this:

Stream<List<String>> records = StreamEx.of(Files.lines(path))
    .groupRuns((line1, line2) -> !line2.startsWith("MASTER"));

Here we start a new record when the second of two adjacent lines starts with "MASTER" (as in your example). Otherwise we continue the previous record.

Note that such stream is still lazy. In sequential processing at most one intermediate List<String> is created at a time. Parallel processing is also supported, though turning the Files.lines stream into parallel mode rarely improves the performance (at least prior to Java-9).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...