algorithm - In Java, how do I efficiently and elegantly stream a tree node's descendants?

Question

Welcome To Ask or Share your Answers For Others

algorithm - In Java, how do I efficiently and elegantly stream a tree node's descendants?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - In Java, how do I efficiently and elegantly stream a tree node's descendants?

Assume we have a collection of objects that are identified by unique Strings, along with a class Tree that defines a hierarchy on them. That class is implemented using a Map from nodes (represented by their IDs) to Collections of their respective children's IDs.

class Tree {
  private Map<String, Collection<String>> edges;

  // ...

  public Stream<String> descendants(String node) {
    // To be defined.
  }
}

I would like to enable streaming a node's descendants. A simple solution is this:

private Stream<String> children(String node) {
    return edges.getOrDefault(node, Collections.emptyList()).stream();
}

public Stream<String> descendants(String node) {
    return Stream.concat(
        Stream.of(node),
        children(node).flatMap(this::descendants)
    );
}

Before continuing, I would like to make the following assertions about this solution. (Am I correct about these?)

Walking the Stream returned from descendants consumes resources (time and memory) – relative to the size of the tree – in the same order of complexity as hand-coding the recursion would. In particular, the intermediate objects representing the iteration state (Streams, Spliterators, ...) form a stack and therefore the memory requirement at any given time is in the same order of complexity as the tree's depth.
As I understand this, as soon as I perform a terminating operation on the Stream returned from descendants, the root-level call to flatMap will cause all contained Streams – one for each (recursive) call to descendants – to be realized immediately. Thus, the resulting Stream is only lazy on the first level of recursion, but not beyond. (Edited according to Tagir Valeevs answer.)

If I understood these points correctly, my question is this: How can I define descendants so that the resulting Stream is lazy?

I would like the solution to be as elegant as possible, in the sense that I prefer a solution which leaves the iteration state implicit. (To clarify what I mean by that: I know that I could write a Spliterator that walks the tree while maintaining an explicit stack of Spliterators on each level. I would like to avoid that.)

(Is there possibly a way in Java to formulate this as a producer-consumer workflow, like one could use in languages like Julia and Go?)

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:32:20+0000

To me, your solution is already as elegant as possible and the limited laziness of it not your fault. The simplest solution is to wait until it gets fixed by the JRE developers. It has been done with Java?10.

However, if this limited laziness of today’s implementation really is a concern, it’s perhaps time to solve this in a general way. Well, it is about implementing a Spliterator, but not specific to your task. Instead, it’s a re-implementation of the flatmap operation serving all cases where the limited laziness of the original implementation matters:

public class FlatMappingSpliterator<E,S> extends Spliterators.AbstractSpliterator<E>
implements Consumer<S> {

    static final boolean USE_ORIGINAL_IMPL
        = Boolean.getBoolean("stream.flatmap.usestandard");

    public static <T,R> Stream<R> flatMap(
        Stream<T> in, Function<? super T,? extends Stream<? extends R>> mapper) {

        if(USE_ORIGINAL_IMPL)
            return in.flatMap(mapper);

        Objects.requireNonNull(in);
        Objects.requireNonNull(mapper);
        return StreamSupport.stream(
            new FlatMappingSpliterator<>(sp(in), mapper), in.isParallel()
        ).onClose(in::close);
    }

    final Spliterator<S> src;
    final Function<? super S, ? extends Stream<? extends E>> f;
    Stream<? extends E> currStream;
    Spliterator<E> curr;

    private FlatMappingSpliterator(
        Spliterator<S> src, Function<? super S, ? extends Stream<? extends E>> f) {
        // actually, the mapping function can change the size to anything,
        // but it seems, with the current stream implementation, we are
        // better off with an estimate being wrong by magnitudes than with
        // reporting unknown size
        super(src.estimateSize()+100, src.characteristics()&ORDERED);
        this.src = src;
        this.f = f;
    }

    private void closeCurr() {
        try { currStream.close(); } finally { currStream=null; curr=null; }
    }

    public void accept(S s) {
        curr=sp(currStream=f.apply(s));
    }

    @Override
    public boolean tryAdvance(Consumer<? super E> action) {
        do {
            if(curr!=null) {
                if(curr.tryAdvance(action))
                    return true;
                closeCurr();
            }
        } while(src.tryAdvance(this));
        return false;
    }

    @Override
    public void forEachRemaining(Consumer<? super E> action) {
        if(curr!=null) {
            curr.forEachRemaining(action);
            closeCurr();
        }
        src.forEachRemaining(s->{
            try(Stream<? extends E> str=f.apply(s)) {
                if(str!=null) str.spliterator().forEachRemaining(action);
            }
        });
    }

    @SuppressWarnings("unchecked")
    private static <X> Spliterator<X> sp(Stream<? extends X> str) {
        return str!=null? ((Stream<X>)str).spliterator(): null;
    }

    @Override
    public Spliterator<E> trySplit() {
        Spliterator<S> split = src.trySplit();
        if(split==null) {
            Spliterator<E> prefix = curr;
            while(prefix==null && src.tryAdvance(s->curr=sp(f.apply(s))))
                prefix=curr;
            curr=null;
            return prefix;
        }
        FlatMappingSpliterator<E,S> prefix=new FlatMappingSpliterator<>(split, f);
        if(curr!=null) {
            prefix.curr=curr;
            curr=null;
        }
        return prefix;
    }
}

All you need for using it, is to add a import static of the flatMap method to your code and change expressions of the form stream.flatmap(function) to flatmap(stream, function).

I.e. in your code

public Stream<String> descendants(String node) {
    return Stream.concat(
        Stream.of(node),
        flatMap(children(node), this::descendants)
    );
}

then you have full lazy behavior. I tested it even with infinite streams…

Note that I added a toggle to allow turning back to the original implementation, e.g. when specifying??? -Dstream.flatmap.usestandard=true on the command line.

Categories

algorithm - In Java, how do I efficiently and elegantly stream a tree node's descendants?

algorithm - In Java, how do I efficiently and elegantly stream a tree node's descendants?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags