Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
101 views
in Technique[技术] by (71.8m points)

java - What do the Stream reduce() requirements exactly entail?

When using the reduce() operation on a parallel stream, the OCP exam book states that there are certain principles the reduce() arguments must adhere to. Those principles are the following:

  1. The identity must be defined such that for all elements in the stream u, combiner.apply(identity, u) is equal to u.
  2. The accumulator operator op must be associative and stateless such that (a op b) op c is equal to a op (b op c).
  3. The combiner operator must also be associative and stateless and compatible with the identity, such that for all of u and t combiner.apply(u, accumulator.apply(identity, t)) is equal to accumulator.apply(u,t) .

The book gives two examples to illustrate these principles, please see the code below:

example for associative:

System.out.println(
        Arrays.asList(1, 2, 3, 4, 5, 6)
                .parallelStream()
                .reduce(0, (a, b) -> (a - b)));

What the book says about this:

It may output -21, 3, or some other value as the accumulator function violates the associativity property.

example for the identity requirement:

System.out.println(
        Arrays.asList("w", "o", "l", "f")
                .parallelStream()
                .reduce("X", String::concat));

What the book says about this:

You can see other problems if we use an identity parameter that is not truly an identity value. It can output XwXoXlXf. As part of the parallel process, the identity is applied to multiple elements in the stream, resulting in very unexpected data.

I don't understand those examples. With the accumulator example the accumulator starts with 0 - 1 which is -1, then -1 - 2 which is -3, then -6 etc all the way to -21. I understand that, because the generated arraylist isn't synchronized the results maybe be unpredictable because of the possibility of race conditions etc, but why isn't the accumulator associative? Wouldn't (a+b) cause unpredictable results too? I really don't see what's wrong with the accumulator being used in the example and why it's not associative, but then again I still don't exactly understand what "associative principle" means.

I don't understand the identity example either. I understand that the result could indeed be XwXoXlXf if 4 separate threads were to start accumulating with the identity at the same time, but what does that have to do with the identity parameter itself? What exactly would be a proper identity to use then?

I was wondering if anyone could enlighten me a bit more on these principles.

Thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

why isn't the accumulator associative?

It's not associative since the order of subtraction operations determines the final result.

If you run a serial Stream, you'll get the expected result of:

0 - 1 - 2 - 3 - 4 - 5 - 6 = -21

On the other hand, for parallel Streams, the work is split to multiple threads. For example, if reduce is executed in parallel on 6 threads, and then the intermediate results are combined, you can get a different result:

0 - 1   0 - 2   0 - 3      0 - 4     0 - 5    0 - 6
  -1     -2      -3         -4        -5        -6

  -1 - (-2)         -3 - (-4)          -5 - (-6)
      1                 1                  1
           1   -   1
               0            -     1

                        -1

Or, to make a long example short:

(1 - 2) - 3 = -4
1 - (2 - 3) =  2

Therefore subtraction is not associative.

On the other hand, a+b doesn't cause the same problem, since addition is an associative operator (i.e. (a+b)+c == a+(b+c)).

The problem with the identity example is that when reduce is executed in parallel on multiple threads, "X" is appended to the starts of each intermediate result.

What exactly would be a proper identity to use then?

If you change the identity value to "" :

System.out.println(Arrays.asList("w","o","l","f"))
.parallelStream()
.reduce("", String::concat));

you'll get "wolf" instead of "XwXoXlXf".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...