Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
171 views
in Technique[技术] by (71.8m points)

java - Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?

Environment: Ubuntu x86_64 (14.10), Oracle JDK 1.8u25

I try and use a parallel stream of Files.lines() but I want to .skip() the first line (it's a CSV file with a header). Therefore I try and do this:

try (
    final Stream<String> stream = Files.lines(thePath, StandardCharsets.UTF_8)
        .skip(1L).parallel();
) {
    // etc
}

But then one column failed to parse to an int...

So I tried some simple code. The file is question is dead simple:

$ cat info.csv 
startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes
1422758875023;34;54;151;4375;4375;27486
$

And the code is equally simple:

public static void main(final String... args)
{
    final Path path = Paths.get("/home/fge/tmp/dd/info.csv");
    Files.lines(path, StandardCharsets.UTF_8).skip(1L).parallel()
        .forEach(System.out::println);
}

And I systematically get the following result (OK, I have only run it something around 20 times):

startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes

What am I missing here?


EDIT It seems like the problem, or misunderstanding, is much more rooted than that (the two examples below were cooked up by a fellow on FreeNode's ##java):

public static void main(final String... args)
{
    new BufferedReader(new StringReader("Hello
World")).lines()
        .skip(1L).parallel()
        .forEach(System.out::println);

    final Iterator<String> iter
        = Arrays.asList("Hello", "World").iterator();
    final Spliterator<String> spliterator
        = Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED);
    final Stream<String> s
        = StreamSupport.stream(spliterator, true);

    s.skip(1L).forEach(System.out::println);
}

This prints:

Hello
Hello

Uh.

@Holger suggested that this happens for any stream which is ORDERED and not SIZED with this other sample:

Stream.of("Hello", "World")
    .filter(x -> true)
    .parallel()
    .skip(1L)
    .forEach(System.out::println);

Also, it stems from all the discussion which already took place that the problem (if it is one?) is with .forEach() (as @SotiriosDelimanolis first pointed out).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since the current state of the issue is quite the opposite of the earlier statements made here, it should be noted, that there is now an explicit statement by Brian Goetz about the back-propagation of the unordered characteristic past a skip operation is considered a bug. It’s also stated that it is now considered to have no back-propagation of the ordered-ness of a terminal operation at all.

There is also a related bug report, JDK-8129120 whose status is “fixed in Java?9” and it’s backported to Java?8, update 60

I did some tests with jdk1.8.0_60 and it seems that the implementation now indeed exhibits the more intuitive behavior.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...