scala - call of distinct and map together throws NPE in spark library

Question

Welcome To Ask or Share your Answers For Others

scala - call of distinct and map together throws NPE in spark library

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - call of distinct and map together throws NPE in spark library

I am unsure if this is a bug, so if you do something like this

// d:spark.RDD[String]
d.distinct().map(x => d.filter(_.equals(x)))

you will get a Java NPE. However if you do a collect immediately after distinct, all will be fine.

I am using spark 0.6.1.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:34:36+0000

Spark does not support nested RDDs or user-defined functions that refer to other RDDs, hence the NullPointerException; see this thread on the spark-users mailing list.

It looks like your current code is trying to group the elements of d by value; you can do this efficiently with the groupBy() RDD method:

scala> val d = sc.parallelize(Seq("Hello", "World", "Hello"))
d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a

scala> d.groupBy(x => x).collect()
res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello)))

Categories

scala - call of distinct and map together throws NPE in spark library

scala - call of distinct and map together throws NPE in spark library

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags