Spark does not support nested RDDs or user-defined functions that refer to other RDDs, hence the NullPointerException; see this thread on the spark-users
mailing list.
It looks like your current code is trying to group the elements of d
by value; you can do this efficiently with the groupBy()
RDD method:
scala> val d = sc.parallelize(Seq("Hello", "World", "Hello"))
d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a
scala> d.groupBy(x => x).collect()
res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello)))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…