In Java (not Scala!) Spark 3.0.1 have a JavaRDD instance object neighborIdsRDD
which its type is JavaRDD<Tuple2<Object, long[]>>
.
Part of my code related to the generation of the JavaRDD is the following:
GraphOps<String, String> graphOps = new GraphOps<>(graph, stringTag, stringTag);
JavaRDD<Tuple2<Object, long[]>> neighborIdsRDD = graphOps.collectNeighborIds(EdgeDirection.Either()).toJavaRDD();
I have had to get a JavaRDD using toJavaRDD()
because collectNeighborIds
returns a org.apache.spark.graphx.VertexRDD<long[]>
object (VertexRDD doc).
However, in the rest of my application I need to have a Spark Dataset<Row>
built from the collectNeighborIds
object.
What are the possibilities and the best ways to get a JavaRDD<Tuple2<Object, long[]>> be converted into a Dataset<Row>?
Update from comments:
I adjusted the code basing from comments:
GraphOps<String, String> graphOps = new GraphOps<>(graph, stringTag, stringTag);
JavaRDD<Tuple2<Object, long[]>> neighborIdsRDD = graphOps.collectNeighborIds(EdgeDirection.Either()).toJavaRDD();
System.out.println("VertexRDD neighborIdsRDD is:");
for (int i = 0; i < neighborIdsRDD.collect().size(); i++) {
System.out.println(
((Tuple2<Object, long[]>) neighborIdsRDD.collect().get(i))._1() + " -- " +
Arrays.toString(((Tuple2<Object, long[]>) neighborIdsRDD.collect().get(i))._2())
);
}
Dataset<Row> dr = spark_session.createDataFrame(neighborIdsRDD.rdd(), Tuple2.class);
System.out.println("converted Dataset<Row> is:");
dr.show();
but I get an empty Dataset as follows:
VertexRDD neighborIdsRDD is:
4 -- [3]
1 -- [2, 3]
5 -- [3, 2]
2 -- [1, 3, 5]
3 -- [1, 2, 5, 4]
converted Dataset<Row> is:
++
||
++
||
||
||
||
||
++
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…