scala> val p=sc.textFile("file:///c:/_home/so-posts.xml", 8) //i've 8 cores
p: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at textFile at <console>:21
scala> p.partitions.size
res33: Int = 729
I was expecting 8 to be printed and I see 729 tasks in Spark UI
EDIT:
After calling repartition()
as suggested by @zero323
scala> p1 = p.repartition(8)
scala> p1.partitions.size
res60: Int = 8
scala> p1.count
I still see 729 tasks in the Spark UI even though the spark-shell prints 8.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…