So I am trying to learn Spark using Python (Pyspark). I want to know how the function mapPartitions
work. That is what Input it takes and what Output it gives. I couldn't find any proper example from the internet. Lets say, I have an RDD object containing lists, such as below.
[ [1, 2, 3], [3, 2, 4], [5, 2, 7] ]
And I want to remove element 2 from all the lists, how would I achieve that using mapPartitions
.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…