This should be possible by first indexing the RDD. The transformation zipWithIndex
provides a stable indexing, numbering each element in its original order.
Given: rdd = (a,b,c)
val withIndex = rdd.zipWithIndex // ((a,0),(b,1),(c,2))
To lookup an element by index, this form is not useful. First we need to use the index as key:
val indexKey = withIndex.map{case (k,v) => (v,k)} //((0,a),(1,b),(2,c))
Now, it's possible to use the lookup
action in PairRDD to find an element by key:
val b = indexKey.lookup(1) // Array(b)
If you're expecting to use lookup
often on the same RDD, I'd recommend to cache the indexKey
RDD to improve performance.
How to do this using the Java API is an exercise left for the reader.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…