Partitioner
s work by assigning a key to a partition. You would need prior knowledge of the key distribution, or look at all keys, to make such a partitioner. This is why Spark does not provide you with one.
In general you do not need such a partitioner. In fact I cannot come up with a use case where I would need equal-size partitions. What if the number of elements is odd?
Anyway, let us say you have an RDD keyed by sequential Int
s, and you know how many in total. Then you could write a custom Partitioner
like this:
class ExactPartitioner[V](
partitions: Int,
elements: Int)
extends Partitioner {
def getPartition(key: Any): Int = {
val k = key.asInstanceOf[Int]
// `k` is assumed to go continuously from 0 to elements-1.
return k * partitions / elements
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…