Im trying to define a function that gets the cartesian product of a given list with itself , however i nedd to filter out the elemnts that contains the same pairs.
For example:
Getting the cartesian product of rdd and fiter out the results ((1,0),(1,0)),((2,0),(2,0)) and ((3,0),(3,0))
rdd = sc.parallelize([(1,0), (2,0), (3,0)])
def get_cart(rdd):
a=sorted(rdd.cartesian(rdd).collect())
aRDD=sc.parallelize(a)
return aRDD
Im expecting to get the output:
[((1, 0), (2, 0)), ((1, 0), (3, 0)), ((2, 0), (1, 0)), ((2, 0), (3, 0)), ((3, 0), (1, 0)), ((3, 0), (2, 0))]
Instead im getting:
[((1, 0), (1, 0)),
((1, 0), (2, 0)),
((1, 0), (3, 0)),
((2, 0), (1, 0)),
((2, 0), (2, 0)),
((2, 0), (3, 0)),
((3, 0), (1, 0)),
((3, 0), (2, 0)),
((3, 0), (3, 0))]
question from:
https://stackoverflow.com/questions/65866091/how-can-i-get-a-cartesian-product-filtering-out-the-pair-of-tuples-with-repeated