i am learning spark, but i can't understand this function combineByKey
.
>>> data = sc.parallelize([("A",1),("A",2),("B",1),("B",2),("C",1)] )
>>> data.combineByKey(lambda v : str(v)+"_", lambda c, v : c+"@"+str(v), lambda c1, c2 : c1+c2).collect()
The output is:
[('A', '1_2_'), ('C', '1_'), ('B', '1_2_')]
First, i am very confused: where is the @
in second step lambda c, v : c+"@"+v
? i can't find @
from the result.
Second, i read the function description for combineByKey
, but i am confused the algorithm flow.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…