Below is my data:
val keysWithValuesList = Array("foo=A", "foo=A", "foo=A", "foo=A", "foo=B", bar=C","bar=D", "bar=D")
Now I want below types of output but without using combineByKey
and aggregateByKey
:
1) Array[(String, Int)] = Array((foo,5), (bar,3))
2) Array((foo,Set(B, A)),
(bar,Set(C, D)))
Below is my attempt:
scala> val keysWithValuesList = Array("foo=A", "foo=A", "foo=A", "foo=A", "foo=B", "bar=C",
| "bar=D", "bar=D")
scala> val sample=keysWithValuesList.map(_.split("=")).map(p=>(p(0),(p(1))))
sample: Array[(String, String)] = Array((foo,A), (foo,A), (foo,A), (foo,A), (foo,B), (bar,C), (bar,D), (bar,D))
Now when I type the variable name followed by tab to see the applicable methods for the mapped RDD I can see the below options out of which none can satisfy my requirement:
scala> sample.
apply asInstanceOf clone isInstanceOf length toString update
So how can I achieve this ??
See Question&Answers more detail:
os