Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
551 views
in Technique[技术] by (71.8m points)

Getting average across keys in Spark

How to calculate averages across keys in Spark?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can calculate averages across keys in Spark either using combineByKey or foldByKey.

foldByKey

foldByKey(initialValue)((initialValue,inputDataValue) => { //code })

Input Data:

employee,department,salary
e1,d1,100
e2,d1,500
e5,d2,200
e6,d1,300
e7,d3,200
e7,d3,500

1 at the end is for counts. Being fold the type of input and initialValue must match

val depSalary = data.map(_.split(',')).map( x=> (x(1),(x(2).toInt,1)))   

val dummy = (0,0)
val depSalarySumCount = depSalary.foldByKey(dummy)((startValue,data)  => ( startValue._1 + data._1 , startValue._2 +data._2  ) )   

val result =  depSalarySumCount.map(x => (x._1, (x._2._1/x._2._2) ))
result.collect

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...