Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.8k views
in Technique[技术] by (71.8m points)

How to maintain a mutable state while iterating through a dataframe in spark scala

I am new to Spark and I'm not sure what the best way is to proceed.

I need to maintain a mutable, distributed state that all executors need to read from and write to while looping through a Dataframe and I would like to use Redis as the mutable distributed state.

My dataframe is composed of composite keys idA, idB and some other information, I want to partition by idA and do either a foreachPartition or mapParitions while maintaining the count of idB in Redis.

In Redis I will be building a map of idB with a count, and in my iteration I will be checking to see if I can still process idB based off of business rules for idA and if the maxCount has been reached in Redis.

What is the best way to accomplish this?

Can I call spark.format("org.apache.spark.sql.redis").options().load & df.write.format("org.apache.spark.sql.redis").save() while I am iterating through the dataframe?

Has anyone done something similar? or have any suggestions as to how to accomplish this?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...