broadcast - When does Spark evict broadcasted dataframe from Executors?

Question

Welcome To Ask or Share your Answers For Others

broadcast - When does Spark evict broadcasted dataframe from Executors?

1 Reply

深蓝 · Answer 1 · 2021-02-16T21:00:21+0000

I find this topic functionally easy to understand, but the manuals harder to follow technically and there are improvements always in the offing.

My take:

There is a ContextCleaner that is running on the Driver for every Spark App.

It gets created immediately started when the SparkContext commences.

It is more about all sorts of objects in Spark.

The ContextCleaner thread cleans RDD, shuffle, and broadcast states, Accumulators using keepCleaning method that runs always from this class. It decides which objects needs eviction due to no longer being referenced and these get placed on a list. It calls various methods, such as registerShuffleForCleanup. That is to say a check is made to see if there are no alive root objects pointing to a given object; if so, then that object is eligible for clean-up, eviction.

context-cleaner-periodic-gc asynchronously requests the standard JVM garbage collector. Periodic runs of this are started when ContextCleaner starts and stopped when ContextCleaner terminates.

Spark makes use of the standard Java GC.

This https://mallikarjuna_g.gitbooks.io/spark/content/spark-service-contextcleaner.html is a good reference next to the Spark official docs.

Categories

broadcast - When does Spark evict broadcasted dataframe from Executors?

broadcast - When does Spark evict broadcasted dataframe from Executors?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags