Spark Transformation - Why is it lazy and what is the advantage?

Question

Welcome To Ask or Share your Answers For Others

Spark Transformation - Why is it lazy and what is the advantage?

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:34:16+0000

For transformations, Spark adds them to a DAG of computation and only when driver requests some data, does this DAG actually gets executed.

One advantage of this is that Spark can make many optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it executed everything as soon as it got it.

For example -- if you executed every transformation eagerly, what does that mean? Well, it means you will have to materialize that many intermediate datasets in memory. This is evidently not efficient -- for one, it will increase your GC costs. (Because you're really not interested in those intermediate results as such. Those are just convnient abstractions for you while writing the program.) So, what you do instead is -- you tell Spark what is the eventual answer you're interested and it figures out best way to get there.

Categories

Spark Transformation - Why is it lazy and what is the advantage?

Spark Transformation - Why is it lazy and what is the advantage?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags