Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
821 views
in Technique[技术] by (71.8m points)

the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead is used to store what kind of data?

I wondered that :

  • spark use the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead to store what kind of data?
  • And in which case i should boost the value of spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In YARN terminology, executors and application masters run inside “containers”. Spark offers yarn specific properties so you can run your application :

  • spark.yarn.executor.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
  • spark.yarn.driver.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode with the memory properties as the executor's memoryOverhead.

So it's not about storing data, it's just the resources needed for YARN to run properly.

In some cases,


e.g if you enable dynamicAllocation you might want to set these properties explicitly along with the maximum number of executor (spark.dynamicAllocation.maxExecutors) that can be created during the process which can easily overwhelm YARN by asking for thousands of executors and thus loosing the already running executors.

Increasing the target number of executors happens in response to backlogged tasks waiting to be scheduled. If the scheduler queue is not drained in N seconds, then new executors are added. If the queue persists for another M seconds, then more executors are added and so on. The number added in each round increases exponentially from the previous round until an upper bound has been reached. The upper bound is based both on a configured property and on the current number of running and pending tasks, as described above.

This can lead into an exponential increase of the number of executors in some cases which can break the YARN resource manager. In my case :

16/03/31 07:15:44 INFO ExecutorAllocationManager: Requesting 8000 new executors because tasks are backlogged (new desired total will be 40000)


This doesn't cover all the use case which one can use those property, but it gives a general idea about how it's been used.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...