Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

Question

Welcome To Ask or Share your Answers For Others

Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

I've just created python list of range(1,100000).

Using SparkContext done the following steps:

a = sc.parallelize([i for i in range(1, 100000)])
b = sc.parallelize([i for i in range(1, 100000)])

c = a.zip(b)

>>> [(1, 1), (2, 2), -----]

sum  = sc.accumulator(0)

c.foreach(lambda (x, y): life.add((y-x)))

Which gives warning as follows:

ARN TaskSetManager: Stage 3 contains a task of very large size (4644 KB). The maximum recommended task size is 100 KB.

How to resolve this warning? Is there any way to handle size? And also, will it affect the time complexity on big data?

question from:https://stackoverflow.com/questions/28878654/spark-using-python-how-to-resolve-stage-x-contains-a-task-of-very-large-size-x

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T05:12:20+0000

Expanding @leo9r comment: consider using not a python range, but sc.range https://spark.apache.org/docs/1.6.0/api/python/pyspark.html#pyspark.SparkContext.range.

Thus you avoid transfer of huge list from your driver to executors.

Of course, such RDDs are usually used for testing purposes only, so you do not want them to be broadcasted.

Categories

Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags