I've just created python list of range(1,100000)
.
Using SparkContext done the following steps:
a = sc.parallelize([i for i in range(1, 100000)])
b = sc.parallelize([i for i in range(1, 100000)])
c = a.zip(b)
>>> [(1, 1), (2, 2), -----]
sum = sc.accumulator(0)
c.foreach(lambda (x, y): life.add((y-x)))
Which gives warning as follows:
ARN TaskSetManager: Stage 3 contains a task of very large size (4644 KB). The maximum recommended task size is 100 KB.
How to resolve this warning? Is there any way to handle size? And also, will it affect the time complexity on big data?
question from:
https://stackoverflow.com/questions/28878654/spark-using-python-how-to-resolve-stage-x-contains-a-task-of-very-large-size-x 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…