Apache Spark Word2Vec throws org.apache.spark.shuffle.FetchFailedException: requirement failed

Question

Welcome To Ask or Share your Answers For Others

Apache Spark Word2Vec throws org.apache.spark.shuffle.FetchFailedException: requirement failed

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Apache Spark Word2Vec throws org.apache.spark.shuffle.FetchFailedException: requirement failed

I run spark word2vec on a 2G of data with the following config using pyspark 3.0.0.

spark = SparkSession 
    .builder 
    .appName("word2vec") 
    .master("local[*]") 
    .config("spark.driver.memory", "32g") 
    .config("spark.sql.execution.arrow.pyspark.enabled", "true") 
    .getOrCreate()

And with running code simply like following,

sentence = sample_corpus.withColumn('sentences',f.split(sample_corpus.sentences, ',')).select('sentences')
word2vec = Word2Vec(vectorSize=300, inputCol="sentences", outputCol="result", minCount=10)
model = word2vec.fit(sentence)

However, it will throw following error during the computation which does not give many useful information for debugging.

FetchFailed(BlockManagerId(driver, fedora, 34105, None), shuffleId=2, mapIndex=0, mapId=73, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: requirement failed
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:748)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:663)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70)
    at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
    at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:110)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException: requirement failed
    at scala.Predef$.require(Predef.scala:268)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator$SuccessFetchResult.(ShuffleBlockFetcherIterator.scala:981)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:422)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:536)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:171)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)
    ... 14 more

)

My machine has 64g memory. I wonder what is the cause here and how can I fix it. Thank you.

question from:https://stackoverflow.com/questions/65602060/apache-spark-word2vec-throws-org-apache-spark-shuffle-fetchfailedexception-requ

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

Apache Spark Word2Vec throws org.apache.spark.shuffle.FetchFailedException: requirement failed

Apache Spark Word2Vec throws org.apache.spark.shuffle.FetchFailedException: requirement failed

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags