scala - Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

Question

Welcome To Ask or Share your Answers For Others

scala - Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

scala - Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

I am having a simple file of size 7 GB in which each line containing two column separated by |.I have created RDD from this file but when i use map or filter transformation on this RDD i gets too many byte exception.

below is sample data from my file .

116010100000000007|33448

116010100000000014|13520

116010100000000021|97132

116010100000000049|82891

116010100000000049|82890

116010100000000056|93014

116010100000000063|43434

here is the code

val input = sparkContext.textFile("hdfsfilePath");

input.filter(x=>x.split("|")(1).toInt > 15000).saveAsTextFile("hdfs://output file path")

Below is the Exception i am getting .

java.io.IOException: Too many bytes before newline: 2147483648
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:249)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:06+0000

Issue was with my scala code while splitting line with pipe delimiter ,i have changed the code and now it is working. below is changed code .

          val input = sparkContext.textFile("hdfsfilePath");

          input.filter(x=>x.split('|')(1).toInt > 15000).saveAsTextFile("hdfs://output file path")

instead of "|" i neeed to use either '|' or "\|" in split method.

Categories

scala - Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

scala - Spark map/Filter throws java.io.IOException: Too many bytes before newline: 2147483648

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags