I am trying to read a Schema file (which is a text file) and apply it to my CSV file without a header. Since I already have a schema file I don't want to use InferSchema
option which is an overhead.
My input schema file looks like below,
"num IntegerType","letter StringType"
I am trying the below code to create a schema file,
val schema_file = spark.read.textFile("D:\Users\Documents\schemaFile.txt")
val struct_type = schema_file.flatMap(x => x.split(",")).map(b => (b.split(" ")(0).stripPrefix(""").asInstanceOf[String],b.split(" ")(1).stripSuffix(""").asInstanceOf[org.apache.spark.sql.types.DataType])).foreach(x=>println(x))
I am getting the error as below
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.types.DataType
- field (class: "org.apache.spark.sql.types.DataType", name: "_2")
- root class: "scala.Tuple2"
and trying to use this as a schema file while using spark.read.csv
like below and write it as an ORC file
val df=spark.read
.format("org.apache.spark.csv")
.option("header", false)
.option("inferSchema", true)
.option("samplingRatio",0.01)
.option("nullValue", "NULL")
.option("delimiter","|")
.schema(schema_file)
.csv("D:\Users\sampleFile.txt")
.toDF().write.format("orc").save("D:\Users\ORC")
Need help to convert a text file into a schema file and convert my input CSV file to ORC.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…