In Spark, it is possible to set some hadoop configuration settings like, e.g.
System.setProperty("spark.hadoop.dfs.replication", "1")
This works, the replication factor is set to 1.
Assuming that this is the case, I thought that this pattern (prepending "spark.hadoop." to a regular hadoop configuration property), would also work for the
textinputformat.record.delimiter:
System.setProperty("spark.hadoop.textinputformat.record.delimiter", "
")
However, it seems that spark just ignores this setting.
Do I set the textinputformat.record.delimiter
in the correct way?
Is there a simpler way of setting the textinputformat.record.delimiter
. I would like to avoid writing my own InputFormat
, since I really only need to obtain records delimited by two newlines.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…