I have created a PySpark application that reads the JSON file in a dataframe through a defined Schema. code sample below
schema = StructType([
StructField("domain", StringType(), True),
StructField("timestamp", LongType(), True),
])
df= sqlContext.read.json(file, schema)
I need a way to find how can I define this schema in a kind of config or ini file etc. And read that in the main the PySpark application.
This will help me to modify schema for the changing JSON if there is any need in future without changing the main PySpark code.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…