I recently published a post talking about how to parse the XML that comes as plain text inside the "body" field of an EventHub Reading schema of streaming Dataframe in Spark Structured Streaming
What I would really like to know is if there is a possibility to infer the XML schema as the data arrives and not have it declared first.
As described in that entry, in batch mode I have used the https://github.com/databricks/spark-xml library. More specifically: https://github.com/databricks/spark-xml#parsing-nested-xml.
What I would like to know is if there is the possibility of automatically inferring the XML schema.
I tried to first upload some sample data and then extract that schema and use it. But is there a more efficient way?
Many thanks!
1.4m articles
1.4m replys
5 comments
56.9k users