Explode for array object works like cross join.
So if you have 3 columns with each containing array with 2 elements, applying explode on all the columns will give you 8 rows.
You can't map one object from array to another.
Actually you can by using posexplode
which gives you index
for each element. which you can use to join based on condition. However, that' tricky when you have multiple columns and the array size is different for each column.
Solution
- Use
posexplode
if you have less column to explode and array size is same. for your case this is not going to work. So
- Store XML as Complex Data Type : Store your whole XML as a complex data type (not just array), I am talking about creating a
struct
based on your xml.
If you don't have much complex xml, you can achieve this. However xmlSerde
is not as good as JSONserde
when it comes to converting file to complex data type.
So in your case best solution would be.
- Convert your XML to JSON. You can use
NiFi
or some other technology for that.
- Create Hive table using
JSONserde
and load this file.
- Create a view as per your requirement.
JSON for Your XML
{"Version":"1.1","StoreId":"16695","Bskt":[{"TillNo":"4","BsktNo":"1753","DateTime":"2017-10-31T11:19:34.000+11:00","OpID":"50056","Itm":[{"ItmSeq":"1","GTIN":"29559","ItmDsc":"CHOCALATE","ItmProm":{"PromCD":"CM"}},{"ItmSeq":"2","GTIN":"59653","ItmDsc":"CORNFLAKES"},{"ItmSeq":"3","GTIN":"42260","ItmDsc":"MILKCHOCOLATE162GM","ItmProm":{"PromCD":"MTSRO","OfferID":"11766"}}]},{"TillNo":"5","BsktNo":"1947","DateTime":"2017-10-31T16:24:59.000+11:00","OpID":"50063","Itm":[{"ItmSeq":"1","GTIN":"24064","ItmDsc":"TOMATOES2KG","ItmProm":{"PromCD":"INSTORE"}},{"ItmSeq":"2","GTIN":"81287","ItmDsc":"ROTHMANSBLUE","ItmProm":{"PromCD":"TF"}}]}]}
JsonSerde
might give your error if you have tabs or other white spaces in your file. So it's always best to remove them.
Hive Table
create external table temp.test_json
(
Version string,
StoreId string,
Bskt array<struct<
BsktNo:string,
DateTime:string,
OpID:string,
TillNo:string,
Itm:array<struct<
GTIN:string,
ItmDsc:string,
ItmSeq:string,
ItmProm:struct<
OfferID:string,
PromCD:string
>
>
>
>
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
location '/tmp/test_json/table/';
Create View
SELECT Version,
StoreId,
basket.bsktno,
basket.tillno,
basket.`datetime`,
item.itmseq,
item.itmdsc,
item.gtin,
item.itmprom.offerid,
item.itmprom.promcd
FROM temp.test_json
lateral view explode(bskt) b AS basket
lateral view explode(basket.itm) i AS item