Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

apache pig - How do I parse JSON in Pig?

I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json:

{"this":"that","test":"4"}

Type b is deeply nested json. I came across this gist talking about compiling a jar to make this work. Since my java skills are less than stellar, I didn't really know what to do from here.

{"this":{"foo":"bar","baz":{"test":"me"},"total":"5"}}

Since types i and c are not always in the same order, this makes specifying everything in the generate regex difficult. Is handling JSON (in a gzip'd file) possible with Pig? I am using whichever version of Pig comes built on an Amazon Elastic Map Reduce instance.

This boils down to two questions: 1) Can I parse JSON with Pig (and if so, how)? 2) If I can parse JSON (from a gzip'd logfile), can I parse nested JSON objects?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Pig 0.10 comes with builtin JsonStorage and JsonLoader().

pig doc for json load/store


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...