Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

apache pig - Pig : result of json loader empty

I'm using cdh5 quickstart vm and I have a file like this(not full here):

{"user_id": "kim95",
 "type": "Book",
 "title": "Modern Database Systems: The Object Model, Interoperability, and
Beyond.",
 "year": "1995",
 "publisher": "ACM Press and Addison-Wesley",
 "authors": {},
 "source": "DBLP"
}
{"user_id": "marshallo79",
 "type": "Book",
 "title": "Inequalities: Theory of Majorization and Its Application.",
 "year": "1979",
 "publisher": "Academic Press",
 "authors": {("Albert W. Marshall"), ("Ingram Olkin")},
 "source": "DBLP"
}

and I used this script:

books = load 'data/book-seded.json'
        using JsonLoader('t1:tuple(user_id:
chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,source:chararray,authors:bag{T:tuple(author:chararray)})');

STORE books INTO 'book-no-seded.tsv';

the script works , but the generated file is empty, do you have any idea?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Finally , only this schema worked : If I add or remove a space different from this configuration then i gonna have an error( i also added "name" for tuples and specified "null" when it was empty, and changed the order between authors and source, but even without this congiguration it will still be wrong)

{"user_id": "kim95", "type": "Book","title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":null"}], "source": "DBLP"}
{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}

And the working script is this one :

books = load 'data/book-seded-workings-reduced.json'
        using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');

STORE books INTO 'book-table.csv';  //whether .tsv or .csv

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...