I have streamed and saved about 250k tweets into MongoDB and here, I am retrieving it, as you can see, based on a word, or keyword, present in the tweet.
Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB("TwitterData");
DBCollection collection = db.getCollection("publicTweets");
BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0);
BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography"));
DBCursor cur=collection.find(query,fields);
What I would like to do is to use Map-Reduce and based on the keyword, categorize it and pass it to the reduce function to count the number of tweets under each category, kinda like what you can see here. In the example, he's counting the number of pages as it is a simple number. I wanna do something like:
"if (this.tweet.contains("kword1")) "+
"category = 'kword1 tweets'; " +
"else if (this.tweet.contains("kword2")) " +
"category = 'kword2 tweets';
and then use the reduce function to get the count, just like in the sample program.
I know that the syntax is incorrect, but that's pretty much what I would like to do. Is there any way of achieving it? Thanks!
PS: Oh, and I'm coding in Java. So the Java syntax would be highly appreciated. Thank you!
The output of the code posted is something like this:
{ "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."}
{ "tweet" : "I refuse to read anything that's not real the only thing I've read since biff books is Jordan's autobiography #lol"}
{ "tweet" : "well we've had the 2012 publication of Ashley's Good Books, I predict 2013 will be seeing an autobiography ;)"}
This of course, is for all tweets with the word "autobiography". What I'd like is to use this in the map function, categorize it as a "autobiography tweet" (and other keywords too), and then send it to the reduce function to count everything and return the number of tweets with the word in it.
Something like:
{"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}}
{"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}}
See Question&Answers more detail:
os