Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
172 views
in Technique[技术] by (71.8m points)

Aggregating results from SPARQL query

I'm querying a dataset of tweets:

SELECT * WHERE {
  ?tweet smo:tweeted_at ?date ;
         smo:has_hashtag ?hashtag ;
         smo:tweeted_by ?account ;
         smo:english_tweet true .
  FILTER ( ?date >= "20130722"^^xsd:date && ?date < "20130723"^^xsd:date )
}

If a tweet has multiple hashtags, there is one row in the result set per hashtag. Is there any way for me to aggregate the hashtags into an array instead?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can GROUP BY by the variables that identify the tweet and then use GROUP_CONCAT to concatenate the hashtags into something like an array, but it will still be a string that you'll need to parse afterward. For instance, given data like

@prefix smo: <http://example.org/> .
@prefix : <http://example.org/> .

:tweet1 smo:tweeted_at "1" ;
        smo:has_hashtag "tag1", "tag2", "tag3" ;
        smo:tweeted_by "user1" ;
        smo:english_tweet true .

:tweet2 smo:tweeted_at "2" ;
        smo:has_hashtag "tag2", "tag3", "tag4" ;
        smo:tweeted_by "user2" ;
        smo:english_tweet true .

you can use a query like

prefix smo: <http://example.org/>

select ?tweet ?date ?account (group_concat(?hashtag) as ?hashtags) where {
  ?tweet smo:tweeted_at ?date ;
         smo:has_hashtag ?hashtag ;
         smo:tweeted_by ?account ;
         smo:english_tweet true .
}
group by ?tweet ?date ?account

to get results like:

--------------------------------------------------
| tweet      | date | account | hashtags         |
==================================================
| smo:tweet2 | "2"  | "user2" | "tag4 tag3 tag2" |
| smo:tweet1 | "1"  | "user1" | "tag3 tag2 tag1" |
--------------------------------------------------

You can specify the delimiter used in the group concatenation, so if there is some character that cannot appear in hashtags, you can use it as a delimiter. For instance, supposing that | can't appear in hashtags, you can use:

(group_concat(?hashtag;separator="|") as ?hashtags)

instead and get

--------------------------------------------------
| tweet      | date | account | hashtags         |
==================================================
| smo:tweet2 | "2"  | "user2" | "tag4|tag3|tag2" |
| smo:tweet1 | "1"  | "user1" | "tag3|tag2|tag1" |
--------------------------------------------------

If you're working in a language that has some literal array syntax, you might even be able to replicate that:

(concat('[',group_concat(?hashtag;separator=","),']') as ?hashtags)
----------------------------------------------------
| tweet      | date | account | hashtags           |
====================================================
| smo:tweet2 | "2"  | "user2" | "[tag4,tag3,tag2]" |
| smo:tweet1 | "1"  | "user1" | "[tag3,tag2,tag1]" |
----------------------------------------------------

Now, it doesn't affect the data here, but group_concat will actually include duplicates in the concatenation if they're present in the data. E.g., from the following (where I'm just providing data with values for the sake of the example):

prefix : <http://example.org/>

select ?tweet (concat('[',group_concat(?hashtag;separator=','),']') as ?hashtags)
where {
  values (?tweet ?hashtag) { 
    (:tweet1 "tag1") (:tweet1 "tag1") (:tweet1 "tag2") (:tweet1 "tag3")
    (:tweet2 "tag2") (:tweet2 "tag3") (:tweet2 "tag4")
  }
}
group by ?tweet

we get results including [tag1,tag1,tag2,tag3], i.e., the duplicate value of ?hashtag is included:

-------------------------------------
| tweet   | hashtags                |
=====================================
| :tweet2 | "[tag2,tag3,tag4]"      |
| :tweet1 | "[tag1,tag1,tag2,tag3]" |
-------------------------------------

We can avoid this by using group_concat(distinct ?hashtag;...):

prefix : <http://example.org/>

select ?tweet (concat('[',group_concat(distinct ?hashtag;separator=','),']') as ?hashtags)
where {
  values (?tweet ?hashtag) { 
    (:tweet1 "tag1") (:tweet1 "tag1") (:tweet1 "tag2") (:tweet1 "tag3")
    (:tweet2 "tag2") (:tweet2 "tag3") (:tweet2 "tag4")
  }
}
group by ?tweet
--------------------------------
| tweet   | hashtags           |
================================
| :tweet2 | "[tag2,tag3,tag4]" |
| :tweet1 | "[tag1,tag2,tag3]" |
--------------------------------

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...