Handling a fixed number of categories
Given data like
@prefix : <http://example.org/books/> .
:book1 a :Book, :Cat1 .
:book2 a :Book, :Cat1, :Cat3 .
:book3 a :Book, :Cat1, :Cat2 .
you can use a query like
prefix : <http://example.org/books/>
select ?individual
(if(bound(?cat1),1,0) as ?Cat1)
(if(bound(?cat2),1,0) as ?Cat2)
(if(bound(?cat3),1,0) as ?Cat3)
where {
?individual a :Book .
OPTIONAL { ?individual a :Cat1 . bind( ?individual as ?cat1 ) }
OPTIONAL { ?individual a :Cat2 . bind( ?individual as ?cat2 ) }
OPTIONAL { ?individual a :Cat3 . bind( ?individual as ?cat3 ) }
}
order by ?book
in which certain variables are bound (the particular value to which they are bound doesn't really matter though) based on the whether certain triples are present to get results like these:
$ arq --data data.n3 --query matrix.sparql
-----------------------------------
| individual | Cat1 | Cat2 | Cat3 |
===================================
| :book1 | 1 | 0 | 0 |
| :book2 | 1 | 0 | 1 |
| :book3 | 1 | 1 | 0 |
-----------------------------------
Handling an arbitrary number of categories
Here's a solution that seems to work in Jena, though I'm not sure that the specific results are guaranteed. (Update: Based on this answers.semanticweb.com question and answer, it seems that this behavior is not guaranteed by the SPARQL specification.) If we have a little bit more data, e.g., about which things are categories and which are books, e.g.,
@prefix : <http://example.org/books/> .
:book1 a :Book, :Cat1 .
:book2 a :Book, :Cat1, :Cat3 .
:book3 a :Book, :Cat1, :Cat2 .
:Cat1 a :Category .
:Cat2 a :Category .
:Cat3 a :Category .
then we can run a subquery that selects all the categories in order, and then for each book computes a string indicating whether or not the book is in each category.
prefix : <http://example.org/books/>
select ?book (group_concat(?isCat) as ?matrix) where {
{
select ?category where {
?category a :Category
}
order by ?category
}
?book a :Book .
OPTIONAL { bind( 1 as ?isCat ) ?book a ?category . }
OPTIONAL { bind( 0 as ?isCat ) NOT EXISTS { ?book a ?category } }
}
group by ?book
order by ?book
This has the output:
$ arq --data data.n3 --query matrix2.query
--------------------
| book | matrix |
====================
| :book1 | "1 0 0" |
| :book2 | "1 0 1" |
| :book3 | "1 1 0" |
--------------------
which is much closer to the output in the question, and handles an arbitrary number categories. However, it depends on the values of ?category
being processed in the same order for each ?book
, and I'm not sure whether that's guaranteed or not.
We can even use this approach to generate a header row for the table. Again, this depends on the ?category
values being processed in the same order for each ?book
, which might not be guaranteed, but seems to work in Jena. To get a category header, all we need to do is create a row where ?book
is unbound, and the value of the ?isCat
indicates the particular category:
prefix : <http://example.org/books/>
select ?book (group_concat(?isCat) as ?matrix) where {
{
select ?category where {
?category a :Category
}
order by ?category
}
# This generates the header row where ?isCat is just
# the category, so the group_concat gives headers.
{
bind(?category as ?isCat)
}
UNION
# This is the table as before
{
?book a :Book .
OPTIONAL { bind( 1 as ?isCat ) ?book a ?category . }
OPTIONAL { bind( 0 as ?isCat ) NOT EXISTS { ?book a ?category } }
}
}
group by ?book
order by ?book
We get this output:
--------------------------------------------------------------------------------------------------------
| book | matrix |
========================================================================================================
| | "http://example.org/books/Cat1 http://example.org/books/Cat2 http://example.org/books/Cat3" |
| :book1 | "1 0 0" |
| :book2 | "1 0 1" |
| :book3 | "1 1 0" |
--------------------------------------------------------------------------------------------------------
Using some string manipulation, you could shorten the URIs used for the categories, or widen the array entries to get correct alignment. One possibility is this:
prefix : <http://example.org/books/>
select ?book (group_concat(?isCat) as ?categories) where {
{
select ?category
(strafter(str(?category),"http://example.org/books/") as ?name)
where {
?category a :Category
}
order by ?category
}
{
bind(?name as ?isCat)
}
UNION
{
?book a :Book .
# The string manipulation here takes the name of the category (which should
# be at least two character), trims off the first character (string indexing
# in XPath functions starts at 1), and replaces the rest with " ". The resulting
# spaces are concatenated with "1" or "0" depending on whether the book is a
# member of the category. The resulting string has the same width as the
# category name, and makes for a nice table.
OPTIONAL { bind( concat(replace(substr(?name,2),"."," "),"1") as ?isCat ) ?book a ?category . }
OPTIONAL { bind( concat(replace(substr(?name,2),"."," "),"0") as ?isCat ) NOT EXISTS { ?book a ?category } }
}
}
group by ?book
order by ?book
which produces this output:
$ arq --data data.n3 --query matrix3.query
-----------------------------
| book | categories |
=============================
| | "Cat1 Cat2 Cat3" |
| :book1 | " 1 0 0" |
| :book2 | " 1 0 1" |
| :book3 | " 1 1 0" |
-----------------------------
which is almost exactly what you had in the question.