Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

Is it possible to get the position of an element in an RDF Collection in SPARQL?

Suppose that I have the following Turtle declaration:

@prefix : <http://example.org#> .

:ls :list (:a :b :c)

Is there a way to get the positions of the elements in the collection?

For example, with this query:

PREFIX :     <http://example.org#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 

SELECT ?elem WHERE {
 ?x :list ?ls .
 ?ls rdf:rest*/rdf:first ?elem .
}

I get:

--------
| elem |
========
| :a   |
| :b   |
| :c   |
--------

But I would like a query to obtain:

--------------
| elem | pos |
==============
| :a   |  0  |
| :b   |  1  |
| :c   |  2  |
--------------

Is it possible?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A Pure SPARQL 1.1 Solution

I've extended the data to make the problem a little harder. Let's add a duplicate element to the list, e.g., an additional :a at the end:

@prefix : <http://example.org#> .

:ls :list (:a :b :c :a) .

Then we can use a query like this to extract each list node (and its element) along with the position of the node in the list. The idea is that we can match all the individual nodes in the list with a pattern like [] :list/rdf:rest* ?node. The position of each node, though, is the number of intermediate nodes between the head of the list and ?node. We can match each of those intermediate nodes by breaking the pattern down into

[] :list/rdf:rest* ?mid . ?mid rdf:rest* :node .

Then if we group by ?node, the number of distinct ?mid bindings is the position of ?node in the list. Thus we can use the following query (which also grabs the element (the rdf:first) associated with each node) to get the positions of elements in the list:

prefix : <https://stackoverflow.com/q/17523804/1281433/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?element (count(?mid)-1 as ?position) where { 
  [] :list/rdf:rest* ?mid . ?mid rdf:rest* ?node .
  ?node rdf:first ?element .
}
group by ?node ?element
----------------------
| element | position |
======================
| :a      | 0        |
| :b      | 1        |
| :c      | 2        |
| :a      | 3        |
----------------------

This works because the structure of an RDF list is a linked list like this (where ?head is the beginning of the list (the object of :list), and is another binding of ?mid because of the pattern [] :list/rdf:rest* ?mid):

graphical representation of RDF list

Comparison with Jena ARQ Extensions

The asker of the question also posted an answer that uses Jena's ARQ extensions for working with RDF lists. The solution posted in that answer is

PREFIX :     <http://example.org#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>

SELECT ?elem ?pos WHERE {
 ?x :list ?ls .
 ?ls list:index (?pos ?elem).
}

This answer depends on using Jena's ARQ and enabling the extensions, but it is more concise and transparent. What isn't obvious is whether one has an obviously preferable performance. As it turns out, for small lists, the difference isn't particularly significant, but for larger lists, the ARQ extensions have much better performance. The runtime for the pure SPARQL query quickly becomes prohibitively long, while there's almost no difference in the version using the ARQ extensions.

-------------------------------------------
| num elements | pure SPARQL | list:index |
===========================================
|      50      |    1.1s     |    0.8s    |
|     100      |    1.5s     |    0.8s    |
|     150      |    2.5s     |    0.8s    |
|     200      |    4.8s     |    0.8s    |
|     250      |    9.7s     |    0.8s    |
-------------------------------------------

These specific values will obviously differ depending on your setup, but the general trend should be observable anywhere. Since things could change in the future, here's the particular version of ARQ I'm using:

$ arq --version
Jena:       VERSION: 2.10.0
Jena:       BUILD_DATE: 2013-02-20T12:04:26+0000
ARQ:        VERSION: 2.10.0
ARQ:        BUILD_DATE: 2013-02-20T12:04:26+0000

As such, if I knew that I had to process lists of non-trivial sizes and that I had ARQ available, I'd use the extension.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...