You can do this, but a couple of things should be noted first:
- Two classes may have lots of superclasses in common, not necessarily just one. This means that there may not be a unique most specialized common superclass.
- If some class C is a superclass of A and B, then every superclass of C is also a superclass of A and B.
A class D might be a superclass of C by multiple paths, which can cause some difficulties if you're trying to compute length. E.g.,
Computer Hardware
Monitors
Flatscreen Monitors
Dell Flatscreen Monitors *
Dell Hardware
Dell Flatscreen Monitors *
In this hierarchy, Dell Flatscreen Monitors is a subclass of Computer Hardware by a path of length 2 (DFM → DH → CH) and by a path of length 3 (DFM → FM → M →CH). That's fine, but if you're computing a length from DFM to another subclass of CH, which of those should you use?
- There might not be any common superclasses in the data. This is also a perfectly legal situation. Now, in OWL, every class is a subclass
owl:Thing
, but that doesn't hold for RDF in general, and you probably won't even get that result from DBpedia because there's no OWL reasoner attached.
Assuming that you can work out the details that you need to address those issues, this isn't too hard. It's easiest, in my opinion, to build up this query step by step. First, using a query like this, you can get the superclasses of a class, and the length of the path to each of the superclasses. This does presume that there is a unique path from the subclass to the superclass. If there are multiple paths, I think the length reported will be the sum of the different paths. I'm not sure how you could get around this.
select ?sub ?super (count(?mid) as ?length) where {
values ?sub { dbpedia-owl:Person }
?sub rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf+ ?super .
}
group by ?sub ?super
SPARQL results
sub super length
http://dbpedia.org/ontology/Person http://dbpedia.org/ontology/Agent 1
http://dbpedia.org/ontology/Person http://www.w3.org/2002/07/owl#Thing 2
Now the trick is to use this approach for both the subclasses, and then join the results based on the superclasses that they have in common, using a query like this:
select *
{
values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }
{ select ?a ?super (count(?mid) as ?aLength) {
?a rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf+ ?super .
}
group by ?a ?super
}
{ select ?b ?super (count(?mid) as ?bLength) {
?b rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf+ ?super .
}
group by ?b ?super
}
}
SPARQL results
That query still finds the path lengths for all the common superclasses, not just most specific ones, and it's still not adding the length from ?a
to ?super
and the length from ?b
to ?super
to get the full path length. That's just a bit of arithmetic though. You can order these results by the length, and then limit to just one result so that you're getting the shortest one. As I pointed out, there might not be a unique most specific common subclasses, but the result with the shortest length will be one of the most specific common subclasses.
select ?a ?b ?super (?aLength + ?bLength as ?length)
{
values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }
{ select ?a ?super (count(?mid) as ?aLength) {
?a rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf+ ?super .
}
group by ?a ?super
}
{ select ?b ?super (count(?mid) as ?bLength) {
?b rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf+ ?super .
}
group by ?b ?super
}
}
order by ?length
limit 1
SPARQL results
a b super length
Person SportsTeam Agent 3
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…