I'm struggling to get the correct combination of an XPath expression and the namespace specification as required by package XML
(argument namespaces
) for a XML document that has an explicit xmlns
namespace defined at the top element.
UPDATE
Thanks to har07 I was able to put it together:
Once you query the namespaces, the first entry of ns
has no name yet and that's the problem:
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
> ns
omegahat r
"http://something.org" "http://www.omegahat.org" "http://www.r-project.org"
So we'll just assign a name that serves as a prefix (this can be any valid R name):
names(ns)[1] <- "xmlns"
Now all we have to do is using that default namespace prefix everywhere in our XPath expressions:
getNodeSet(doc, "/xmlns:doc//xmlns:b[@omegahat:status='foo']", ns)
For those interested in alternative solutions based on name()
and namespace-uri()
(amongst others) might find this post helpful.
Just for the sake of reference: this was the trial-and-error code before we came to the solution:
Consider the example from ?xmlParse
:
require("XML")
doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))
> doc
<?xml version="1.0"?>
<doc>
<!-- A comment -->
<a xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
<b>
<c>
<b/>
</c>
</b>
<b omegahat:status="foo">
<r:d>
<a status="xyz"/>
<a/>
<a status="1"/>
</r:d>
</b>
</a>
</doc>
nsDefs <- xmlNamespaceDefinitions(getNodeSet(doc, "/doc/a")[[1]])
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", ns)[[1]]
In my document, however, the namespaces are already defined in <doc>
tag, so I adapted the example XML code accordingly:
xml_source <- c(
"<?xml version="1.0"?>",
"<doc xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">",
"<!-- A comment -->",
"<a>",
"<b>",
"<c>",
"<b/>",
"</c>",
"</b>",
"<b omegahat:status="foo">",
"<r:d>",
"<a status="xyz"/>",
"<a/>",
"<a status="1"/>",
"</r:d>",
"</b>",
"</a>",
"</doc>"
)
write(xml_source, file="exampleData_2.xml")
doc <- xmlParse("exampleData_2.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
getNodeSet(doc, "/doc", namespaces = ns)
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", namespaces = ns)[[1]]
Everything still works fine. What's more, though, is that my XML code additionally has an explicit definition of the default namespace (xmlns
):
xml_source <- c(
"<?xml version="1.0"?>",
"<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">",
"<!-- A comment -->",
"<a>",
"<b>",
"<c>",
"<b/>",
"</c>",
"</b>",
"<b omegahat:status="foo">",
"<r:d>",
"<a status="xyz"/>",
"<a/>",
"<a status="1"/>",
"</r:d>",
"</b>",
"</a>",
"</doc>"
)
write(xml_source, file="exampleData_3.xml")
doc <- xmlParse("exampleData_3.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
What used to work fails now:
> getNodeSet(doc, "/doc", namespaces = ns)
list()
attr(,"class")
[1] "XMLNodeSet"
Warning message:
using http://something.org as prefix for default namespace http://something.org
> getNodeSet(doc, "/xmlns:doc", namespaces = ns)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces, :
error evaluating xpath expression /xmlns:doc
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org
getNodeSet(doc, "/xmlns:doc",
namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
)
This seems to get me closer:
> getNodeSet(doc, "/xmlns:doc",
+ namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
+ )[[1]]
<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
<!-- A comment -->
<a>
<b>
<c>
<b/>
</c>
</b>
<b omegahat:status="foo">
<r:d>
<a status="xyz"/>
<a/>
<a status="1"/>
</r:d>
</b>
</a>
</doc>
attr(,"class")
[1] "XMLNodeSet"
Yet, now I don't know how to proceed in order to get to the children nodes:
> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']", ns)[[1]]
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces, :
error evaluating xpath expression /xmlns:doc//b[@omegahat:status='foo']
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org
> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']",
+ namespaces = c(
+ matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs),
+ matchNamespaces(doc, namespaces="omegahat", nsDefs = nsDefs)
+ )
+ )
list()
attr(,"class")
[1] "XMLNodeSet"
See Question&Answers more detail:
os