I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager
instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class
itself. When the access fails (by default), then the configuration is loaded from the xalan.jar
file's
META-INF/service/org.apache.xml.dtm.DTMManager
configuration file. Every time!:
Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName()
if the factory class name is the default anyway:
// Code from com.sun.org.apache.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(String factoryId,
String propertiesFilename,
String fallbackClassName) {
SecuritySupport ss = SecuritySupport.getInstance();
try {
String systemProp = ss.getSystemProperty(factoryId);
if (systemProp != null) {
// Return early from the method
return systemProp;
}
} catch (SecurityException se) {
}
// [...] "Heavy" operations later
So here's a performance improvement overview for 10k consecutive XPath evaluations of //SomeNodeName
against a 90k XML file (measured with System.nanoTime()
:
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
note that the benchmark was a very primitive one. it may well be that your own benchmark will show that saxon outperforms xalan
I have filed this as a bug to the Xalan guys at Apache:
https://issues.apache.org/jira/browse/XALANJ-2540
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…