xml - Recursive call to a duplicated Bash script, making it unable to access the assets

Question

Welcome To Ask or Share your Answers For Others

xml - Recursive call to a duplicated Bash script, making it unable to access the assets

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

xml - Recursive call to a duplicated Bash script, making it unable to access the assets

Edit : This post is now addressed in a new, as the problem as to be presented slightly differently. It's here : How can I efficiently run XSLT transformations for a large number of files in parallel?

I'm stuck in my attempts of parallelizing a process, and after some decent time spent on it I'd like to request some help ...

Basically, I have a lots of XML files to transform with a specific XSLT sheet. But the sheet uses a call to an (very slow) API to fetch additional data, and taking the whole batch of XMLs in 1 go will take (very) long.

Therefore I splitted all the files from the original "input" folder into subfolder containing each around 5000 XML files, and I copied the following Bash script inside each subfolder too:

for f in *.xml
do
  java -jar ../../saxon9he.jar -xsl:../../some-xslt-sheet.xsl -s:$f
done

And I call each process, for each folder, from the "root" folder containing altogether the "input" folder, the Saxon library and the XSLT sheet :

find input -type d -exec sh {}/script.sh ;

But I get this error:

Unable to access jarfile ../../saxon9he.jar

I suppose it comes form the fact that I'm operating from the "root" folder, when the scripts being called are lower in the directories. I could solver the problem (if I'm correct) by copying all the assets in each subfolder, but I found the solution making my current approach even clumsier.

Thanks to anyone who might have an idea and make me understand this !

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:13:22+0000

Firstly, you really don't want to initialize a new Java VM to run each transformation: this is typically going to take much longer than running the actual transformation. To put this in perspective, for "typical" transformations you will often see Java initialization time 3 seconds, stylesheet compilation time 300ms, transformation time 10ms. So if you can find a way to do it that only initializes Java and compiles the stylesheet once, your total time for 10K documents is going to be 2 minutes rather than 10 hours.

There are various ways to achieve this but they all involve using something other than a shell-script to control the process. The simplest, in my view, is to control it from XSLT itself, by using the collection() function to access all the files in the directory. This has an added bonus, if you're using Saxon-EE, that the files will be processed (parsed) in parallel using all the cores on your machine, which can speed things up by another factor of 4 or so. You just need to add an entry point to the stylesheet something like:

<xsl:template name="main">
  <xsl:for-each select="collection('file:///my/dir?select=*.xml;recurse=yes')!saxon:discard-document(.)">
    <xsl:result-document href="....">
      <xsl:apply-templates/>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template>

The saxon:discard-document call is optional, but because it makes documents eligible for garbage collection, means that you are less likely to run out of memory.

Another approach to writing the control loop is to use a specialized shell such as xmlsh.

Categories

xml - Recursive call to a duplicated Bash script, making it unable to access the assets

xml - Recursive call to a duplicated Bash script, making it unable to access the assets

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags