It's not trivial. In the context of the nodes you selected (the td
), to get everything between two elements, you need to perform an intersection of these two sets:
- Set A: All the nodes preceding the first
h3
: //h3[1]/preceding::node()
- Set B: All the nodes following the first
h2
: //h2[1]/following::node()
To perform an intersection, you can use the Kaysian method (after Michael Kay, who proposed it). The basic formula is:
A[count(.|B) = count(B)]
Applying it to your sets, as defined above, where A = //h3[1]/preceding::node()
, and B = //h2[1]/following::node()
, we have:
//h3[1]/preceding::node()[ count( . | //h2[1]/following::node()) = count(//h2[1]/following::node()) ]
which will select all elements and text nodes starting with the first <br>
after the </h2>
tag, to the whitespace text node after the last <br>
, just before the next <h3>
tag.
You can easily select just the text nodes between h2
and h3
replacing node()
for text()
in the expression. This one will return all text nodes (including whitespace and linebreaks) between the two headers:
//h3[1]/preceding::text()[ count( . | //h2[1]/following::text()) = count(//h2[1]/following::text()) ]
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…