The cause:
parseBodyFragment()
as well as all other parse()
-methods use a HTML parser by default. And those add always the HTML-Shell (<html>…</html>
, <head>…</head>
etc.).
The Solution:
Just don't use a HTML-parser, use a XML-parser instead ;-)
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
Replace that single line and your problem is solved.
Example:
final String html = "<p><b>This <i>is</i></b> <i>my sentence</i> of text.</p>";
Document docHtml = Jsoup.parse(html);
Document docXml = Jsoup.parse(html, "", Parser.xmlParser());
System.out.println("******* HTML *******
" + docHtml);
System.out.println();
System.out.println("******* XML *******
" + docXml);
Output:
******* HTML *******
<html>
<head></head>
<body>
<p><b>This <i>is</i></b> <i>my sentence</i> of text.</p>
</body>
</html>
******* XML *******
<p><b>This <i>is</i></b> <i>my sentence</i> of text.</p>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…