I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end <html>
and <body>
tags.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
How can I prevent Nokogiri from doing this?
I.E., when I do:
doc = Nokogiri::HTML("<div>some content</div>")
doc.to_s
or:
doc.to_html
I get the original:
<html blah><body>div>some content</div></body></html>
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…