Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
580 views
in Technique[技术] by (71.8m points)

ruby - How do I validate XHTML with nokogiri?

I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents.

For me, this:

doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org")))
puts doc.validate

results in a whole heap of:

[
#<Nokogiri::XML::SyntaxError: No declaration for element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,  
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for element head>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute profile of element head
[repeat for every tag in the document.]
]

So I'm assuming that's not the right approach. I can't seem to locate any good examples -- can anyone suggest what I'm doing wrong?

I'm running ruby 1.8.6 on Mac OSX 10.5.8. Nokogiri tells me:

nokogiri: 1.3.3
warnings: []

libxml: 
  compiled: 2.6.23
  loaded: 2.6.23
  binding: extension
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's not just you. What you're doing is supposed to be the right way to do it, but I've never had any luck with it. As far as I can tell, there's some disconnect somewhere between Nokogiri and libxml which causes it to not load SYSTEM DTDs, or to recognize PUBLIC DTDs. It will work if you define the DTD within the XML file, but good luck doing that with the XHTML DTDs.

The best thing I can recommend is to use the schemas for XHTML instead:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('http://www.w3.org'))
xsd = Nokogiri::XML::Schema(open('http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd'))

#this is a true/false validation
xsd.valid?(doc)    # => true

#this gives a listing of errors
xsd.validate(doc)  # => []

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...