Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)
1.4m articles
1.4m replys
5 comments
57.0k users