It seems like every question on stackoverflow where the asker is using regex to grab some information from HTML will inevitably have an "answer" that says not to use regex to parse HTML.
Why not? I'm aware that there are quote-unquote "real" HTML parsers out there like Beautiful Soup, and I'm sure they're powerful and useful, but if you're just doing something simple, quick, or dirty, then why bother using something so complicated when a few regex statements will work just fine?
Moreover, is there just something fundamental that I don't understand about regex that makes them a bad choice for parsing in general?
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…