Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

html - Regular expression for nested tags (innermost to make it easier)

I researched this quite a bit, but couldn't find a working example how to match nested html tags with attributes. I know it is possible to match balanced/nested innermost tags without attributes (for example a regex for and would be #<div[^>]*>(?:(?> [^<]+ ) |<(?!div[^>]*>))*?</div>#x).

However, I would like to see a regex pattern that finds an html tag pair with attributes.

Example: It basically should match

<div class="aaa"> **<div class="aaa">** <div> <div> </div> **</div>** </div>

and not

<div class="aaa"> **<div class="aaa">** <div> <div> **</div>** </div> </div>

Anybody has some ideas?

For testing purposes we could use: http://www.lumadis.be/regex/test_regex.php


PS. Steven mentioned a solution in his blog (actually in a comment), but it doesn't work

http://blog.stevenlevithan.com/archives/match-innermost-html-element

$regex = '/<div[^>]+?ids*=s*"MyID"[^>]*>(?:((?:[^<]++|<(?!/?div[^>]*>))+)|(<div[^>]*>(?>(?1)|(?2))*</div>))?</div>/i';
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

RegEx match open tags except XHTML self-contained tags

And indeed, it is absolutely impossible. HTML has something unique, something magical, which is immune to RegEx.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...