I researched this quite a bit, but couldn't find a working example how to match nested html tags with attributes. I know it is possible to match balanced/nested innermost tags without attributes (for example a regex for and would be #<div[^>]*>(?:(?> [^<]+ ) |<(?!div[^>]*>))*?</div>
#x).
However, I would like to see a regex pattern that finds an html tag pair with attributes.
Example: It basically should match
<div class="aaa"> **<div class="aaa">** <div> <div> </div> **</div>** </div>
and not
<div class="aaa"> **<div class="aaa">** <div> <div> **</div>** </div> </div>
Anybody has some ideas?
For testing purposes we could use: http://www.lumadis.be/regex/test_regex.php
PS. Steven mentioned a solution in his blog (actually in a comment), but it doesn't work
http://blog.stevenlevithan.com/archives/match-innermost-html-element
$regex = '/<div[^>]+?ids*=s*"MyID"[^>]*>(?:((?:[^<]++|<(?!/?div[^>]*>))+)|(<div[^>]*>(?>(?1)|(?2))*</div>))?</div>/i';
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…