The regex_match
returns only a single match, with all the capturing group submatches (their number depends on how many groups there are in the pattern).
Here, you only get 1 match that contains two submatches: 1) whole match, 2) capture group 1 value.
To obtain the contents of the capturing group, you need to access the smatches
object second element, matches[1].str()
or matches.str(1)
Note that when you write "<tag1.*>(.*)</tag1>1"
, the 1
is not parsed as a backreference, but as a char with octal code 1. Even if you defined a backreference (as "<tag1.*>(.*)</tag1>\1"
) you would require the whole text captured with the capturing group 1 to be repeated after </tag1>
- that is definitely not what you want. Actually, I doubt this regex is any good, at least, you need to replace ".*"
with "[\s\S]*?"
, but it is still a fragile approach to parse HTML with regex.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…