Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
170 views
in Technique[技术] by (71.8m points)

How to get only given captured group <regex> c++

I want to extract tag's inner content. From the following string:

<tag1 val=123>Hello</tag1>

I just want to get

Hello

What I do:

string s = "<tag1 val=123>Hello</tag1>";
regex re("<tag1.*>(.*)</tag1>");
smatch matches;
bool b = regex_match(s, matches, re);

But it returns two matches:

<tag1 val=123>Hello</tag1>
Hello

And when I try to get only 1st captured group like this:

"<tag1.*>(.*)</tag1>1"

I get zero matches.

Please, advise.

question from:https://stackoverflow.com/questions/66053236/c-regex-error-message-when-running

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The regex_match returns only a single match, with all the capturing group submatches (their number depends on how many groups there are in the pattern).

Here, you only get 1 match that contains two submatches: 1) whole match, 2) capture group 1 value.

To obtain the contents of the capturing group, you need to access the smatches object second element, matches[1].str() or matches.str(1)

Note that when you write "<tag1.*>(.*)</tag1>1", the 1 is not parsed as a backreference, but as a char with octal code 1. Even if you defined a backreference (as "<tag1.*>(.*)</tag1>\1") you would require the whole text captured with the capturing group 1 to be repeated after </tag1> - that is definitely not what you want. Actually, I doubt this regex is any good, at least, you need to replace ".*" with "[\s\S]*?", but it is still a fragile approach to parse HTML with regex.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...