Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
163 views
in Technique[技术] by (71.8m points)

How do I make part of a regex match optional?

This is an example string:

123456#p654321

Currently, I am using this match to capture 123456 and 654321 in to two different groups:

([0-9].*)#p([0-9].*)

But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.

What would be the best way to solve this problem?

question from:https://stackoverflow.com/questions/66066650/r-using-tidyrs-extract-and-regex-to-extract-values-from-structured-character

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:

([0-9]*)(?:#p([0-9]*))?

The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.

Also, most reg-ex variants have a d character class for digits. So you could simplify even further:

(d*)(?:#p(d*))?

As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:

(d+)(?:#p(d+))?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...