Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
925 views
in Technique[技术] by (71.8m points)

regex - regexp - find numbers in a string in any order

I need to find a regexp that allows me to find strings in which i have all the required numbers but only once.

For example:

a <- c("12","13","112","123","113","1123","23","212","223","213","2123","312","323","313","3123","1223","1213","12123","2313","23123","13123")

I want to get:

"123" "213" "312"

The pattern 123 only once and in any order and in any position of the string

I tried a lot of things and this seemed to be the closer while it's still very far from what I want :

grep('[1:3][1:3][1:3]', a, value=TRUE)
[1] "113"   "313"   "2313"  "13123"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What i exactly need is to find all 3 digit numbers containing 1 2 AND 3 digits

Then you can safely use

grep('^[123]{3}$', a, value=TRUE)
##=> [1] "112" "123" "113" "212" "223" "213" "312" "323" "313"

The regex matches:

  • ^ - start of string
  • [123]{3} - Exactly 3 characters that are either 1, or 2 or 3
  • $ - assert the position at the end of string.

Also, if you only need unique values, use unique.

If you do not need to allow the same digit more than once, you need a Perl-based regex:

grep('^(?!.*(.).*\1)[123]{3}$', a, value=TRUE, perl=T)
## => [1] "123" "213" "312"

Note the double escaped back-reference. The (?!.*(.).*\1) negative look-ahead will check if the string has no repeated symbols with the help of a capturing group (.) and a back-reference that forces the same captured text to appear in the string. If the same characters are found, there will be no match. See IDEONE demo.

The (?!.*(.).*\1) is a negative look-ahead. It only asserts the absence of some pattern after the current regex engine position, i.e. it checks and returns true if there is no match, otherwise it returns false. Thus, it does not not "consume" characters, it does not "match" the pattern inside the look-ahead, the regex engine stays at the same location in the input string. In this regex, it is the beginning of string (^). So, right at the beginning of the string, the regex engine starts looking for .* (any character but a newline, 0 or more repetitions), then captures 1 character (.) into group 1, again matches 0 or more characters with .*, and then tries to match the same text inside group 1 with \1. Thus, if there is 121, there will be no match since the look-ahead will return false as it will find two 1s.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...