r - regex match substring unless another substring matches

Question

Welcome To Ask or Share your Answers For Others

r - regex match substring unless another substring matches

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - regex match substring unless another substring matches

I'm trying to dig deeper into regexes and want to match a condition unless some substring is also found in the same string. I know I can use two grepl statements (as seen below) but am wanting to use a single regex to test for this condition as I'm pushing my understanding. Let's say I want to match the words "dog" and "man" using "(dog.*man|man.*dog)" (taken from here) but not if the string contains the substring "park". I figured I could use (*SKIP)(*FAIL) to negate the "park" but this does not cause the string to fail (shown below).

How can I match the logic of find "dog" & "man" but not "park" with 1 regex?
What is wrong with my understanding of (*SKIP)(*FAIL)|?

The code:

x <- c(
    "The dog and the man play in the park.",
    "The man plays with the dog.",
    "That is the man's hat.",
    "Man I love that dog!",
    "I'm dog tired",
    "The dog park is no place for man.",
    "Park next to this dog's man."
)

# Could do this but want one regex
grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE) & !grepl("park", x, ignore.case=TRUE)

# Thought this would work, it does not
grepl("park(*SKIP)(*FAIL)|(dog.*man|man.*dog)", x, ignore.case=TRUE, perl=TRUE)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:38:08+0000

You can use the anchored look-ahead solution (requiring Perl-style regexp):

grepl("^(?!.*park)(?=.*dog.*man|.*man.*dog)", x, ignore.case=TRUE, perl=T)

Here is an IDEONE demo

^ - anchors the pattern at the start of the string
(?!.*park) - fail the match if park is present
(?=.*dog.*man|.*man.*dog) - fail the match if man and dog are absent.

Another version (more scalable) with 3 look-aheads:

^(?!.*park)(?=.*dog)(?=.*man)

Categories

r - regex match substring unless another substring matches

r - regex match substring unless another substring matches

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags