Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
157 views
in Technique[技术] by (71.8m points)

r - How to make string replacements case-insensitive

I'm working on speech transcriptions with phonologically reduced forms:

reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")

I need to replace these forms with contiguous strings of the same letters but without the whitespaces:

reduced_replacements <- setNames(c("innit", "dunno", "dunnit", "wanna", "gonna", "gotta"),            # new forms
                                 c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta"))   # old forms

The problem is that the reduced forms may vary in terms of case. That is, the replacement needs to be case-insensitive. I've tried to make the regex pattern case-insensitive by including (?i):

# pattern:
reduced_pattern <- paste0("(?i)\b(", paste0(reduced, collapse = "|"), ")\b")

But apparently that does not do the trick:

# test:
 tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
          "will be great in n it, ", "it matters Dun n it",
          "Looks awesome. Dun n it?", "Gon na be terrific!")
 library(stringr)
 ifelse(grepl(reduced_pattern, tst, perl = T),
        str_replace_all(tst[grepl(reduced_pattern, tst)], reduced_replacements),
        tst)
[1] "Wan na go ? well dunno. come on" "i do n't know really"            "it matters Dun n it"            
[4] "Looks awesome. Dun n it?"        "Gon na be terrific!"             "Wan na go ? well dunno. come on"

None of the capitalized reduced forms get replaced. How can that be achieved in an effective way, i.e., other than by enumerating the upper-case forms in reducedand reduced_replacements and converting everything tolower case?

The correct result would be:

[1] "Wanna go ? well dunno. come on" "i do n't know really"            "it matters Dunnit"            
[4] "Looks awesome. Dunnit?"        "Gonna be terrific!"             "Wanna go ? well dunno. come on"
question from:https://stackoverflow.com/questions/65857710/how-to-make-string-replacements-case-insensitive

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use a stringr::str_replace_all with a function as a replacement argument, where you can simply remove all whitespaces you want.

See an R demo:

library(stringr)
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
          "will be great in n it, ", "it matters Dun n it",
          "Looks awesome. Dun n it?", "Gon na be terrific!")
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
reduced_pattern <- paste0("(?i)\b(?:", paste0(reduced, collapse = "|"), ")\b")
str_replace_all(tst, reduced_pattern, function(x) str_replace_all(x, "\s+",""))
## => [1] "Wanna go ? well dunno. come on" "i do n't know really"          
##    [3] "will be great innit, "          "it matters Dunnit"             
##    [5] "Looks awesome. Dunnit?"         "Gonna be terrific!"  

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...