I'm working on speech transcriptions with phonologically reduced forms:
reduced <- c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")
I need to replace these forms with contiguous strings of the same letters but without the whitespaces:
reduced_replacements <- setNames(c("innit", "dunno", "dunnit", "wanna", "gonna", "gotta"), # new forms
c("in n it", "du n no", "dun n it", "wan na", "gon na", "got ta")) # old forms
The problem is that the reduced
forms may vary in terms of case. That is, the replacement needs to be case-insensitive. I've tried to make the regex pattern case-insensitive by including (?i)
:
# pattern:
reduced_pattern <- paste0("(?i)\b(", paste0(reduced, collapse = "|"), ")\b")
But apparently that does not do the trick:
# test:
tst <- c("Wan na go ? well du n no. come on", "i do n't know really",
"will be great in n it, ", "it matters Dun n it",
"Looks awesome. Dun n it?", "Gon na be terrific!")
library(stringr)
ifelse(grepl(reduced_pattern, tst, perl = T),
str_replace_all(tst[grepl(reduced_pattern, tst)], reduced_replacements),
tst)
[1] "Wan na go ? well dunno. come on" "i do n't know really" "it matters Dun n it"
[4] "Looks awesome. Dun n it?" "Gon na be terrific!" "Wan na go ? well dunno. come on"
None of the capitalized reduced
forms get replaced. How can that be achieved in an effective way, i.e., other than by enumerating the upper-case forms in reduced
and reduced_replacements
and converting everything tolower
case?
The correct result would be:
[1] "Wanna go ? well dunno. come on" "i do n't know really" "it matters Dunnit"
[4] "Looks awesome. Dunnit?" "Gonna be terrific!" "Wanna go ? well dunno. come on"
question from:
https://stackoverflow.com/questions/65857710/how-to-make-string-replacements-case-insensitive