Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

text processing - R: how to use str_replace_all( ) without regular expression

I have some textual data which contain "[surname]", "[female name]" and "[male name]". For example,

c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") 

I hope to delete them for analysis and expect to get

"I am . I am ten years old", "My father is ", "I went to school today"

But when I run the code below, what it returns is broken. I think str_replace_all might recognize the pattern of [ ] as regular expressions, but I am not entirely sure why.

> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')

[1] "I  [fl ]. I  t y old" "My fth i [l ][]"      "I wt to chool tody"  

Does anyone know how to solve it? Thank you in advance

question from:https://stackoverflow.com/questions/65904906/r-how-to-use-str-replace-all-without-regular-expression

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use stringi::str_replace_all:

library(stringi)
data <- c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") 
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data, remove_us, "", vectorize_all=FALSE)

Results

[1] "I am . I am ten years old" "My father is  "            "I went to school today"   

See R proof.

However, it is simpler with gsub:

gsub('\[[^][]*]', '', data)

See another R proof.

--------------------------------------------------------------------------------
  [                       '['
--------------------------------------------------------------------------------
  [^][]*                   any character except: ']', '[' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ]                        ']'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...