Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
175 views
in Technique[技术] by (71.8m points)

Removing certain characters from a string in R

I have a string in R which contains a large amount of words. When viewing the string I get a large amount of text which includes text similar to the following:

>docs

....

u009cYes yes for ever for ever the boys cried in their ringing voices with softened faces

....

So I'm wondering how to remove these u009 characters (all of them, some of which have slightly different numbers) from the string. I've tried using gsub(), but that wasn't effective in removing the content from the strings.

question from:https://stackoverflow.com/questions/15170250/removing-certain-characters-from-a-string-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This should work

gsub('u009c','','u009cYes yes for ever for ever the boys ')
"Yes yes for ever for ever the boys "

Here 009c is the hexadecimal number of unicode. You must always specify 4 hexadecimal digits. If you have many , one solution is to separate them by a pipe:

gsub('u009c|u00F0','','u009cYes yes u00F0for ever for ever the boys and the girls')

"Yes yes for ever for ever the boys and the girls"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...