Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
446 views
in Technique[技术] by (71.8m points)

r - Convert accented characters into ascii character

What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.

e.g., 'Sj?gren's syndrome' into 'Sjogren's syndrome'

Converstion of single character into a single character is better then transliteration such as

? => ae ? => oe ü => ue.

e.g., using regular expression would be one option but is there something better (R package for this)?

gsub('ü','u',gsub('?','o',"Sj?gren's syndrome ( über) "))

There are SO solutions for non-R platforms but not a good one for R.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use iconv to convert to ASCII with transliteration (if supported):

iconv(c("über","Sj?gren's"),to="ASCII//TRANSLIT")
[1] "uber"      "Sjogren's"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...