Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
543 views
in Technique[技术] by (71.8m points)

regex - Confused with the locale settings in R

Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.

This is my code.

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "u20AC"
gsub(paste(euro , "(\S+)|."), "\1", x)
# "" 

I think this is all about changing the locale settings, I don't know how to do that.

I'm running rstudio on Windows 8.

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
[1] tools_3.2.0

@Anada's answer is good but we need to add that encoding parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8 on Windows?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Seems to be a problem with encoding.

Consider:

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\S+)|."), "\1", x)
# [1] ""
gsub(paste(euro , "(\S+)|."), "\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...