Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
537 views
in Technique[技术] by (71.8m points)

r - Best practice: Should I try to change to UTF-8 as locale or is it safe to leave it as is?

I try to set my default encoding to UTF-8; up to now without success:

a <- "Hallo"
b <- "??fd"
print(Encoding(a))
# [1] "unknown"
print(Encoding(b))
# [1] "latin1"

options(encoding = "UTF-8")
a <- "Hallo"
b <- "??fd"
print(Encoding(a))
# [1] "unknown"
print(Encoding(b))
# [1] "latin1"

old_locale <- Sys.getlocale()
Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
a <- "Hallo"
b <- "??fd"
print(Encoding(a))
# [1] "unknown"
print(Encoding(b))
# [1] "latin1"

Sys.getlocale()
# [1] "LC_COLLATE=German_Switzerland.1252;
# LC_CTYPE=German_Switzerland.1252;
# LC_MONETARY=German_Switzerland.1252;
# LC_NUMERIC=C;LC_TIME=German_Switzerland.1252"

I found the following links R Encoding for files and How to use Sys.setlocale() but as you can see it seems they don't work in my case and I don't understand why.

I also tried Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8") but got

Warning message: In Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8") : OS reports request to set locale to "en_US.UTF-8" cannot be honored

In cmd the command systeminfo & pause gives

Systemgebietsschema: de-ch;Deutsch (Schweiz) Eingabegebietsschema: de-ch;Deutsch (Schweiz)

Edit:

  • I fear that "unknown" encoding could lead to mistakes which I am not aware and
  • I thought that it was good to use the new standard UTF-8 to avoid problems like the one I had.
  • Last but not least I would like to be able to get reproducible results - a colleague is working on a Mac (with less issues concerning encoding)...

Edit2: What is the experience with this issue? Is there any best practice?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not a perfect answer but a good workaround: As Roland pointed out, it might be dangerous to change the locale. So leave it as is. If you have a file and you run into trouble, just search for non-UTF8 encoding as discribed here for RStudio. What I saw, most Editors have such a feature.

Furthermore, this answer gives more insight in what you can do in case you source() a file.

For a way to deal with locales when collations play a crucial part see here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...