utf 8 - How to identify/delete non-UTF-8 characters in R

Question

Welcome To Ask or Share your Answers For Others

utf 8 - How to identify/delete non-UTF-8 characters in R

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:07:01+0000

Another solution using iconv and it argument sub: character string. If not NA(here I set it to ''), it is used to replace any non-convertible bytes in the input.

x <- "faxE7ile"
Encoding(x) <- "UTF-8"
iconv(x, "UTF-8", "UTF-8",sub='') ## replace any non UTF-8 by ''
"faile"

Here note that if we choose the right encoding:

x <- "faxE7ile"
Encoding(x) <- "latin1"
xx <- iconv(x, "latin1", "UTF-8",sub='')
facile

Categories

utf 8 - How to identify/delete non-UTF-8 characters in R

utf 8 - How to identify/delete non-UTF-8 characters in R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags