Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
480 views
in Technique[技术] by (71.8m points)

python - How to remove accents from values in columns?

How do I change the special characters to the usual alphabet letters? This is my dataframe:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         ?land Islands   2014.0      MARIEHAMN   11437.0 1
240         ?land Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durr?s      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durr?s      56511.0 

I want it to look like this:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Aland Islands   2014.0      MARIEHAMN   11437.0 1
240         Aland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durres      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durres      56511.0 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The pandas method is to use the vectorised str.normalize combined with str.decode and str.encode:

In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')

Out[60]:
0    Aland Islands
1    Aland Islands
2          Albania
3          Albania
4          Albania
Name: Country, dtype: object

So to do this for all str dtypes:

In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df

Out[64]:
   Table Code        Country    Year       City      Value
0         240  Aland Islands  2014.0  MARIEHAMN  11437.0 1
1         240  Aland Islands  2010.0  MARIEHAMN  5829.5  1
2         240        Albania  2011.0     Durres   113249.0
3         240        Albania  2011.0     TIRANA   418495.0
4         240        Albania  2011.0     Durres    56511.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...