Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
525 views
in Technique[技术] by (71.8m points)

pandas - Sort frequency distribution of string using rank (Python)

I have to sort the frequency distribution of a string variable (education) using predetermined ranks and the code I made is below. However, it still sorts using alphabetical (please find image attached), I don't know what went wrong.

education_rank = {' Bachelors':12, ' HS-grad':8, ' 11th':6, ' Masters':14, ' 9th':5, ' Some-college':11, ' Assoc-acdm':10, ' Assoc-voc':9, ' 7th-8th':4, ' Doctorate':15, ' Prof-school':13, ' 5th-6th':3, ' 10th':16, ' 1st-4th':2, ' Preschool':1, ' 12th':7}

fd_education = pd.value_counts(adult_data.education)
print(fd_education)
    
fd_education = fd_education.sort_index(level='education_rank')
print(fd_education)

enter image description here

question from:https://stackoverflow.com/questions/65644356/sort-frequency-distribution-of-string-using-rank-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this method -

  1. Sort the education_rank as a series to get index values
  2. Use index values to fetch rows from the value_counts series
  3. Dropna if any
#Your predefined rankings
education_rank = {'Bachelors':12, 'HS-grad':8, '11th':6, 'Masters':14, '12th':77}

#Your frequency output from value_counts()
fd_education = pd.Series({'Bachelors':500, 'HS-grad':809, '11th':23, 'Masters':65})

fd_education[pd.Series(education_rank).sort_values().index].dropna()
11th          23
HS-grad      809
Bachelors    500
Masters       65
dtype: int64

Explanation -

The issue is that you are passing a dictionary to the level instead of the index name of the series object. The goal of level to help with multi-index situations. This lets it decide which of the indexes to sort on. You cant provide sequence as a list/dict to sort on.

If it is unable to find the index name you have provided, it will just resort to sorting by alphabetical order. Check this example -

#Your predefined rankings
education_rank = {'Bachelors':12, 'HS-grad':8, '11th':6, 'Masters':14, '12th':77}

#Your frequency output from value_counts()
fd_education = pd.Series({'Bachelors':500, 'HS-grad':809, '11th':23, 'Masters':65})
    
fd_education = fd_education.sort_index(level='hello') #<---- 
print(fd_education)
11th          23
Bachelors    500
HS-grad      809
Masters       65
dtype: int64

Do read documentation for more details.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...