pandas - Sort frequency distribution of string using rank (Python)

Question

Welcome To Ask or Share your Answers For Others

pandas - Sort frequency distribution of string using rank (Python)

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Sort frequency distribution of string using rank (Python)

I have to sort the frequency distribution of a string variable (education) using predetermined ranks and the code I made is below. However, it still sorts using alphabetical (please find image attached), I don't know what went wrong.

education_rank = {' Bachelors':12, ' HS-grad':8, ' 11th':6, ' Masters':14, ' 9th':5, ' Some-college':11, ' Assoc-acdm':10, ' Assoc-voc':9, ' 7th-8th':4, ' Doctorate':15, ' Prof-school':13, ' 5th-6th':3, ' 10th':16, ' 1st-4th':2, ' Preschool':1, ' 12th':7}

fd_education = pd.value_counts(adult_data.education)
print(fd_education)
    
fd_education = fd_education.sort_index(level='education_rank')
print(fd_education)

question from:https://stackoverflow.com/questions/65644356/sort-frequency-distribution-of-string-using-rank-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:45:51+0000

Try this method -

Sort the education_rank as a series to get index values
Use index values to fetch rows from the value_counts series
Dropna if any

#Your predefined rankings
education_rank = {'Bachelors':12, 'HS-grad':8, '11th':6, 'Masters':14, '12th':77}

#Your frequency output from value_counts()
fd_education = pd.Series({'Bachelors':500, 'HS-grad':809, '11th':23, 'Masters':65})

fd_education[pd.Series(education_rank).sort_values().index].dropna()

11th          23
HS-grad      809
Bachelors    500
Masters       65
dtype: int64

Explanation -

The issue is that you are passing a dictionary to the level instead of the index name of the series object. The goal of level to help with multi-index situations. This lets it decide which of the indexes to sort on. You cant provide sequence as a list/dict to sort on.

If it is unable to find the index name you have provided, it will just resort to sorting by alphabetical order. Check this example -

#Your predefined rankings
education_rank = {'Bachelors':12, 'HS-grad':8, '11th':6, 'Masters':14, '12th':77}

#Your frequency output from value_counts()
fd_education = pd.Series({'Bachelors':500, 'HS-grad':809, '11th':23, 'Masters':65})
    
fd_education = fd_education.sort_index(level='hello') #<---- 
print(fd_education)

11th          23
Bachelors    500
HS-grad      809
Masters       65
dtype: int64

Do read documentation for more details.

Categories

pandas - Sort frequency distribution of string using rank (Python)

pandas - Sort frequency distribution of string using rank (Python)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags