Mask by finding percentage of occupency i.e :
series = pd.value_counts(df.column)
mask = (series/series.sum() * 100).lt(1)
# To replace df['column'] use np.where I.e
df['column'] = np.where(df['column'].isin(series[mask].index),'Other',df['column'])
To change the index with sum :
new = series[~mask]
new['Other'] = series[mask].sum()
Windows 26083
iOS 19711
Android 13077
Macintosh 5799
Other 832
Name: 1, dtype: int64
If you want to replace the index then :
series.index = np.where(series.index.isin(series[mask].index),'Other',series.index)
Windows 26083
iOS 19711
Android 13077
Macintosh 5799
Other 347
Other 285
Other 167
Other 22
Other 11
Name: 1, dtype: int64
Explanation
(series/series.sum() * 100) # This will give you the percentage i.e
Windows 39.820158
iOS 30.092211
Android 19.964276
Macintosh 8.853165
Chrome OS 0.529755
Linux 0.435101
Windows Phone 0.254954
(not set) 0.033587
BlackBerry 0.016793
Name: 1, dtype: float64
.lt(1)
is equivalent to lesser than 1. That gives you a boolean mask, based on that mask index and assign the data
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…