python - Cleaning messy observations while keeping information

Question

Welcome To Ask or Share your Answers For Others

python - Cleaning messy observations while keeping information

posted Feb 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Cleaning messy observations while keeping information

I'm practising my importing and cleaning skills and have reached a bit of a quagmire. I've been importing from here. The importing works and I have been able to drop na's. However, the issue is that certain observations are written in such a way (for example 13.7 (2016)). Because of how they're written they're read in as strings and even if they weren't they would contain false information.

I want to get rid of the year observations which are in the parentheses but preserve the data observation itself.

At present here is my code:

#Declare Missing Variables
missing_values = ['?', np.nan]
#Read Data
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_firearm-related_death_rate', na_values=missing_values)
#Set Dataset and Drop Variables
df = dfs[3]
df_drops = df[['Year', 'Undetermined', 'Sources and notes']]
df.drop(df_drops, inplace = True, axis=1)

print(df)
# pd.to_numeric(df['Guns per 100 inhabitants'])

Any help appreciated!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-16T17:34:24+0000

Bit of a workaround, but you could clean it up by splitting the string by a space and then taking the first entry.

df['Guns per 100 inhabitants (clean)'] = np.array([float(s.split(' ')[0]) for s in df['Guns per 100 inhabitants'])

I tried it out with your example and there are still some errors (for example, one entry is formatted '6.2-19.4', and some entries are already floats not strings so s.split(' ') throws an error) but I think this solves the year in parentheses issue.

Categories

python - Cleaning messy observations while keeping information

python - Cleaning messy observations while keeping information

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags