python - DataFrame.drop_duplicates and DataFrame.drop not removing rows

Question

Welcome To Ask or Share your Answers For Others

python - DataFrame.drop_duplicates and DataFrame.drop not removing rows

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - DataFrame.drop_duplicates and DataFrame.drop not removing rows

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the dataframe but neither drop nor drop_duplicates is working.

Here is my implementation:

#Read CSV
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e'])

print Series(df.b)

dropRows = []
#Sanitize the data to get rid of duplicates
for indx, val in enumerate(df.b): #for all the values
    if(indx == 0): #skip first indx
        continue

    if (val == df.b[indx-1]): #this is duplicate rtc value
        dropRows.append(indx)

print dropRows

df.drop(dropRows) #this doesnt work
df.drop_duplicates('b') #this doesnt work either

print Series(df.b)

when i print out the series df.b before and after they are the same length and I can visibly see the duplicates still. is there something wrong in my implementation?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:52:29+0000

As mentioned in the comments, drop and drop_duplicates creates a new DataFrame, unless provided with an inplace argument. All these options would work:

df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)

Categories

python - DataFrame.drop_duplicates and DataFrame.drop not removing rows

python - DataFrame.drop_duplicates and DataFrame.drop not removing rows

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags