a = [['John', 'Mary', 'John'], [10,22,50]])
df1 = pd.DataFrame(a, columns=['Name', 'Count'])
Given a data frame like this I want to compare all similar string values of "Name" against the "Count" value to determine the highest. I'm not sure how to do this in a dataframe in Python.
Ex: In the case above the Answer would be:
- Name Count
- Mary 22
- John 50
The lower value John 10 has been dropped (I only want to see the highest value of "Count" based on the same value for "Name").
In SQL it would be something like a Select Case query (wherein I select the Case where Name == Name & Count > Count recursively to determine the highest number. Or a For loop for each name, but as I understand loops in DataFrames is a bad idea due to the nature of the object.
Is there a way to do this with a DF in Python? I could create a new data frame with each variable (one with Only John and then get the highest value (df.value()[:1] or similar. But as I have many hundreds of unique entries that seems like a terrible solution. :D
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…