Problem
Given data in a Pandas DataFrame like the following:
Name Amount
---------------
Alice 100
Bob 50
Charlie 200
Alice 30
Charlie 10
I want to select all rows where the Name
is one of several values in a collection {Alice, Bob}
Name Amount
---------------
Alice 100
Bob 50
Alice 30
Question
What is an efficient way to do this in Pandas?
Options as I see them
- Loop through rows, handling the logic with Python
Select and merge many statements like the following
merge(df[df.name = specific_name] for specific_name in names) # something like this
Perform some sort of join
What are the performance trade-offs here? When is one solution better than the others? What solutions am I missing?
While the example above uses strings my actual job uses matches on 10-100 integers over millions of rows and so fast NumPy operations may be relevant.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…