Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
473 views
in Technique[技术] by (71.8m points)

python - Efficiently select rows that match one of several values in Pandas DataFrame

Problem

Given data in a Pandas DataFrame like the following:

Name     Amount
---------------
Alice       100
Bob          50
Charlie     200
Alice        30
Charlie      10

I want to select all rows where the Name is one of several values in a collection {Alice, Bob}

Name     Amount
---------------
Alice       100
Bob          50
Alice        30

Question

What is an efficient way to do this in Pandas?

Options as I see them

  1. Loop through rows, handling the logic with Python
  2. Select and merge many statements like the following

    merge(df[df.name = specific_name] for specific_name in names) # something like this
    
  3. Perform some sort of join

What are the performance trade-offs here? When is one solution better than the others? What solutions am I missing?

While the example above uses strings my actual job uses matches on 10-100 integers over millions of rows and so fast NumPy operations may be relevant.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the isin Series method:

In [11]: df['Name'].isin(['Alice', 'Bob'])
Out[11]: 
0     True
1     True
2    False
3     True
4    False
Name: Name, dtype: bool

In [12]: df[df.Name.isin(['Alice', 'Bob'])]
Out[12]: 
    Name  Amount
0  Alice     100
1    Bob      50
3  Alice      30

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...