Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
445 views
in Technique[技术] by (71.8m points)

python - Filter rows of DataFrame by list of tuples

Let's assume I have the following DataFrame:

 dic = {'a' : [1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'b' : [1, 1, 1, 1, 2, 2, 1, 1, 2, 2],
'c' : ['f', 'f', 'f', 'e', 'f', 'f', 'f', 'e', 'f', 'f'],
'd' : [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}
df = pd.DataFrame(dic)

df
Out[10]: 
   a  b  c    d
0  1  1  f   10
1  1  1  f   20
2  2  1  f   30
3  2  1  e   40
4  2  2  f   50
5  2  2  f   60
6  3  1  f   70
7  3  1  e   80
8  3  2  f   90
9  3  2  f  100 

In the following I want to take the values of column a and b, where c='e' and use those values to select respective rows of df (which would filter rows 2, 3, 6, 7). The idea is to create a list of tuples and index df by that list:

list_tup = list(df.loc[df['c'] == 'e', ['a','b']].to_records(index=False))
df_new = df.set_index(['a', 'b']).sort_index()

df_new
Out[13]: 
     c    d
a b        
1 1  f   10
  1  f   20
2 1  f   30
  1  e   40
  2  f   50
  2  f   60
3 1  f   70
  1  e   80
  2  f   90
  2  f  100

list_tup
Out[14]: [(2, 1), (3, 1)]

df.loc[list_tup]

Results in an TypeError: unhashable type: 'writeable void-scalar', which I don't understand. Any suggestions? I'm pretty new to python and pandas, hence I assume that I miss something fundamental.

question from:https://stackoverflow.com/questions/65849246/filter-rows-of-dataframe-by-list-of-tuples

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I believe it's better to groupby().transform() and boolean indexing in this use case:

valids = (df['c'].eq('e')                # check if `c` is 'e`
            .groupby([df['a'],df['b']])  # group by `a` and `b`
            .transform('any')            # check if `True` occurs in the group
                                         # use the same label for all rows in group
         )

# filter with `boolean indexing
df[valids]

Output:

   a  b  c   d
2  2  1  f  30
3  2  1  e  40
6  3  1  f  70
7  3  1  e  80

A similar idea with groupby().filter() which is more readable but can be slightly slower:

df.groupby(['a','b']).filter(lambda x: x['c'].eq('e').any())

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...