python - Filter rows of DataFrame by list of tuples

Question

Welcome To Ask or Share your Answers For Others

python - Filter rows of DataFrame by list of tuples

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Filter rows of DataFrame by list of tuples

Let's assume I have the following DataFrame:

 dic = {'a' : [1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'b' : [1, 1, 1, 1, 2, 2, 1, 1, 2, 2],
'c' : ['f', 'f', 'f', 'e', 'f', 'f', 'f', 'e', 'f', 'f'],
'd' : [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}
df = pd.DataFrame(dic)

df
Out[10]: 
   a  b  c    d
0  1  1  f   10
1  1  1  f   20
2  2  1  f   30
3  2  1  e   40
4  2  2  f   50
5  2  2  f   60
6  3  1  f   70
7  3  1  e   80
8  3  2  f   90
9  3  2  f  100

In the following I want to take the values of column a and b, where c='e' and use those values to select respective rows of df (which would filter rows 2, 3, 6, 7). The idea is to create a list of tuples and index df by that list:

list_tup = list(df.loc[df['c'] == 'e', ['a','b']].to_records(index=False))
df_new = df.set_index(['a', 'b']).sort_index()

df_new
Out[13]: 
     c    d
a b        
1 1  f   10
  1  f   20
2 1  f   30
  1  e   40
  2  f   50
  2  f   60
3 1  f   70
  1  e   80
  2  f   90
  2  f  100

list_tup
Out[14]: [(2, 1), (3, 1)]

df.loc[list_tup]

Results in an TypeError: unhashable type: 'writeable void-scalar', which I don't understand. Any suggestions? I'm pretty new to python and pandas, hence I assume that I miss something fundamental.

question from:https://stackoverflow.com/questions/65849246/filter-rows-of-dataframe-by-list-of-tuples

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:29:33+0000

I believe it's better to groupby().transform() and boolean indexing in this use case:

valids = (df['c'].eq('e')                # check if `c` is 'e`
            .groupby([df['a'],df['b']])  # group by `a` and `b`
            .transform('any')            # check if `True` occurs in the group
                                         # use the same label for all rows in group
         )

# filter with `boolean indexing
df[valids]

Output:

   a  b  c   d
2  2  1  f  30
3  2  1  e  40
6  3  1  f  70
7  3  1  e  80

A similar idea with groupby().filter() which is more readable but can be slightly slower:

df.groupby(['a','b']).filter(lambda x: x['c'].eq('e').any())

Categories

python - Filter rows of DataFrame by list of tuples

python - Filter rows of DataFrame by list of tuples

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags