python - Efficient conditional selection with masks in very large dataframe

Question

Welcome To Ask or Share your Answers For Others

python - Efficient conditional selection with masks in very large dataframe

posted Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Efficient conditional selection with masks in very large dataframe

I have a dataframe with some 2 million rows like this:

                    dt   num
0  2019-05-12 10:17:00   135
1  2018-01-16 21:32:00     5
2  2017-11-30 22:29:00   135
3  2017-10-05 16:59:00    19
4  2017-08-07 05:26:00     5
5  2017-06-12 17:47:00    18

For each and all of the different values in column 'num' I need to find the corresponding minimum value of column 'dt'.

I am doing it with a list comprehension with a mask followed by an operator:

[(num_i, df[df.num == num_i].dt.min()) for num_i in set(df.num)]

It works, but it is taking really a lot ot time. Any other way to solve it that is less time consuming?

Ooops ... thanks to all! (@It_is_Chris, @papke, @paul-brennan). I was thinking in making a time comparison, but the solution provided (groupby) solves it in seconds against close to one hour...

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-01-24T02:45:59+0000

@It_is_Chris was exactly right, and if you have more cores available, parallel the job with the groupby apply trick.

from multiprocessing import Pool, cpu_count

def applyParallel(dfGrouped, func):
    with Pool(cpu_count()) as p:
        ret_list = p.map(func, [group for name, group in dfGrouped])
    return pandas.concat(ret_list)

so pass in the df.groupby(df['num']) as dfGrouped and then have the function defined as you would like it.

Categories

python - Efficient conditional selection with masks in very large dataframe

python - Efficient conditional selection with masks in very large dataframe

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags