Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
508 views
in Technique[技术] by (71.8m points)

pandas - How to group by a df in Python by a column with the difference between the max value of one column and the min of another column?

I have a data frame which looks like this:

student_id session_id reading_level_id st_week end_week
1 3334 3 3 3
1 3335 2 4 4
2 3335 2 2 2
2 3336 2 2 3
2 3337 2 3 3
2 3339 2 3 4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using the data you shared, a simpler solution is possible:

Group by student_id, and pass False argument to the as_index parameter (this works for a dataframe, and returns a dataframe);

Next, use a named aggregation to get the `max week for end week and the min week for st_week for each group

Get the difference between max_wk and end_wk

Finally, keep only the required columns

(
    df.groupby("student_id", as_index=False)
    .agg(max_wk=("end_week", "max"), min_wk=("st_week", "min"))
    .assign(Diff=lambda x: x["max_wk"] - x["min_wk"])
    .loc[:, ["student_id", "Diff"]]
)

    student_id  Diff
0          1    1
1          2    2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...