I have the following dataframe:
obj_id data_date value
0 4 2011-11-01 59500
1 2 2011-10-01 35200
2 4 2010-07-31 24860
3 1 2009-07-28 15860
4 2 2008-10-15 200200
I want to get a subset of this data so that I only have the most recent (largest 'data_date'
) 'value'
for each 'obj_id'
.
I've hacked together a solution, but it feels dirty. I was wondering if anyone has a better way. I'm sure I must be missing some easy way to do it through pandas.
My method is essentially to group, sort, retrieve, and recombine as follows:
row_arr = []
for grp, grp_df in df.groupby('obj_id'):
row_arr.append(dfg.sort('data_date', ascending = False)[:1].values[0])
df_new = DataFrame(row_arr, columns = ('obj_id', 'data_date', 'value'))
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…