I have a complicated problem with pandas.I would like to calculatea cumlative sum depending on the timestamp start_date concerning that we have an end_date which if greater than 1970 is taken into account otherwise gets subtracted from the sum.
Sample data
df = pd.DataFrame({'start_date': ['2014-09-18 14:46:58.563', '2015-04-18 07:10:31.365', '2014-09-18 14:46:58.563', '2014-12-18 08:41:32.466','2015-04-18 08:00:00.000'],'end_date': ['2015-04-18 07:10:31.364', '1970-01-01 00:00:00.000','1970-01-01 00:00:00.000','2015-04-18 07:10:31.518','1970-01-01 00:00:00.000'], 'value': [2300,2300, 2300,2300,2300], 'IDX' :[1,1,2,2,3] })
start_date end_date value IDX IDX_TOTAL
0 2014-09-18 14:46:58.563 2015-04-18 07:10:31.364 2300.0 1 1
1 2015-04-18 07:10:31.365 1970-01-01 00:00:00.000 2300.0 1 1
2 2014-09-18 14:46:58.563 1970-01-01 00:00:00.000 2300.0 2 1
3 2014-12-18 08:41:32.466 2015-04-18 07:10:31.518 2300.0 2 1
4 2015-04-18 08:00:00.000 1970-01-01 00:00:00.000 2300.0 3 1
What I have tried:
df ["start_date"] = pd.to_datetime(df ["start_date"])
df .sort_values("start_date", inplace =True)
df ["start_date_2"] = df ["start_date"]
df.groupby(['IDX_TOTAL', pd.Grouper(key='start_date_2', freq='m')])['value'].apply(lambda x: x[-1]).cumsum()
What I would expect:
IDX_TOTAL start_date value
1 2014-09-18 14:46 4600.0
2014-12-18 8:41 4600.0
2015-04-18 7:10 4600.0
2015-04-18 8:00 6900.0
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…