I have a list of dataframes (with datetimeindex), the minimum time (date) between two rows in each dataframe is 15 minutes. I would like to grouping all dataframes in one (by day) using mean, median, geometric mean and other methods. The problem is there are some days that contain no data in all dataframes. Some methods, like mean, ignore that but with other methods it causes error. My question is how can remove such days before applying the method?
Data:
[ col1 col2 col3 col4
date
2020-02-03 08:00:00+00:00 3.616141 3.362717 1.627347 2.242732
2020-02-03 08:15:00+00:00 4.043727 3.749407 1.790467 2.272293
2020-02-03 08:30:00+00:00 3.872196 3.595969 1.729359 2.221447
... ... ... ... ...
2020-12-25 08:45:00+00:00 6.645853 1.352785 0.081961 4.112518
2020-12-25 09:30:00+00:00 6.066697 1.068805 0.058980 3.991505
[2204 rows x 6 columns],
...]
Data after aggregation with mean:
col1 col2 col3 col4
date
2020-02-02 00:00:00+00:00 4.636509 0.842644 0.069093 1.393849
2020-02-03 00:00:00+00:00 6.649390 1.077993 0.081713 1.798794
2020-02-04 00:00:00+00:00 5.765083 1.113354 0.097113 1.668112
2020-02-05 00:00:00+00:00 NaN NaN NaN NaN
2020-02-06 00:00:00+00:00 NaN NaN NaN NaN
... ... ... ... ...
As you can see, both days 02/05 and 02/06 have no data.
My code to aggregate with gstd which returns error:
from scipy.stats import gstd
cols = ["col1", "col2","col3","col4"]
joined = pd.concat(df.reset_index() for df in datalist)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)
df = joined.set_index('date').groupby(pd.Grouper(freq='D'))
std = df.apply(gstd)
#std = df.agg(gstd)
The error:
ValueError: Degrees of freedom <= 0 for slice
question from:
https://stackoverflow.com/questions/65834408/removing-empty-rows-before-aggregation 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…