Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.9k views
in Technique[技术] by (71.8m points)

python - Removing empty rows before aggregation

I have a list of dataframes (with datetimeindex), the minimum time (date) between two rows in each dataframe is 15 minutes. I would like to grouping all dataframes in one (by day) using mean, median, geometric mean and other methods. The problem is there are some days that contain no data in all dataframes. Some methods, like mean, ignore that but with other methods it causes error. My question is how can remove such days before applying the method?

Data:

[                                 col1     col2      col3    col4  
 date                                                                   
 2020-02-03 08:00:00+00:00    3.616141   3.362717  1.627347    2.242732   
 2020-02-03 08:15:00+00:00    4.043727   3.749407  1.790467    2.272293   
 2020-02-03 08:30:00+00:00    3.872196   3.595969  1.729359    2.221447  
 ...                               ...        ...       ...         ...  
 2020-12-25 08:45:00+00:00    6.645853   1.352785  0.081961    4.112518   
 2020-12-25 09:30:00+00:00    6.066697   1.068805  0.058980    3.991505   
 
 [2204 rows x 6 columns],
...]

Data after aggregation with mean:

                                col1      col2        col3     col4
date                        
2020-02-02 00:00:00+00:00   4.636509    0.842644    0.069093    1.393849    
2020-02-03 00:00:00+00:00   6.649390    1.077993    0.081713    1.798794    
2020-02-04 00:00:00+00:00   5.765083    1.113354    0.097113    1.668112    
2020-02-05 00:00:00+00:00      NaN        NaN          NaN       NaN    
2020-02-06 00:00:00+00:00      NaN        NaN          NaN       NaN    
...                           ...         ...          ...       ...

As you can see, both days 02/05 and 02/06 have no data.

My code to aggregate with gstd which returns error:

from scipy.stats import gstd

cols = ["col1", "col2","col3","col4"]    
joined = pd.concat(df.reset_index() for df in datalist)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)

df = joined.set_index('date').groupby(pd.Grouper(freq='D'))

std = df.apply(gstd)
#std = df.agg(gstd)

The error:

ValueError: Degrees of freedom <= 0 for slice

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Have you tried

df.dropna()

?

this will drop rows containing at least one null value.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...