Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
929 views
in Technique[技术] by (71.8m points)

python - pandas fill missing dates in time series

I have a dataframe which has aggregated data for some days. I want to add in the missing days

I was following another post, Add missing dates to pandas dataframe, unfortunately, it overwrote my results (maybe functionality was changed slightly?)... the code is below

import random
import datetime as dt
import numpy as np
import pandas as pd

def generate_row(year, month, day):
    while True:
        date = dt.datetime(year=year, month=month, day=day)
        data = np.random.random(size=4)
        yield [date] + list(data)

# days I have data for
dates = [(2000, 1, 1), (2000, 1, 2), (2000, 2, 4)]
generators = [generate_row(*date) for date in dates]

# get 5 data points for each
data = [next(generator) for generator in generators for _ in range(5)]

df = pd.DataFrame(data, columns=['date'] + ['f'+str(i) for i in range(1,5)])

# df
groupby_day = df.groupby(pd.PeriodIndex(data=df.date, freq='D'))
results = groupby_day.sum()

idx = pd.date_range(min(df.date), max(df.date))
results.reindex(idx, fill_value=0)

Results before filling in missing date indices
enter image description here

Results after
enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to use period_range rather than date_range:

In [11]: idx = pd.period_range(min(df.date), max(df.date))
    ...: results.reindex(idx, fill_value=0)
    ...:
Out[11]:
                  f1        f2        f3        f4
2000-01-01  2.049157  1.962635  2.756154  2.224751
2000-01-02  2.675899  2.587217  1.540823  1.606150
2000-01-03  0.000000  0.000000  0.000000  0.000000
2000-01-04  0.000000  0.000000  0.000000  0.000000
2000-01-05  0.000000  0.000000  0.000000  0.000000
2000-01-06  0.000000  0.000000  0.000000  0.000000
2000-01-07  0.000000  0.000000  0.000000  0.000000
2000-01-08  0.000000  0.000000  0.000000  0.000000
2000-01-09  0.000000  0.000000  0.000000  0.000000
2000-01-10  0.000000  0.000000  0.000000  0.000000
2000-01-11  0.000000  0.000000  0.000000  0.000000
2000-01-12  0.000000  0.000000  0.000000  0.000000
2000-01-13  0.000000  0.000000  0.000000  0.000000
2000-01-14  0.000000  0.000000  0.000000  0.000000
2000-01-15  0.000000  0.000000  0.000000  0.000000
2000-01-16  0.000000  0.000000  0.000000  0.000000
2000-01-17  0.000000  0.000000  0.000000  0.000000
2000-01-18  0.000000  0.000000  0.000000  0.000000
2000-01-19  0.000000  0.000000  0.000000  0.000000
2000-01-20  0.000000  0.000000  0.000000  0.000000
2000-01-21  0.000000  0.000000  0.000000  0.000000
2000-01-22  0.000000  0.000000  0.000000  0.000000
2000-01-23  0.000000  0.000000  0.000000  0.000000
2000-01-24  0.000000  0.000000  0.000000  0.000000
2000-01-25  0.000000  0.000000  0.000000  0.000000
2000-01-26  0.000000  0.000000  0.000000  0.000000
2000-01-27  0.000000  0.000000  0.000000  0.000000
2000-01-28  0.000000  0.000000  0.000000  0.000000
2000-01-29  0.000000  0.000000  0.000000  0.000000
2000-01-30  0.000000  0.000000  0.000000  0.000000
2000-01-31  0.000000  0.000000  0.000000  0.000000
2000-02-01  0.000000  0.000000  0.000000  0.000000
2000-02-02  0.000000  0.000000  0.000000  0.000000
2000-02-03  0.000000  0.000000  0.000000  0.000000
2000-02-04  1.856158  2.892620  2.986166  2.793448

This is because your groupby uses PeriodIndex, rather than datetime:

df.groupby(pd.PeriodIndex(data=df.date, freq='D'))

You could have instead used a pd.Grouper:

df.groupby(pd.Grouper(key="date", freq='D'))

which would have give a datetime index.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...