python - Fill in missing hours in a pandas dataframe

Question

Welcome To Ask or Share your Answers For Others

python - Fill in missing hours in a pandas dataframe

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Fill in missing hours in a pandas dataframe

I have a dataframe that contains hourly data:

area     date         hour      output
H1       2018-07-01   07:00:00  150
H1       2018-07-01   08:00:00  150
H1       2018-07-01   09:00:00  100
H1       2018-07-01   11:00:00  150
H2       2018-07-01   09:00:00  100
H2       2018-07-01   10:00:00   50
H2       2018-07-01   11:00:00   50
H2       2018-07-01   12:00:00  150

but the data only contains row for the hours when there was output, how can I fill in the missing hours for each area with output 0? For example add two rows for H1:

area     date         hour      output
H1       2018-07-01   10:00:00  0
H1       2018-07-01   12:00:00  0

I can assume that the min and max hour for all areas are the beginning and end of the sample period (in this case 7:00:00 and 12:00:00)

Right now, I'm creating a dataframe containing all the hours from 7:00 to 12:00 for each area and then doing a merge of my data with that dataframe, and then filling the NaN with 0s. This is very slow as my data set can have millions of rows.

Is there any better way of doing this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:32:00+0000

You can create a date range of min and max and merge your dataframe with the existing and fill values with null

df

    area    date    hour    output
0   H1  2018-07-01 07:00:00 07:00:00    150
1   H1  2018-07-01 08:00:00 08:00:00    150
2   H1  2018-07-01 09:00:00 09:00:00    100
6   H2  2018-07-01 11:00:00 11:00:00    50
7   H2  2018-07-01 12:00:00 12:00:00    150

df = pd.DataFrame(pd.date_range(pd.to_datetime(df['date'] +' ' + df['hour']).min(),pd.to_datetime(df['date'] +' ' + df['hour']).max(),freq='H'),columns= ['date']).merge(df,on=['date'],how='outer').fillna(0)
df.hour = df.date.dt.strftime('%H:%M:%S')
df.date = df.date.dt.strftime('%d-%m-%Y')
df

Out:

date    area    hour    output
0   01-07-2018  H1  07:00:00    150.0
1   01-07-2018  H1  08:00:00    150.0
2   01-07-2018  H1  09:00:00    100.0
3   01-07-2018  0   10:00:00    0.0
4   01-07-2018  H2  11:00:00    50.0
5   01-07-2018  H2  12:00:00    150.0

Categories

python - Fill in missing hours in a pandas dataframe

python - Fill in missing hours in a pandas dataframe

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags