I have a dataframe that contains hourly data:
area date hour output
H1 2018-07-01 07:00:00 150
H1 2018-07-01 08:00:00 150
H1 2018-07-01 09:00:00 100
H1 2018-07-01 11:00:00 150
H2 2018-07-01 09:00:00 100
H2 2018-07-01 10:00:00 50
H2 2018-07-01 11:00:00 50
H2 2018-07-01 12:00:00 150
but the data only contains row for the hours when there was output, how can I fill in the missing hours for each area with output 0? For example add two rows for H1:
area date hour output
H1 2018-07-01 10:00:00 0
H1 2018-07-01 12:00:00 0
I can assume that the min and max hour for all areas are the beginning and end of the sample period (in this case 7:00:00 and 12:00:00)
Right now, I'm creating a dataframe containing all the hours from 7:00 to 12:00 for each area and then doing a merge of my data with that dataframe, and then filling the NaN with 0s. This is very slow as my data set can have millions of rows.
Is there any better way of doing this?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…