Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
304 views
in Technique[技术] by (71.8m points)

datetime - How to insert new line in pandas on hour differences criteria

I have the following dataframe :

  Matricule Startdate   Starthour   Enddate     Endhour
0   5357    2019-01-08  14:21:06    2019-01-08  14:34:42
1   5357    2019-01-08  15:29:23    2019-01-08  15:33:43
2   5357    2019-01-08  19:51:11    2019-01-08  20:02:48
3   5357    2019-03-08  20:05:49    2019-03-08  21:04:52
4   aaaa    2019-01-08  14:17:51    2019-01-08  14:32:10
5   aaaa    2019-01-08  18:21:16    2019-01-08  18:39:26

I am trying to make a table in which I insert between each new line, and this based on the condition that the difference between the arrival time of line 1 and the departure time of line 2 is greater than 30 min. The line to insert has the same properties as the previous line. Here is an example :

     Matricule  Startdate   Starthour   Enddate     Endhour
    0   5357    2019-01-08  14:21:06    2019-01-08  14:34:42
    1   5357    2019-01-08  14:34:42    2019-01-08  15:04:42
    2   5357    2019-01-08  15:29:23    2019-01-08  15:33:43
    3   5357    2019-01-08  15:33:43    2019-01-08  16:03:43
    4   5357    2019-01-08  19:51:11    2019-01-08  20:02:48
    5   5357    2019-03-08  20:05:49    2019-03-08  21:04:52
    6   aaaa    2019-01-08  14:17:51    2019-01-08  14:32:10
    7   aaaa    2019-01-08  14:32:10    2019-01-08  15:02:10
    8   aaaa    2019-01-08  18:21:16    2019-01-08  18:39:26
question from:https://stackoverflow.com/questions/65901875/how-to-insert-new-line-in-pandas-on-hour-differences-criteria

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First, I made new columns with the date and time as a unified object:

df['start'] = df['Startdate'].astype(str) + " " + df['Starthour'].astype(str)
df['start'] = pd.to_datetime(df['start'])
df['end'] = df['Enddate'] + " " + df['Endhour']
df['end'] = pd.to_datetime(df['end'])

Next, calculate the gap to the next record, making sure it's sorted first:

df = df.sort_values(['Matricule','start'])
df['gap_to_next'] = (df['start'].shift(-1) - df['end'])

Handle mismatches between different Matricules:

cut = df['Matricule'] != df['Matricule'].shift(-1)
df.loc[cut, 'gap_to_next'] = np.nan

Define a boolean series that shows where you'll need a new row inserted. I used your request about 30 min, but added something about making sure things were less than 1 day apart because your sample had a case that seemed to imply that. Adjust that as needed:

should_insert_next = ( (df['gap_to_next'] > pd.Timedelta(30, 'min')) & (df['gap_to_next'] < pd.Timedelta(24, 'hr')) )

Make a copy of only those rows:

new_rows = df[should_insert_next].copy()

Using those rows as a template, adjust the times to be what you want for the inserts. It seems like you wanted 30 min start-to-end for the new records.

new_rows['start'] = new_rows['end']
new_rows['end'] = new_rows['start'] + pd.Timedelta(30, 'min')

If your original date and hour columns weren't strings, you can add a step after the below to convert them to whatever type they were...

new_rows['Startdate'] = new_rows['start'].dt.strftime("%Y-%m-%d")
new_rows['Enddate'] = new_rows['end'].dt.strftime("%Y-%m-%d")
new_rows['Starthour'] = new_rows['start'].dt.strftime("%H:%M:%S")
new_rows['Endhour'] = new_rows['end'].dt.strftime("%H:%M:%S")

Finally, concatenate the old and new together and resort:

final = pd.concat([df, new_rows])
final = final.sort_values(['Matricule','start'])
final = final.drop(columns=['gap_to_next','start','end'])
final = final.reset_index(drop=True)

That gave:

print(final)
  Matricule   Startdate Starthour     Enddate   Endhour
0      5357  2019-01-08  14:21:06  2019-01-08  14:34:42
1      5357  2019-01-08  14:34:42  2019-01-08  15:04:42
2      5357  2019-01-08  15:29:23  2019-01-08  15:33:43
3      5357  2019-01-08  15:33:43  2019-01-08  16:03:43
4      5357  2019-01-08  19:51:11  2019-01-08  20:02:48
5      5357  2019-03-08  20:05:49  2019-03-08  21:04:52
6      aaaa  2019-01-08  14:17:51  2019-01-08  14:32:10
7      aaaa  2019-01-08  14:32:10  2019-01-08  15:02:10
8      aaaa  2019-01-08  18:21:16  2019-01-08  18:39:26

        ?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...