- Aside from the fact that the
.strptime
is being accessed from the incorrect namespace, it should be datetime.datetime.strptime
, the implementation will result in an error (TypeError: strptime() argument 1 must be str, not Series
), even with the correct function call.
pandas
already has built-in vectorized datetime features.
- In this case, the hours and minutes in the
'total_time
column, represent an absolute amount of time, so the column should be converted with pandas.to_timedelta
.
- This function takes a
string
in the form 'hh:mm:ss'
, so df.tt + ':00'
is used to get the 'total_time'
('tt'
) column into the correct format.
import pandas as pd
# test dataframe
df = pd.DataFrame({'user': ['n1', 'n2', 'n3', '', 'n1', 'n2'],
'ts': ['09:30', '09:30', '09:30', '', '14:00', '10:00'],
'te': ['14:00', '10:00', '12:00', '', '18:00', '18:00'],
'ls': ['12:00', '', '', '', '', '13:00'],
'le': ['13:00', '', '', '', '', '14:00'],
'tt': ['03:00', '00:30', '02:30', '00:00', '04:00', '07:00']})
# display(df)
user ts te ls le tt
0 n1 09:30 14:00 12:00 13:00 03:00
1 n2 09:30 10:00 00:30
2 n3 09:30 12:00 02:30
3 00:00
4 n1 14:00 18:00 04:00
5 n2 10:00 18:00 13:00 14:00 07:00
# fill blanks with nan
df = df.replace('', np.nan)
# drop all rows that are all nan, except total_time
df = df.dropna(axis=0, how='all', subset=df.columns[:-2])
# display(df)
user ts te ls le tt
0 n1 09:30 14:00 12:00 13:00 03:00
1 n2 09:30 10:00 NaN NaN 00:30
2 n3 09:30 12:00 NaN NaN 02:30
4 n1 14:00 18:00 NaN NaN 04:00
5 n2 10:00 18:00 13:00 14:00 07:00
# convert total_time (tt) to a timedelta
df.tt = pd.to_timedelta(df.tt + ':00')
# groupby user and sum the tt column
total_hours = df.groupby('user').tt.sum().reset_index(name='total_hours')
# display(total_hours)
user total_hours
0 n1 0 days 07:00:00
1 n2 0 days 07:30:00
2 n3 0 days 02:30:00
# to have total_hours as a float
total_hours.total_hours = total_hours.total_hours / pd.to_timedelta(1, 'h')
# display(total_hours)
user total_hours
0 n1 7.0
1 n2 7.5
2 n3 2.5
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…