Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

pandas - How to find the start time and end time of an event in python?

I have a data frame consists of column 1 i.e event and column 2 is Datetime:

Sample data

 Event   Time
    0   2020-02-12 11:00:00
    0   2020-02-12 11:30:00
    2   2020-02-12 12:00:00
    1   2020-02-12 12:30:00
    0   2020-02-12 13:00:00
    0   2020-02-12 13:30:00
    0   2020-02-12 14:00:00
    1   2020-02-12 14:30:00
    0   2020-02-12 15:00:00
    0   2020-02-12 15:30:00

And I want to find start time and end time of each event:

Desired Data

 Event  EventStartTime  EventEndTime
    0   2020-02-12 11:00:00 2020-02-12 12:00:00
    2   2020-02-12 12:00:00 2020-02-12 12:30:00
    1   2020-02-12 12:30:00 2020-02-12 13:00:00
    0   2020-02-12 13:00:00 2020-02-12 14:30:00
    1   2020-02-12 14:30:00 2020-02-12 15:00:00

Note: EventEndTime is time when the event changes the value say from value 1 to got change to 0 or any other value or vice versa

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a method that can get the results without a for loop. I assume that the input data is read into a dataframe called df:

# Initialize the output df
dfout = pd.DataFrame()
dfout['Event'] = df['Event']
dfout['EventStartTime'] = df['Time']

Now, I create a variable called 'change' that tells you whether the event changed.

dfout['change'] = df['Event'].diff()

This is how dfout looks now:

   Event       EventStartTime  change
0      0  2020-02-12 11:00:00     NaN
1      0  2020-02-12 11:30:00     0.0
2      2  2020-02-12 12:00:00     2.0
3      1  2020-02-12 12:30:00    -1.0
4      0  2020-02-12 13:00:00    -1.0
5      0  2020-02-12 13:30:00     0.0
6      0  2020-02-12 14:00:00     0.0
7      1  2020-02-12 14:30:00     1.0
8      0  2020-02-12 15:00:00    -1.0
9      0  2020-02-12 15:30:00     0.0

Now, I go on to remove the rows where the event did not change:

dfout = dfout.loc[dfout['change'] !=0 ,:]

This will now leave me with rows where the event has changed.

Next, the event end time of the current event is the start time of the next event.

dfout['EventEndTime'] = dfout['EventStartTime'].shift(-1)

The dataframe looks like this:

   Event       EventStartTime  change         EventEndTime
0      0  2020-02-12 11:00:00     NaN  2020-02-12 12:00:00
2      2  2020-02-12 12:00:00     2.0  2020-02-12 12:30:00
3      1  2020-02-12 12:30:00    -1.0  2020-02-12 13:00:00
4      0  2020-02-12 13:00:00    -1.0  2020-02-12 14:30:00
7      1  2020-02-12 14:30:00     1.0  2020-02-12 15:00:00
8      0  2020-02-12 15:00:00    -1.0                  NaN

You may chose to remove the 'change' column and also the last row if not needed.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...