python - Fill NA Values in pandas Series with a stop

Question

Welcome To Ask or Share your Answers For Others

python - Fill NA Values in pandas Series with a stop

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Fill NA Values in pandas Series with a stop

I'm analyzing a time series, and based on certain criteria, I can pick out rows that are either the start or the end of the events. At this point, my series looks something like this (I've left out some repetitive values for brevity):

The Setup

import numpy as np
import pandas
from pandas import Timestamp

datadict = {'event': {
  Timestamp('2010-01-01 00:20:00', tz=None): 'event start',
  Timestamp('2010-01-01 00:30:00', tz=None): '--',
  Timestamp('2010-01-01 00:40:00', tz=None): '--',
  Timestamp('2010-01-01 00:50:00', tz=None): '--',
  Timestamp('2010-01-01 01:00:00', tz=None): '--',
  Timestamp('2010-01-01 01:10:00', tz=None): 'event end',
  Timestamp('2010-01-01 01:20:00', tz=None): '--',
  Timestamp('2010-01-01 02:20:00', tz=None): '--',
  Timestamp('2010-01-01 02:30:00', tz=None): 'event start',
  Timestamp('2010-01-01 02:40:00', tz=None): '--',
  Timestamp('2010-01-01 02:50:00', tz=None): '--',
  Timestamp('2010-01-01 03:00:00', tz=None): '--',
  Timestamp('2010-01-01 03:10:00', tz=None): '--',
  Timestamp('2010-01-01 03:20:00', tz=None): '--',
  Timestamp('2010-01-01 03:30:00', tz=None): 'event end',
}}
data = pandas.DataFrame.from_dict(datadict)

                           event
2010-01-01 00:20:00  event start
2010-01-01 00:30:00           --
2010-01-01 00:40:00           --
2010-01-01 00:50:00           --
2010-01-01 01:00:00           --
2010-01-01 01:10:00    event end
2010-01-01 01:20:00           --
2010-01-01 02:20:00           --
2010-01-01 02:30:00  event start
2010-01-01 02:40:00           --
2010-01-01 02:50:00           --
2010-01-01 03:00:00           --
2010-01-01 03:10:00           --
2010-01-01 03:20:00           --
2010-01-01 03:30:00    event end

Here's what I would like to achieve (ideally without `for` loops)

                           event  event number
2010-01-01 00:20:00  event start  1
2010-01-01 00:30:00           --  1
2010-01-01 00:40:00           --  1
2010-01-01 00:50:00           --  1
2010-01-01 01:00:00           --  1
2010-01-01 01:10:00    event end  1
2010-01-01 01:20:00           --  NA
2010-01-01 02:20:00           --  NA
2010-01-01 02:30:00  event start  2
2010-01-01 02:40:00           --  2
2010-01-01 02:50:00           --  2
2010-01-01 03:00:00           --  2
2010-01-01 03:10:00           --  2
2010-01-01 03:20:00           --  2
2010-01-01 03:30:00    event end  2
2010-01-01 03:40:00           --  NA
2010-01-01 03:50:00           --  NA

Here's what I've tried

With some optimistic assumptions about the quality of my data, I can get Event Numbers like this:

table = data[data.event != '--'].reset_index()
table['event number'] = 1 + np.floor(table.index / 2)
table = table.set_index('index')

                           event  event number
index                                         
2010-01-01 00:20:00  event start             1
2010-01-01 01:10:00    event end             1
2010-01-01 02:30:00  event start             2
2010-01-01 03:30:00    event end             2

I can then join that to my original dataframe, and fillna with the method='ffill'

data2 = data.join(table[['event number']])
data2['filled'] = data2['event number'].fillna(method='ffill')

                           event  event number  filled
2010-01-01 00:20:00  event start             1       1
2010-01-01 00:30:00           --           NaN       1
2010-01-01 00:40:00           --           NaN       1
2010-01-01 00:50:00           --           NaN       1
2010-01-01 01:00:00           --           NaN       1
2010-01-01 01:10:00    event end             1       1
2010-01-01 01:20:00           --           NaN       1 # <- d'oh
2010-01-01 02:20:00           --           NaN       1 # <- d'oh 
2010-01-01 02:30:00  event start             2       2
2010-01-01 02:40:00           --           NaN       2
2010-01-01 02:50:00           --           NaN       2
2010-01-01 03:00:00           --           NaN       2
2010-01-01 03:10:00           --           NaN       2
2010-01-01 03:20:00           --           NaN       2
2010-01-01 03:30:00    event end             2       2

The Problem

As you can see, the time between events (01:20 through 02:20) is being associated with event #1.

The Question

Is there anyway to skip over these sections without looping?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:23+0000

you can achieve this by just looking at cumulative summation of number of event start and number of event end:

>>> data['event number'] = (data.event == 'event start').cumsum()
>>> data
                           event  event number
2010-01-01 00:20:00  event start             1
2010-01-01 00:30:00           --             1
2010-01-01 00:40:00           --             1
2010-01-01 00:50:00           --             1
2010-01-01 01:00:00           --             1
2010-01-01 01:10:00    event end             1
2010-01-01 01:20:00           --             1
2010-01-01 02:20:00           --             1
2010-01-01 02:30:00  event start             2
2010-01-01 02:40:00           --             2
2010-01-01 02:50:00           --             2
2010-01-01 03:00:00           --             2
2010-01-01 03:10:00           --             2
2010-01-01 03:20:00           --             2
2010-01-01 03:30:00    event end             2

now you just need to set to nan when there is no event; but those places corresponds to rows where cumulative summation of event start is equal to cumulative summation of event end (with shifting 1 row)

>>> idx = data['event number'] == (data.event.shift(1) == 'event end').cumsum()
>>> data.loc[idx, 'event number'] = np.nan
>>> data
                           event  event number
2010-01-01 00:20:00  event start             1
2010-01-01 00:30:00           --             1
2010-01-01 00:40:00           --             1
2010-01-01 00:50:00           --             1
2010-01-01 01:00:00           --             1
2010-01-01 01:10:00    event end             1
2010-01-01 01:20:00           --           NaN
2010-01-01 02:20:00           --           NaN
2010-01-01 02:30:00  event start             2
2010-01-01 02:40:00           --             2
2010-01-01 02:50:00           --             2
2010-01-01 03:00:00           --             2
2010-01-01 03:10:00           --             2
2010-01-01 03:20:00           --             2
2010-01-01 03:30:00    event end             2

[15 rows x 2 columns]

Categories

python - Fill NA Values in pandas Series with a stop

python - Fill NA Values in pandas Series with a stop

The Setup

Here's what I would like to achieve (ideally without `for` loops)

Here's what I've tried

The Problem

The Question

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - Fill NA Values in pandas Series with a stop

python - Fill NA Values in pandas Series with a stop

The Setup

Here's what I would like to achieve (ideally without for loops)

Here's what I've tried

The Problem

The Question

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Here's what I would like to achieve (ideally without `for` loops)