Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
228 views
in Technique[技术] by (71.8m points)

python - Pandas: How to find a particular pattern in a dataframe column?

I'd like to find a particular pattern in a pandas dataframe column, and return the corresponding index values in order to subset the dataframe.

Here's a sample dataframe with a possible pattern:

Snippet to produce dataframe:

import pandas as pd
import numpy as np

Observations = 10
Columns = 2
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
                  columns = ['ColA','ColB'])
datelist = pd.date_range(pd.datetime(2017, 7, 7).strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])

pattern = [100,90,105]
print(df)

Dataframe:

            ColA  ColB
Dates                 
2017-07-07   103    92
2017-07-08    92    96
2017-07-09   107   109
2017-07-10   100    91
2017-07-11    90   107
2017-07-12   105    99
2017-07-13    90   104
2017-07-14    90   105
2017-07-15   109   104
2017-07-16    94    90

Here, the pattern of interest occurs in Column A on the dates 2017-07-10 to 2017-07-12, and that's what I'd like to end up with:

Desired output:

2017-07-10   100    91
2017-07-11    90   107
2017-07-12   105    99

If the same pattern occurs several times, I would like to subset the dataframe the same way, and also count how many times the pattern occurs, but I hope that's more straight forward as long as I get the first step sorted out.

Thank you for any suggestions!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using the magic of list comprehensions:

[df.index[i - len(pattern)] # Get the datetime index 
 for i in range(len(pattern), len(df)) # For each 3 consequent elements 
 if all(df['ColA'][i-len(pattern):i] == pattern)] # If the pattern matched 

# [Timestamp('2017-07-10 00:00:00')]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...