I am trying to find a line which tracks the upper limits of local maximums in an ingestion chart, such that if there are spikes, say 30% higher than this line, it could have value.
I was doing a simple rolling mean, as well as a standard deviation to follow the data, but was hoping there was a way to more accurately dictate the rolling average of all local max ( or mins ) in the chart.
For some sample data I have some csv data below:
date,value
2021-01-05 05:00:00,54637.71111111111
2021-01-05 05:00:30,52017.84444444443
2021-01-05 05:01:00,51685.55555555555
2021-01-05 05:01:30,53948.222222222226
2021-01-05 05:02:00,53216.35555555554
2021-01-05 05:02:30,54714.77777777779
2021-01-05 05:03:00,54358.22222222222
2021-01-05 05:03:30,52332.86666666666
2021-01-05 05:04:00,51980.86666666667
2021-01-05 05:04:30,54679.244444444455
2021-01-05 05:05:00,54697.488888888874
2021-01-05 05:05:30,55256.11111111111
2021-01-05 05:06:00,59757.31111111112
2021-01-05 05:06:30,55843.88888888888
2021-01-05 05:07:00,51912.755555555545
2021-01-05 05:07:30,51175.24444444443
2021-01-05 05:08:00,51193.73333333334
2021-01-05 05:08:30,51743.73333333333
2021-01-05 05:09:00,50394.53333333334
2021-01-05 05:09:30,50070.73333333333
2021-01-05 05:10:00,50664.26666666667
2021-01-05 05:10:30,51443.22222222222
2021-01-05 05:11:00,50453.06666666667
2021-01-05 05:11:30,49595.77777777777
2021-01-05 05:12:00,50391.22222222221
2021-01-05 05:12:30,49115.022222222215
2021-01-05 05:13:00,50099.73333333333
2021-01-05 05:13:30,51361.71111111111
2021-01-05 05:14:00,50181.77777777777
2021-01-05 05:14:30,49647.866666666654
2021-01-05 05:15:00,49812.22222222222
so after ingestion:
df = pd.read_csv("file.csv")
df['mean'] = df['value'].rolling(5).mean()
df['std'] = df['value'].rolling(5).std()
df['upper'] = df['value'] + df['std']
df['lower'] = df['value'] - df['std']
df.loc[df['lower'] < 0, 'lower'] = 0
Generally you can use a variety of data for this, but it is ingestion metrics for servers.
The upper and lower std deviations are not, from my perspective, an accurate representation of a line which encapsulates the limits.
Not sure if others have ideas, but im not a statistician so certain terms in pandas have been a bit more annoying.
EDIT: Local max is a relative term based on scope. I look at it thouh as a point which has 2 adjacent points lower than it, as a local max.
In the below image, you will see a bunch of different colored lines. I am using the rolling command to get this to function as such. The light blue is a 2*std dev + mean. I have a pink as well which is the rolling max.
What my hope was, to create a line which better represents the local maximums above the mean as i dont think the 2*std dev is accurate enough.
My desired end state is to use this upper line which better fits the upper limits as the top half of an operational range
question from:
https://stackoverflow.com/questions/65601147/pandas-rolling-average-local-maximums