Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
145 views
in Technique[技术] by (71.8m points)

(automated) outlier correction of univariate time series data in python

I have a dataset of several thousand timeseries. The data consists of monthly sales of different products (between 2016-2020), see the two examples below. Many of the time series (products) have outliers; which are due to additional demand from one-time projects/promotions. Unfortunately I do not have any data whatsoever about when/for which products this was the case. In the second example this would be the case for the two peaks of June 2016 and July 2018 (I apologise for the unreadable x-axis).

My goal is to eventually provide forecasts for each of the products. I'm expecting to achieve better results if I can first apply outlier corrections to such peaks, before applying forecasting models. Due to the sheer volume of products, it is not feasible for me to manually analyze/process each product. I'm looking for an automated procedure that could identify and correct these outliers, preferably in python.

Frankly, I'm a bit overwhelmed by the topic. I would greatly appreciate a list of steps/models/statistical tests/... that I should execute in sequence to solve this problem.

Some additional information that might help:

  • products may or may not be seasonal seasonal or have a trend, but I do not know which.
  • I plan to train forecasting models on 2016-2018 data, and create forecasts for entire 2019 for each product after having applied the outlier corrections (to calculate forecast accuracy)
  • I have information about product hierarchy (about 100 product groups), I'm not sure if/how I should use this for the outlier detection. note: I'm interested on forecasts on product level rather than aggregate level
  • I saw the term 'stationarity'; I do not know whether/how I should take this into account or not

Thanks a lot for help/insight on this matter

example1 example2

actual values (first value 1-2016, last value 12-2019:

ex1: [4.0, 11.0, 8.0, 4.0, 4.0, 9.0, 8.0, 5.0, 7.0, 10.0, 11.0, 3.0, 7.0, 5.0, 9.0, 3.0, 6.0, 5.0, 9.0, 1.0, 10.0, 9.0, 5.0, 2.0, 9.0, 1.0, 3.0, 8.0, 4.0, 4.0, 5.0, 5.0, 5.0, 8.0, 7.0, 5.0, 2.0, 8.0, 8.0, 4.0, 6.0, 8.0, 5.0, 4.0, 4.0, 7.0, 6.0, 4.0]

ex2: [8000.0, 8200.0, 16400.0, 13900.0, 13000.0, 15400.0, 44900.0, 5200.0, 12800.0, 17300.0, 9900.0, 12800.0, 13500.0, 17300.0, 11100.0, 15100.0, 15900.0, 20100.0, 14800.0, 6200.0, 8600.0, 12400.0, 15800.0, 14100.0, 18100.0, 26100.0, 19400.0, 14800.0, 15400.0, 48000.0, 13400.0, 11200.0, 14500.0, 12200.0, 16900.0, 4300.0, 8000.0, 11500.0, 11200.0, 17900.0, 7200.0, 19200.0, 18500.0, 6200.0, 6000.0, 11700.0, 14000.0, 7900.0, 13800.0]

question from:https://stackoverflow.com/questions/65661297/automated-outlier-correction-of-univariate-time-series-data-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...