Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
198 views
in Technique[技术] by (71.8m points)

Rolling median algorithm in C

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial window of values, then perform a binary search to insert the new value and remove the existing one at each iteration.

The second (from Hardle and Steiger, 1995, JRSS-C, Algorithm 296) builds a double-ended heap structure, with a maxheap on one end, a minheap on the other, and the median in the middle. This yields a linear-time algorithm instead of one that is O(n log n).

Here is my problem: implementing the former is doable, but I need to run this on millions of time series, so efficiency matters a lot. The latter is proving very difficult to implement. I found code in the Trunmed.c file of the code for the stats package of R, but it is rather indecipherable.

Does anyone know of a well-written C implementation for the linear time rolling median algorithm?

Edit: Link to Trunmed.c code http://google.com/codesearch/p?hl=en&sa=N&cd=1&ct=rc#mYw3h_Lb_e0/R-2.2.0/src/library/stats/src/Trunmed.c

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have looked at R's src/library/stats/src/Trunmed.c a few times as I wanted something similar too in a standalone C++ class / C subroutine. Note that this are actually two implementations in one, see src/library/stats/man/runmed.Rd (the source of the help file) which says

details{
  Apart from the end values, the result code{y = runmed(x, k)} simply has
  code{y[j] = median(x[(j-k2):(j+k2)])} (k = 2*k2+1), computed very
  efficiently.

  The two algorithms are internally entirely different:
  describe{
    item{"Turlach"}{is the H?rdle-Steiger
      algorithm (see Ref.) as implemented by Berwin Turlach.
      A tree algorithm is used, ensuring performance eqn{O(n log
        k)}{O(n * log(k))} where code{n <- length(x)} which is
      asymptotically optimal.}
    item{"Stuetzle"}{is the (older) Stuetzle-Friedman implementation
      which makes use of median emph{updating} when one observation
      enters and one leaves the smoothing window.  While this performs as
      eqn{O(n imes k)}{O(n * k)} which is slower asymptotically, it is
      considerably faster for small eqn{k} or eqn{n}.}
  }
}

It would be nice to see this re-used in a more standalone fashion. Are you volunteering? I can help with some of the R bits.

Edit 1: Besides the link to the older version of Trunmed.c above, here are current SVN copies of

Edit 2: Ryan Tibshirani has some C and Fortran code on fast median binning which may be a suitable starting point for a windowed approach.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...