python - how to calculate running median efficiently

Question

Welcome To Ask or Share your Answers For Others

python - how to calculate running median efficiently

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - how to calculate running median efficiently

I borrowed some code trying to implement a function to calculate the running median for a ton of data. The current one is too slow for me (The tricky part is that I need to exclude all zeros from the running box). Below is the code:

from itertools import islice
from collections import deque
from bisect import bisect_left,insort

def median(s):
    sp = [nz for nz in s if nz!=0]
    print sp
    Mnow = len(sp)
    if Mnow == 0:
        return 0
    else:
        return np.median(sp)

def RunningMedian(seq, M):
    seq = iter(seq)
    s = []

    # Set up list s (to be sorted) and load deque with first window of seq
    s = [item for item in islice(seq,M)]
    d = deque(s)

    # Sort it in increasing order and extract the median ("center" of the sorted window)
    s.sort()
    medians = [median(s)]
    for item in seq:
        old = d.popleft()          # pop oldest from left
        d.append(item)             # push newest in from right
        del s[bisect_left(s, old)] # locate insertion point and then remove old 
        insort(s, item)            # insert newest such that new sort is not required        
        medians.append(median(s))
    return medians

It works well, the only drawback is that it is too slow. Any one could help me to improve the code to be more efficient?

After I explored all the possibilities, the following simple code can calculate comparably efficiently:

def RunningMedian(x,N):
    idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
    b = [row[row>0] for row in x[idx]]
    return np.array(map(np.median,b))
    #return np.array([np.median(c) for c in b])  # This also works

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:04:52+0000

One approach is below:

def RunningMedian(x,N):
    idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
    b = [row[row>0] for row in x[idx]]
    return np.array(map(np.median,b))
    #return np.array([np.median(c) for c in b])  # This also works

I found a much faster one (tens of thousand times faster), copied as below:

from collections import deque
from bisect import insort, bisect_left
from itertools import islice
def running_median_insort(seq, window_size):
    """Contributed by Peter Otten"""
    seq = iter(seq)
    d = deque()
    s = []
    result = []
    for item in islice(seq, window_size):
        d.append(item)
        insort(s, item)
        result.append(s[len(d)//2])
    m = window_size // 2
    for item in seq:
        old = d.popleft()
        d.append(item)
        del s[bisect_left(s, old)]
        insort(s, item)
        result.append(s[m])
    return result

Take a look at the link: running_median

Categories

python - how to calculate running median efficiently

python - how to calculate running median efficiently

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags