Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
126 views
in Technique[技术] by (71.8m points)

python - How can I optimize a functional mergesort for large lists

I am trying to learn functional programming and algorithms at the same time, and Ive implemented a merge sort in Haskell. Then I converted the style into python and run a test on a learning platform, but I get return that it takes too long time to sort a list on a 1000 integers.

Is there a way i can optimize my python code and still keep my functional style or do I have to solve the problem iteratively?

Thanks in advance.

So here is the code I made in Haskell first.

merge :: Ord a => [a] -> [a] -> [a]
merge [] xs = xs
merge ys [] = ys
merge (x:xs) (y:ys)
  | (x <= y) = x : (merge xs (y:ys))
  | otherwise = y : (merge (x:xs) ys)

halve :: [a] -> ([a] , [a])
halve [x] = ([x], [])
halve xs = (take n xs , drop n xs)
        where n = length xs `div` 2

msort :: Ord a => [a] -> [a]
msort [x] = [x]
msort [] = []
msort xs = merge (msort n) (msort m)
    where (n,m) = halve xs

Then I made this code in python based on the Haskell style.

import sys
sys.setrecursionlimit(1002) #This is because the recursion will go 1002 times deep when I have a list on 1000 numbers.
    
def merge(xs,ys):
    if len(xs) == 0:
        return ys
    elif len(ys) == 0:
        return xs
    else:
        if xs[0] <= ys[0]:
            return [xs[0]] + merge(xs[1:], ys)
        else:
            return [ys[0]] + merge(xs, ys[1:])

def halve(xs):
    return (xs[:len(xs)//2],xs[len(xs)//2:])

def msort(xss):
    if len(xss) <= 1:
        return xss
    else:
        xs,ys = halve(xss)
        return merge(msort(xs), msort(ys))

Is there a smarter way I can optimize the python version and still have a functional style?

question from:https://stackoverflow.com/questions/65845242/how-can-i-optimize-a-functional-mergesort-for-large-lists

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Haskell lists are lazy. [x] ++ xs first produces the x, and then it produces all the elements in xs.

In e.g. Lisp the lists are singly-linked lists and appending them copies the first list, so prepending a singleton is an O(1) operation.

In Python though the appending copies the second list (as confirmed by @chepner in the comments), i.e. [x] + xs will copy the whole list xs and thus is an O(n) operation (where n is the length of xs).

This means that both your [xs[0]] + merge(xs[1:], ys) and [ys[0]] + merge(xs, ys[1:]) lead to quadratic behavior which you observe as the dramatic slowdown you describe.

Python's equivalent to Haskell's lazy lists is not lists, it's generators, which produce their elements one by one on each yield. Thus the rewrite could look something like

def merge(xs,ys):
    if len(xs) == 0:
        return ys
    elif len(ys) == 0:
        return xs
    else:
        a = (x for x in xs)     # or maybe iter(xs)
        b = (y for y in ys)     # or maybe iter(ys)
        list( merge_gen(a,b))

Now what's left is to re-implement your merge logic as merge_gen which expects two generators (or should that be iterators? do find out) as its input and generates the ordered stream of elements which it gets by pulling them one by one from the two sources as needed. The resulting stream of elements is converted back to list, as expected by the function's caller. No redundant copying will be performed.

If I've made some obvious Python errors, please treat the above as a pseudocode.


Your other option is to pre-allocate a second list of the same length and copy the elements between the two lists back and forth while merging, using indices to reference the elements of the arrays and mutating the contents to store the results.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...