To retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?
I actually did not realize there was a name for this, so I proved and implemented this from scratch:
import random def random_subset( iterator, K ): result = [] N = 0 for item in iterator: N += 1 if len( result ) < K: result.append( item ) else: s = int(random.random() * N) if s < K: result[ s ] = item return result
From: http://web.archive.org/web/20141026071430/http://propersubset.com:80/2010/04/choosing-random-elements.html
With a proof near the end.
1.4m articles
1.4m replys
5 comments
57.0k users