Here's a vectorized approach with rand+argsort/argpartition
trick from here
-
np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
Sample run -
In [41]: rows = 10
In [42]: np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
Out[42]:
array([[ 1, 9, 3, 26, 14, 44],
[32, 20, 27, 13, 25, 45],
[40, 12, 47, 16, 10, 29],
[ 6, 36, 32, 16, 18, 4],
[42, 46, 24, 9, 1, 31],
[15, 25, 47, 42, 34, 24],
[ 7, 16, 49, 31, 40, 20],
[28, 17, 47, 36, 8, 44],
[ 7, 42, 14, 4, 17, 35],
[39, 19, 37, 7, 8, 36]])
Just to prove the random-ness -
In [56]: rows = 1000000
In [57]: out = np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
In [58]: np.bincount(out.ravel())[1:]
Out[58]:
array([120048, 120026, 119942, 119838, 119885, 119669, 119965, 119491,
120280, 120108, 120293, 119399, 119917, 119974, 120195, 119796,
119887, 119505, 120235, 119857, 119499, 120560, 119891, 119693,
120081, 120369, 120011, 119714, 120218, 120581, 120111, 119867,
119791, 120265, 120457, 120048, 119813, 119702, 120266, 120445,
120016, 120190, 119576, 119737, 120153, 120215, 120144, 120196,
120218, 119863])
Timings on one million rows of data -
In [43]: rows = 1000000
In [44]: %timeit np.random.rand(rows, 50).argpartition(6,axis=1)[:,:6]+1
1 loop, best of 3: 1.07 s per loop