I want to find the index of the n'th occurrence of an item in a list. e.g.,
x=[False,True,True,False,True,False,True,False,False,False,True,False,True]
What is the index of the n'th true? If I wanted the fifth occurrence (4th if zero-indexed), the answer is 10.
I've come up with:
indargs = [ i for i,a in enumerate(x) if a ]
indargs[n]
Note that x.index
returns the first occurrence or the first occurrence after some point, and therefore as far as I can tell is not a solution.
There is also a solution in numpy for cases similar to the above, e.g. using cumsum
and where
, but I'd like to know if there's a numpy-free way to solve the problem.
I'm concerned about performance since I first encountered this while implemented a Sieve of Eratosthenes for a Project Euler problem, but this is a more general question that I have encountered in other situations.
EDIT: I've gotten a lot of great answers, so I decided to do some performance tests. Below are timeit
execution times in seconds for lists with len
nelements searching for the 4000'th/1000'th True. The lists are random True/False. Source code linked below; it's a touch messy. I used short / modified versions of the posters' names to describe the functions except listcomp
, which is the simple list comprehension above.
True Test (100'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.007824 0.031117 0.002144 0.007694 0.026908 0.003563 0.003563
10000: 0.018424 0.103049 0.002233 0.018063 0.088245 0.003610 0.003769
50000: 0.078383 0.515265 0.002140 0.078074 0.442630 0.003719 0.003608
100000: 0.152804 1.054196 0.002129 0.152691 0.903827 0.003741 0.003769
200000: 0.303084 2.123534 0.002212 0.301918 1.837870 0.003522 0.003601
True Test (1000'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.038461 0.031358 0.024167 0.039277 0.026640 0.035283 0.034482
10000: 0.049063 0.103241 0.024120 0.049383 0.088688 0.035515 0.034700
50000: 0.108860 0.516037 0.023956 0.109546 0.442078 0.035269 0.035373
100000: 0.183568 1.049817 0.024228 0.184406 0.906709 0.035135 0.036027
200000: 0.333501 2.141629 0.024239 0.333908 1.826397 0.034879 0.036551
True Test (20000'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.004520 0.004439 0.036853 0.004458 0.026900 0.053460 0.053734
10000: 0.014925 0.014715 0.126084 0.014864 0.088470 0.177792 0.177716
50000: 0.766154 0.515107 0.499068 0.781289 0.443654 0.707134 0.711072
100000: 0.837363 1.051426 0.501842 0.862350 0.903189 0.707552 0.706808
200000: 0.991740 2.124445 0.498408 1.008187 1.839797 0.715844 0.709063
Number Test (750'th 0 in a list containing 0-9)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.026996 0.026887 0.015494 0.030343 0.022417 0.026557 0.026236
10000: 0.037887 0.089267 0.015839 0.040519 0.074941 0.026525 0.027057
50000: 0.097777 0.445236 0.015396 0.101242 0.371496 0.025945 0.026156
100000: 0.173794 0.905993 0.015409 0.176317 0.762155 0.026215 0.026871
200000: 0.324930 1.847375 0.015506 0.327957 1.536012 0.027390 0.026657
Hettinger's itertools solution is almost always the best. taymon's and graddy's solutions are next best for most situations, though the list comprehension approach can be better for short arrays when you want the n'th instance such that n is high or lists in which there are fewer than n occurrences. If there's a chance that there are fewer than n occurrences, the initial count
check saves time. Also, graddy's is more efficient when searching for numbers instead of True/False... not clear why that is. eyquem's solutions are essentially equivalent to others with slightly more or less overhead; eyquem_occur is approximately the same as taymon's solution, while eyquem_occurrence is similar to listcomp.
See Question&Answers more detail:
os