I have an array of unsigned integers stored on the GPU with CUDA (typically 1000000
elements). I would like to count the occurrence of every number in the array. There are only a few distinct numbers (about 10
), but these numbers can span from 1 to 1000000
. About 9/10
th of the numbers are 0
, I don't need the count of them. The result looks something like this:
58458 -> 1000 occurrences
15 -> 412 occurrences
I have an implementation using atomicAdd
s, but it is too slow (a lot of threads write to the same address). Does someone know of a fast/efficient method?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…