The reason you are observing a difference in performance is because Thrust is implementing the sort with different algorithms depending on the arguments provided to thrust::sort
.
In case 1., Thrust can prove that the sort can be implemented in linear time with a radix sort. This is because the type of the data to sort is a built-in numeric type (int
), and the comparison function is the built-in less than operation -- Thrust recognizes that thrust::less<int>
will produce the equivalent result as x < y
.
In case 2., Thrust knows nothing about your user-provided less<int>
, and has to use a more conservative algorithm based on a comparison sort which has different asymptotic complexity, even though in truth your less<int>
is equivalent to thrust::less<int>
.
In general, user-defined comparison operators can't be used with more restrictive, faster sorts which manipulate the binary representation of data such as radix sort. In these cases, Thrust falls back on a more general, but slower sort.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…