I am attempting to sort a some arrays lexicographically by rows. The integer case works perfectly:
>>> arr = np.random.choice(10, size=(5, 3))
>>> arr
array([[1, 0, 2],
[8, 0, 8],
[1, 8, 4],
[1, 3, 9],
[6, 1, 8]])
>>> np.ndarray(arr.shape[0], dtype=[('', arr.dtype, arr.shape[1])], buffer=arr).sort()
>>> arr
array([[1, 0, 2],
[1, 3, 9],
[1, 8, 4],
[6, 1, 8],
[8, 0, 8]])
I can also do the sorting with
np.ndarray(arr.shape[0], dtype=[('', arr.dtype)] * arr.shape[1], buffer=arr).sort()
In both cases, the results are the same. However, that is not the case for object arrays:
>>> selection = np.array(list(string.ascii_lowercase), dtype=object)
>>> arr = np.random.choice(selection, size=(5, 3))
>>> arr
array([['t', 'p', 'g'],
['n', 's', 'd'],
['g', 'g', 'n'],
['g', 'h', 'o'],
['f', 'j', 'x']], dtype=object)
>>> np.ndarray(arr.shape[0], dtype=[('', arr.dtype, arr.shape[1])], buffer=arr).sort()
>>> arr
array([['t', 'p', 'g'],
['n', 's', 'd'],
['g', 'h', 'o'],
['g', 'g', 'n'],
['f', 'j', 'x']], dtype=object)
>>> np.ndarray(arr.shape[0], dtype=[('', arr.dtype)] * arr.shape[1], buffer=arr).sort()
>>> arr
array([['f', 'j', 'x'],
['g', 'g', 'n'],
['g', 'h', 'o'],
['n', 's', 'd'],
['t', 'p', 'g']], dtype=object)
Clearly only the case with dtype=[('', arr.dtype)] * arr.shape[1]
is working properly. Why is that? What is different about dtype=[('', arr.dtype, arr.shape[1])]
? The sort is clearly doing something, but the order appears to be nonsensical at first glance. Is it using pointers as the sort keys?
For what it's worth, np.searchsorted
appears to be doing the same sort of comparison as np.sort
, as expected.
See Question&Answers more detail:
os