Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
150 views
in Technique[技术] by (71.8m points)

python - Numpy: Sorting a multidimensional array by a multidimensional array

Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head.

I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 

so now I have a 2 x 5 array of random numbers and a 2 x 5 index. I've read the help for take() about 10 times now but my brain is not groking it, obviously.

I thought this might get me there:

take(a1, index)

array([[ 0.29589188, -0.71279375, -0.18154864, -1.12184984,  0.25698875],
       [ 0.29589188, -0.71279375, -0.18154864,  0.25698875, -1.12184984]])

but that's clearly reordering only the first element (I presume because of flattening).

Any tips on how I get from where I am to a solution that sorts element 0 of a1 by element 0 of the index ... element n?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I can't think of how to work this in N dimensions yet, but here is the 2D version:

>>> a = np.random.standard_normal(size=(2,5))
>>> a
array([[ 0.72322499, -0.05376714, -0.28316358,  1.43025844, -0.90814293],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])
>>> i = np.array([[0,1,2,4,3],[0,1,2,3,4]]) 
>>> a[np.arange(a.shape[0])[:,np.newaxis],i]
array([[ 0.72322499, -0.05376714, -0.28316358, -0.90814293,  1.43025844],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])

Here is the N-dimensional version:

>>> a[list(np.ogrid[[slice(x) for x in a.shape]][:-1])+[i]]

Here's how it works:

Ok, let's start with a 3 dimensional array for illustration.

>>> import numpy as np
>>> a = np.arange(24).reshape((2,3,4))
>>> a
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

You can access elements of this array by specifying the index along each axis as follows:

>>> a[0,1,2]
6

This is equivalent to a[0][1][2] which is how you would access the same element if we were dealing with a list instead of an array.

Numpy allows you to get even fancier when slicing arrays:

>>> a[[0,1],[1,1],[2,2]]
array([ 6, 18])
>>> a[[0,1],[1,2],[2,2]]
array([ 6, 22])

These examples would be equivalent to [a[0][1][2],a[1][1][2]] and [a[0][1][2],a[1][2][2]] if we were dealing with lists.

You can even leave out repeated indices and numpy will figure out what you want. For example, the above examples could be equivalently written:

>>> a[[0,1],1,2]
array([ 6, 18])
>>> a[[0,1],[1,2],2]
array([ 6, 22])

The shape of the array (or list) you slice with in each dimension only affects the shape of the returned array. In other words, numpy doesn't care that you are trying to index your array with an array of shape (2,3,4) when it's pulling values, except that it will feed you back an array of shape (2,3,4). For example:

>>> a[[[0,0],[0,0]],[[0,0],[0,0]],[[0,0],[0,0]]]
array([[0, 0],
       [0, 0]])

In this case, we're grabbing the same element, a[0,0,0] over and over again, but numpy is returning an array with the same shape as we passed in.

Ok, onto your problem. What you want is to index the array along the last axis with the numbers in your index array. So, for the example in your question you would like [[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...

The fact that your index array is multidimensional, like I said earlier, doesn't tell numpy anything about where you want to pull these indices from; it just specifies the shape of the output array. So, in your example, you need to tell numpy that the first 5 values are to be pulled from a[0] and the latter 5 from a[1]. Easy!

>>> a[[[0]*5,[1]*5],index]

It gets complicated in N dimensions, but let's do it for the 3 dimensional array a I defined way above. Suppose we have the following index array:

>>> i = np.array(range(4)[::-1]*6).reshape(a.shape)
>>> i
array([[[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]],

       [[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]]])

So, these values are all for indices along the last axis. We need to tell numpy what indices along the first and second axes these numbers are to be taken from; i.e. we need to tell numpy that the indices for the first axis are:

i1 = [[[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]],

      [[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]]]

and the indices for the second axis are:

i2 = [[[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]],

      [[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]]]

Then we can just do:

>>> a[i1,i2,i]
array([[[ 3,  2,  1,  0],
        [ 7,  6,  5,  4],
        [11, 10,  9,  8]],

       [[15, 14, 13, 12],
        [19, 18, 17, 16],
        [23, 22, 21, 20]]])

The handy numpy function which generates i1 and i2 is called np.mgrid. I use np.ogrid in my answer which is equivalent in this case because of the numpy magic I talked about earlier.

Hope that helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...