Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
319 views
in Technique[技术] by (71.8m points)

python - Convert a numpy array of lists to a numpy array

I have some data which is stored as a numpy array with dtype=object, and I would like to extract one column of lists and convert it to a numpy array. It seems like a simple problem, but the only way I've found to solve it is to recast the entire thing as a list of lists and then recast it as a numpy array. Is there a more pythonic approach?

import numpy as np

arr = np.array([[1, ['a', 'b', 'c']], [2, ['a', 'b', 'c']]], dtype=object)
arr = arr[:, 1]

print(arr)
# [['a', 'b', 'c'] ['a', 'b', 'c']]

type(arr)
# numpy.ndarray
type(arr[0])
# list

arr.shape
# (2,)

Recasting the array as dtype=str raises a ValueError since it is trying to convert each list to a string.

arr.astype(str)
# ValueError: setting an array element with a sequence

It is possible to rebuild the entire array as a list of lists and then cast it as a numpy array, but this seems like a roundabout way.

arr_2 = np.array(list(arr))

type(arr_2)
# numpy.ndarray
type(arr_2[0])
# numpy.ndarray

arr_2.shape
# (2, 3)

Is there a better way to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Though going by way of lists is faster than by way of vstack:

In [1617]: timeit np.array(arr[:,1].tolist())
...
100000 loops, best of 3: 11.5 μs per loop
In [1618]: timeit np.vstack(arr[:,1])
...
10000 loops, best of 3: 54.1 μs per loop

vstack is doing:

np.concatenate([np.atleast_2d(a) for a in arr[:,1]],axis=0)

Some alternatives:

In [1627]: timeit np.array([a for a in arr[:,1]])
100000 loops, best of 3: 18.6 μs per loop
In [1629]: timeit np.stack(arr[:,1],axis=0)
10000 loops, best of 3: 60.2 μs per loop

Keep in mind that the object array just contains pointers to the lists which are else where in memory. While the 2d nature of arr makes it easy to select the 2nd column, arr[:,1] is effectively a list of lists. And most operations on it treat it as such. Things like reshape don't cross that object boundary.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...