Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
197 views
in Technique[技术] by (71.8m points)

python - Function application over numpy's matrix row/column

I am using Numpy to store data into matrices. Coming from R background, there has been an extremely simple way to apply a function over row/columns or both of a matrix.

Is there something similar for python/numpy combination? It's not a problem to write my own little implementation but it seems to me that most of the versions I come up with will be significantly less efficient/more memory intensive than any of the existing implementation.

I would like to avoid copying from the numpy matrix to a local variable etc., is that possible?

The functions I am trying to implement are mainly simple comparisons (e.g. how many elements of a certain column are smaller than number x or how many of them have absolute value larger than y).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Almost all numpy functions operate on whole arrays, and/or can be told to operate on a particular axis (row or column).

As long as you can define your function in terms of numpy functions acting on numpy arrays or array slices, your function will automatically operate on whole arrays, rows or columns.

It may be more helpful to ask about how to implement a particular function to get more concrete advice.


Numpy provides np.vectorize and np.frompyfunc to turn Python functions which operate on numbers into functions that operate on numpy arrays.

For example,

def myfunc(a,b):
    if (a>b): return a
    else: return b
vecfunc = np.vectorize(myfunc)
result=vecfunc([[1,2,3],[5,6,9]],[7,4,5])
print(result)
# [[7 4 5]
#  [7 6 9]]

(The elements of the first array get replaced by the corresponding element of the second array when the second is bigger.)

But don't get too excited; np.vectorize and np.frompyfunc are just syntactic sugar. They don't actually make your code any faster. If your underlying Python function is operating on one value at a time, then np.vectorize will feed it one item at a time, and the whole operation is going to be pretty slow (compared to using a numpy function which calls some underlying C or Fortran implementation).


To count how many elements of column x are smaller than a number y, you could use an expression such as:

(array['x']<y).sum()

For example:

import numpy as np
array=np.arange(6).view([('x',np.int),('y',np.int)])
print(array)
# [(0, 1) (2, 3) (4, 5)]

print(array['x'])
# [0 2 4]

print(array['x']<3)
# [ True  True False]

print((array['x']<3).sum())
# 2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...