Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
640 views
in Technique[技术] by (71.8m points)

python - Finding the top n values in a row of a scipy sparse matrix

I have a scipy sparse matrix in CSR format. It's 72665x72665 so it's impractical to convert this matrix to a dense matrix to perform operations on (the dense representation of this matrix is like 40 gigs). The matrix is symmetric, and has about 82 million non-zero entries (~1.5%).

What I would like to be able to do is, for each row, I want to get the indices of the largest N values. If this were a numpy array, I would use np.argpartition to do it like so:

    for row in matrix:
        top_n_idx = np.argpartition(row,-n)[-n:]

Is there something similar to this I can do for a sparse matrix?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Improve from @Paul Panzer's solution. Now it can handle the case when any row has less than n values.

def top_n_idx_sparse(matrix, n):
    '''Return index of top n values in each row of a sparse matrix'''
    top_n_idx = []
    for le, ri in zip(matrix.indptr[:-1], matrix.indptr[1:]):
        n_row_pick = min(n, ri - le)
        top_n_idx.append(matrix.indices[le + np.argpartition(matrix.data[le:ri], -n_row_pick)[-n_row_pick:]])
    return top_n_idx

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...