Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
165 views
in Technique[技术] by (71.8m points)

r - Computing time complexity of the sparse matrix (2)

I have a data set (D) of (nxd) where n=number of rows and d= number of dimensions, I create a similarity matrix (S)(nxn) by comparing each row of the data set (D) and then convert it into a sparse matrix (tx3) where t is the number of non-zero elements of the symmetric similarity matrix (S)

The time complexity of creating the similarity matrix is o(n^2d) where d is some constant operation. The time complexity of converting a sparse matrix is theta(n^2)

My question is: While creating the similarity matrix if I perform a check that "if the similarity value is "zero" then proceed (continue) else put it into the sparse matrix". Assuming this can I say that the cost of computing the sparse matrix from the dataset (D) is O(n^2 d).

For Example:

Creating Similarity Matrix:

for i in range(0,n):
    for j in range(0,n):
        find similarity_value of D[i] and D[j]
        insert into similarity_matrix: S[i,j]= similarity_value

The above runs in O(n^2 d)
    n^2 for the loops
    d   for finding the similarity between D[i] and D[j]

Sparse Matrix creation form Simiarity matrix

for i in range(0,n):
    for j in range(0,n):
        if S[i,j]==0:
            continue
        else
            insert into sparse_matrix [i, j, S[i,j]]

The above runs in O(n^2)
    n^2 for the loops

Performing both the operation would require O(n^2 d) +O(n^2) if done one after another.

Since we require only the sparse_matrix, we create the sparse matrix directly without creating the similarity matrix.

Creating Sparse matrix directly without creating the similarity matrix:

for i in range(0,n):
    for j in range(0,n):
        find similarity_val of D[i] and D[j]
        if similarity_val==0:
            continue
        else
            insert into sparse_matrix [i,j,similarity_val]

My question is:

Wouldn't the above run in only O(n^2 d), since I am directly inserting into sparse matrix
    n^2  for the two loops
    d    for finding the similarity_val of D[i] and D[j]

Please let me know if I am missing something or my understanding of something is wrong.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For every (i, j) pair (there are n^2 in total), you reach the inner part of the loop where you find the similarity and then conditionally add elements to your sparse matrix. Finding the similarity takes "d" operations (because you need to loop through each of your dimensions) and conditionally adding an element takes a constant number of operations (either 1 comparison operation in the case where the value is 0 and 1 comparison operation plus one insertion operation in the case where the value is non-zero). Since you need to do "d" plus a constant number of operations each time you reach the inside of this double loop, in total you perform O(n^2 d) operations.

Note that this asymptotic operation count will not change if you limit the inner loop to values of j that are no less than i (aka replace for j in range(0, n) with for j in range(i, n)). This is because you will reach the inside of the loop n*(n+1)/2 times and perform "d" plus a constant number of operations, which is still O(n^2 d) total computation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...