I have a data set (D) of (nxd) where n=number of rows and d= number of dimensions, I create a similarity matrix (S)(nxn) by comparing each row of the data set (D) and then convert it into a sparse matrix (tx3) where t is the number of non-zero elements of the symmetric similarity matrix (S)
The time complexity of creating the similarity matrix is o(n^2d) where d is some constant operation.
The time complexity of converting a sparse matrix is theta(n^2)
My question is:
While creating the similarity matrix if I perform a check that "if the similarity value is "zero" then proceed (continue) else put it into the sparse matrix". Assuming this can I say that the cost of computing the sparse matrix from the dataset (D) is O(n^2 d).
For Example:
Creating Similarity Matrix:
for i in range(0,n):
for j in range(0,n):
find similarity_value of D[i] and D[j]
insert into similarity_matrix: S[i,j]= similarity_value
The above runs in O(n^2 d)
n^2 for the loops
d for finding the similarity between D[i] and D[j]
Sparse Matrix creation form Simiarity matrix
for i in range(0,n):
for j in range(0,n):
if S[i,j]==0:
continue
else
insert into sparse_matrix [i, j, S[i,j]]
The above runs in O(n^2)
n^2 for the loops
Performing both the operation would require O(n^2 d) +O(n^2) if done one after another.
Since we require only the sparse_matrix, we create the sparse matrix directly without creating the similarity matrix.
Creating Sparse matrix directly without creating the similarity matrix:
for i in range(0,n):
for j in range(0,n):
find similarity_val of D[i] and D[j]
if similarity_val==0:
continue
else
insert into sparse_matrix [i,j,similarity_val]
My question is:
Wouldn't the above run in only O(n^2 d), since I am directly inserting into sparse matrix
n^2 for the two loops
d for finding the similarity_val of D[i] and D[j]
Please let me know if I am missing something or my understanding of something is wrong.
See Question&Answers more detail:
os