Welcome To Ask or Share your Answers For Others

pyspark - 需要存储器中的整个稀疏矩阵来执行PCA吗？(Need the entire sparse matrix in memory to do PCA?)

posted Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

pyspark - 需要存储器中的整个稀疏矩阵来执行PCA吗？(Need the entire sparse matrix in memory to do PCA?)

The problem is from a recommendations project.

(问题来自建议项目。)

The data has ~300K users and ~200K items.

(数据有?300K用户和?200K项。)

The user-item ratings matrix would be sparse and huge, much larger than that can be fit in a RAM.

(用户项目评级矩阵将稀疏且庞大，远大于可容纳在RAM中的矩阵。)

I first want to get latent representations of the users with PCA, and then do similarity analyses of the users with the latent vectors using something like approximate nearest neighbors.

(我首先想用PCA获得用户的潜在表示，然后使用近似最近邻等方法对用户与潜在向量进行相似性分析。)

How can I approach this problem?

(我该如何解决这个问题？)

I have the options of using PySpark and/or sklearn.

(我可以选择使用PySpark和/或sklearn。)

ask by candide translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

...

Categories

pyspark - 需要存储器中的整个稀疏矩阵来执行PCA吗？(Need the entire sparse matrix in memory to do PCA?)

pyspark - 需要存储器中的整个稀疏矩阵来执行PCA吗？(Need the entire sparse matrix in memory to do PCA?)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags