Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
419 views
in Technique[技术] by (71.8m points)

pyspark - 需要存储器中的整个稀疏矩阵来执行PCA吗?(Need the entire sparse matrix in memory to do PCA?)

The problem is from a recommendations project.

(问题来自建议项目。)

The data has ~300K users and ~200K items.

(数据有?300K用户和?200K项。)

The user-item ratings matrix would be sparse and huge, much larger than that can be fit in a RAM.

(用户项目评级矩阵将稀疏且庞大,远大于可容纳在RAM中的矩阵。)

I first want to get latent representations of the users with PCA, and then do similarity analyses of the users with the latent vectors using something like approximate nearest neighbors.

(我首先想用PCA获得用户的潜在表示,然后使用近似最近邻等方法对用户与潜在向量进行相似性分析。)

How can I approach this problem?

(我该如何解决这个问题?)

I have the options of using PySpark and/or sklearn.

(我可以选择使用PySpark和/或sklearn。)

  ask by candide translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...