My question is equivalent to R-related post Create Sparse Matrix from a data frame, except that I would like to perform the same thing on Spark (preferably in Scala).
Sample of data in the data.txt file from which the sparse matrix is being created:
UserID MovieID Rating
2 1 1
3 2 1
4 2 1
6 2 1
7 2 1
So in the end the columns are the movie IDs and the rows are the user IDs
1 2 3 4 5 6 7
1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0
3 0 1 0 0 0 0 0
4 0 1 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 1 0 0 0 0 0
7 0 1 0 0 0 0 0
I've actually started by doing a map
RDD transformation on the data.txt
file (without the headers) to convert values into Integer, but then ... I could not find a function for sparse matrix creation.
val data = sc.textFile("/data/data.txt")
val ratings = data.map(_.split(',') match { case Array(user, item, rate) =>
Rating(user.toInt, item.toInt, rate.toInt)
})
...?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…