I have a RDD of Words, than I have another RDD of something that contains a string that if a match is made it is removed from the string.
val wordList = sc.textFile("wordList.txt").map(x => x.split(',')).map(x => x(0))
Sample of wordList:
res15: Array[String] = Array(basetting, choosinesses, concavenesses, crabbinesses, cupidinously, falliblenesses, fleecinesses, hackishes, immaterialnesses, impiousnesses)
Than I have my other:
val filterWord = posts.map(x => (x._1, x._2.split(" ").filter(x => x != (wordList)))
Sample filterWord:
res16: Array[(String, Array[String])] = Array((6,Array(how, sweet, is, it, that, we, have)), (2,Array("")), (2,Array(will, this, question, cause, an, error)), (2,Array("")), (4,Array(how, do, we, create, a, new, tag, in), (7,Array("")), (2,Array(test, after, clr, on)), (2,Array("")), (2,Array(testing, a, long, tag)), (2,Array("")))
I need to get filterWord to only contain words that are not in the wordList but doesnt seem to be working because it is not filtering out any words in the wordList and if I change it to == instead it filters out everything.
See Question&Answers more detail:
os