I am looking for an algorithm, hint or any source code that can solve my following problem.
I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)
For example i have two documents:
Doc A => "brown fox jump"
Doc B => "dog not jump"
Doc C = > "fox jump dog"
Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on......
Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....
Any advice or pseudo code?
Hint: It is also called inverse document frequency ( Idf ). I know what is idf.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…