algorithm - How to count words in java

Question

Welcome To Ask or Share your Answers For Others

algorithm - How to count words in java

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - How to count words in java

I am looking for an algorithm, hint or any source code that can solve my following problem.

I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)

For example i have two documents: Doc A => "brown fox jump" Doc B => "dog not jump" Doc C = > "fox jump dog"

Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on...... Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....

Any advice or pseudo code?

Hint: It is also called inverse document frequency ( Idf ). I know what is idf.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:07:51+0000

Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:

Open new document
For every word, look at your hashmap if it's already there. If it isn't, create a new key in HashMap with this word, and in that position add the new document (the filename). If it is, just add the filename of the document.

For example, if you have: DocA: Brown fox jump DocB: Fox jump dog

You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.

Categories

algorithm - How to count words in java

algorithm - How to count words in java

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags