Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
185 views
in Technique[技术] by (71.8m points)

compare - Perl program to find matching words in a paragraph

I have two text files.

The first one has a list of words, like the following:

File 1.txt

Laura
Samuel
Gerry
Peter
Maggie

The second one has paragraphs on it. For example

File2.txt

Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along

All I want the program to do is look for common words and print MATCH beside the matching words in File2.txt or to a third output file.

So the desired output should look like this.

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along

I have tried the following code, however I am not getting the desired output.

use warnings;
use strict;

use Data::Dumper;

my $result = { };

my $first_file  = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output      = 'output2.txt';

open my $a_fh, '<', $first_file  or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";

open( OUTPUT,  '>' . $output ) or die "Cannot create $output.
";

while ( <$a_fh> ) {
    chomp;
    next if /^$/;
    $result->{$_}++;
}

while ( <$b_fh> ) {

    chomp;

    next if /^$/;

    if ( $result->{$_} ) {
        delete $result->{$_};
        $result->{ join " |" => $_, "MATCH" }++;
    }
    else {
        $result->{$_}++;
    }
}

{
    $Data::Dumper::Sortkeys = 0;
    print OUTPUT Dumper $result;
}

But the output that I am getting is like this.

Laura  | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH

The output is not in a paragraph format, nor is it printing MATCH for all matches.

Please advise.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's one example, which allows doing multiple files. I populate an array @files with the files I want to compare, then I read in the wordlist file and put them all into a hash, then iterate over the paragraph files one line at a time. I then separate all the words on each line, and print them, but only after checking whether the word is in wordlist. If it is, I print it with " | MATCH".

Paragraph file 1:

Laura is about to meet Gerry, and is planning to take Peter along.

But Peter and Sarah have other plans.

Paragraph file 2:

Blah Peter has lost it.

The code:

use warnings;
use strict;

my @files = ('file.txt', 'file2.txt');

open my $word_fh, '<', 'wordlist.txt' or die $!;

my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;

close $word_fh;

check($_) for @files;

sub check {
    my $file = shift;

    open my $fh, '<', $file or die $!;

    while (<$fh>){
        chomp;
        my @words_in_line = split;

        for my $word (@words_in_line){
            $word =~ s/[.,;:!]//g;
            $word .= ' | MATCH' if exists $words_to_match{$word};
            print "    $word
";
        }
        print "
";
    }
}    

Output:

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans

Blah
Peter | MATCH
has
lost
it

If you want to print it to a file, open a write file handle, and change the print statement inside the while loop to print $wfh ....


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...