Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

regex - Perl - Regexp to manipulate .csv

I've got a function in Perl that reads the last modified .csv in a folder, and parses it's values into variables.

I'm finding some problems with the regular expressions. My .csv look like:

Title is: "NAME_NAME_NAME"
"Period end","Duration","Sample","Corner","Line","PDP OUT TOTAL","PDP OUT OK","PDP OUT NOK","PDP OUT OK Rate"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","ARG - NAME 1","536","536","0","100%"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","USA - NAME 2","1850","1438","412","77.72%"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","AUS - NAME 3","8","6","2","75%"


.(ignore this dot, you will understand later)

So far, I've had some help to parse the values into some variables, by:

open my $file, "<", $newest_file
        or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {

    my ($date_time, $duration, $sample, $corner, $country_name, $pdp_in_total, $pdp_in_ok, $pdp_in_not_ok, $pdp_in_ok_rate) 
            = parse_line ',', 0, $line;

    my ($date, $time) = split /s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;

    print "$date, $time, $country, $name, $pdp_in_total, $pdp_in_ok_rate";
}

The problems are:

  1. I don't know how to make the first AND second line (that are the column names from the .csv) to be ignored;
  2. The file sometimes come with 2-5 empty lines in the end of the file, as I show in my sample (ignore the dot in the end of it, it doesn't exists in the file).

How can I do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When you have a csv file with column headers and want to parse the data into variables, the simplest choice would be to use Text::CSV. This code shows how you get your data into the hash reference $row. (I.e. my %data = %$row)

use strict;
use warnings;
use Text::CSV;
use feature 'say';

my $csv = Text::CSV->new({
        binary  => 1,
        eol => $/,
    });
# open the file, I use the DATA internal file handle here
my $title = <DATA>;

# Set the headers using the header line
$csv->column_names( $csv->getline(*DATA) );

while (my $row = $csv->getline_hr(*DATA)) {
    # you can now access the variables via their header names, e.g.:
    if (defined $row->{Duration}) {  # this will skip the blank lines
        say $row->{Duration};
    }
}

__DATA__
Title is: "NAME_NAME_NAME"    
"Period end","Duration","Sample","Corner","Line","PDP IN TOTAL","PDP IN OK","PDP IN NOT OK","PDP IN OK Rate"
"04/12/2014 10:00:00","3600","1","GRPS_INB","CHN - Name 1","1198","1195","3","99.74%"
"04/12/2014 10:00:00","3600","1","GRPS_INB","ARG - Name 2","1198","1069","129","89.23%"
"04/12/2014 10:00:00","3600","1","GRPS_INB","NLD - Name 3","813","798","15","98.15%"

If we print one of the $row variables with Data::Dumper, it shows the structure we are getting back from Text::CSV:

$VAR1 = {
          'PDP IN TOTAL' => '1198',
          'PDP IN NOT OK' => '3',
          'PDP IN OK' => '1195',
          'Period end' => '04/12/2014 10:00:00',
          'Line' => 'CHN - Name 1',
          'Duration' => '3600',
          'Sample' => '1',
          'PDP IN OK Rate' => '99.74%',
          'Corner' => 'GRPS_INB'
        };

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...