There is no need to start so many new shell subprocesses to do such basic operations. ls
, fgrep
, grep
and echo
all have equivalents in Perl, and especially calling echo
for each line of text is a very poor way of copying one file to another
I suspect that your problem is because of the line
my $cmd5 = `echo "$currentguiline" >> $FinalVariants`;
which will append each element of @currentlines
to the end of the file. So the first time you run your program it will contain a single copy of the result, but every subsequent run will just add more data to the end of your file and it will keep growing
I hate to offer a hack to get things working, but it would take me ages to understand what your program is doing behind all the confusion and write a proper concise version. You can fix it temporarily by adding the line
unlink $FinalVariants or die $!;
before the foreach ( @tsvfiles ) { ... }
loop. This will delete the file and ensure that a new version is created for each execution of your program.
Okay, I've studied your code carefully and I think this will do what you intend. Without any data or even file name samples I've been unable to test it beyond making sure that it compiles, so it will be a miracle if it works first time, but I believe it's the best chance you have of getting a coherent solution
Note that there's a problem with $refreverse
that you use in your own code but never declare or define it, so there's no way that the code you show will create the problem you say it does because it dies during compilation with the error message
Global symbol "$refreverse" requires explicit package name
I've guessed that it's right after $ref_forward
at index 68
Please report back about how well this functions
#!/usr/bin/perl
use strict;
use warnings 'all';
my $home = "/data";
my $tsv_directory = "$home/test_all_runs/$ARGV[0]";
my $final_variants = "$tsv_directory/final_variant_file.txt";
open my $out_fh, '>', $final_variants
or die qq{Unable to open "$final_variants" for output: $!};
my @tsv_files = glob "$tsv_directory/FOCUS*.tsv";
for my $tsv_file ( @tsv_files ) {
print "The current VCF is ############# $tsv_file
";
$tsv_file =~ m|([^/]+)-oncogene.tsv$| or die "Cant extract Sample ID";
my $sample_id = $1;
print "The sample ID is ############## $sample_id
";
open my $in_fh, '<', $tsv_file
or die qq{Unable to open "$tsv_file" for input: $!};
while ( <$in_fh> ) {
next if /^#/;
next if /(?:CNV|intronic|synonymous|utr_3|utr_5)/;
my @fields = split;
next if $fields[70] eq '0/0' or $fields[70] eq './.';
my @wanted = ( 9, 10, 21, 67, 68, 70, 77, 78, 81, 83, 84, 88, 92, 98, 100 );
my $current_line = join " ", @fields[@wanted];
print $out_fh $current_line, "
";
}
}