perl script miscounting because of empty lines - perl

the below script is basically catching the second column and counting the values. The only minor issue I have is that the file has empty lines at the end (it's how the values are being exported) and because of these empty lines the script is miscounting. Any ideas please? Thanks.
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n";

I think the main issue has been established, you are using $1 when it is not conditionally tied to the regex match, which causes you to add values when you should not. This is an alternative solution:
$sum_column_b += $1 if $line =~ m/\s+(\d+)/;
Typically, you should never use $1 unless you check that the regex you expect it to come from succeeded. Use either something like this:
if ($line =~ /(\d+)/) {
$sum += $1;
}
Or use direct assignment to a variable:
my ($num) = $line =~ /(\d+)/;
$sum += $num;
Note that you need to use list context by adding parentheses around the variable, or the regex will simply return 1 for success. Also note that, like Borodin says, this will give an undefined value when the match fails, and you must add code to check for that.
This can be handy when capturing several values:
my #nums = $line =~ /(\d+)/g;

The main problem is that if the regex does not match, then $1 will hold the value it received in the previous successful match. So every empty line will cause the previous line to be counted again.
An improvement would be:
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
next if $line =~ /^\s*$/; # skip "empty" lines
# ... maybe skip other known invalid lines
if ($line =~ m/\s+(\d+)/) { #regexpr to catch second column values
$sum_column_b += $1;
} else {
warn "problematic line '$line'\n"; # report invalid lines
}
}
print $sum_column_b, "\n";
The else-block is of course optional but can help noticing invalid data.

Try putting this line just after the while line:
next if ( $line =~ /^$/ );
Basically, loop around to the next line if the current line has no content.

#!/usr/bin/perl
use warnings;
use strict;
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while (my $line = <$file>) {
next if (m/^\s*$/); # next line if this is unsignificant
if ($line =~ m/\s+(\d+)/) {
$sum_column_b += $1;
}
}
print "$sum_column_b\n";

Related

Perl script - Confusing error

When I run this code, I am purely trying to get all the lines containing the word "that" in them. Sounds easy enough. But when I run it, I get a list of matches that contain the word "that" but only at the end of the line. I don't know why it's coming out like this and I have been going crazy trying to solve it. I am currently getting an output of 268 total matches, and the output I need is only 13. Please advise!
#!/usr/bin/perl -w
#Usage: conc.shift.pl textfile word
open (FH, "$ARGV[0]") || die "cannot open";
#array = (1,2,3,4,5);
$count = 0;
while($line = <FH>) {
chomp $line;
shift #array;
push(#array, $line);
$count++;
if ($line =~ /that/)
{
$output = join(" ",#array);
print "$output \n";
}
}
print "Total matches: $count\n";
Don't you want to increment your $count variable only if the line contains "that", i.e.:
if ($line =~ /that/) {
$count++;
instead of incrementing the counter before checking if $line contains "that", as you have it:
$count++;
if ($line =~ /that/) {
Similarly, I suspect that your push() and join() calls, for stashing a matching line in #array, should also be within the if block, only executed if the line contains "that".
Hope this helps!

Perl: printing original file with changes

I wrote this code and it works fine, it should find lines in which there's no string like 'SID' and append a pipe | at the beginning of the line, so like this: find all lines in which there's no 'SID' and append a pipe | at the beginning of the line. But how I wrote it, I can just output the lines which were changed and have a pipe. What I actually want: leave the file as it is and just append the pipes to the lines which match. Thank you.
#!usr/bin/perl
use strict;
use warnings;
use autodie;
my $fh;
open $fh, '<', 'file1.csv';
my $out = 'file2.csv';
open(FILE, '>', $out);
my $myline = "";
while (my $line = <$fh>) {
chomp $line;
unless ($line =~ m/^SID/) {
$line =~ m/^(.*)$/;
$myline = "\|$1";
}
print FILE $myline . "\n";
}
close $fh;
close FILE;
my file example:
SID,bla
foo bar <- my code adds the pipe to the beginning of this line
output should be like this:
SID,bla
| foo bar
but in my case I only print $myline, I know:
| foo bar
The line
$line =~ m/^(.*)$/
is misguided: all it does is put the contents of $line into $1, so the following statement
$myline = "\|$1"
may as well be
$myline = "|$line"
(The pipe | doesn't need escaping unless it is part of a regular expression.)
Since you are printing $myline at the end of your loop you are never seeing the contents of unmodified lines.
You can fix that by printing $line or $myline according to which one contains the required output, like this
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^SID/) {
print "$line\n";
}
else {
my $myline = "|$line";
print "$myline\n";
}
}
or, much more simply, by dropping the intermediate variable and using the default $_ for the input lines, like this
while (<$fh>) {
print '|' unless /^SID/;
print;
}
Note that I have also removed the chomp as it just means you have to put the newline back on the end of the string when you print it.
Instead of creating a new variable $myline, use the one you already have:
while (my $line =<$fh>) {
$line = '|' . $line if $line !~ /^SID/;
print FILE $line;
}
Also, you can use lexical filehandle for the output file as well. Moreover, you should check the return value of open:
open my $OUT, '>', $out or die $!;

Ignore lines in a file till match and process lines after that

I am looping over lines in a file and when matched a particular line, i want to process the lines after the current (matched) line. I can do it :-
open my $fh, '<', "abc" or die "Cannot open!!";
while (my $line = <$fh>){
next if($line !~ m/Important Lines below this Line/);
last;
}
while (my $line = <$fh>){
print $line;
}
Is there a better way to do this (code needs to be a part of a bigger perl script) ?
I'd use flip-flop operator:
while(<DATA>) {
next if 1 .. /Important/;
print $_;
}
__DATA__
skip
skip
Important Lines below this Line
keep
keep
output:
keep
keep

Nested foreach loop not working

It should be a simple nested foreach loop but it's not working and really starting to annoy me that I can't figure this out! Still a perl beginner but I thought I understood this by now. Can someone explain to me where I'm going wrong? The idea is simple: 2 files, 1 small, 1 large with info I want in the small one. Both have unique id's in them. Compare and match the id's and output a new small file with the added info in the small file.
I have 2 pieces of code: 1 without stricts and 1 with and both are not working. I know to use stricts but i'm still curious as to why the one without stricts isn't working either.
WITOUT STRICTS:
if ($#ARGV != 2){
print "input_file1 input_file2 output_file\n";
exit;
}
$inputfile1=$ARGV[0];
$inputfile2=$ARGV[1];
$outputfile1=$ARGV[2];
open(INFILE1,$inputfile1) || die "No inputfile :$!\n";
open(INFILE2,$inputfile2) || die "No inputfile :$!\n";
open(OUTFILE_1,">$outputfile1") || die "No outputfile :$!\n";
$i = 0;
$j = 0;
#infile1=<INFILE1>;
#infile2=<INFILE2>;
foreach ( #infile1 ){
#elements = split(";",$infile1[$i]);
$id1 = $elements[3];
print "1. $id1\n";
$lat = $elements[5];
$lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2. $lat\n";
print "3. $lon\n";
foreach ( #infile2 ){
#loopelements = split(";",$infile2[$j]);
$id2 = $loopelements[4];
print "4. $id2\n";
if ($id1 == $id2){
print OUTFILE_1 "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
#elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close(INFILE1);
close(INFILE2);
close(OUTFILE_1);
The error without is the second loop will not start if i'm not mistaken.
WITH STRICTS:
use strict;
use warnings;
my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";
open my $INFILE1, '<', $inputfile1 or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use/Not found :$!\n";
my $i = 0;
my $j = 0;
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
my $id1 = $elements[3];
print "1: $id1\n";
my $lat = $elements[5];
my $lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2: $lat\n";
print "3: $lon\n";
foreach ( my $infile2 = <$INFILE2> ){
my #loopelements = split(";",$infile2[$j]);
my $id2 = $loopelements[4];
print "4: $id2\n";
if ($id1 == $id2){
print $OUTFILE "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
##elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close($INFILE1);
close($INFILE2);
close($OUTFILE);
The error with stricts:
Global symbol "#infile1" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 16.
Global symbol "#infile2" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 31.
Your 'strict' implementation gives you errors due to a confusion about the sigils (the $ and # characters) indication whether a variable is an scalar or an array. In the loop statement you are reading each line of the file into a scalar called $infile1 but in the following line you are trying to access a element of the array #infile1. These to variables are not related and as perl tells you the latter is not declared.
Another problem with you 'strict' implementation is that you are reading the file inside the loop. This means that for nested loops you will read file 2 in the first iteration of the outer loop and for all succeeding iterations the inner loop will not be able to read any lines.
I missed the foreach/while issue, pointed out by stevenl, even fixing the stricture issues will leave you with foreach loops with only one iteration.
I'm not sure what your problem with the unstrict script are.
But I wouldn't use a nested loop at all for processing two files. I would un-nest the loops, so it roughly looked like this:
my %cord;
while ( my $line = <$INFILE1> ) {
my #elements = split /;/, $line;
$cord{ $elements[3] } = "$elements[5],$elements[6]";
}
while ( my $line = <$INFILE2> ) {
my #elements = split /;/, $line;
if ( exists %coord{ $elements[4] } ) {
print $OUTFILE "....;$cord{ $elements4 }\n";
}
}
I can't see exactly where the problem with the non-strict version is. What is the problem that you are encountering?
The problem with the strict version is particularly in these 2 lines:
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
You have a scalar $infile1 in the first line, but you are treating it as an array in the next line. Also, change the foreach to a while (see below).
A few comments.
For the non-strict version, you could have collapsed the loop to a C-style for loop as:
for (my $i = 0; $i < #infile1; $i++) {
...
}
That can be made simpler to read if you go without the array indexes altogether:
foreach my $infile1 (#infile1) {
my #elements = split ';', $infile1;
...
}
But with the larger file, it might take time to slurp the entire file into the array at the beginning. So it might be better to iterate through the file as you go:
while (my $infile = <$INFILE1>) {
...
}
Note the last point should be how the strict version looks. You need a while loop rather than a foreach loop, because assigning <$INFILE1> to a scalar means it will return the next line only, which evaluates to true as long as there is another line in the file. (Thus, the foreach would only ever get the first line to loop over.)
You don't reset $j before the inner foreach loop runs. Therefore, the second time your inner loop runs, you are trying to access elements that are past the end of the array. This mistake exists in both the strict and non-strict version.
You should not be using $i and $j at all; the point of foreach is that it automatically gets each element for you. Here is an example of correctly using foreach in the inner loop:
foreach my $line ( #infile2 ){
#loopelements = split(";",$line);
#...now do stuff as before
}
This puts each element of #infile one into the variable $line in succession, until you have gone through all of the array.

Cleanest Perl parser for Makefile-like continuation lines

A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.
I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)
There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar
Can you write cleaner, perl-ish, code that passes all my tests?
#!/usr/bin/perl -w
use strict;
sub process_file_with_continuations {
my $processref = shift #_;
my $nextline;
my $line = <ARGV>;
$line = '' unless defined $line;
chomp $line;
while (defined($nextline = <ARGV>)) {
chomp $nextline;
next if $nextline =~ /^\s*#/; # skip comments
$nextline =~ s/\s+$//g; # remove trailing whitespace
if (eof()) { # Handle EOF
$nextline =~ s/^\s+/ /;
if ($nextline =~ /^\s+/) { # indented line
&$processref($line . $nextline);
}
else {
&$processref($line);
&$processref($nextline) if $nextline ne '';
}
$line = '';
}
elsif ($nextline eq '') { # blank line
&$processref($line);
$line = '';
}
elsif ($nextline =~ /^\s+/) { # indented line
$nextline =~ s/^\s+/ /;
$line .= $nextline;
}
else { # non-indented line
&$processref($line) unless $line eq '';
$line = $nextline;
}
}
&$processref($line) unless $line eq '';
}
sub process_one_line {
my $line = shift #_;
print "$line\n";
}
process_file_with_continuations \&process_one_line;
How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:
#!/usr/bin/perl
use strict;
use warnings;
$/ = undef; # we want no input record separator.
my $file = <>; # slurp whole file
$file =~ s/^\n//; # Remove newline at start of file
$file =~ s/\s+\n/\n/g; # Remove trailing whitespace.
$file =~ s/\n\s*#[^\n]+//g; # Remove comments.
$file =~ s/\n\s+/ /g; # Merge continuations
# Done
print $file;
If you don't mind loading the entire file in memory, then the code below passes the tests.
It stores the lines in an array, adding each line either to the previous one (continuation) or at the end of the array (other).
#!/usr/bin/perl
use strict;
use warnings;
my #out;
while( <>)
{ chomp;
s{#.*}{}; # suppress comments
next unless( m{\S}); # skip blank lines
if( s{^\s+}{ }) # does the line start with spaces?
{ $out[-1] .= $_; } # yes, continuation, add to last line
else
{ push #out, $_; } # no, add as new line
}
$, = "\n"; # set output field separator
$\ = "\n"; # set output record separator
print #out;