When comparing two files, how do I skip (ignore) blank lines? - perl

I'm comparing line against line of two text files, ref.txt (reference) and log.txt. But there may be an arbitrary number of blank lines in either file that I'd like to ignore; how can I accomplish this?
ref.txt
one
two
three
end
log.txt
one
two
three
end
There would be no incorrect log lines in the output, in other words log.txt matches with ref.txt.
What I like to accomplish in pseudo code:
while (traversing both files at same time) {
if ($l is blank line || $r is blank line) {
if ($l is blank line)
skip to next non-blank line
if ($r is blank line)
skip to next non-blank line
}
#continue with line by line comparison...
}
My current code:
use strict;
use warnings;
my $logPath = ${ARGV [0]};
my $refLogPath = ${ARGV [1]} my $r; #ref log line
my $l; #log line
open INLOG, $logPath or die $!;
open INREF, $refLogPath or die $!;
while (defined($l = <INLOG>) and defined($r = <INREF>)) {
#code for skipping blank lines?
if ($l ne $r) {
print $l, "\n"; #Output incorrect line in log file
$boolRef = 0; #false==0
}
}

If you are on a Linux platform, use :
diff -B ref.txt log.txt
The -B option causes changes that just insert or delete blank lines to be ignored

You can skip blank lines by comparing it to this regular expression:
next if $line =~ /^\s*$/
This will match any white space or newline characters which can potentially make up a blank line.

This way seems the most "perl-like" to me. No fancy loops or anything, just slurp the files and grep out the blank lines.
use warnings;
$f1 = "path/file/1";
$f2 = "path/file/2";
open(IN1, "<$f1") or die "Cannot open file: $f1 ($!)\n";
open(IN2, "<$f2") or die "Cannot open file: $f2 ($!)\n";
chomp(#lines1 = <IN1>); # slurp the files
chomp(#lines2 = <IN2>);
#l1 = grep(!/^\s*$/,#lines1); # get the files without empty lines
#l2 = grep(!/^\s*$/,#lines2);
# something like this to print the non-matching lines
for $i (0 .. $#l1) {
print "[$f1 $i]: $l1[$i]\n[$f2 $i]: $l2[$i]\n" if($l1[$i] ne $l2[$i]);
}

You can loop to find each line, each time:
while(1) {
while(defined($l = <INLOG>) and $l eq "") {}
while(defined($r = <INREF>) and $r eq "") {}
if(!defined($l) or !defined($r)) {
break;
}
if($l ne $r) {
print $l, "\n";
$boolRef = 0;
}
}

man diff
diff -B ref.txt log.txt

# line skipping code
while (defined($l=<INLOG>) && $l =~ /^$/ ) {} # no-op loop exits with $l that has length
while (defined($r=<INREF>) && $r =~ /^$/ ) {} # no-op loop exits with $r that has length

Related

prepend the next read in line with information from the previous read in line

Development Platform: Ubuntu 17.10 mainly command line work
Tools: perl 5.26 and postgresql 9.6
goal: Convert a file so I can \COPY it into postgresql
Information: Line delimiter is the # sign
Database table columns: id work composer artist conductor orchestra album_title
problem: append next info line to current id line
In the following how do I preserve the 'mmfmm01#' so upon the next line iteration I can prepend it to that next line? As this is my first post please let me know if the code example is too much or too little.
I am going from this:
Le Nozze di Figaro, K. 492 / The Marriage of Figaro: Le Nozze di
Figaro, K. 492 / The Marriage of Figaro: Cinque ... dieci ... venti /
Five ...Ten ...Twenty
Eventually to this:
mfmm01#Cinque dieci venti#Mozart####Entertaining Made Simple Merlot,
Filet Mignon, Mozart
After running the script I have the following:
mfmm01#
How do I have to preserve the 'mfmm01#' so upon the next line iteration I can prepend it to that next line?
#!/usr/bin/perl
# use clauses for File, cwd, etc#
# usage statment
# Variables - append _orig to input_file
my $id = $ARGV[1];
my $input_file = $ARGV[0];
my $album_title = $ARGV[2];
my $output_file = output_file;
my $input_file_orig = $input_file;
$input_file_orig = $input_file_orig .= _orig;
##############################################
# Ensure that the provided input file exists #
##############################################
##########################################################
# Read all file lines into an array #
##########################################################
###########################################################
# Modify each line to meet the following specs: #
# id#work#composer#artist#conductor#orchestra#album_title #
###########################################################
for my $line (#lines) {
$line =~ s/[\n\r\t]+//g;
######################################################
# Ignore lines with num:num, lines that begin with $ #
# and emptry string lines #
# ####################################################
if ( $line =~ /[0-9]:/m ) {
next;
} elsif ( $line =~ /^\$/m ) {
next;
} else {
if ( $line =~ /^\s*$/m ) {
next;
}
}
########################################################
# If line is a number followed by a space, prepend id #
# and replace space with the # character #
########################################################
if ( $line =~ /^\d\d\s/m ) {
$id_num = $line;
$id_num =~ s/(\d\d)\s/$id$1#/g;
} else {
if ( $line =~ /^\d\s/m ) {
$id_num = $line;
$id_num =~ s/(\d)\s/$id$1#/g;
# print ("\$line after removing space: \"$line\"\n");
}
}
####################################################
# If line begins with an alphabetic character then #
# prepend id_num and append album_title #
####################################################
if ( $line =~ /Sold/m ) {
next;
}
if ( $line =~ /^[A-Z]/m ) {
################################################i##
# At this point $line exists but $id_num is empty #
# I thought $id_num would live through the next #
# line read #
###################################################
$prepend_line =~ s/($line)/$id_num$1/g;
print("$prepend_line");
$append_line =~ s/($prepend_line)/$1#Mozart###$album_title/g;
open my $ofh, '>>', $output_file or die $!;
print $ofh "$append_line\n";
close $ofh or die $!;
print("\$append_line: $append_line\n");
}
}
1;
I will fix this by capturing the value to a file. Upon the next line read I will extract the value and prepend it to the string. Thank you for looking at this. I don't know why I didn't think of this before.
Thank you;
Sherman

How to get a comment printed for each line of text that matches within a file?

I am trying to match a keyword/text/line given in a file called expressions.txt from all files matching *main_log. When a match is found I want to print the comment for each line that matches.
Is there any better way to get this printed?
expression.txt
Hello World ! # I want to print this comments#
Bye* #I want this to print when Bye Is match with main_log#
:::
:::
Below Is the code I used :
{
open( my $kw, '<', 'expressions.txt' ) or die $!;
my #keywords = <$kw>;
chomp( #keywords ); # remove newlines at the end of keywords
# get list of files in current directory
my #files = grep { -f } ( <*main_log>, <*Project>, <*properties> );
# loop over each file to search keywords in
foreach my $file ( #files ) {
open( my $fh, '<', $file ) or die $!;
my #content = <$fh>;
close( $fh );
my $l = 0;
foreach my $kw ( #keywords ) {
my $search = quotemeta( $kw ); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
$kw =~ m/\((.*)\)/;
my $temp = $1;
print "$temp\n";
foreach ( #content ) { # go through every line for this keyword
$l++;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_ if /$search/;
}
}
}
I tried this code to print the comments mentioned within parentheses (...) but it is not printing in the fashion which I want like below:
If the expression.txt contains
Hello World ! # I want to print this comments#
If Hello World ! string is matched in my file called main_log then it should match only Hello World! from the main_log but print # I want to print this comments# as a comment for user to understand the keyword.
These keywords can be from any length or contains any character.
It worked fine but just a little doubt on printing the required output Into a file though I have used perl -w Test.pl > my_output.txt command on command prompt not sure how can I use Inside the perl script Itself
open( my $kw, '<', 'expressions.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>,<*properties>);
# loop over each file to search keywords in
foreach my $file (#files) {
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
#foreach my $kw (#keywords) {
foreach my $kw (keys %$kwhashref) {
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
#$kw =~ m/\#(.*)\#/;
#my $temp = $1;
#print "$temp\n";
foreach (#content) { # go through every line for this keyword
$l++;
if (/$search/)
{
# only print if comment defined
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
#printf '$output';
}
}
}
}
Your example code has mismatched braces { ... } and won't compile.
If you were to add another closing brace to the end of your code then it would compile, but the line
$kw =~ m/\((.*)\)/;
will never succeed since there are no parentheses anywhere in expressions.txt. If a match has not succeeded then the value of $1 will be retained from the most recently successful regex match operation
You are also trying to search the lines from the files against the whole of the lines retrieved from expressions.txt, when you should be splitting those lines into keywords and their corresponding comments
This seems to be the followup for this answer of another question of you. What I tried to suggest in the last paragraph would start after the first three lines of your code:
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
Now you have the keywords in a hashref containing the actual keywords to search for as keys, and comments as values, if they exists (using your #comment# at the end of line syntax here).
Your keyword loop would now have to use keys %$kwhashref and you now can additionally print the comment in the inner loop, converted like shown in the answer I linked. The additional print:
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}); # only print if comment defined

zcat working in command line but not in perl script

Here is a part of my script:
foreach $i ( #contact_list ) {
print "$i\n";
$e = "zcat $file_list2| grep $i";
print "$e\n";
$f = qx($e);
print "$f";
}
$e prints properly but $f gives a blank line even when $file_list2 has a match for $i.
Can anyone tell me why?
Always is better to use Perl's grep instead of using pipe :
#lines = `zcat $file_list2`; # move output of zcat to array
die('zcat error') if ($?); # will exit script with error if zcat is problem
# chomp(#lines) # this will remove "\n" from each line
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print #ar;
# print join("\n",#ar)."\n"; # in case of using chomp
}
Best solution is not calling zcat, but using zlib library :
http://perldoc.perl.org/IO/Zlib.html
use IO::Zlib;
# ....
# place your defiiniton of $file_list2 and #contact list here.
# ...
$fh = new IO::Zlib; $fh->open($file_list2, "rb")
or die("Cannot open $file_list2");
#lines = <$fh>;
$fh->close;
#chomp(#lines); #remove "\n" symbols from lines
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print (#ar);
# print join("\n",#ar)."\n"; #in case of using chomp
}
Your question leaves us guessing about many things, but a better overall approach would seem to be opening the file just once, and processing each line in Perl itself.
open(F, "zcat $file_list |") or die "$0: could not zcat: $!\n";
LINE:
while (<F>) {
######## FIXME: this could be optimized a great deal still
foreach my $i (#contact_list) {
if (m/$i/) {
print $_;
next LINE;
}
}
}
close (F);
If you want to squeeze out more from the inner loop, compile the regexes from #contact_list into a separate array before the loop, or perhaps combine them into a single regex if all you care about is whether one of them matched. If, on the other hand, you want to print all matches for one pattern only at the end when you know what they are, collect matches into one array per search expression, then loop them and print when you have grepped the whole set of input files.
Your problem is not reproducible without information about what's in $i, but I can guess that it contains some shell metacharacter which causes it to be processed by the shell before the grep runs.

How to execute a command for every line in file?

open my $directory, '<', abc.txt
chomp(my #values = <$directory>);
There is a file named abc.txt with the following contents:
abcde
abc
bckl
drfg
efgt
eghui
webnmferg
With the above lines, I am sending contents of file abc.txt into an array
Intention is to create a loop to run a command on all the lines of file abc.txt
Any suggestions for creating the loop?
open my $directory_fh, '<', abc.txt or die "Error $! opening abc.txt";
while (<$directory_fh>) {
chomp; # Remove final \n if any
print $_; # Do whatevery you want here
}
close $directory_fh;
I prefer to suffix all filehandles with _fh to make them more obvious.
while (<fh>) loops though all lines of the file.
You might need/want to remove a final \r if the file might have Windows/MS-DOS format.
create a loop to run a command on all the lines of file abc.txt
foreach my $line (#lines){
#assugming $cmd contains the command you want to execute
my $output = `$cmd $line`;
print "Executed $cmd on $line, output: $output\n";
}
Edit: As per Sebastian's feedback
my $i = 0;
while ($i <= $#lines){
my $output = `$cmd $lines[$i]`;
print "Executed $cmd on $lines[$i], output: $output\n";
}
OR if you are ok with destroying array then:
while (#lines){
my $line = shift #lines;
my $output = `$cmd $line`;
print "Executed $cmd on $line, output: $output\n";
}
If you wanted safe code that didn't refer to the array twice, you could use splice in a list assignment.
while (my ($line) = splice(#array, 0, 1)) {
my $output = `$cmd $line`;
print "Executed $cmd on $line, output: $output\n";
}

Cleanest Perl parser for Makefile-like continuation lines

A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.
I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)
There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar
Can you write cleaner, perl-ish, code that passes all my tests?
#!/usr/bin/perl -w
use strict;
sub process_file_with_continuations {
my $processref = shift #_;
my $nextline;
my $line = <ARGV>;
$line = '' unless defined $line;
chomp $line;
while (defined($nextline = <ARGV>)) {
chomp $nextline;
next if $nextline =~ /^\s*#/; # skip comments
$nextline =~ s/\s+$//g; # remove trailing whitespace
if (eof()) { # Handle EOF
$nextline =~ s/^\s+/ /;
if ($nextline =~ /^\s+/) { # indented line
&$processref($line . $nextline);
}
else {
&$processref($line);
&$processref($nextline) if $nextline ne '';
}
$line = '';
}
elsif ($nextline eq '') { # blank line
&$processref($line);
$line = '';
}
elsif ($nextline =~ /^\s+/) { # indented line
$nextline =~ s/^\s+/ /;
$line .= $nextline;
}
else { # non-indented line
&$processref($line) unless $line eq '';
$line = $nextline;
}
}
&$processref($line) unless $line eq '';
}
sub process_one_line {
my $line = shift #_;
print "$line\n";
}
process_file_with_continuations \&process_one_line;
How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:
#!/usr/bin/perl
use strict;
use warnings;
$/ = undef; # we want no input record separator.
my $file = <>; # slurp whole file
$file =~ s/^\n//; # Remove newline at start of file
$file =~ s/\s+\n/\n/g; # Remove trailing whitespace.
$file =~ s/\n\s*#[^\n]+//g; # Remove comments.
$file =~ s/\n\s+/ /g; # Merge continuations
# Done
print $file;
If you don't mind loading the entire file in memory, then the code below passes the tests.
It stores the lines in an array, adding each line either to the previous one (continuation) or at the end of the array (other).
#!/usr/bin/perl
use strict;
use warnings;
my #out;
while( <>)
{ chomp;
s{#.*}{}; # suppress comments
next unless( m{\S}); # skip blank lines
if( s{^\s+}{ }) # does the line start with spaces?
{ $out[-1] .= $_; } # yes, continuation, add to last line
else
{ push #out, $_; } # no, add as new line
}
$, = "\n"; # set output field separator
$\ = "\n"; # set output record separator
print #out;