Combining two one line Perl commands into a script - perl

I am trying to combine the following two one line perl codes into a single perl script that carries out both on a line of a file before progressing to the next line. Note that this is not my own original code, it was very thoughtfully provided here: Adding a blank line between unrelated data entries
1
perl -pae 'print $/ if (defined $x && $x ne $F[0]); $x = $F[0];' DF-data2pfa.csv >DF-data2pfb.txt
2
perl -pae 'print $/ if (defined $x && $x ne $F[3]); $x = $F[3];' DF-data2pfb.txt >DF-data2pfc.txt
The script does exactly what I want it to (compares the F[0] field of a line in my dataset to the F[0] of the previous line and adds a blank line between those entries if they are different), except I realized that I need it to look at F[0] and F[3] on a single line and compare both to the previous line. Much to my embarrassment I tried just running one after another, and did not realize that this was adding an extra blank line every time the script encountered the blank line added by the previous script, which is unacceptable to the program I am trying to input this data to.
So I tried using the Deparse tool to convert both into script format and than use an elsif statement to add the second to the first. This became messy. Also I'm not sure how to achieve the pae function of the command line in the script. I'm not sure the e is necessary in the script but it seems like first printing each line and then splitting it into an array ( with pa ) is a rather integral component to this whole code and I'm not sure how to achieve that here.
Here's what I got:
while (defined($_ = <ARGV>)) {
our(#F) = split(' ', $_, 0);
$x = $F[0];
$y = $F[3];
if defined $x and $x ne $F[0];
elsif defined $y and $y ne $F[3];
print $/
}
continue {
die "-p destination: $!\n" unless print $_;
}
I'm also open to not using the deparse module if that's unnecessary here. Thanks for any help/explanations you can provide!

It's getting a bit wordy for a one-liner, but you could do this:
perl -pae 'print $/ if ((defined $x && $x ne $F[0]) && (defined $y && $y ne $F[3])); $x = $F[0]; $y = $F[3]' DF-data2pfa.csv >DF-data2pfb.txt
or as a script
open my $fh, "<", "input_file_name";
open my $out, ">", "output_file_name";
my ($x, $y);
foreach (<$fh>) {
my #F = split(' ', $_);
if ( ( defined($x) && $x ne $F[0] ) && (defined($y) && $y ne $F[3]) ) {
print $OUT $\;
}
$x = $F[0];
$y = $F[3];
print $OUT $_;
}
I'm not sure that I'm reading your requirements correctly - if you need to print an extra line if either $F[0] or $F[3] matches the previous row, then the conditional would be:
( ( defined($x) && $x ne $F[0] ) || (defined($y) && $y ne $F[3]) )

I'm not 100% sure what you are doing and so this script may not be exactly what you want, but it hopefully can get you started. It uses the strict and warnings pragmas which will help you prevent certain errors.
#!/usr/bin/env perl
use strict;
use warnings;
my ($x, $y, #F);
while ( <> ) {
#F = split ' ';
if ( defined $x and $x ne $F[0] ) {
print $/;
} elsif ( defined $y and $y ne $F[3] ) {
print $/;
}
$x = $F[0];
$y = $F[3];
print;
}
This implicitly uses the $_ variable (while implicitly sets it, split implicitly uses it). It also shows how your conditional statements should look; when not used in posfix style, the conditions NEED round braces. I have left in the continue block, but in practice I've never needed to use one, that is probably a remnant of the deparse and probably could go at the end of the while loop (and print can implicitly use $_ too). Finally the <> operator is the magic-open/read operator, it will use the files in ARGV sequentially or use STDIN as needed.
If you need more help just ping.

Related

perl cookbook fixstyle2 perplexed by & 1

From once again perl cookbook, I know what this program does and I understand most of it but below code is escapes me.
It is using basically if else but what is ( $i++ & 1 ) mean??
#!/usr/bin/perl -w
# fixstyle2 - like fixstyle but faster for many many matches
use strict;
my $verbose = (#ARGV && $ARGV[0] eq '-v' && shift);
my %change = ();
while (<DATA>) {
chomp;
my ($in, $out) = split /\s*=>\s*/;
next unless $in && $out;
$change{$in} = $out;
}
if (#ARGV) {
$^I = ".orig";
} else {
warn "$0: Reading from stdin\n" if -t STDIN;
}
while (<>) {
my $i = 0;
s/^(\s+)// && print $1; # emit leading whitespace
for (split /(\s+)/, $_, -1) { # preserve trailing whitespace
print( ($i++ & 1) ? $_ : ($change{$_} || $_));
}
}
__END__
analysed analyzed
$i++ returns the value of $i and increments $i afterwards. & is the "bitwise and" operator, so it takes the before mentioned value of $i and checks its last bit (as 1 in binary is 00..01).
As $i is incremented by 1 in each iteration, in binary its last bit changes from 1 to 0 and vice versa in each step, therefore the expression just determines odd versus even words.

zcat working in command line but not in perl script

Here is a part of my script:
foreach $i ( #contact_list ) {
print "$i\n";
$e = "zcat $file_list2| grep $i";
print "$e\n";
$f = qx($e);
print "$f";
}
$e prints properly but $f gives a blank line even when $file_list2 has a match for $i.
Can anyone tell me why?
Always is better to use Perl's grep instead of using pipe :
#lines = `zcat $file_list2`; # move output of zcat to array
die('zcat error') if ($?); # will exit script with error if zcat is problem
# chomp(#lines) # this will remove "\n" from each line
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print #ar;
# print join("\n",#ar)."\n"; # in case of using chomp
}
Best solution is not calling zcat, but using zlib library :
http://perldoc.perl.org/IO/Zlib.html
use IO::Zlib;
# ....
# place your defiiniton of $file_list2 and #contact list here.
# ...
$fh = new IO::Zlib; $fh->open($file_list2, "rb")
or die("Cannot open $file_list2");
#lines = <$fh>;
$fh->close;
#chomp(#lines); #remove "\n" symbols from lines
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print (#ar);
# print join("\n",#ar)."\n"; #in case of using chomp
}
Your question leaves us guessing about many things, but a better overall approach would seem to be opening the file just once, and processing each line in Perl itself.
open(F, "zcat $file_list |") or die "$0: could not zcat: $!\n";
LINE:
while (<F>) {
######## FIXME: this could be optimized a great deal still
foreach my $i (#contact_list) {
if (m/$i/) {
print $_;
next LINE;
}
}
}
close (F);
If you want to squeeze out more from the inner loop, compile the regexes from #contact_list into a separate array before the loop, or perhaps combine them into a single regex if all you care about is whether one of them matched. If, on the other hand, you want to print all matches for one pattern only at the end when you know what they are, collect matches into one array per search expression, then loop them and print when you have grepped the whole set of input files.
Your problem is not reproducible without information about what's in $i, but I can guess that it contains some shell metacharacter which causes it to be processed by the shell before the grep runs.

perl script miscounting because of empty lines

the below script is basically catching the second column and counting the values. The only minor issue I have is that the file has empty lines at the end (it's how the values are being exported) and because of these empty lines the script is miscounting. Any ideas please? Thanks.
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n";
I think the main issue has been established, you are using $1 when it is not conditionally tied to the regex match, which causes you to add values when you should not. This is an alternative solution:
$sum_column_b += $1 if $line =~ m/\s+(\d+)/;
Typically, you should never use $1 unless you check that the regex you expect it to come from succeeded. Use either something like this:
if ($line =~ /(\d+)/) {
$sum += $1;
}
Or use direct assignment to a variable:
my ($num) = $line =~ /(\d+)/;
$sum += $num;
Note that you need to use list context by adding parentheses around the variable, or the regex will simply return 1 for success. Also note that, like Borodin says, this will give an undefined value when the match fails, and you must add code to check for that.
This can be handy when capturing several values:
my #nums = $line =~ /(\d+)/g;
The main problem is that if the regex does not match, then $1 will hold the value it received in the previous successful match. So every empty line will cause the previous line to be counted again.
An improvement would be:
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
next if $line =~ /^\s*$/; # skip "empty" lines
# ... maybe skip other known invalid lines
if ($line =~ m/\s+(\d+)/) { #regexpr to catch second column values
$sum_column_b += $1;
} else {
warn "problematic line '$line'\n"; # report invalid lines
}
}
print $sum_column_b, "\n";
The else-block is of course optional but can help noticing invalid data.
Try putting this line just after the while line:
next if ( $line =~ /^$/ );
Basically, loop around to the next line if the current line has no content.
#!/usr/bin/perl
use warnings;
use strict;
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while (my $line = <$file>) {
next if (m/^\s*$/); # next line if this is unsignificant
if ($line =~ m/\s+(\d+)/) {
$sum_column_b += $1;
}
}
print "$sum_column_b\n";

What does dot-equals mean in Perl?

What does ".=" mean in Perl (dot-equals)? Example code below (in the while clause):
if( my $file = shift #ARGV ) {
$parser->parse( Source => {SystemId => $file} );
} else {
my $input = "";
while( <STDIN> ) { $input .= $_; }
$parser->parse( Source => {String => $input} );
}
exit;
Thanks for any insight.
The period . is the concatenation operator. The equal sign to the right means that this is an assignment operator, like in C.
For example:
$input .= $_;
Does the same as
$input = $input . $_;
However, there's also some perl magic in this, for example this removes the need to initialize a variable to avoid "uninitialized" warnings. Try the difference:
perl -we 'my $x; $x = $x + 1' # Use of uninitialized value in addition ...
perl -we 'my $x; $x += 1' # no warning
This means that the line in your code:
my $input = "";
Is quite redundant. Albeit some people might find it comforting.
For pretty much any binary operator X, $a X= $b is equivalent to $a = $a X $b. The dot . is a string concatenation operator; thus, $a .= $b means "stick $b at the end of $a".
In your code, you start with an empty $input, then repeatedly read a line and append it to $input until there's no lines left. You should end up with the entire file as the contents of $input, one line at a time.
It should be equivalent to the loopless
local $/;
$input = <STDIN>;
(define line separator as a non-defined character, then read until the "end of line" that never comes).
EDIT: Changed according to TLP's comment.
You have found the string concatenation operator.
Let's try it :
my $string = "foo";
$string .= "bar";
print $string;
foobar
This performs concatenation to the $input var. Whatever is coming in via STDIN is being assigned to $input.

While and foreach mixed loop issue

!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $argnum (0 .. $#ARGV) {
if ($line =~ /$ARGV[$argnum]/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
when I passed one arg, it works well.
Such as I run test.pl A or test.pl B or **test.pl Hello"
when I passed two args, it works some time only.
Successful: When I run test.pl A B or test.pl A Hello or **test.pl B Hello"
Failed: when I run test.pl Hello World*
Produced and output duplicate lines:
D:\learning\perl>t.pl Hello World
thanks, you gave me 2 command-line arguments.
Hello World :-)
Hello World :-)
Hello World !
Hello World !
D:\learning\perl>
How to fix it? Thank you for reading and replies.
[update]
I don't want to print duplicate lines.
I don't see the problem, your script processes the __DATA__ and tests all input words against it: since "Hello" and "World" match twice each, it prints 4 rows.
If you don't want it to write multiple lines, just add last; after the print statement.
The reason you're getting the duplicate output is because the regex $line =~ /Hello/ matches both "Hello World" lines and $line =~ /World/ also matches both "Hello World" lines. To prevent that, you'll need to add something to remember which lines from the __DATA__ section have already been printed so that you can skip printing them if they match another argument.
Also, some very minor stylistic cleanup:
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = #ARGV;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $arg (#ARGV) {
if ($line =~ /$arg/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
Using an array in scalar context returns its size, so $size = #arr is preferred over $size = $#arr + 1
If you're not going to use a counter for anything other than indexing through an array (for $i (0..$#arr) { $elem = $arr[$i]; ... }), then it's simpler and more straightforward to just loop over the array instead (for $elem (#arr) { ... }).
Your foreach loop could also be replaced with a grep statement, but I'll leave that as an exercise for the reader.
Assuming you want to print each line from DATA only once if one or more patterns match, you can use grep. Note that use of \Q to quote regex metacharacters in the command line arguments and the use of the #patterns array to precompile the patterns.
Read if grep { $line =~ $_ } #patterns out loud: If $line matches one or more patterns ;-)
#!/usr/bin/perl
use strict; use warnings;
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
my #patterns = map { qr/\Q$_/ } #ARGV;
while ( my $line = <DATA> ) {
print $line if grep { $line =~ $_ } #patterns;
}
__DATA__
A
B
Hello World :-)
Hello World !
Here are some comments on your script to help you learn:
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
The command line arguments are in #ARGV (please do read the documentation). In scalar context, #ARGV evaluates to the number of elements in that array. Therefore, you can simply use:
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
Further, you can iterate directly over the elements of #ARGV in your foreach loop instead of indexed access.
while (my $line = <DATA> ) {
foreach my $arg ( #ARGV ) {
if ( $line =~ /$arg/ ) {
print $line;
}
}
}
Now, what happens to your program if I pass ( to it on the command line? Or, even World? What should happen?