How to pass both variables and files to a perl -p -e command - perl

I have a command line written in perl that executes in Solaris (maybe this is irrelevant as it is UNIX-like) which inserts a "wait" string every 6 lines
perl -pe 'print "wait\n" if ($. % 6 == 0);' file
However, I want to replace that 6 by a parameter (ARGV[0]), resulting in something like this:
perl -pe 'print "wait\n" if ($. % ARGV[0] == 0);' file 6
It goes well, giving me the right output, until it finishes reading the file and treats "6" as the next file (even when it understood it as ARGV[0] before).
Is there any way to use the -p option and specify which parameters are files and which ones are not?
Edited: I thought there was a problem with using the -f option but as #ThisSuitIsBlackNot pointed out, I was using it wrongly.

-p, as a superset of -n, wraps the code with a while (<>) { } loop, which reads from the files named on the command line. You need to extract the argument before entering the loop.
perl -e'$n = shift; while (<>) { print "wait\n" if $. % $n == 0; print }' 6 file
or
perl -pe'BEGIN { $n = shift } print "wait\n" if $. % $n == 0' 6 file
Alternatively, you could also use an env var.
N=6 perl -pe'print "wait\n" if $. % $ENV{N} == 0' file

Related

Get value of autosplit delimiter?

If I run a script with perl -Fsomething, is that something value saved anywhere in the Perl environment where the script can find it? I'd like to write a script that by default reuses the input delimiter (if it's a string and not a regular expression) as the output delimiter.
Looking at the source, I don't think the delimiter is saved anywhere. When you run
perl -F, -an
the lexer actually generates the code
LINE: while (<>) {our #F=split(q\0,\0);
and parses it. At this point, any information about the delimiter is lost.
Your best option is to split by hand:
perl -ne'BEGIN { $F="," } #F=split(/$F/); print join($F, #F)' foo.csv
or to pass the delimiter as an argument to your script:
F=,; perl -F$F -sane'print join($F, #F)' -- -F=$F foo.csv
or to pass the delimiter as an environment variable:
export F=,; perl -F$F -ane'print join($ENV{F}, #F)' foo.csv
As #ThisSuitIsBlackNot says it looks like the delimiter is not saved anywhere.
This is how the perl.c stores the -F parameter
case 'F':
PL_minus_a = TRUE;
PL_minus_F = TRUE;
PL_minus_n = TRUE;
PL_splitstr = ++s;
while (*s && !isSPACE(*s)) ++s;
PL_splitstr = savepvn(PL_splitstr, s - PL_splitstr);
return s;
And then the lexer generates the code
LINE: while (<>) {our #F=split(q\0,\0);
However this is of course compiled, and if you run it with B::Deparse you can see what is stored.
$ perl -MO=Deparse -F/e/ -e ''
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/e/, $_, 0);
}
-e syntax OK
Being perl there is always a way, however ugly. (And this is some of the ugliest code I have written in a while):
use B::Deparse;
use Capture::Tiny qw/capture_stdout/;
BEGIN {
my $f_var;
}
unless ($f_var) {
$stdout = capture_stdout {
my $sub = B::Deparse::compile();
&{$sub}; # Have to capture stdout, since I won't bother to setup compile to return the text, instead of printing
};
my (undef, $split_line, undef) = split(/\n/, $stdout, 3);
($f_var) = $split_line =~ /our\(\#F\) = split\((.*)\, \$\_\, 0\);/;
print $f_var,"\n";
}
Output:
$ perl -Fe/\\\(\\[\\\<\\{\"e testy.pl
m#e/\(\[\<\{"e#
You could possible traverse the bytecode instead, since the start probably will be identical every time until you reach the pattern.

Why does the following oneliner skip the first line?

Here is the example:
$cat test.tsv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
$perl -wne 'while(<STDIN>){ print $_;}' <test.tsv
GGGGCCCCTTTTAAAA bar
This should work like cat and not like tail -n +2. What is happening here? And what the correct way?
The use of the -n option creates this (taking from man perlrun):
while (<STDIN>) {
while(<STDIN>){ print $_;} #< your code
}
This shows two while(<STDIN>) instances. They both take all available inputs from STDIN, breaking at newlines.
When you run with a test.tsv which is at least two lines long, the first (outer) use of while(<STDIN>) takes the first line, and the second (inner) one takes the second line - so your print statement is first passed the second line.
If you had more than two lines in test.tsv then the inner loop would print out all lines from the second line onwards.
The correct way to make this work is simply to rely on the -n option you pass to perl:
perl -wne 'print $_;' < test.tsv
Because the -n switch implicitly puts your code inside a loop which goes through the file line by line. Remove the 'n' from the list of switches, or (even better) remove your loop from the code, leave only the print command there.
nbokor#nbokor:~/tmp$ perl -wne 'print $_;' <test.csv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
Remove -n command line option. It duplicates while(<STDIN>){ ... }.
$perl -MO=Deparse -wne 'while(<STDIN>){ print $_;}'
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
while (defined($_ = <STDIN>)) {
print $_;
}
}
-e syntax OK

Summing a column of numbers in a text file using Perl

Ok, so I'm very new to Perl. I have a text file and in the file there are 4 columns of data(date, time, size of files, files). I need to create a small script that can open the file and get the average size of the files. I've read so much online, but I still can't figure out how to do it. This is what I have so far, but I'm not sure if I'm even close to doing this correctly.
#!/usr/bin/perl
open FILE, "files.txt";
##array = File;
while(FILE){
#chomp;
($date, $time, $numbers, $type) = split(/ /,<FILE>);
$total += $numbers;
}
print"the total is $total\n";
This is how the data looks in the file. These are just a few of them. I need to get the numbers in the third column.
12/02/2002 12:16 AM 86016 a2p.exe
10/10/2004 11:33 AM 393 avgfsznew.pl
11/01/2003 04:42 PM 38124 c2ph.bat
Your program is reasonably close to working. With these changes it will do exactly what you want
Always use use strict and use warnings at the start of your program, and declare all of your variables using my. That will help you by finding many simple errors that you may otherwise overlook
Use lexical file handles, the three-parameter form of open, and always check the return status of any open call
Declare the $total variable outside the loop. Declaring it inside the loop means it will be created and destroyed each time around the loop and it won't be able to accumulate a total
Declare a $count variable in the same way. You will need it to calculate the average
Using while (FILE) {...} just tests that FILE is true. You need to read from it instead, so you must use the readline operator like <FILE>
You want the default call to split (without any parameters) which will return all the non-space fields in $_ as a list
You need to add a variable in the assignment to allow for athe AM or PM field in each line
Here is a modification of your code that works fine
use strict;
use warnings;
open my $fh, '<', "files.txt" or die $!;
my $total = 0;
my $count = 0;
while (<$fh>) {
my ($date, $time, $ampm, $numbers, $type) = split;
$total += $numbers;
$count += 1;
}
print "The total is $total\n";
print "The count is $count\n";
print "The average is ", $total / $count, "\n";
output
The total is 124533
The count is 3
The average is 41511
It's tempting to use Perl's awk-like auto-split option. There are 5 columns; three containing date and time information, then the size and then the name.
The first version of the script that I wrote is also the most verbose:
perl -n -a -e '$total += $F[3]; $num++; END { printf "%12.2f\n", $total / ($num + 0.0); }'
The -a (auto-split) option splits a line up on white space into the array #F. Combined with the -n option (which makes Perl run in a loop that reads the file name arguments in turn, or standard input, without printing each line), the code adds $F[3] (the fourth column, counting from 0) to $total, which is automagically initialized to zero on first use. It also counts the lines in $num. The END block is executed when all the input is read; it uses printf() to format the value. The + 0.0 ensures that the arithmetic is done in floating point, not integer arithmetic. This is very similar to the awk script:
awk '{ total += $4 } END { print total / NR }'
First drafts of programs are seldom optimal — or, at least, I'm not that good a programmer. Revisions help.
Perl was designed, in part, as an awk killer. There is still a program a2p distributed with Perl for converting awk scripts to Perl (and there's also s2p for converting sed scripts to Perl). And Perl does have an automatic (built-in) variable that keeps track of the number of lines read. It has several names. The tersest is $.; the mnemonic name $NR is available if you use English; in the script; so is $INPUT_LINE_NUMBER. So, using $num is not necessary. It also turns out that Perl does a floating point division anyway, so the + 0.0 part was unnecessary. This leads to the next versions:
perl -MEnglish -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $NR; }'
or:
perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $.; }'
You can tune the print format to suit your whims and fancies. This is essentially the script I'd use in the long term; it is fairly clear without being long-winded in any way. The script could be split over multiple lines if you desired. It is a simple enough task that the legibility of the one-line is not a problem, IMNSHO. And the beauty of this is that you don't have to futz around with split and arrays and read loops on your own; Perl does most of that for you. (Granted, it does blow up on empty input; that fix is trivial; see below.)
Recommended version
perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $. if $.; }'
The if $. tests whether the number of lines read is zero or not; the printf and division are omitted if $. is zero so the script outputs nothing when given no input.
There is a noble (or ignoble) game called 'Code Golf' that was much played in the early days of Stack Overflow, but Code Golf questions are no longer considered good questions. The object of Code Golf is to write a program that does a particular task in as few characters as possible. You can play Code Golf with this and compress it still further if you're not too worried about the format of the output and you're using at least Perl 5.10:
perl -Mv5.10 -n -a -e '$total += $F[3]; END { say $total / $. if $.; }'
And, clearly, there are a lot of unnecessary spaces and letters in there:
perl -Mv5.10 -nae '$t+=$F[3];END{say$t/$.if$.}'
That is not, however, as clear as the recommended version.
#!/usr/bin/perl
use warnings;
use strict;
open my $file, "<", "files.txt";
my ($total, $cnt);
while(<$file>){
$total += (split(/\s+/, $_))[3];
$cnt++;
}
close $file;
print "number of files: $cnt\n";
print "total size: $total\n";
printf "avg: %.2f\n", $total/$cnt;
Or you can use awk:
awk '{t+=$4} END{print t/NR}' files.txt
Try doing this :
#!/usr/bin/perl -l
use strict; use warnings;
open my $file, '<', "my_file" or die "open error [$!]";
my ($total, $count);
while (<$file>){
chomp;
next if /^$/;
my ($date, $time, $x, $numbers, $type) = split;
$total += $numbers;
$count++;
}
print "the average is " . $total/$count . " and the total is $total";
close $file;
It is as simple as this:
perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' your_file
tested below:
> cat temp
12/02/2002 12:16 AM 86016 a2p.exe
10/10/2004 11:33 AM 393 avgfsznew.pl
11/01/2003 04:42 PM 38124 c2ph.bat
Now the execution:
> perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' temp
The average size is 41511
>
explanation:
-F -a says store the line in an array format.with the default separator as space or tab.
so nopw $F[3] has you size of the file.
sum up all the sizes in the 4th column untill all the lines are processed.
END will be executed after processing all the lines in the file.
so $. at the end will gives the number of lines.
so $a/$. will give the average.
This solution opens the file and loops through each line of the file. It then splits the file into the five variables in the line by splitting on 1 or more spaces.
open the file for reading, "<", and if it fails, raise an error or die "..."
my ($total, $cnt) are our column total and number of files added count
while(<FILE>) { ... } loops through each line of the file using the file handle and stores the line in $_
chomp removes the input record separator in $_. In unix, the default separator is a newline \n
split(/\s+/, $_) Splits the current line represented by$_, with the delimiter \s+. \s represents a space, the + afterward means "1 or more". So, we split the next line on 1 or more spaces.
Next we update $total and $cnt
#!/usr/bin/perl
open FILE, "<", "files.txt" or die "Error opening file: $!";
my ($total, $cnt);
while(<FILE>){
chomp;
my ($date, $time, $am_pm, $numbers, $type) = split(/\s+/, $_);
$total += $numbers;
$cnt++;
}
close FILE;
print"the total is $total and count of $cnt\n";`

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile

Perl: extract rows from 1 to n (Windows)

I want to extract rows 1 to n from my .csv file. Using this
perl -ne 'if ($. == 3) {print;exit}' infile.txt
I can extract only one row. How to put a range of rows into this script?
If you have only a single range and a single, possibly concatenated input stream, you can use:
#!/usr/bin/perl -n
if (my $seqno = 1 .. 3) {
print;
exit if $seqno =~ /E/;
}
But if you want it to apply to each input file, you need to catch the end of each file:
#!/usr/bin/perl -n
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
And if you want to be kind to people who forget args, add a nice warning in a BEGIN or INIT clause:
#!/usr/bin/perl -n
BEGIN { warn "$0: reading from stdin\n" if #ARGV == 0 && -t }
print if my $seqno = 1 .. 3;
close ARGV if eof || $seqno =~ /E/;
Notable points include:
You can use -n or -p on the #! line. You could also put some (but not all) other command line switches there, like ‑l or ‑a.
Numeric literals as
operands to the scalar flip‐flop
operator are each compared against
readline counter, so a scalar 1 ..
3 is really ($. == 1) .. ($. ==
3).
Calling eof with neither an argument nor empty parens means the last file read in the magic ARGV list of files. This contrasts with eof(), which is the end of the entire <ARGV> iteration.
A flip‐flop operator’s final sequence number is returned with a "E0" appended to it.
The -t operator, which calls libc’s isatty(3), default to the STDIN handle — unlike any of the other filetest operators.
A BEGIN{} block happens during compilation, so if you try to decompile this script with ‑MO=Deparse to see what it really does, that check will execute. With an INIT{}, it will not.
Doing just that will reveal that the implicit input loop as a label called LINE that you perhaps might in other circumstances use to your advantage.
HTH
What's wrong with:
head -3 infile.txt
If you really must use Perl then this works:
perl -ne 'if ($. <= 3) {print} else {exit}' infile.txt
You can use the range operator:
perl -ne 'if (1 .. 3) { print } else { last }' infile.txt