why does perl while (<>) fail to count or print the first line - perl

I want to count the lines in a file and print a string which depends on the line number. But my while loop misses the first line. I believe the while (<>) construct is necessary to increment the $n variable; anyway, is not this construct pretty standard in perl?
How do I get the while loop to print the first line? Or should I not be using while?
> printf '%s\n%s\n' dog cat
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; print; '
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; while (<>) { print; } '
cat
>
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; } '
1:cat

The man perlrun shows:
-n causes Perl to assume the following loop around your program, which makes it iterate over filename
arguments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See "-p" to have lines printed. If a file named by an
argument cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Also note that "<>" passes command line arguments to "open" in perlfunc, which doesn't necessarily
interpret them as file names. See perlop for possible security implications.
...
...
"BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as in awk.
So, in fact you running this script
LINE:
while (<>) {
# your progrem start
use strict;
use warnings;
my $n=0;
while (<>) {
$n++;
print "$n:";
print;
}
# end
}
Solution, just remove the -n.
printf '%s\n%s\n' dog cat | perl -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; }'
Will print:
1:dog
2:cat
or
printf '%s\n%s\n' dog cat | perl -ne 'print ++$n, ":$_"'
with the same result
or
printf '%s\n%s\n' dog cat | perl -pe '++$n;s/^/$n:/'
but the ikegami's solution
printf "one\ntwo\n" | perl -ne 'print "$.:$_"'
is the BEST

There's a way to figure out what your one-liner is actually doing. The B::Deparse module has a way to show you how perl interpreted your source code. It's actually from the O (capital letter O, not zero) namespace that you can load with -M (ikegami explains this on Perlmonks):
$ perl -MO=Deparse -ne 'while(<>){print}' foo bar
LINE: while (defined($_ = readline ARGV)) {
while (defined($_ = readline ARGV)) {
print $_;
}
-e syntax OK
Heh, googling for the module link shows I wrote about this for The Effective Perler. Same example. I guess I'm not that original.
If you can't change the command line, perhaps because it's in the middle of a big script or something, you can set options in PERL5OPT. Then those options last for just the session. I hate changing the original scripts because it seems that no matter how careful I am, I mess up something (how many times has my brain told me "hey dummy, you know what a git branch is, so you should have used that first"):
$ export PERL5OPT='-MO=Deparse'

Related

Why does the following oneliner skip the first line?

Here is the example:
$cat test.tsv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
$perl -wne 'while(<STDIN>){ print $_;}' <test.tsv
GGGGCCCCTTTTAAAA bar
This should work like cat and not like tail -n +2. What is happening here? And what the correct way?
The use of the -n option creates this (taking from man perlrun):
while (<STDIN>) {
while(<STDIN>){ print $_;} #< your code
}
This shows two while(<STDIN>) instances. They both take all available inputs from STDIN, breaking at newlines.
When you run with a test.tsv which is at least two lines long, the first (outer) use of while(<STDIN>) takes the first line, and the second (inner) one takes the second line - so your print statement is first passed the second line.
If you had more than two lines in test.tsv then the inner loop would print out all lines from the second line onwards.
The correct way to make this work is simply to rely on the -n option you pass to perl:
perl -wne 'print $_;' < test.tsv
Because the -n switch implicitly puts your code inside a loop which goes through the file line by line. Remove the 'n' from the list of switches, or (even better) remove your loop from the code, leave only the print command there.
nbokor#nbokor:~/tmp$ perl -wne 'print $_;' <test.csv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
Remove -n command line option. It duplicates while(<STDIN>){ ... }.
$perl -MO=Deparse -wne 'while(<STDIN>){ print $_;}'
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
while (defined($_ = <STDIN>)) {
print $_;
}
}
-e syntax OK

perl eval without block

I am learning perl eval. I understand how to use eval BLOCK, but I have came across the code below. What is the code below doing?
while(<>) {
eval;
warn $# if $#;
}
while(<>) {
This reads input, and places it in the variable $_. The input used by <> is first #ARGV (if you called your script with arguments), then STDIN (standard input).
Information on the diamond operator here.
eval;
This evaluates the line that was read, since not specifying what to evaluate looks at $_.
warn $# if $#;
This line will display the warnings that appear in $#, if there are any.
Perl's eval() builtin can take either a BLOCK or an EXPR. If it is given an EXPR that EXPR will be evaluated as a string which contains Perl code to be executed.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
eval { say "Hello, Block World!"; };
eval 'say "Hello, String World!";';
This code executes both say()s as you would expect.
$ ./evals.pl
Hello, Block World!
Hello, String World!
In general the string version of eval() is considered dangerous, especially if you allow interpolation into that string based on variables that are coming from outside your control. For exmaple:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my $name = $ARGV[0] // 'World';
eval "say 'Hello, $name';";
This code is safe if called as so:
$ ./evals.pl Kaoru
Hello, Alex
$ ./evals.pl
Hello, World
But would be very dangerous if the user called it as:
$ ./evals.pl "Kaoru'; system 'rm -rf /"
On the other hand, in string eval()'s favour, it can be very useful as the opposite of Data::Dumper::Dumper() for turning dumped Perl code back into Perl-internal data structures. For example:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $hashref = { a => 1, b => 2, c => 3 };
print Dumper $hashref;
my $VAR1;
my $hashref_copy = eval Dumper $hashref;
say $hashref_copy->{b};
Which, as you would expect, outputs:
$ ./evals.pl
$VAR1 = {
'c' => 3,
'b' => 2,
'a' => 1
};
2
See perldoc -f eval or http://perldoc.perl.org/functions/eval.html for more details.
As of Perl 5.16.3, there is also an evalbytes() which treats the string as a byte string rather than a character string. See perldoc -f perlunicode or http://perldoc.perl.org/perlunicode.html for more details on the difference between character strings and byte strings.
The code which you asked about explicitly:
while(<>) {
eval;
warn $# if $#;
}
Is reading in each line of either STDIN or the files specified in #ARGV, and evaluating each line of input as a line of Perl code. If that Perl code fails to compile, or throws an exception via die(), the error is warned to STDERR. perldoc -f eval has the full details of how and why eval() might set $#.
As an example of the code being called:
$ echo 'print "foo\\n";' | ./evals.pl
foo
$ echo 'print 1 + 1, "\\n";' | ./evals.pl
2
$ echo 'dfsdfsdaf' | ./evals.pl
Bareword "dfsdfsdaf" not allowed while "strict subs" in use at (eval 1) line 1, <> line 1.
$ echo 'die "dead";' | ./evals.pl
dead at (eval 1) line 1, <> line 1.

perl command line equivalent of php -E

php -R '$count++' -E 'print "$count\n";' < somefile
will print the number of lines in 'somefile' (not that I would actually do this).
I'm looking to emulate the -E switch in a perl command.
perl -ne '$count++' -???? 'print "$count\n"' somefile
Is it possible?
TIMTOWTDI
You can use the Eskimo Kiss operator:
perl -nwE '}{ say $.' somefile
This operator is less magical than one thinks, as seen if we deparse the one-liner:
$ perl -MO=Deparse -nwE '}{say $.' somefile
BEGIN { $^W = 1; }
BEGIN {
$^H{'feature_unicode'} = q(1);
$^H{'feature_say'} = q(1);
$^H{'feature_state'} = q(1);
$^H{'feature_switch'} = q(1);
}
LINE: while (defined($_ = <ARGV>)) {
();
}
{
say $.;
}
-e syntax OK
It simply tacks on an extra set of curly braces, making the following code wind up outside the implicit while loop.
Or you can check for end of file.
perl -nwE 'eof and say $.' somefile
With multiple files, you get a cumulative sum printed for each of them.
perl -nwE 'eof and say $.' somefile somefile somefile
10
20
30
You can close the file handle to get a non-cumulative count:
perl -nwE 'if (eof) { say $.; close ARGV }' somefile somefile somefile
10
10
10
You can use an END { ... } block to add code that should be executed after the loop:
perl -ne '$count++; END { print "$count\n"; }' somefile
You can also easily put it in its own -e argument, if you want it more separated:
perl -ne '$count++;' -e 'END { print "$count\n"; }' somefile
See also:
perlmod - BEGIN, UNITCHECK, CHECK, INIT and END at perldoc.perl.org.
This should be what you're looking for:
perl -nle 'END { print $. }' notes.txt

variable for field separator in perl

In awk I can write: awk -F: 'BEGIN {OFS = FS} ...'
In Perl, what's the equivalent of FS? I'd like to write
perl -F: -lane 'BEGIN {$, = [what?]} ...'
update with an example:
echo a:b:c:d | awk -F: 'BEGIN {OFS = FS} {$2 = 42; print}'
echo a:b:c:d | perl -F: -ane 'BEGIN {$, = ":"} $F[1] = 42; print #F'
Both output a:42:c:d
I would prefer not to hard-code the : in the Perl BEGIN block, but refer to wherever the -F option saves its argument.
To sum up, what I'm looking for does not exist:
there's no variable that holds the argument for -F, and more importantly
Perl's "FS" is fundamentally a different data type (regular expression) than the "OFS" (string) -- it does not make sense to join a list of strings using a regex.
Note that the same holds true in awk: FS is a string but acts as regex:
echo a:b,c:d | awk -F'[:,]' 'BEGIN {OFS=FS} {$2=42; print}'
outputs "a[:,]42[:,]c[:,]d"
Thanks for the insight and workarounds though.
You can use perl's -s (similar to awk's -v) to pass a "FS" variable, but the split becomes manual:
echo a:b:c:d | perl -sne '
BEGIN {$, = $FS}
#F = split $FS;
$F[1] = 42;
print #F;
' -- -FS=":"
If you know the exact length of input, you could do this:
echo a:b:c:d | perl -F'(:)' -ane '$, = $F[1]; #F = #F[0,2,4,6]; $F[1] = 42; print #F'
If the input is of variable lengths, you'll need something more sophisticated than #f[0,2,4,6].
EDIT: -F seems to simply provide input to an automatic split() call, which takes a complete RE as an expression. You may be able to find something more suitable by reading the perldoc entries for split, perlre, and perlvar.
You can sort of cheat it, because perl is actually using the split function with your -F argument, and you can tell split to preserve what it splits on by including capturing parens in the regex:
$ echo a:b:c:d | perl -F'(:)' -ane 'print join("/", #F);'
a/:/b/:/c/:/d
You can see what perl's doing with some of these "magic" command-line arguments by using -MO=Deparse, like this:
$ perl -MO=Deparse -F'(:)' -ane 'print join("/", #F);'
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/(:)/, $_, 0);
print join('/', #F);
}
-e syntax OK
You'd have to change your #F subscripts to double what they'd normally be ($F[2] = 42).
Darnit...
The best I can do is:
echo a:b:c:d | perl -ne '$v=":";#F = split("$v"); $F[1] = 42; print join("$v", #F) . "\n";'
You don't need the -F: this way, and you're only stating the colon once. I was hoping there was someway of setting variables on the command line like you can with Awk's -v switch.
For one liners, Perl is usually not as clean as Awk, but I remember using Awk before I knew of Perl and writing 1000+ line Awk scripts.
Trying things like this made people think Awk was either named after the sound someone made when they tried to decipher such a script, or stood for AWKward.
There is no input record separator in Perl. You're basically emulating awk by using the -a and -F flags. If you really don't want to hard code the value, then why not just use an environmental variable?
$ export SPLIT=":"
$ perl -F$SPLIT -lane 'BEGIN { $, = $ENV{SPLIT}; } ...'

How to put 'perl -pne' functionality in a perl script

So at the command line I can conveniently do something like this:
perl -pne 's/from/to/' in > out
And if I need to repeat this and/or I have several other perl -pne transformations, I can put them in, say, a .bat file in Windows. That's a rather roundabout way of doing it, of course. I should just write one perl script that has all those regex transformations.
So how do you write it? If I have a shell script containing these lines:
perl -pne 's/from1/to1/' in > temp
perl -pne 's/from2/to2/' -i temp
perl -pne 's/from3/to3/' -i temp
perl -pne 's/from4/to4/' -i temp
perl -pne 's/from5/to5/' temp > out
How can I just put these all into one perl script?
-e accepts arbitrary complex program. So just join your substitution operations.
perl -pe 's/from1/to1/; s/from2/to2/; s/from3/to3/; s/from4/to4/; s/from5/to5/' in > out
If you really want a Perl program that handles input and looping explicitely, deparse the one-liner to see the generated code and work from here.
> perl -MO=Deparse -pe 's/from1/to1/; s/from2/to2/; s/from3/to3/; s/from4/to4/; s/from5/to5/'
LINE: while (defined($_ = <ARGV>)) {
s/from1/to1/;
s/from2/to2/;
s/from3/to3/;
s/from4/to4/;
s/from5/to5/;
}
continue {
print $_;
}
-e syntax OK
Related answer to the question you didn't quite ask: the perl special variable $^I, used together with #ARGV, gives the in-place editing behavior of -i. As with the -p option, Deparse will show the generated code:
perl -MO=Deparse -pi.bak -le 's/foo/bar/'
BEGIN { $^I = ".bak"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
s/foo/bar/;
}
continue {
print $_;
}