Perl eating line one on -n commandline option flag - perl

I've started playing around with perl and I'm trying to figure out what is wrong with telling perl to use a loop if I also provide a loop?
It looks like perl is getting confused with the same open file descriptors but what I don't get is why does it eat the first line?
perl -ne 'while (<>) { print $_; }'
Of course in this simple example, I can simply perl -ne '{print $_}' to arrive at the same functional logic.
But what I want to know is what is going wrong with the double loop that the first line disappears if yet another while (<>) { } gets wrapped?
$ perl -ne '{print $_}' hello
hello
hello
world
world
^C
$ perl -ne 'while (<>) { print $_; }'
hello
world
world
^C
Update: According to the answers what seems to be happening is that Perl is waiting on the first loop for STDIN input. Upon receiving input on STDIN, the input is assigned to the internal buffer $_ and the logic proceeds to the second loop where again it waits for new STDIN input. Upon receiving new STDIN input, it clobbers the STDIN buffer $_ with the new STDIN input and begins printing.

You can itself check the code generated by one-liner using O=Deparse.
First:
$ perl -MO=Deparse -ne 'print $_;' file
LINE: while (defined($_ = <ARGV>)) {
print $_;
}
-e syntax OK
Second:
$ perl -MO=Deparse -ne 'while (<>) { print $_; }' file
LINE: while (defined($_ = <ARGV>)) {
while (defined($_ = <ARGV>)) {
print $_;
}
}
-e syntax OK
Now, It is easy to know what is wrong with second case. Outer while eats the first line of file and it is lost.

The -n flag wraps your code inside a while (<>) { ... } construct.
So in your second example, the code that is actually executed is
while (<>) # reads a line from STDIN, places it in $_
{
# you don't do anything with the contents of $_ here
while (<>) # reads a line from STDIN, places it in $_, overwriting the previous value
{
print $_; # prints the contents of $_
}
}
Which means the line that was read by the first <> is just lost.

Related

What does exactly perl -pi -e do?

I would like to know what is the equivalent code that Perl runs when executed with the options perl -pi -e?
On some SO question I can read this:
while (<>) {
... # your script goes here
} continue {
print;
}
But this example does not show the part where the file is saved.
How does Perl determine the EOL? Does it touch the file when no changes occured? For example if I have a old MAC file (\r only). How does it deal with s/^foo/bar/gm?
I tried to use the Perl debugger but it doesn't really help. So I am just trying to guess:
#!/usr/bin/env perl
my $pattern = shift;
map &process, #ARGV;
# perl -pi -e PATTERN <files>...
sub process {
next unless -f;
open my $fh, '<', $_;
my $extract;
read $fh, $extract, 1024;
seek &fh, 0, 0;
if ($extract =~ /\r\n/) {
$/ = "\r\n";
} elsif ($extract =~ /\r[^\n]/) {
$/ = "\r";
} else {
$/ = "\n";
}
my $out = '';
while(<&fh>) {
my $__ = $_;
eval $pattern;
my $changes = 1 if $_ ne $__;
$out .= $_;
}
if($changes)
{
open my $fh, '>', $_;
print $fh $out;
}
close &fh;
}
You can inspect the code actually used by Perl with the core module B::Deparse. This compiler backend module is activated with the option -MO=Deparse.
$ perl -MO=Deparse -p -i -e 's/X/U/' ./*.txt
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
s/X/U/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
Thus perl is looping over the lines in the given files, executes the code with $_ set to the line and prints the resulting $_.
The magic variabe $^I is set to an empty string. This turns on in place editing. In place editing is explained in perldoc perlrun. There is no check whether the file is unchanged. Thus the modified time of the edited file is always updated. Apparently the modified time of the backup file is the same as the modified time of the original file.
Using the -0 flag you can set the input record separator for using "\r" for your Mac files.
$ perl -e "print qq{aa\raa\raa}" > t.txt
$perl -015 -p -i.ori -e 's/a/b/' t.txt
$cat t.txt
ba
$ perl -MO=Deparse -015 -p -i.ori -e 's/a/b/'.txt
BEGIN { $^I = ".ori"; }
BEGIN { $/ = "\r"; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/a/b/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
From the perlrun documentation:
-p assumes an input loop around your script. Lines are printed.
-i files processed by the < > construct are to be edited in place.
-e may be used to enter a single line of script. Multiple -e commands may be given to build up a multiline script.

Why is '$_' the same as $ARGV in a Perl one-liner?

I ran into this problem while trying to print single quotes in a Perl one-liner. I eventually figured out you have to escape them with '\''. Here's some code to illustrate my question.
Let's start with printing a text file.
perl -ne 'chomp; print "$_\n"' shortlist.txt
red
orange
yellow
green
blue
Now let's print the name of the file instead for each line.
perl -ne 'chomp; print "$ARGV\n"' shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
Then we can add single quotes around each line.
perl -ne 'chomp; print "'$_'\n"' shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
shortlist.txt
Wait that didn't work. Let's try again.
perl -ne 'chomp; print "'\''$_'\''\n"' shortlist.txt
'red'
'orange'
'yellow'
'green'
'blue'
So I got it working now. But I'm still confused on why '$_' evaluates to the program name. Maybe this is something easy but can someone explain or link to some documentation?
edit: I'm running Perl 5.8.8 on Red Hat 5
To your shell, 'chomp; print "'$_'\n"' results in a string that's the concatenation of
chomp; print " (the first sequence inside single quotes),
the value of its variable $_, and
\n" (the second sequence inside single quotes).
In bash, $_ "... expands to the last argument to the previous command, after expansion. ...". Since this happens to be shortlist.txt, the following is passed to perl:
chomp; print "shortlist.txt\n"
For example,
$ echo foo
foo
$ echo 'chomp; print "'$_'\n"'
chomp; print "foo\n"
Note that the above mechanism shouldn't be used to pass values to a Perl one-liner. You shouldn't be generating Perl code from the shell. See How can I process options using Perl in -n or -p mode? for how to provide arguments to a one-liner.
You use single quotes in one-liners to protect your Perl code from being evaluated by the shell. In this command:
perl -ne 'chomp; print "'$_'\n"' shortlist.txt
you close the single quotes before $_, so the shell expands $_ to the last argument to the previous command. In your case, this happened to be the name of your input file, but the output would be different if you ran a different command first:
$ echo foo
$ perl -ne 'chomp; print "'$_'\n"' shortlist.txt
foo
foo
foo
foo
foo
I try to avoid quotes in one liners for just this reason. I use generalized quoting when I can:
% perl -ne 'chomp; print qq($_\n)'
Although I can avoid even that with the -l switch to get the newline for free:
% perl -nle 'chomp; print $_'
If I don't understand a one-liner, I use -MO=Deparse to see what Perl thinks it is. The first two are what you expect:
% perl -MO=Deparse -ne 'chomp; print "$_\n"' shortlist.txt
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print "$_\n";
}
-e syntax OK
% perl -MO=Deparse -ne 'chomp; print "$ARGV\n"' shortlist.txt
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print "$ARGV\n";
}
-e syntax OK
You see something funny in the one where you saw the problem. The variable has disappeared before perl ever saw it and there's a constant string in its place:
% perl -MO=Deparse -ne 'chomp; print "'$_'\n"' shortlist.txt
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print "shortlist.txt\n";
}
-e syntax OK
Your fix is curious too because Deparse puts the variable name in braces to separate it from the old package specifier ':
% perl -MO=Deparse -ne 'chomp; print "'\''$_'\''\n"' shortlist.txt
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print "'${_}'\n";
}
-e syntax OK

How to add blank line after every grep result using Perl?

How to add a blank line after every grep result?
For example, grep -o "xyz" may give something like -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I want the output to be like this -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I would like to do something like
grep "xyz" | perl (code to add a new line after every grep result)
This is the direct answer to your question:
grep 'xyz' | perl -pe 's/$/\n/'
But this is better:
perl -ne 'print "$_\n" if /xyz/'
EDIT
Ok, after your edit, you want (almost) this:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++'
If you don’t like the blank line at the beginning, make it:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++ && $. > 1'
NOTE: This won’t work right on filenames with colons in them. :)½
If you want to use perl, you could do something like
grep "xyz" | perl -p -e 's/(.*)/\1\n/g'
If you want to use sed (where I seem to have gotten better results), you could do something like
grep "xyz" | sed 's/.*/\0\n/g'
This prints a newline after every single line of grep output:
grep "xyz" | perl -pe 'print "\n"'
This prints a newline in between results from different files. (Answering the question as I read it.)
grep 'xyx' * | perl -pe '/(.*?):/; if ($f ne $1) {print "\n"; $f=$1}'
Use a state machine to determine when to print a blank line:
#!/usr/bin/env perl
use strict;
use warnings;
# state variable to determine when to print a blank line
my $prev_file = '';
# change DATA to the appropriate input file handle
while( my $line = <DATA> ){
# did the state change?
if( my ( $file ) = $line =~ m{ \A ([^:]*) \: .*? xyz }msx ){
# blank lines between states
print "\n" if $file ne $prev_file && length $prev_file;
# set the new state
$prev_file = $file;
}
# print every line
print $line;
}
__DATA__
file1:xyz
file2:xyz
file2:xyz2
file3:xyz

How to mimic -l inside script

Is there a simple way to mimic the effect of the -l command-line switch within perl scripts? (Of course, I can always chomp each line and then append "\n" to each line I print, but the point is to avoid having to do this.)
No. You can get the automatic appending of "\n" by using $\, but you have to add the chomp yourself.
Here's how -l works.
$ perl -MO=Deparse -ne 'print $_'
LINE: while (defined($_ = <ARGV>)) {
print $_;
}
$ perl -MO=Deparse -lne 'print $_'
BEGIN { $/ = "\n"; $\ = "\n"; } # -l added this line
LINE: while (defined($_ = <ARGV>)) {
chomp $_; # -l added this line
print $_;
}
(The comments are mine.) Notice that -l added a literal chomp $_ at the beginning of the loop generated by -n (and it only does that if you use -n or -p). There's no variable you can set to mimic that behaviour.
It's a little-known fact that -l, -n, and -p work by wrapping boilerplate text around the code you supply before it's compiled.
Yes, try using this at the beginning of your script after the shebang and strictures:
$/ = $\ = "\n"; # setting the output/input record separator like OFS in awk
and use in the loop :
chomp;
print;
Or like this :
use strict; use warnings;
use English qw/-no_match_vars/;
$OUTPUT_RECORD_SEPARATOR = "\n";
while (<>) {
chomp;
print;
}
I do not recommend to use
#!/usr/bin/perl -l
for a better clarity =)
See perldoc perlvar
You can add it to your shebang line:
#!/usr/bin/perl -l

Is __LINE__ constant-folded in this Perl one-liner?

In exploring an alternative answer to sarathi's current file line number question, I wrote this one-liner with the expectation that it would print the first line of all files provided:
$ perl -ne 'print "$ARGV : $_" if __LINE__ == 1;' *txt
This did not work as expected; all lines were printed.
Running the one-liner through -MO=Deparse shows that the conditional is not present. I assume this is because it has been constant-folded at compile time:
$ perl -MO=Deparse -ne 'print "$ARGV : $_" if __LINE__ == 1;' *txt
LINE: while (defined($_ = <ARGV>)) {
print "$ARGV : $_";
}
-e syntax OK
But why?
Run under Perl 5.8.8.
__LINE__ corresponds to the line number in the Perl source, not in the input file.
__LINE__ is the source line number i.e., the program line number.
$. will give you the input file line number.
if you want to print all the first lines of all the files then you can try this:
perl -lne '$.=0 if eof;print $_ if ($.==1)' *.txt