Reading stdin in perl requires line feeds around input. How to avoid? - perl

MSG_OUT="<B><I>Skipping<N> all libraries and fonts...<N>"
perl -ne '%ES=("B","[1m","I","[3m","N","[m","O","[9m","R","[7m","U","[4m"); while (<>) { s/(<([BINORSU])>)/\e$ES{$2}/g; print; }'
This perl one-liner swaps a token for an escape sequence.
It works as intended but only if the input is surrounded with line feeds.
i.e.
echo "\x0a${MSG_OUT}\x0a" | perl -ne '.... etc.
How do I avoid this issue when reading from stdin?

-n wraps your code in while (<>) { ... }* (cf perldoc perlrun). Thus, your one-liner is equivalent to:
perl -e '
while(<>) {
%ES=("B","[1m","I","[3m","N","[m","O","[9m","R","[7m","U","[4m");
while (<>) { s/(<([BINORSU])>)/\e$ES{$2}/g; print; }
}
'
[Line breaks added for readability. They can be removed if you so desire.]
See the double while (<>) { ... }? That's your issue: the first while (the one added by -n) reads a line, then the second while (the one you wrote) reads a second line, does your s/// (on the second line), and prints this second line updated. Thus, you need a blank line before the actual line you want to process.
To fix the issue, either remove the inner while(<>), or remove the -n flag. For instance:
perl -e '
%ES=("B","[1m","I","[3m","N","[m","O","[9m","R","[7m","U","[4m");
while (<>) { s/(<([BINORSU])>)/\e$ES{$2}/g; print; }
'
Or,
perl -ne '
BEGIN { %ES=("B","[1m","I","[3m","N","[m","O","[9m","R","[7m","U","[4m") };
s/(<([BINORSU])>)/\e$ES{$2}/g; print;
'
Note that instead of using -n and print, you can use -p, which is the same as -n with an extra print** at the end:
perl -pe '
BEGIN { %ES=("B","[1m","I","[3m","N","[m","O","[9m","R","[7m","U","[4m") };
s/(<([BINORSU])>)/\e$ES{$2}/g;
'
* For completness, note that -n adds the label LINE before the while loop (LINE: while(<>) { ... }), although that doesn't matter in your case.
** The print added by -p is actually in a continue block after the while, although, once again, this doesn't matter in your case.

Related

why does perl while (<>) fail to count or print the first line

I want to count the lines in a file and print a string which depends on the line number. But my while loop misses the first line. I believe the while (<>) construct is necessary to increment the $n variable; anyway, is not this construct pretty standard in perl?
How do I get the while loop to print the first line? Or should I not be using while?
> printf '%s\n%s\n' dog cat
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; print; '
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; while (<>) { print; } '
cat
>
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; } '
1:cat
The man perlrun shows:
-n causes Perl to assume the following loop around your program, which makes it iterate over filename
arguments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See "-p" to have lines printed. If a file named by an
argument cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Also note that "<>" passes command line arguments to "open" in perlfunc, which doesn't necessarily
interpret them as file names. See perlop for possible security implications.
...
...
"BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as in awk.
So, in fact you running this script
LINE:
while (<>) {
# your progrem start
use strict;
use warnings;
my $n=0;
while (<>) {
$n++;
print "$n:";
print;
}
# end
}
Solution, just remove the -n.
printf '%s\n%s\n' dog cat | perl -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; }'
Will print:
1:dog
2:cat
or
printf '%s\n%s\n' dog cat | perl -ne 'print ++$n, ":$_"'
with the same result
or
printf '%s\n%s\n' dog cat | perl -pe '++$n;s/^/$n:/'
but the ikegami's solution
printf "one\ntwo\n" | perl -ne 'print "$.:$_"'
is the BEST
There's a way to figure out what your one-liner is actually doing. The B::Deparse module has a way to show you how perl interpreted your source code. It's actually from the O (capital letter O, not zero) namespace that you can load with -M (ikegami explains this on Perlmonks):
$ perl -MO=Deparse -ne 'while(<>){print}' foo bar
LINE: while (defined($_ = readline ARGV)) {
while (defined($_ = readline ARGV)) {
print $_;
}
-e syntax OK
Heh, googling for the module link shows I wrote about this for The Effective Perler. Same example. I guess I'm not that original.
If you can't change the command line, perhaps because it's in the middle of a big script or something, you can set options in PERL5OPT. Then those options last for just the session. I hate changing the original scripts because it seems that no matter how careful I am, I mess up something (how many times has my brain told me "hey dummy, you know what a git branch is, so you should have used that first"):
$ export PERL5OPT='-MO=Deparse'

Print each line of a file

I have a file test.txt that reads as follows:
one
two
three
Now, I want to print each line of this file as follows:
.one (one)
.two (two)
.three (three)
I try this in Perl:
#ARGV = ("test.txt");
while (<>) {
print (".$_ \($_\)");
}
This doesn't seem to work and this is what I get:
.one
(one
).two
(two
).three
(three
)
Can some help me figure out what's going wrong?
Update :
Thanks to Aureliano Guedes for the suggestion.
This 1-liner seems to work :
perl -pe 's/([^\s]+)/.$1 ($1)/'
$_ will include the newline, e.g. one\n, so print ".$_ \($_\)" becomes something like print ".one\n (one\n).
Use chomp to get rid of them, or use s/\s+\z// to remove all trailing whitespace.
while (<>) {
chomp;
print ".$_ ($_)\n";
}
(But add a \n to print the newline that you do want.)
Besides the correct answer already given, you can do this in a oneliner:
perl -pe 's/(.+)/.$1 ($1)/'
Or if you prefer a while loop:
while (<>) {
s/(.+)/.$1 ($1)/;
print;
}
This simply modifies your current line to your desired output and prints it then.
Another Perl one-liner without using regex.
perl -ple ' $_=".$_ ($_)" '
with the given inputs
$ cat test.txt
one
two
three
$ perl -ple ' $_=".$_ ($_)" ' test.txt
.one (one)
.two (two)
.three (three)
$

Get value of autosplit delimiter?

If I run a script with perl -Fsomething, is that something value saved anywhere in the Perl environment where the script can find it? I'd like to write a script that by default reuses the input delimiter (if it's a string and not a regular expression) as the output delimiter.
Looking at the source, I don't think the delimiter is saved anywhere. When you run
perl -F, -an
the lexer actually generates the code
LINE: while (<>) {our #F=split(q\0,\0);
and parses it. At this point, any information about the delimiter is lost.
Your best option is to split by hand:
perl -ne'BEGIN { $F="," } #F=split(/$F/); print join($F, #F)' foo.csv
or to pass the delimiter as an argument to your script:
F=,; perl -F$F -sane'print join($F, #F)' -- -F=$F foo.csv
or to pass the delimiter as an environment variable:
export F=,; perl -F$F -ane'print join($ENV{F}, #F)' foo.csv
As #ThisSuitIsBlackNot says it looks like the delimiter is not saved anywhere.
This is how the perl.c stores the -F parameter
case 'F':
PL_minus_a = TRUE;
PL_minus_F = TRUE;
PL_minus_n = TRUE;
PL_splitstr = ++s;
while (*s && !isSPACE(*s)) ++s;
PL_splitstr = savepvn(PL_splitstr, s - PL_splitstr);
return s;
And then the lexer generates the code
LINE: while (<>) {our #F=split(q\0,\0);
However this is of course compiled, and if you run it with B::Deparse you can see what is stored.
$ perl -MO=Deparse -F/e/ -e ''
LINE: while (defined($_ = <ARGV>)) {
our(#F) = split(/e/, $_, 0);
}
-e syntax OK
Being perl there is always a way, however ugly. (And this is some of the ugliest code I have written in a while):
use B::Deparse;
use Capture::Tiny qw/capture_stdout/;
BEGIN {
my $f_var;
}
unless ($f_var) {
$stdout = capture_stdout {
my $sub = B::Deparse::compile();
&{$sub}; # Have to capture stdout, since I won't bother to setup compile to return the text, instead of printing
};
my (undef, $split_line, undef) = split(/\n/, $stdout, 3);
($f_var) = $split_line =~ /our\(\#F\) = split\((.*)\, \$\_\, 0\);/;
print $f_var,"\n";
}
Output:
$ perl -Fe/\\\(\\[\\\<\\{\"e testy.pl
m#e/\(\[\<\{"e#
You could possible traverse the bytecode instead, since the start probably will be identical every time until you reach the pattern.

Why does the following oneliner skip the first line?

Here is the example:
$cat test.tsv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
$perl -wne 'while(<STDIN>){ print $_;}' <test.tsv
GGGGCCCCTTTTAAAA bar
This should work like cat and not like tail -n +2. What is happening here? And what the correct way?
The use of the -n option creates this (taking from man perlrun):
while (<STDIN>) {
while(<STDIN>){ print $_;} #< your code
}
This shows two while(<STDIN>) instances. They both take all available inputs from STDIN, breaking at newlines.
When you run with a test.tsv which is at least two lines long, the first (outer) use of while(<STDIN>) takes the first line, and the second (inner) one takes the second line - so your print statement is first passed the second line.
If you had more than two lines in test.tsv then the inner loop would print out all lines from the second line onwards.
The correct way to make this work is simply to rely on the -n option you pass to perl:
perl -wne 'print $_;' < test.tsv
Because the -n switch implicitly puts your code inside a loop which goes through the file line by line. Remove the 'n' from the list of switches, or (even better) remove your loop from the code, leave only the print command there.
nbokor#nbokor:~/tmp$ perl -wne 'print $_;' <test.csv
AAAATTTTCCCCGGGG foo
GGGGCCCCTTTTAAAA bar
Remove -n command line option. It duplicates while(<STDIN>){ ... }.
$perl -MO=Deparse -wne 'while(<STDIN>){ print $_;}'
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
while (defined($_ = <STDIN>)) {
print $_;
}
}
-e syntax OK

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile