How to put 'perl -pne' functionality in a perl script - perl

So at the command line I can conveniently do something like this:
perl -pne 's/from/to/' in > out
And if I need to repeat this and/or I have several other perl -pne transformations, I can put them in, say, a .bat file in Windows. That's a rather roundabout way of doing it, of course. I should just write one perl script that has all those regex transformations.
So how do you write it? If I have a shell script containing these lines:
perl -pne 's/from1/to1/' in > temp
perl -pne 's/from2/to2/' -i temp
perl -pne 's/from3/to3/' -i temp
perl -pne 's/from4/to4/' -i temp
perl -pne 's/from5/to5/' temp > out
How can I just put these all into one perl script?

-e accepts arbitrary complex program. So just join your substitution operations.
perl -pe 's/from1/to1/; s/from2/to2/; s/from3/to3/; s/from4/to4/; s/from5/to5/' in > out
If you really want a Perl program that handles input and looping explicitely, deparse the one-liner to see the generated code and work from here.
> perl -MO=Deparse -pe 's/from1/to1/; s/from2/to2/; s/from3/to3/; s/from4/to4/; s/from5/to5/'
LINE: while (defined($_ = <ARGV>)) {
s/from1/to1/;
s/from2/to2/;
s/from3/to3/;
s/from4/to4/;
s/from5/to5/;
}
continue {
print $_;
}
-e syntax OK

Related answer to the question you didn't quite ask: the perl special variable $^I, used together with #ARGV, gives the in-place editing behavior of -i. As with the -p option, Deparse will show the generated code:
perl -MO=Deparse -pi.bak -le 's/foo/bar/'
BEGIN { $^I = ".bak"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
s/foo/bar/;
}
continue {
print $_;
}

Related

why does perl while (<>) fail to count or print the first line

I want to count the lines in a file and print a string which depends on the line number. But my while loop misses the first line. I believe the while (<>) construct is necessary to increment the $n variable; anyway, is not this construct pretty standard in perl?
How do I get the while loop to print the first line? Or should I not be using while?
> printf '%s\n%s\n' dog cat
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; print; '
dog
cat
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; while (<>) { print; } '
cat
>
> printf '%s\n%s\n' dog cat | perl -n -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; } '
1:cat
The man perlrun shows:
-n causes Perl to assume the following loop around your program, which makes it iterate over filename
arguments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See "-p" to have lines printed. If a file named by an
argument cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Also note that "<>" passes command line arguments to "open" in perlfunc, which doesn't necessarily
interpret them as file names. See perlop for possible security implications.
...
...
"BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as in awk.
So, in fact you running this script
LINE:
while (<>) {
# your progrem start
use strict;
use warnings;
my $n=0;
while (<>) {
$n++;
print "$n:";
print;
}
# end
}
Solution, just remove the -n.
printf '%s\n%s\n' dog cat | perl -e 'use strict; use warnings; my $n=0; while (<>) { $n++; print "$n:"; print; }'
Will print:
1:dog
2:cat
or
printf '%s\n%s\n' dog cat | perl -ne 'print ++$n, ":$_"'
with the same result
or
printf '%s\n%s\n' dog cat | perl -pe '++$n;s/^/$n:/'
but the ikegami's solution
printf "one\ntwo\n" | perl -ne 'print "$.:$_"'
is the BEST
There's a way to figure out what your one-liner is actually doing. The B::Deparse module has a way to show you how perl interpreted your source code. It's actually from the O (capital letter O, not zero) namespace that you can load with -M (ikegami explains this on Perlmonks):
$ perl -MO=Deparse -ne 'while(<>){print}' foo bar
LINE: while (defined($_ = readline ARGV)) {
while (defined($_ = readline ARGV)) {
print $_;
}
-e syntax OK
Heh, googling for the module link shows I wrote about this for The Effective Perler. Same example. I guess I'm not that original.
If you can't change the command line, perhaps because it's in the middle of a big script or something, you can set options in PERL5OPT. Then those options last for just the session. I hate changing the original scripts because it seems that no matter how careful I am, I mess up something (how many times has my brain told me "hey dummy, you know what a git branch is, so you should have used that first"):
$ export PERL5OPT='-MO=Deparse'

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile

Execute Unix command in a Perl script

How I can make the following external command within ticks work with variables instead?
Or something similar?
sed -i.bak -e '10,16d;17d' $docname; (this works)
I.e., sed -i.bak -e '$line_number,$line_end_number;$last_line' $docname;
my $result =
qx/sed -i.bak -e "$line_number,${line_end_number}d;${last_line}d" $docname/;
Where the line split avoid the horizontal scroll-bar on SO; otherwise, it would be on one line.
Or, since it is not clear that there's any output to capture:
system "sed -i.back '$line_number,${line_end_number}d;${last_line}d' $docname";
Or you could split that up into arguments yourself:
system "sed", "-i.back", "$line_number,${line_end_number}d;${last_line}d", "$docname";
This tends to be safer since the shell doesn't get a chance to interfere with the interpretation of the arguments.
#args = ("command", "arg1", "arg2");
system(#args) == 0 or die "system #args failed: $?"
Furthermore on the manual:
perldoc -f system
I think you should read up on using qq for strings.
You probably want something like this:
use strict;
use warnings;
my $line_number = qq|10|;
my $line_end_number = qq|16d|;
my $last_line = qq|17d|;
my $doc_name = qq|somefile.bak|;
my $sed_command = qq|sed -i.bak -e '$line_number,$line_end_number;$last_line' $doc_name;|;
print $sed_command;
qx|$sed_command|;

perl command line equivalent of php -E

php -R '$count++' -E 'print "$count\n";' < somefile
will print the number of lines in 'somefile' (not that I would actually do this).
I'm looking to emulate the -E switch in a perl command.
perl -ne '$count++' -???? 'print "$count\n"' somefile
Is it possible?
TIMTOWTDI
You can use the Eskimo Kiss operator:
perl -nwE '}{ say $.' somefile
This operator is less magical than one thinks, as seen if we deparse the one-liner:
$ perl -MO=Deparse -nwE '}{say $.' somefile
BEGIN { $^W = 1; }
BEGIN {
$^H{'feature_unicode'} = q(1);
$^H{'feature_say'} = q(1);
$^H{'feature_state'} = q(1);
$^H{'feature_switch'} = q(1);
}
LINE: while (defined($_ = <ARGV>)) {
();
}
{
say $.;
}
-e syntax OK
It simply tacks on an extra set of curly braces, making the following code wind up outside the implicit while loop.
Or you can check for end of file.
perl -nwE 'eof and say $.' somefile
With multiple files, you get a cumulative sum printed for each of them.
perl -nwE 'eof and say $.' somefile somefile somefile
10
20
30
You can close the file handle to get a non-cumulative count:
perl -nwE 'if (eof) { say $.; close ARGV }' somefile somefile somefile
10
10
10
You can use an END { ... } block to add code that should be executed after the loop:
perl -ne '$count++; END { print "$count\n"; }' somefile
You can also easily put it in its own -e argument, if you want it more separated:
perl -ne '$count++;' -e 'END { print "$count\n"; }' somefile
See also:
perlmod - BEGIN, UNITCHECK, CHECK, INIT and END at perldoc.perl.org.
This should be what you're looking for:
perl -nle 'END { print $. }' notes.txt

perl query using -pie

This works:
perl -pi -e 's/abc/cba/g' hellofile
But this does not:
perl -pie 's/cba/abc/g' hellofile
In other words -pi -e works but -pie does not. Why?
The -i flag takes an optional argument (which, if present, must be immediately after it, not in a separate command-line argument) that specifies the suffix to append to the name of the input file for the purposes of creating a backup. Writing perl -pie 's/cba/abc/g' hellofile causes the e to be taken as this suffix, and as the e isn't interpreted as the normal -e option, Perl tries to run the script located in s/cba/abc/g, which probably doesn't exist.
Because -i takes an optional extension for backup files, e.g. -i.bak, and therefore additional flags cannot follow directly after -i.
From perldoc perlrun
-i[extension]
specifies that files processed by the <> construct are to be edited
in-place. It does this by renaming the input file, opening the output
file by the original name, and selecting that output file as the
default for print() statements. The extension, if supplied, is used to
modify the name of the old file to make a backup copy, following these
rules:
If no extension is supplied, no backup is made and the current file is
overwritten.
If the extension doesn't contain a * , then it is appended to the end
of the current filename as a suffix. If the extension does contain one
or more * characters, then each * is replaced with the current
filename. In Perl terms, you could think of this as:
perl already tells you why :) Try-It-To-See
$ perl -pie " s/abc/cba/g " NUL
Can't open perl script " s/abc/cba/g ": No such file or directory
If you use B::Deparse you can see how perl compiles your code
$ perl -MO=Deparse -pi -e " s/abc/cba/g " NUL
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
s/abc/cba/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
If you lookup $^I in perlvar you can learn about the -i switch :)
$ perldoc -v "$^I"
$INPLACE_EDIT
$^I The current value of the inplace-edit extension. Use "undef" to
disable inplace editing.
Mnemonic: value of -i switch.
Now if we revisit the first part, add an extra -e, then add Deparse, the -i switch is explained
$ perl -pie -e " s/abc/cba/g " NUL
Can't do inplace edit: NUL is not a regular file.
$ perl -MO=Deparse -pie -e " s/abc/cba/g " NUL
BEGIN { $^I = "e"; }
LINE: while (defined($_ = <ARGV>)) {
s/abc/cba/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
Could it really be that e in -pie is taken as extension? I guess so
$ perl -MO=Deparse -pilogicus -e " s/abc/cba/g " NUL
BEGIN { $^I = "logicus"; }
LINE: while (defined($_ = <ARGV>)) {
s/abc/cba/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
When in doubt, Deparse or Deparse,-p