How to run shell command in Perl, like Raku? - perl

I have excellent code in Raku:
#!/usr/bin/env perl6
CONTROL {
when CX::Warn {
note $_;
die
}
}
use fatal;
role KeyRequired {
method AT-KEY (\key) {
die "Key {key} not found" unless self.EXISTS-KEY(key);
nextsame
}
}
sub execute ($cmd) {
put $cmd;
my $proc = shell $cmd, :err, :out;
if $proc.exitcode != 0 {
put 'exit code = ' ~ $proc.exitcode;
put 'stderr ' ~ $proc.err.slurp;
put 'stdout ' ~ $proc.out.slurp;
die
}
}
execute "ls *.p6"
I say "excellent" because the Raku version runs a command, returns an exit code, and prints stdout/stderr if needed, and all in an easily-read and easily-understood manner.
Reading through the Perl5 manual for IPC::Run https://metacpan.org/pod/IPC::Run I've come across what appears to be the best Perl5 way of doing this, but I find the methods used there to be much less easily readable and understood than the Raku way of doing things.
Reading through the manual for IPC::Run the best that I can find is:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie qw(:all);
use IPC::Run qw(run timeout);
sub execute {
my $cmd = shift;
my #cat = ('cat', __FILE__); # Raku doesn't need to split the string into an array
run \#cat, \undef, \my $out, \my $err, timeout( 10 ) or die "cat: $?";
if ($out ne '') {
say "\$out = $out";
}
if ($err ne '') {
say "\$err = $err";
}
}
execute("cat " . __FILE__);
execute("cat __Fle"); #intentionally wrong to produce an error
How can I re-write the Perl5 so that it is as easily read and used as the Raku code?

You've unfairly loaded the Perl 5 example with a lot of extra fluff, and you haven't handled many things in the Raku code. For instance, you output the results in Raku despite what's in the variables, but test the variables in Perl 5.
Your Perl 5 would look more like this:
use v5.30;
use IPC::Run qw(run timeout);
sub execute {
my #command = #_;
run \#command, \undef, \my $out, \my $err, timeout( 10 )
or die "cat: $?";
say "\$out = $out";
say "\$err = $err";
}
execute("cat ", __FILE__);
ikegami offered this version in his pastebin link:
sub execute {
my ($command) = #_;
if (! run $command, \undef, \my $out, \my $err, timeout( 10 ) ) {
say "exit code = $?";
say "stderr $err";
put "stdout $out";
die "Died";
}
}
There's an interesting thing to note in both of those cases. You are assuming an error if the exit code is not zero (and Raku assumes that, which is why you have to worry about not sinking the result). However, many useful programs don't follow that convention. For instance, git merge base uses exit value 1 to mean "not an ancestor" and all exit values higher than 1 to mean an error. The command-line grep is similar. sendmail had exit code 75 to mean that something didn't work out, but it would try again later.
Raku, having an opinion on that, ignores this sort of thing and does not allow you to tell the Proc which exit values it should accept as successful exits. Perl 5 is not so opinionated. Using or die or ! ... is really saying "exit code is not zero", but that's not really a good enough description. In many cases you get away with it, but at least Perl 5 isn't deciding for you. If you expanded the Raku example to check the literal value and decide if that's successful, it will look messy.
But, notice that Raku's shell documentation notes that it's unsafe and that you should use run instead.
For what it's worth, I don't find Raku's interprocess communication all that trustworthy. In many cases, I think its IPC design was neglected. See, for instance, Does changing Perl 6's $*OUT change standard output for child processes? . I have several other IPC questions spread out in bug reports and in Stackoverflow, and almost none of them received a satisfactory answer. Mostly, I think that's because nobody thought about it that much. Granted, Raku is developed by a small team and its a big project, but when it comes to production programming, that's no factor.
Some more Raku shell weirdness:
Which shell does Perl 6's shell() use?

Related

How to implement assert in Perl?

When trying to implement C's assert() macro in Perl, there is some fundamental problem. Consider this code first:
sub assert($$) {
my ($assertion, $failure_msg) = #_;
die $failure_msg unless $assertion;
}
# ...
assert($boolean, $message);
While this works, it's not like C: In C I'd write assert($foo <= $bar), but with this implementation I'd have to write assert($foo <= $bar, '$foo <= $bar'), i.e. repeat the condition as string.
Now I wonder how to implement this efficiently. The easy variant seems to pass the string to assert() and use eval to evaluate the string, but you can't access the variables when evaluating eval. Even if it would work, it would be quite inefficient as the condition is parsed and evaluated each time.
When passing the expression, I have no idea how to make a string from it, especially as it's evaluated already.
Another variant using assert(sub { $condition }) where it's likely easier to make a string from the code ref, is considered too ugly.
The construct assert(sub { (eval $_[0], $_[0]) }->("condition")); with
sub assert($)
{
die "Assertion failed: $_[1]\n" unless $_[0];
}
would do, but is ugly to call.
The solution I am looking for is to write the condition to check only once, while being able to reproduce the original (non-evaluated) condition and efficiently evaluate the condition.
So what are more elegant solutions? Obviously solutions would be easier if Perl had a macro or comparable syntax mechanism that allows transforming the input before compiling or evaluating.
Use B::Deparse?
#!/usr/bin/perl
use strict;
use warnings;
use B::Deparse;
my $deparser = B::Deparse->new();
sub assert(&) {
my($condfunc) = #_;
my #caller = caller();
unless ($condfunc->()) {
my $src = $deparser->coderef2text($condfunc);
$src =~ s/^\s*use\s.*$//mg;
$src =~ s/^\s+(.+?)/$1/mg;
$src =~ s/(.+?)\s+$/$1/mg;
$src =~ s/[\r\n]+/ /mg;
$src =~ s/^\{\s*(.+?)\s*\}$/$1/g;
$src =~ s/;$//mg;
die "Assertion failed: $src at $caller[1] line $caller[2].\n";
}
}
my $var;
assert { 1 };
#assert { 0 };
assert { defined($var) };
exit 0;
Test output:
$ perl dummy.pl
Assertion failed: defined $var at dummy.pl line 26.
There are a load of assertion modules on CPAN. These are open source, so it's pretty easy to peek at them and see how they're done.
Carp::Assert is a low-magic implementation. It has links to a few more complicated assertion modules in its documentation, one of which is my module PerlX::Assert.
Use caller and extract the line of source code that made the assertion?
sub assert {
my ($condition, $msg) = #_;
return if $condition;
if (!$msg) {
my ($pkg, $file, $line) = caller(0);
open my $fh, "<", $file;
my #lines = <$fh>;
close $fh;
$msg = "$file:$line: " . $lines[$line - 1];
}
die "Assertion failed: $msg";
}
assert(2 + 2 == 5);
Output:
Assertion failed: assert.pl:14: assert(2 + 2 == 5);
If you use Carp::croak instead of die, Perl will also report stack trace information and identify where the failing assertion was called.
One approach to any kind of "assertions" is to use a testing framework. It isn't as clean-cut as C's assert but then it is incomparably more flexible and manageable, while tests can still be freely embedded in code much like assert statements are.
A few very simple examples
use warnings;
use strict;
use feature 'say';
use Test::More 'no_plan';
Test::More->builder->output('/dev/null');
say "A few examples of tests, scattered around code\n";
like('may be', qr/(?:\w+\s+)?be/, 'regex');
cmp_ok('a', 'eq', 'a ', 'string equality');
my ($x, $y) = (1.7, 13);
cmp_ok($x, '==', $y, '$x == $y');
say "\n'eval' expression in a string so we can see the failing code\n";
my $expr = '$x**2 == $y';
ok(eval $expr, 'Quadratic') || diag explain $expr;
# ok(eval $expr, $expr);
with output
A few examples of tests, scattered around code
# Failed test 'string equality'
# at assertion.pl line 19.
# got: 'a'
# expected: 'a '
# Failed test '$x == $y'
# at assertion.pl line 20.
# got: 1.7
# expected: 13
'eval' expression in a string so we can see the failing code
# Failed test 'Quadratic'
# at assertion.pl line 26.
# $x**2 == $y
# Looks like you failed 3 tests of 4.
This is just a scattershot of examples, where the last one answers the question directly.
The module Test::More brings together a number of tools; there are many options in how to use it and how to manipulate output. See Test::Harness, and Test::Builder (used above), and a number of tutorials and SO posts.
I don't know how the above eval counts toward "elegant" but it does move you from singular and individually cared for C-style assert statements toward a more easily manageable system.
Good assertions are meant and planned as systemic tests and code documentation but by their nature lack formal structure (and so may still end up scattered and ad-hoc). When done this way they come with a framework and can be managed and tuned with many tools, and as a suite.

How to test the exit status from IPC::Run3

I'm trying to test the Perl module IPC::Run3 but having difficulty in checking whether a command is failed or successful.
I know that IPC::Run3 issues an exit code if something is wrong with its arguments, but what about if the arguments are ok but the command does not exist? How can I test the following example?
Having a subroutine to call Run3
sub runRun3 {
my $cmd = shift;
my ($stdout, $stderr);
run3($cmd, \undef, \$stdout, \$stderr);
# if( $? == -1 ) {
if (! $stdout and ! $stderr) {
die "Something is wrong";
} else {
print "OK \n";
}
}
when command $cmds[0] below is executed (the ls command of *nix systems) it prints OK as expected, but with command $cmds[1] it just says No such file or directory at ./testrun3.pl line 18.
With a test to the exit code I want it to print Something is wrong instead.
#!/usr/bin/perl
use warnings;
use strict;
use IPC::Run3;
my #cmds = qw(ls silly);
runRun3($cmds[0]);
runRun3($cmds[1]);
Or what would be the best alternative to IPC::Run3 in cases like this? This is just an oversimplification of the process, but eventually I would like to capture STDERR and STDOUT for more complex situations.
Thanks.
A few points to go through.
First, for the direct question, the IPC::Run3 documentation tells us that
run3 throws an exception if the wrapped system call returned -1 or anything went wrong with run3's processing of filehandles. Otherwise it returns true. It leaves $? intact for inspection of exit and wait status.
The error you ask about is of that kind and you need to eval the call to catch that exception
use warnings 'all';
use strict;
use feature 'say';
my ($stdout, $stderr);
my #cmd = ("ls", "-l");
eval { run3 \#cmd, \undef, \$stdout, \$stderr };
if ( $# ) { print "Error: $#"; }
elsif ( $? & 0x7F ) { say "Killed by signal ".( $? & 0x7F ); }
elsif ( $? >> 8 ) { say "Exited with error ".( $? >> 8 ); }
else { say "Completed successfully"; }
You can now print your own messages inside if ($#) { } block, when errors happen where the underlying system fails to execute. Such as when a non-existing program is called.
Here $# relates to eval while $? to system. So if run3 didn't have a problem and $# is false next we check the status of system itself, thus $?. From docs
Note that a true return value from run3 doesn't mean that the command
had a successful exit code. Hence you should always check $?.
For variables $# and $? see General Variables in perlvar, and system and eval pages.
A minimal version of this is to drop eval (and $# check) and expect the program to die if run3 had problems, what should be rare, and to check (and print) the value of $?.
A note on run3 interface. With \#cmd it expects #cmd to contain a command broken into words, the first element being the program and the rest arguments. There is a difference between writing a command in a string, supported by $cmd interface, and in an array. See system for explanation.
Which alternative would suit you best depends on your exact needs. Here are some options. Perhaps first try IPC::System::Simple (but no STDERR on the platter). For cleanly capturing all kinds of output Capture::Tiny is great. On the other end there is IPC::Run for far more power.

Making an IRC bot - how can I let people !eval perl/javascript code?

I'm working on a bot in Perl (based on POE) and so far so good, but I can't figure out how can I add a !js or !perl command to evaluate respective code and return one line of output to be printed into the channel. I found App::EvalServer but I don't get how to use it.
Thanks for any help!
The App::EvalServer module comes with a binary to run as a standalone application. You do not put it in your program but rather run it on it's own. It opens a port where you can hand it code as a json string. This does not sound like a good idea to me either.
There is another module you might want to look at called Safe. I suggest you read through the complete documentation as well as the one to Opcode (linked in the doc) before you do anything with this. YOU CAN DO SERIOUS DAMAGE IF YOU EVALUATE ARBITRARY CODE! Never forget that.
UPDATE:
Here's an example of how to capture the output of print or say from your evaled code. You can use open with a variable to make printed output always go to that variable. If you switch back afterwards you can work with the captured output in your var. This is called an in-memory file.
use strict; use warnings;
use feature 'say';
use Safe;
# Put our STDOUT into a variable
my $printBuffer;
open(my $buffer, '>', \$printBuffer);
# Everything we say and print will go into $printBuffer until we change it back
my $stdout = select($buffer);
# Create a new Safe
my $compartment = new Safe;
$compartment->permit(qw(print)); # for testing
# This is where the external code comes in:
my $external_code = qq~print "Hello World!\n"~;
# Execute the code
my $ret = $compartment->reval($external_code, 1);
# Go back to STDOUT
select($stdout);
printf "The return value of the reval is: %d\n", $ret;
say "The reval's output is:";
say $printBuffer;
# Now you can do whatever you want with your output
$printBuffer =~ s/World/Earth/;
say "After I change it:";
say $printBuffer;
Disclaimer: Use this code at your own risk!
Update 2: After a lengthy discussion in chat, here's what we came up with. It implements a kind of timeout to stop the execution if the reval is taking to long, e.g. because of an infinite loop.
#!/usr/bin/perl
use warnings;
use strict;
use Safe;
use Benchmark qw(:hireswallclock);
my ($t0, $t1); # Benchmark
my $timedOut = 0;
my $userError = 0;
my $printBuffer;
open (my $buffer, '>', \$printBuffer);
my $stdout = select($buffer);
my $cpmt = new Safe;
$cpmt->permit_only(qw(:default :base_io sleep));
eval
{
local $SIG{'ALRM'} = sub { $timedOut = 1; die "alarm\n"};
$t0 = Benchmark->new;
alarm 2;
$cpmt->reval('print "bla\n"; die "In the user-code!";');
# $cpmt->reval('print "bla\n"; sleep 50;');
alarm 0;
$t1 = Benchmark->new;
if ($#)
{
$userError = "The user-code died! $#\n";
}
};
select($stdout);
if ($timedOut)
{
print "Timeout!\n";
my $td = timediff($t1, $t0);
print timestr($td), "\n";
print $printBuffer;
}
else
{
print "There was no timeout...\n";
if ($userError)
{
print "There was an error with your code!\n";
print $userError;
print "But here's your output anyway:\n";
print $printBuffer;
}
else
{
print $printBuffer;
}
}
Take a look at perl eval(), you can pass it variables/strings and it will evaluate it as if it's perl code. Likewise in javascript, there's also an eval() function that performs similarly.
However, DO NOT EVALUATE ARBITRARY CODE in either perl or javascript unless you can run it in a completely closed environment (and even then, it's still a bad idea). Lot's of people spend lots of time preventing just this from happening. So that's how you'd do it, but you don't want to do it, really at all.

how to source a shell script [environment variables] in perl script without forking a subshell?

I want to call "env.sh " from "my_perl.pl" without forking a subshell. I tried with backtics and system like this --> system (. env.sh) [dot space env.sh] , however wont work.
Child environments cannot change parent environments. Your best bet is to parse env.sh from inside the Perl code and set the variables in %ENV:
#!/usr/bin/perl
use strict;
use warnings;
sub source {
my $name = shift;
open my $fh, "<", $name
or die "could not open $name: $!";
while (<$fh>) {
chomp;
my ($k, $v) = split /=/, $_, 2;
$v =~ s/^(['"])(.*)\1/$2/; #' fix highlighter
$v =~ s/\$([a-zA-Z]\w*)/$ENV{$1}/g;
$v =~ s/`(.*?)`/`$1`/ge; #dangerous
$ENV{$k} = $v;
}
}
source "env.sh";
for my $k (qw/foo bar baz quux/) {
print "$k => $ENV{$k}\n";
}
Given
foo=5
bar=10
baz="$foo$bar"
quux=`date +%Y%m%d`
it prints
foo => 5
bar => 10
baz => 510
quux => 20110726
The code can only handle simple files (for instance, it doesn't handle if statements or foo=$(date)). If you need something more complex, then writing a wrapper for your Perl script that sources env.sh first is the right way to go (it is also probably the right way to go in the first place).
Another reason to source env.sh before executing the Perl script is that setting the environment variables in Perl may happen too late for modules that are expecting to see them.
In the file foo:
#!/bin/bash
source env.sh
exec foo.real
where foo.real is your Perl script.
You can use arbitrarily complex shell scripts by executing them with the relevant shell, dumping their environment to standard output in the same process, and parsing that in perl. Feeding the output into something other than %ENV or filtering for specific values of interest is prudent so you don't change things like PATH that may have interesting side effects elsewhere. I've discarded standard output and error from the spawned shell script although they could be redirected to temporary files and used for diagnostic output in the perl script.
foo.pl:
#!/usr/bin/perl
open SOURCE, "bash -c '. foo.sh >& /dev/null; env'|" or
die "Can't fork: $!";
while(<SOURCE>) {
if (/^(BAR|BAZ)=(.*)/) {
$ENV{$1} = ${2} ;
}
}
close SOURCE;
print $ENV{'BAR'} . "\n";
foo.sh:
export BAR=baz
Try this (unix code sample):
cd /tmp
vi s
#!/bin/bash
export blah=test
vi t
#!/usr/bin/perl
if ($ARGV[0]) {
print "ENV second call is : $ENV{blah}\n";
} else {
print "ENV first call is : $ENV{blah}\n";
exec(". /tmp/s; /tmp/t 1");
}
chmod 777 s t
./t
ENV first call is :
ENV second call is : test
The trick is using the exec to source your bash script first and then calling your perl script again with an argument so u know that you are being called for a second time.

How can I translate a shell script to Perl?

I have a shell script, pretty big one. Now my boss says I must rewrite it in Perl.
Is there any way to write a Perl script and use the existing shell code as is in my Perl script. Something similar to Inline::C.
Is there something like Inline::Shell? I had a look at inline module, but it supports only languages.
I'll answer seriously. I do not know of any program to translate a shell script into Perl, and I doubt any interpreter module would provide the performance benefits. So I'll give an outline of how I would go about it.
Now, you want to reuse your code as much as possible. In that case, I suggest selecting pieces of that code, write a Perl version of that, and then call the Perl script from the main script. That will enable you to do the conversion in small steps, assert that the converted part is working, and improve gradually your Perl knowledge.
As you can call outside programs from a Perl script, you can even replace some bigger logic with Perl, and call smaller shell scripts (or other commands) from Perl to do something you don't feel comfortable yet to convert. So you'll have a shell script calling a perl script calling another shell script. And, in fact, I did exactly that with my own very first Perl script.
Of course, it's important to select well what to convert. I'll explain, below, how many patterns common in shell scripts are written in Perl, so that you can identify them inside your script, and create replacements by as much cut&paste as possible.
First, both Perl scripts and Shell scripts are code+functions. Ie, anything which is not a function declaration is executed in the order it is encountered. You don't need to declare functions before use, though. That means the general layout of the script can be preserved, though the ability to keep things in memory (like a whole file, or a processed form of it) makes it possible to simplify tasks.
A Perl script, in Unix, starts with something like this:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#other libraries
(rest of the code)
The first line, obviously, points to the commands to be used to run the script, just like normal shells do. The following two "use" lines make then language more strict, which should decrease the amount of bugs you encounter because you don't know the language well (or plain did something wrong). The third use line imports the "Dumper" function of the "Data" module. It's useful for debugging purposes. If you want to know the value of an array or hash table, just print Dumper(whatever).
Note also that comments are just like shell's, lines starting with "#".
Now, you call external programs and pipe to or pipe from them. For example:
open THIS, "cat $ARGV[0] |";
That will run cat, passing "$ARGV[0]", which would be $1 on shell -- the first argument passed to it. The result of that will be piped into your Perl script through "THIS", which you can use to read that from it, as I'll show later.
You can use "|" at the beginning or end of line, to indicate the mode "pipe to" or "pipe from", and specify a command to be run, and you can also use ">" or ">>" at the beginning, to open a file for writing with or without truncation, "<" to explicitly indicate opening a file for reading (the default), or "+<" and "+>" for read and write. Notice that the later will truncate the file first.
Another syntax for "open", which will avoid problems with files with such characters in their names, is having the opening mode as a second argument:
open THIS, "-|", "cat $ARGV[0]";
This will do the same thing. The mode "-|" stands for "pipe from" and "|-" stands for "pipe to". The rest of the modes can be used as they were (>, >>, <, +>, +<). While there is more than this to open, it should suffice for most things.
But you should avoid calling external programs as much as possible. You could open the file directly, by doing open THIS, "$ARGV[0]";, for example, and have much better performance.
So, what external programs you could cut out? Well, almost everything. But let's stay with the basics: cat, grep, cut, head, tail, uniq, wc, sort.
CAT
Well, there isn't much to be said about this one. Just remember that, if possible, read the file only once and keep it in memory. If the file is huge you won't do that, of course, but there are almost always ways to avoid reading a file more than once.
Anyway, the basic syntax for cat would be:
my $filename = "whatever";
open FILE, "$filename" or die "Could not open $filename!\n";
while(<FILE>) {
print $_;
}
close FILE;
This opens a file, and prints all it's contents ("while(<FILE>)" will loop until EOF, assigning each line to "$_"), and close it again.
If I wanted to direct the output to another file, I could do this:
my $filename = "whatever";
my $anotherfile = "another";
open (FILE, "$filename") || die "Could not open $filename!\n";
open OUT, ">", "$anotherfile" or die "Could not open $anotherfile for writing!\n";
while(<FILE>) {
print OUT $_;
}
close FILE;
This will print the line to the file indicated by "OUT". You can use STDIN, STDOUT and STDERR in the appropriate places as well, without having to open them first. In fact, "print" defaults to STDOUT, and "die" defaults to "STDERR".
Notice also the "or die ..." and "|| die ...". The operators or and || means it will only execute the following command if the first returns false (which means empty string, null reference, 0, and the like). The die command stops the script with an error message.
The main difference between "or" and "||" is priority. If "or" was replaced by "||" in the examples above, it would not work as expected, because the line would be interpreted as:
open FILE, ("$filename" || die "Could not open $filename!\n");
Which is not at all what is expected. As "or" has a lower priority, it works. In the line where "||" is used, the parameters to open are passed between parenthesis, making it possible to use "||".
Alas, there is something which is pretty much what cat does:
while(<>) {
print $_;
}
That will print all files in the command line, or anything passed through STDIN.
GREP
So, how would our "grep" script work? I'll assume "grep -E", because that's easier in Perl than simple grep. Anyway:
my $pattern = $ARGV[0];
shift #ARGV;
while(<>) {
print $_ if /$pattern/o;
}
The "o" passed to $patttern instructs Perl to compile that pattern only once, thus gaining you speed. Not the style "something if cond". It means it will only execute "something" if the condition is true. Finally, "/$pattern/", alone, is the same as "$_ =~ m/$pattern/", which means compare $_ with the regex pattern indicated. If you want standard grep behavior, ie, just substring matching, you could write:
print $_ if $_ =~ "$pattern";
CUT
Usually, you do better using regex groups to get the exact string than cut. What you would do with "sed", for instance. Anyway, here are two ways of reproducing cut:
while(<>) {
my #array = split ",";
print $array[3], "\n";
}
That will get you the fourth column of every line, using "," as separator. Note #array and $array[3]. The # sigil means "array" should be treated as an, well, array. It will receive an array composed of each column in the currently processed line. Next, the $ sigil means array[3] is a scalar value. It will return the column you are asking for.
This is not a good implementation, though, as "split" will scan the whole string. I once reduced a process from 30 minutes to 2 seconds just by not using split -- the lines where rather large, though. Anyway, the following has a superior performance if the lines are expected to be big, and the columns you want are low:
while(<>) {
my ($column) = /^(?:[^,]*,){3}([^,]*),/;
print $column, "\n";
}
This leverages regular expressions to get the desired information, and only that.
If you want positional columns, you can use:
while(<>) {
print substr($_, 5, 10), "\n";
}
Which will print 10 characters starting from the sixth (again, 0 means the first character).
HEAD
This one is pretty simple:
my $printlines = abs(shift);
my $lines = 0;
my $current;
while(<>) {
if($ARGV ne $current) {
$lines = 0;
$current = $ARGV;
}
print "$_" if $lines < $printlines;
$lines++;
}
Things to note here. I use "ne" to compare strings. Now, $ARGV will always point to the current file, being read, so I keep track of them to restart my counting once I'm reading a new file. Also note the more traditional syntax for "if", right along with the post-fixed one.
I also use a simplified syntax to get the number of lines to be printed. When you use "shift" by itself it will assume "shift #ARGV". Also, note that shift, besides modifying #ARGV, will return the element that was shifted out of it.
As with a shell, there is no distinction between a number and a string -- you just use it. Even things like "2"+"2" will work. In fact, Perl is even more lenient, cheerfully treating anything non-number as a 0, so you might want to be careful there.
This script is very inefficient, though, as it reads ALL file, not only the required lines. Let's improve it, and see a couple of important keywords in the process:
my $printlines = abs(shift);
my #files;
if(scalar(#ARGV) == 0) {
#files = ("-");
} else {
#files = #ARGV;
}
for my $file (#files) {
next unless -f $file && -r $file;
open FILE, "<", $file or next;
my $lines = 0;
while(<FILE>) {
last if $lines == $printlines;
print "$_";
$lines++;
}
close FILE;
}
The keywords "next" and "last" are very useful. First, "next" will tell Perl to go back to the loop condition, getting the next element if applicable. Here we use it to skip a file unless it is truly a file (not a directory) and readable. It will also skip if we couldn't open the file even then.
Then "last" is used to immediately jump out of a loop. We use it to stop reading the file once we have reached the required number of lines. It's true we read one line too many, but having "last" in that position shows clearly that the lines after it won't be executed.
There is also "redo", which will go back to the beginning of the loop, but without reevaluating the condition nor getting the next element.
TAIL
I'll do a little trick here.
my $skiplines = abs(shift);
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
shift #lines if $#lines == $skiplines;
}
print #lines;
Ok, I'm combining "push", which appends a value to an array, with "shift", which takes something from the beginning of an array. If you want a stack, you can use push/pop or shift/unshift. Mix them, and you have a queue. I keep my queue with at most 10 elements with $#lines which will give me the index of the last element in the array. You could also get the number of elements in #lines with scalar(#lines).
UNIQ
Now, uniq only eliminates repeated consecutive lines, which should be easy with what you have seen so far. So I'll eliminate all of them:
my $current = "";
my %lines;
while(<>) {
if($ARGV ne $current) {
undef %lines;
$current = $ARGV;
}
print $_ unless defined($lines{$_});
$lines{$_} = "";
}
Now here I'm keeping the whole file in memory, inside %lines. The use of the % sigil indicates this is a hash table. I'm using the lines as keys, and storing nothing as value -- as I have no interest in the values. I check where the key exist with "defined($lines{$_})", which will test if the value associated with that key is defined or not; the keyword "unless" works just like "if", but with the opposite effect, so it only prints a line if the line is NOT defined.
Note, too, the syntax $lines{$_} = "" as a way to store something in a hash table. Note the use of {} for hash table, as opposed to [] for arrays.
WC
This will actually use a lot of stuff we have seen:
my $current;
my %lines;
my %words;
my %chars;
while(<>) {
$lines{"$ARGV"}++;
$chars{"$ARGV"} += length($_);
$words{"$ARGV"} += scalar(grep {$_ ne ""} split /\s/);
}
for my $file (keys %lines) {
print "$lines{$file} $words{$file} $chars{$file} $file\n";
}
Three new things. Two are the "+=" operator, which should be obvious, and the "for" expression. Basically, a "for" will assign each element of the array to the variable indicated. The "my" is there to declare the variable, though it's unneeded if declared previously. I could have an #array variable inside those parenthesis. The "keys %lines" expression will return as an array they keys (the filenames) which exist for the hash table "%lines". The rest should be obvious.
The third thing, which I actually added only revising the answer, is the "grep". The format here is:
grep { code } array
It will run "code" for each element of the array, passing the element as "$_". Then grep will return all elements for which the code evaluates to "true" (not 0, not "", etc). This avoids counting empty strings resulting from consecutive spaces.
Similar to "grep" there is "map", which I won't demonstrate here. Instead of filtering, it will return an array formed by the results of "code" for each element.
SORT
Finally, sort. This one is easy too:
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print sort #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
}
print sort #lines;
Here, "sort" will sort the array. Note that sort can receive a function to define the sorting criteria. For instance, if I wanted to sort numbers I could do this:
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print sort #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
}
print sort {$a <=> $b} #lines;
Here "$a" and "$b" receive the elements to be compared. "<=>" returns -1, 0 or 1 depending on whether the number is less than, equal to or greater than the other. For strings, "cmp" does the same thing.
HANDLING FILES, DIRECTORIES & OTHER STUFF
As for the rest, basic mathematical expressions should be easy to understand. You can test certain conditions about files this way:
for my $file (#ARGV) {
print "$file is a file\n" if -f "$file";
print "$file is a directory\n" if -d "$file";
print "I can read $file\n" if -r "$file";
print "I can write to $file\n" if -w "$file";
}
I'm not trying to be exaustive here, there are many other such tests. I can also do "glob" patterns, like shell's "*" and "?", like this:
for my $file (glob("*")) {
print $file;
print "*" if -x "$file" && ! -d "$file";
print "/" if -d "$file";
print "\t";
}
If you combined that with "chdir", you can emulate "find" as well:
sub list_dir($$) {
my ($dir, $prefix) = #_;
my $newprefix = $prefix;
if ($prefix eq "") {
$newprefix = $dir;
} else {
$newprefix .= "/$dir";
}
chdir $dir;
for my $file (glob("*")) {
print "$prefix/" if $prefix ne "";
print "$dir/$file\n";
list_dir($file, $newprefix) if -d "$file";
}
chdir "..";
}
list_dir(".", "");
Here we see, finally, a function. A function is declared with the syntax:
sub name (params) { code }
Strictly speakings, "(params)" is optional. The declared parameter I used, "($$)", means I'm receiving two scalar parameters. I could have "#" or "%" in there as well. The array "#_" has all the parameters passed. The line "my ($dir, $prefix) = #_" is just a simple way of assigning the first two elements of that array to the variables $dir and $prefix.
This function does not return anything (it's a procedure, really), but you can have functions which return values just by adding "return something;" to it, and have it return "something".
The rest of it should be pretty obvious.
MIXING EVERYTHING
Now I'll present a more involved example. I'll show some bad code to explain what's wrong with it, and then show better code.
For this first example, I have two files, the names.txt file, which names and phone numbers, the systems.txt, with systems and the name of the responsible for them. Here they are:
names.txt
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555
systems.txt
Sales, Jane Doe
Inventory, John Doe
Payment, That Guy
I want, then, to print the first file, with the system appended to the name of the person, if that person is responsible for that system. The first version might look like this:
#!/usr/bin/perl
use strict;
use warnings;
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
sub get_system($) {
my ($name) = #_;
my $system = "";
open FILE, "systems.txt";
while(<FILE>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close FILE;
return $system;
}
This code won't work, though. Perl will complain that the function was used too early for the prototype to be checked, but that's just a warning. It will give an error on line 8 (the first while loop), complaining about a readline on a closed filehandle. What happened here is that "FILE" is global, so the function get_system is changing it. Let's rewrite it, fixing both things:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
This won't give any error or warnings, nor will it work. It returns just the sysems, but not the names and phone numbers! What happened? Well, what happened is that we are making a reference to "$_" after calling get_system, but, by reading the file, get_system is overwriting the value of $_!
To avoid that, we'll make $_ local inside get_system. This will give it a local scope, and the original value will then be restored once returned from get_system:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
local $_;
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
And that still doesn't work! It prints a newline between the name and the system. Well, Perl reads the line including any newline it might have. There is a neat command which will remove newlines from strings, "chomp", which we'll use to fix this problem. And since not every name has a system, we might, as well, avoid printing the comma when that happens:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
local $_;
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
chomp;
print $_;
print ", $system" if $system ne "";
print "\n";
}
close FILE;
That works, but it also happens to be horribly inefficient. We read the whole systems file for every line in the names file. To avoid that, we'll read all data from systems once, and then use that to process names.
Now, sometimes a file is so big you can't read it into memory. When that happens, you should try to read into memory any other file needed to process it, so that you can do everything in a single pass for each file. Anyway, here is the first optimized version of it:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /([^,]*),(.*)/;
$systems{$name} = $system;
}
close SYSTEMS;
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^([^,]*),/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
Unfortunately, it doesn't work. No system ever appears! What has happened? Well, let's look into what "%systems" contains, by using Data::Dumper:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /([^,]*),(.*)/;
$systems{$name} = $system;
}
close SYSTEMS;
print Dumper(%systems);
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^([^,]*),/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
The output will be something like this:
$VAR1 = ' Jane Doe';
$VAR2 = 'Sales';
$VAR3 = ' That Guy';
$VAR4 = 'Payment';
$VAR5 = ' John Doe';
$VAR6 = 'Inventory';
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555
Those $VAR1/$VAR2/etc is how Dumper displays a hash table. The odd numbers are the keys, and the succeeding even numbers are the values. Now we can see that each name in %systems has a preceeding space! Silly regex mistake, let's fix it:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /^\s*([^,]*?)\s*,\s*(.*?)\s*$/;
$systems{$name} = $system;
}
close SYSTEMS;
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^\s*([^,]*?)\s*,/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
So, here, we are aggressively removing any spaces from the beginning or end of name and system. There are other ways to form that regex, but that's beside the point. There is still one problem with this script, which you'll have seen if your "names.txt" and/or "systems.txt" files have an empty line at the end. The warnings look like this:
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
John Doe, (555) 1234-4321, Inventory
Jane Doe, (555) 5555-5555, Sales
The Boss, (666) 5555-5555
Use of uninitialized value in hash element at ./exemplo3e.pl line 19, <NAMES> line 4.
What happened here is that nothing went into the "$name" variable when the empty line was processed. There are many ways around that, but I choose the following:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt" or die "Could not open systems.txt!";
while(<SYSTEMS>) {
my ($system, $name) = /^\s*([^,]+?)\s*,\s*(.+?)\s*$/;
$systems{$name} = $system if defined $name;
}
close SYSTEMS;
open NAMES, "names.txt" or die "Could not open names.txt!";
while(<NAMES>) {
my ($name) = /^\s*([^,]+?)\s*,/;
chomp;
print $_;
print ", $systems{$name}" if defined($name) && defined($systems{$name});
print "\n";
}
close NAMES;
The regular expressions now require at least one character for name and system, and we test to see if "$name" is defined before we use it.
CONCLUSION
Well, then, these are the basic tools to translate a shell script. You can do MUCH more with Perl, but that was not your question, and it wouldn't fit here anyway.
Just as a basic overview of some important topics,
A Perl script that might be attacked by hackers need to be run with the -T option, so that Perl will complain about any vulnerable input which has not been properly handled.
There are libraries, called modules, for database accesses, XML&cia handling, Telnet, HTTP & other protocols. In fact, there are miriads of modules which can be found at CPAN.
As mentioned by someone else, if you make use of AWK or SED, you can translate those into Perl with A2P and S2P.
Perl can be written in an Object Oriented way.
There are multiple versions of Perl. As of this writing, the stable one is 5.8.8 and there is a 5.10.0 available. There is also a Perl 6 in development, but experience has taught everyone not to wait too eagerly for it.
There is a free, good, hands-on, hard & fast book about Perl called Learning Perl The Hard Way. It's style is similar to this very answer. It might be a good place to go from here.
I hope this helped.
DISCLAIMER
I'm NOT trying to teach Perl, and you will need to have at least some reference material. There are guidelines to good Perl habits, such as using "use strict;" and "use warnings;" at the beginning of the script, to make it less lenient of badly written code, or using STDOUT and STDERR on the print lines, to indicate the correct output pipe.
This is stuff I agree with, but I decided it would detract from the basic goal of showing patterns for common shell script utilities.
I don't know what's in your shell script, but don't forget there are tools like
a2p - awk-to-perl
s2p - sed-to-perl
and perhaps more. Worth taking a look around.
You may find that due to Perl's power/features, it's not such a big job, in that you may have been jumping through hoops with various bash features and utility programs to do something that comes out of Perl natively.
Like any migration project, it's useful to have some canned regression tests to run with both solutions, so if you don't have those, I'd generate those first.
I'm surprised no-one has yet mentioned the Shell module that is included with core Perl, which lets you execute external commands using function-call syntax. For example (adapted from the synopsis):
use Shell qw(cat ps cp);
$passwd = cat '</etc/passwd';
#pslines = ps '-ww';
cp "/etc/passwd", "/tmp/passwd";
Provided you use parens, you can even call other programs in the $PATH that you didn't mention on the use line, e.g.:
gcc('-o', 'foo', 'foo.c');
Note that Shell gathers up the subprocess's STDOUT and returns it as a string or array. This simplifies scripting, but it is not the most efficient way to go and may cause trouble if you rely on a command's output being unbuffered.
The module docs mention some shortcomings, such as that shell internal commands (e.g. cd) cannot be called using the same syntax. In fact they recommend that the module not be used for production systems! But it could certainly be a helpful crutch to lean on until you get your code ported across to "proper" Perl.
The inline shell thingy is called system. If you have user-defined functions you're trying to expose to Perl, you're out of luck. However, you can run short bits of shell using the same environment as your running Perl program. You can also gradually replace parts of the shell script with Perl. Start writing a module that replicates the shell script functionality and insert Perly bits into the shell script until you eventually have mostly Perl.
There's no shell-to-Perl translator. There was a long running joke about a csh-to-Perl translator that you could email your script to, but that was really just Tom Christainsen translating it for you to show you how cool Perl was back in the early 90s. Randal Schwartz uploaded a sh-to-Perl translator, but you have to check the upload date: it was April Fool's day. His script merely wrapped everything in system.
Whatever you do, don't lose the original shell script. :)
I agree that learning Perl and trying to write Perl instead of shell is for the greater good. I did the transfer once with the help of the "Replace" function of Notepad++.
However, I had a similar problem to the one initially asked while I was trying to create a Perl wrapper around a shell script (that could execute it).
I came with the following code that works in my case.
It might help.
#!perl
use strict;
use Data::Dumper;
use Cwd;
#Variables read from shell
our %VAR;
open SH, "<$ARGV[0]" or die "Error while trying to read $ARGV[0] ($!)\n";
my #SH=<SH>;
close SH;
sh2perl(#SH);
#Subroutine to execute shell from Perl (read from array)
sub sh2perl {
#Variables
my %case; #To store data from conditional block of "case"
my %if; #To store data from conditional block of "if"
foreach my $line (#_) {
#Remove blanks at the beginning and EOL character
$line=~s/^\s*//;
chomp $line;
#Comments and blank lines
if ($line=~/^(#.*|\s*)$/) {
#Do nothing
}
#Conditional block - Case
elsif ($line=~/case.*in/..$line=~/esac/) {
if ($line=~/case\s*(.*?)\s*\in/) {
$case{'var'}=transform($1);
} elsif ($line=~/esac/) {
delete $case{'curr_pattern'};
#Run conditional block
my $case;
map { $case=$_ if $case{'var'}=~/$_/ } #{$case{'list_patterns'}};
$case ? sh2perl(#{$case{'patterns'}->{$case}}) : sh2perl(#{$case{'patterns'}->{"*"}});
} elsif ($line=~/^\s*(.*?)\s*\)/) {
$case{'curr_pattern'}=$1;
push(#{$case{'list_patterns'}}, $case{'curr_pattern'}) unless ($line=~m%\*\)%)
} else {
push(#{$case{'patterns'}->{ $case{'curr_pattern'} }}, $line);
}
}
#Conditional block - if
elsif ($line=~/^if/..$line=~/^fi/) {
if ($line=~/if\s*\[\s*(.*\S)\s*\];/) {
$if{'condition'}=transform($1);
$if{'curr_cond'}="TRUE";
} elsif ($line=~/fi/) {
delete $if{'curr_cond'};
#Run conditional block
$if{'condition'} ? sh2perl(#{$if{'TRUE'}}) : sh2perl(#{$if{'FALSE'}});
} elsif ($line=~/^else/) {
$if{'curr_cond'}="FALSE";
} else {
push(#{$if{ $if{'curr_cond'} }}, $line);
}
}
#echo
elsif($line=~/^echo\s+"?(.*?[^"])"?\s*$/) {
my $str=$1;
#echo with redirection
if ($str=~m%[>\|]%) {
eval { system(transform($line)) };
if ($#) { warn "Error while evaluating $line: $#\n"; }
#print new line
} elsif ($line=~/^echo ""$/) {
print "\n";
#default
} else {
print transform($str),"\n";
}
}
#cd
elsif($line=~/^\s*cd\s+(.*)/) {
chdir $1;
}
#export
elsif($line=~/^export\s+((\w+).*)/) {
my ($var,$exported)=($2,$1);
if ($exported=~/^(\w+)\s*=\s*(.*)/) {
while($exported=~/(\w+)\s*=\s*"?(.*?\S)"?\s*(;(?:\s*export\s+)?|$)/g) { $VAR{$1}=transform($2); }
}
# export($var,$VAR{$var});
$ENV{$var}=$VAR{$var};
print "Exported variable $var = $VAR{$var}\n";
}
#Variable assignment
elsif ($line=~/^(\w+)\s*=\s*(.*)$/) {
$1 eq "" or $VAR{$1}=""; #Empty variable
while($line=~/(\w+)\s*=\s*"?(.*?\S)"?\s*(;|$)/g) {
$VAR{$1}=transform($2);
}
}
#Source
elsif ($line=~/^source\s*(.*\.sh)/) {
open SOURCE, "<$1" or die "Error while trying to open $1 ($!)\n";
my #SOURCE=<SOURCE>;
close SOURCE;
sh2perl(#SOURCE);
}
#Default (assuming running command)
else {
eval { map { system(transform($_)) } split(";",$line); };
if ($#) { warn "Error while doing system on \"$line\": $#\n"; }
}
}
}
sub transform {
my $src=$_[0];
#Variables $1 and similar
$src=~s/\$(\d+)/$ARGV[$1-1]/ge;
#Commands stored in variables "$(<cmd>)"
eval {
while ($src=~m%\$\((.*)\)%g) {
my ($cmd,$new_cmd)=($1,$1);
my $curr_dir=getcwd;
$new_cmd=~s/pwd/echo $curr_dir/g;
$src=~s%\$\($cmd\)%`$new_cmd`%e;
chomp $src;
}
};
if ($#) { warn "Wrong assessment for variable $_[0]:\n=> $#\n"; return "ERROR"; }
#Other variables
$src=~s/\$(\w+)/$VAR{$1}/g;
#Backsticks
$src=~s/`(.*)`/`$1`/e;
#Conditions
$src=~s/"(.*?)"\s*==\s*"(.*?)"/"$1" eq "$2" ? 1 : 0/e;
$src=~s/"(.*?)"\s*!=\s*"(.*?)"/"$1" ne "$2" ? 1 : 0/e;
$src=~s/(\S+)\s*==\s*(\S+)/$1 == $2 ? 1 : 0/e;
$src=~s/(\S+)\s*!=\s*(\S+)/$1 != $2 ? 1 : 0/e;
#Return Result
return $src;
}
You could start your "Perl" script with:
#!/bin/bash
Then, assuming bash was installed at that location, perl would automatically invoke the bash interpretor to run it.
Edit: Or maybe the OS would intercept the call and stop it getting to Perl. I'm finding it hard to track down the documentation on how this actually works. Comments to documentation would be welcomed.