What are some elegant features or uses of Perl? - perl

What? Perl Beautiful? Elegant? He must be joking!
It's true, there's some ugly Perl out there. And by some, I mean lots. We've all seen it.
Well duh, it's symbol soup. Isn't it?
Yes there are symbols. Just like 'math' has 'symbols'. It's just that we programmers are more familiar with the standard mathematical symbols. We grew to accept the symbols from our mother languages, whether that be ASM, C, or Pascal. Perl just decided to have a few more.
Well, I think we should get rid of all the unnecessary symbols. Makes the code look better.
The language for doing so already exists. It's called Lisp. (and soon, perl 6.)
Okay, smart guy. Truth is, I can already invent my own symbols. They're called functions and methods. Besides, we don't want to reinvent APL.
Oh, fake alter ego, you are so funny! It's really true, Perl can be quite beautiful. It can be quite ugly, as well. With Perl, TIMTOWTDI.
So, what are your favorite elegant bits of Perl code?

Perl facilitates the use of lists/hashes to implement named parameters, which I consider very elegant and a tremendous aid to self-documenting code.
my $result = $obj->method(
flux_capacitance => 23,
general_state => 'confusion',
attitude_flags => ATTITUDE_PLEASANT | ATTITUDE_HELPFUL,
);

My favourite pieces of elegant Perl code aren't necessarily elegant at all. They're meta-elegant, and allow you to get rid of all those bad habits that many Perl developers have slipped into. It would take me hours or days to show them all in the detail they deserve, but as a short list they include:
autobox, which turns Perl's primitives into first-class objects.
autodie, which causes built-ins to throw exceptions on failure (removing most needs for the or die... construct). See also my autodie blog and video).
Moose, which provide an elegant, extensible, and correct way of writing classes in Perl.
MooseX::Declare, which provides syntaxic aweseomeness when using Moose.
Perl::Critic, your personal, automatic, extensible and knowledgeable code reviewer. See also this Perl-tip.
Devel::NYTProf, which provides me the most detailed and usable profiling information I've seen in any programming language. See also Tim Bunce's Blog.
PAR, the Perl Archiver, for bundling distributions and even turning whole programs into stand-alone executable files. See also this Perl-tip.
Perl 5.10, which provides some stunning regexp improvements, smart-match, the switch statement, defined-or, and state variables.
Padre, the only Perl editor that integrates the best bits of the above, is cross-platform, and is completely free and open source.
If you're too lazy to follow links, I recently did a talk at Linux.conf.au about most of the above. If you missed it, there's a video of it on-line (ogg theora). If you're too lazy to watch videos, I'm doing a greatly expanded version of the talk as a tutorial at OSCON this year (entitled doing Perl right).
All the best,
Paul

I'm surprised no one mentioned the Schwartzian Transform.
my #sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, expensive_func($_) ] }
#elements;
And in the absence of a slurp operator,
my $file = do { local $/; readline $fh };

Have a list of files the user wants your program to process? Don't want to accidentally process a program, folder, or nonexistent file? Try this:
#files = grep { -T } #files;
And, like magic, you've weeded out all the inappropriate entries. Don't want to ignore them silently? Add this line before the last one:
warn "Not a file: $_" foreach grep { !-T } #files;
Prints a nice warning message for every file that it can't process to standard error. The same thing without using grep would look like this:
my #good;
foreach(#files) {
if(-T) {
push #good, $_;
} else {
warn "Not a file: $_";
}
}
grep (and map) can be used to make code shorter while still keeping it very readable.

The "or die" construct:
open my $fh, "<", $filename
or die "could not open $filename: $!";
The use of qr// to create grammars:
#!/usr/local/ActivePerl-5.10/bin/perl
use strict;
use warnings;
use feature ':5.10';
my $non_zero = qr{[1-9]};
my $zero = qr{0};
my $decimal = qr{[.]};
my $digit = qr{$non_zero+ | $zero}x;
my $non_zero_natural = qr{$non_zero+ $digit*}x;
my $natural = qr{$non_zero_natural | $zero}x;
my $integer = qr{-? $non_zero_natural | $zero}x;
my $real = qr{$integer (?: $decimal $digit)?}x;
my %number_types = (
natural => qr/^$natural$/,
integer => qr/^$integer$/,
real => qr/^$real$/
);
for my $n (0, 3.14, -5, 300, "4ever", "-0", "1.2.3") {
my #types = grep { $n =~ $number_types{$_} } keys %number_types;
if (#types) {
say "$n is of type", #types == 1 ? " ": "s ", "#types";
} else {
say "$n is not a number";
}
}
Anonymous subroutines used to factor out duplicate code:
my $body = sub {
#some amount of work
};
$body->();
$body->() while $continue;
instead of
#some amount of work
while ($continue) {
#some amount of work again
}
Hash based dispatch tables:
my %dispatch = (
foo => \&foo,
bar => \&bar,
baz => \&baz
);
while (my $name = iterator()) {
die "$name not implemented" unless exists $dispatch{$name};
$dispatch{$name}->();
}
instead of
while (my $name = iterator()) {
if ($name eq "foo") {
foo();
} elsif ($name eq "bar") {
bar();
} elsif ($name eq "baz") {
baz();
} else {
die "$name not implemented";
}
}

Three-line classes with constructors, getter/setters and type validation:
{
package Point;
use Moose;
has ['x', 'y'] => (isa => 'Num', is => 'rw');
}
package main;
my $point = Point->new( x => '8', y => '9' );
$point->x(25);

A favorite example of mine is Perl's implementation of a factorial calculator. In Perl 5, it looks like so:
use List::Util qw/reduce/;
sub factorial {
reduce { $a * $b } 1 .. $_[0];
}
This returns false if the number is <= 1 or a string and a number if a number is passed in (rounding down if a fraction).
And looking forward to Perl 6, it looks like this:
sub factorial {
[*] 1..$^x
}
And also ( from the blog in the link above ) you can even implement this as an operator:
sub postfix:<!>(Int $x) {
[*] 1..($x || 1)
}
and then use it in your code like so:
my $fact5 = 5!;

If you have a comma separated list of flags, and want a lookup table for them, all you have to do is:
my %lookup = map { $_ => 1 } split /,/, $flags;
Now you can simply test for which flags you need like so:
if ( $lookup{FLAG} ) {
print "Ayup, got that flag!";
}

I am surprised no one has mentioned this. It's a masterpiece in my opinion:
#!/usr/bin/perl
$==$';
$;||$.| $|;$_
='*$ ( ^#(%_+&~~;# ~~/.~~
;_);;.);;#) ;~~~~;_,.~~,.* +,./|~
~;_);#-, .;.); ~ ~,./##-__);#-);~~,.*+,.
/|);;;~~#-~~~~;.~~,. /.);;.,./#~~#-;.;#~~#-;;
;;,.*+,./.);;#;./#,./ |~~~~;#-(#-__#-__&$#%^';$__
='`'&'&';$___="````" |"$[`$["|'`%",';$~=("$___$__-$[``$__"|
"$___"| ("$___$__-$[.%")).("'`"|"'$["|"'#").
'/.*?&([^&]*)&.*/$'.++$=.("/``"|"/$[`"|"/#'").(";`/[\\`\\`$__]//`;"
|";$[/[\\$[\\`$__]//`;"|";#/[\\\$\\.$__]//'").'#:=("#-","/.",
"~~",";#",";;",";.",",.",");","()","*+","__","-(","/#",".%","/|",
";_");#:{#:}=$%..$#:;'.('`'|"$["|'#')."/(..)(..)/".("```"|"``$["|
'#("').'(($:{$'.$=.'}<<'.(++$=+$=).')|($:{$'.$=.'}))/'.("```;"|
"``$[;"|"%'#;").("````'$__"|"%$[``"|"%&!,").${$[};`$~$__>&$=`;$_=
'*$(^#(%_+&#-__~~;#~~#-;.;;,.(),./.,./|,.-();;#~~#-);;;,.;_~~#-,./.,
./#,./#~~#-);;;,.(),.;.~~#-,.,.,.;_,./#,.-();;#~~#-,.;_,./|~~#-,.
,.);););#-#-__~~;#~~#-,.,.,.;_);~~~~#-);;;,.(),.*+);;# ~~#-,
./|,.*+,.,.);;;);*+~~#-,.*+,.;;,.;.,./.~~#-,.,.,.;_) ;~~~
~#-,.;;,.;.,./#,./.);*+,.;.,.;;#-__~~;#~~#-,.;;,.* +);;
#);#-,./#,./.);*+~~#-~~.%~~.%~~#-;;__,. /.);;##- __#-
__ ~~;;);/#;#.%;#/.;#-(#-__~~;;;.;_ ;#.%~~~~ ;;()
,.;.,./#,. /#,.;_~~#- ););,.;_ );~~,./ #,.
;;;./#,./| ~~~~;#-(#- __,.,.,. ;_);~~~ ~#
-~~());; #);#-,./#, .*+);;; ~~#-~~
);~~);~~ *+~~#-);-( ~~#-#-_ _~~#-
~~#-);; #,./#,.;., .;.);# -~~#-;
#/.;#-( ~~#-#-__ ~~#-~~ #-);#
-);~~, .*+,./ |);;;~ ~#-~~
;;;.; _~~#-# -__);. %;#-(
#-__# -__~~;# ~~#-;; ;#,.
;_,.. %);#-,./#, .*+,
..%, .;.,./|) ;;;)
;;#~ ~#-,.*+,. ,.~~
#-); *+,.;_);;.~ ~););
~~,.; .~~#-);~~,.;., ./.,.;
;,.*+ ,./|,.); ~~#- );;;,.(
),.*+); ;#~~/|#-
__~~;#~~ $';$;;

I absolutely love Black Perl (link to version rewritten to compile under Perl 5). It compiles, but as far as I can tell it doesn't actually do anything.
That's what you get for a language written by a linguist from a pragmatic perspective rather than from a theoretical perspective.
Moving on from that, you can think about the Perl that people complain about as pidgin Perl (perfectly useful, but not expressive, and beware of trying to express anything complex in it), and the stuff that #pjf is talking about as "proper" Perl, the language of Shakespeare, Hemingway, Hume and so on. [edit: err, though easier to read than Hume and less dated than Shakespeare.] [re-edit and hopefully less alcoholic than Hemingway]

Adding to the love of map and grep, we can write a simple command-line parser.
my %opts = map { $_ => 1 } grep { /^-/ } #ARGV;
If we want, we can set each flag to it's index in #ARGV:
my %opts = map { $ARGV[$_] => $_ } grep { $ARGV[$_] =~ /^-/ } 0 .. $#ARGV;
That way, if a flag has an argument, we can get the argument like this:
if( defined( $opts{-e} ) ) {
my $arg = $ARGV[ $opts{-e} ];
# do -e stuff for $arg
}
Of course, some people will cry that we're reinventing the wheel and we should use getopt or some variant thereof, but honestly, this was a fairly easy wheel to reinvent. Plus, I don't like getopt.
If you don't like how long some of those lines are, you can always use intermediate variables or just convenient line breaks (hey, Python fanatics? You hear that? We can put one line of code across two lines and it still works!) to make it look better:
my %opts = map { $ARGV[$_] => $_ }
grep { $ARGV[$_] =~ /^-/ } 0 .. $#ARGV;

This file parsing mechanism is compact and easy to customize (skip blank lines, skip lines starting with X, etc..).
open(H_CONFIG, "< $file_name") or die("Error opening file: $file_name! ($!)");
while (<H_CONFIG>)
{
chomp; # remove the trailing newline
next if $_ =~ /^\s*$/; # skip lines that are blank
next if $_ =~ /^\s*#/; # skip lines starting with comments
# do something with the line
}
I use this type of construct in diverse build situations - where I need to either pre or post process payload files (S-records, etc..) or C-files or gather directory information for a 'smart build'.

My favourite elegant Perl feature is that it uses different operators for numerical values and string values.
my $string = 1 . 2;
my $number = "1" + "2";
my $unambiguous = 1 . "2";
Compare this to other dynamic languages such as JavaScript, where "+" is used for concatenation and addition.
var string = "1" + "2";
var number = 1 + 2;
var ambiguous = 1 + "2";
Or to dynamic languages such as Python and Ruby that require type coercion between strings and numberical values.
string = "1" + "2"
number = 1 + 2
throws_exception = 1 + "2"
In my opinion Perl gets this so right and the other languages get it so wrong.

Poorer typists like me who get cramps hitting the shift key too often and have an almost irrational fear of using a semicolon started writing our Perl code in python formatted files. :)
e.g.
>>> k = 5
>>> reduce(lambda i,j: i*j, range(1,k+1),1)
120
>>> k = 0
>>> reduce(lambda i,j: i*j, range(1,k+1),1)
1

Related

How can I determine if an element exists in an array (perl)

I'm looping through an array, and I want to test if an element is found in another array.
In pseudo-code, what I'm trying to do is this:
foreach $term (#array1) {
if ($term is found in #array2) {
#do something here
}
}
I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...
I've tried grep:
if grep {/$term/} #array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in #array2
if (grep(/$term/, #array2)) { #do something }
# this test likewise succeeds for values NOT IN the array
I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.
I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...
... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)
...
I have an existing script which uses 'grep' somewhat succesfully:
$rc=grep(/$term/, #array);
if ($rc eq 0) { #something happens here }
but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.
The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:
#array=`cat file`;
whereas in new script I put the array inside the script itself (coz it's small) ... like this:
#array=("element1","element2","element3","element4");
How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(
########################################################################
addendum ... some clarifications or examples of my actual code:
########################################################################
The term I'm trying to match/find/grep is a word element, for example "word123".
This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.
The term I'm searching for is stored in a variable which is instantiated via split:
foreach $line(#array1) {
chomp($line); # habit
# every line has multiple elements that I want to capture
($term1,$term2,$term3,$term4)=split(/\t/,$line);
# if a particular one of those terms is found in my other array 'array2'
if (grep(/$term2/, #array2) {
# then I'm storing a different element from the line into a 3rd array which eventually will be outputted
push(#known, $term1) unless $seen{$term1}++;
}
}
see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.
...
I just tried one of the below suggestions:
if (grep $_ eq $term2, #array2)
And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.
This is in perlfaq. A quick way to do it is
my %seen;
$seen{$_}++ for #array1;
for my $item (#array2) {
if ($seen{$item}) {
# item is in array2, do something
}
}
If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).
ETA:
With the changed question: If the task is to match single words in #array2 against whole lines in #array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.
Unless #array2 is very large, you might do something like this:
my $rx = join "|", #array2;
for my $line (#array1) {
if ($line =~ /\b$rx\b/) { # use word boundary to avoid partial matches
# do something
}
}
If #array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:
my $rx = join "|", map quotemeta, #array2;
# etc
You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
print "a in \#array1\n" if 'a' ~~ #array1;
print "z in \#array1\n" if 'z' ~~ #array1;
print "z in \#array2\n" if 'z' ~~ #array2;
The example is very simple, but you can use an RE if you need to as well.
I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.
This should work.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
for my $term (#array1) {
if (grep $_ eq $term, #array2) {
print "$term found.\n";
}
}
Output:
a found.
c found.
e found.
g found.
#!/usr/bin/perl
#ar = ( '1','2','3','4','5','6','10' );
#arr = ( '1','2','3','4','5','6','7','8','9' ) ;
foreach $var ( #arr ){
print "$var not found\n " if ( ! ( grep /$var/, #ar )) ;
}
Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!
print "$element found in the array\n" if ("#array" =~ m/$element/);
Your 'actual code' shouldn't even compile:
if (grep(/$term2/, #array2) {
should be:
if (grep (/$term2/, #array2)) {
You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:
if (grep {/$term2/} #array2) {
You may want to use strict; and use warnings; to catch issues like this.
The example below might be helpful, it tries to see if any element in #array_sp is present in #my_array:
#! /usr/bin/perl -w
#my_array = qw(20001 20003);
#array_sp = qw(20001 20002 20004);
print "#array_sp\n";
foreach $case(#my_array){
if("#array_sp" =~ m/$case/){
print "My God!\n";
}
}
use pattern matching can solve this. Hope it helps
-QC
1. grep with eq , then
if (grep {$_ eq $term2} #array2) {
print "$term2 exists in the array";
}
2. grep with regex , then
if (grep {/$term2/} #array2) {
print "element with pattern $term2 exists in the array";
}

Converting code to perl sub, but not sure I'm doing it right

I'm working from a question I posted earlier (here), and trying to convert the answer to a sub so I can use it multiple times. Not sure that it's done right though. Can anyone provide a better or cleaner sub?
I have a good deal of experience programming, but my primary language is PHP. It's frustrating to know how to execute in one language, but not be able to do it in another.
sub search_for_key
{
my ($args) = #_;
foreach $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
my $thiskey = NULL;
my #result = map { $args->{search_ary}[$_][0] } # Get the 0th column...
grep { #$args->{search_in} =~ /$args->{search_ary}[$_][1]/ } # ... of rows where the
0 .. $#array; # first row matches
$thiskey = #result;
print "\nReturning: " . $thiskey . "\n";
return $thiskey;
}
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
---Edit---
From the answers so far, I've cobbled together the function below. I'm new to Perl, so I don't really understand much of the syntax. All I know is that it throws an error (Not an ARRAY reference at line 26.) about that grep line.
Since I seem to not have given enough info, I will also mention that:
I am calling this function like this (which may or may not be correct):
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
And $ref_cam_make is an array I collect from a database table like this:
$ref_cam_make = $sth->fetchall_arrayref;
And it is in the structure like this (if I understood how to make the associative fetch work properly, I would like to use it like that instead of by numeric keys):
Reference Array
Associative
row[1][cam_make_id]: 13, row[1][name]: Sony
Numeric
row[1][0]: 13, row[1][1]: Sony
row[0][0]: 19, row[0][1]: Canon
row[2][0]: 25, row[2][1]: HP
sub search_for_key
{
my ($args) = #_;
foreach my $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
print grep { $args->{search_in} =~ #$args->{search_ary}[$_][1] } #$args->{search_ary};
}
You are moving in the direction of a 2D array, where the [0] element is some sort of ID number and the [1] element is the camera make. Although reasonable in a quick-and-dirty way, such approaches quickly lead to unreadable code. Your project will be easier to maintain and evolve if you work with richer, more declarative data structures.
The example below uses hash references to represent the camera brands. An even nicer approach is to use objects. When you're ready to take that step, look into Moose.
use strict;
use warnings;
demo_search_feature();
sub demo_search_feature {
my #camera_brands = (
{ make => 'Canon', id => 19 },
{ make => 'Sony', id => 13 },
{ make => 'HP', id => 25 },
);
my #test_searches = (
"Sony's Cyber-shot DSC-S600",
"Canon cameras",
"Sony HPX-32",
);
for my $ts (#test_searches){
print $ts, "\n";
my #hits = find_hits($ts, \#camera_brands);
print ' => ', cb_stringify($_), "\n" for #hits;
}
}
sub cb_stringify {
my $cb = shift;
return sprintf 'id=%d make=%s', $cb->{id}, $cb->{make};
}
sub find_hits {
my ($search, $camera_brands) = #_;
return grep { $search =~ $_->{make} } #$camera_brands;
}
This whole sub is really confusing, and I'm a fairly regular perl user. Here are some blanket suggestions.
Do not create your own undef ever -- use undef then return at the bottom return $var // 'NULL'.
Do not ever do this: foreach $row, because foreach my $row is less prone to create problems. Localizing variables is good.
Do not needlessly concatenate, for it offends the style god: not this, print "\nReturning: " . $thiskey . "\n";, but print "\nReturning: $thiskey\n";, or if you don't need the first \n: say "Returning: $thiskey;" (5.10 only)
greping over 0 .. $#array; is categorically lame, just grep over the array: grep {} #{$foo[0]}, and with that code being so complex you almost certainly don't want grep (though I don't understand what you're doing to be honest.). Check out perldoc -q first -- in short grep doesn't stop until the end.
Lastly, do not assign an array to a scalar: $thiskey = #result; is an implicit $thiskey = scalar #result; (see perldoc -q scalar) for more info. What you probably want is to return the array reference. Something like this (which eliminates $thiskey)
printf "\nReturning: %s\n", join ', ', #result;
#result ? \#result : 'NULL';
If you're intending to return whether a match is found, this code should work (inefficiently). If you're intending to return the key, though, it won't -- the scalar value of #result (which is what you're getting when you say $thiskey = #result;) is the number of items in the list, not the first entry.
$thiskey = #result; should probably be changed to $thiskey = $result[0];, if you want mostly-equivalent functionality to the code you based this off of. Note that it won't account for multiple matches anymore, though, unless you return #result in its entirety, which kinda makes more sense anyway.

Simple Perl Script: Two questions

I have a small program:
#!/user/bin/perl
use strict;
system ("clear");
my($option, $path);
do
{
print "\tEnter the number of your chosen option:\n";
print "\n";
print "\tOption\t\tCommand\n";
print "\t======\t\t=======\n";
print "\t1\t\tDate\n";
print "\t2\t\tDirectory Listing\n";
print "\t3\t\tCalendar\n";
print "\t4\t\tVi Editor\n";
print "\t5\t\tCalculator\n";
print "\t6\t\tExit\n\n";
chomp($option=<STDIN>);
SWITCH:
{
($option =="1") and do
{
system(date);
last;
};
($option =="2") and do
{
print "Enter the path:"; ############################
chomp($path=<STDIN>); #This is giving me an error#
system(ls $path); ############################
last;
};
($option =="3") and do
{
system(cal);
last;
};
($option =="4") and do
{
system(vi);
last;
};
($option =="5") and do
{
system(bc);
last;
};
}
}while ($option!=6);
print "Goodbye!\n";
sleep 2;
First question: Can anyone help me how to write the proper command to create a directory listing in case 2.
Second Question: Why do I get a loop if I use
$date = `date`;
print "$date";
instead of
system(date);
You should be able to solve a lot of your problems by remembering to put quotes around literal arguments to system():
system("date");
system("ls $path");
and the same for most other places you call system() (your first call to system("clear") is correct).
It is a quirk of Perl that calling something like system(cal) works at all, because the unquoted cal is treated as a "bareword" by Perl, which happens to be roughly equivalent to a string when passed to a function such as system(). Relying on this behaviour would be terribly bad practice, and so you should always quote literal strings.
You could read the path like:
chomp($path=<STDIN>);
system("ls $path");
Not sure why you'd get the loop for $date =date;print "$date";. But I don't think there's a date function unless you're using a package for it. You can show a time like:
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
$year += 1900;
$mon += 1;
printf "%04d-%02d-%02d %02d:%02d:%02d",
$year, $mday, $mon,
$hour, $min, $sec;
On most unix systems perl resides in /usr/bin, without the e in user, so you might consider double-checking the first line of your script.
Your immediate problems were caused by quoting issues and the lack of use warnings in your script.
It's also worth noting that menu-driven scripts like yours are ideal candidates for dispatch tables. A dispatch table is a technique for defining actions as data. The actions are Perl subroutines. The data is usually a set of key-value pairs that end up getting stored in a hash.
The keys to the hash are the choices made by the user (menu items 1-6 in your case).
The values in the hash are called code references. There are two ways to set up these code references: (1) Directly in the dispatch table, using anonymous subroutines; or (2) using the &\foo syntax, which would create a reference to a subroutine named foo.
The handy thing about this approach is that your menu() method can be reused -- simply with a different dispatch table and a different usage message.
This example is so small that the benefit of reuse might not seem compelling, but the general technique of having data -- in the form of a dispatch table -- control program behavior is powerful in many contexts.
# Always use both of these.
use strict;
use warnings;
sub dispatch_table {
return
1 => sub { system 'date' },
2 => \&ls_path,
3 => sub { system 'cal' },
4 => sub { system 'vi' },
5 => sub { system 'bc' },
6 => sub { print "Goodbye!\n"; sleep 2 },
;
}
sub ls_path {
print "\nEnter the path: ";
chomp(my $path=<STDIN>);
# Note quoting. To be super robust, you would
# need to escape apostrophes in the path.
system "ls '$path'";
}
sub usage_message {
return "Choose wisely:
Option Command
====== =======
1 Date
2 Directory Listing
3 Calendar
4 Vi Editor
5 Calculator
6 Exit
";
}
sub menu {
system 'clear';
my %dt = dispatch_table();
my $option;
print usage_message();
while (1){
print "> ";
chomp($option = <STDIN>);
last if exists $dt{$option};
}
$dt{$option}->();
}
menu();
I can not reproduce your loop with:
$date =date;print "$date";
I doubt that is exactly how you coded it since I get a compile error
with use strict;. If you can show a reduced code example which still illustrates the problem, we could help debug it further.
If you are trying to capture the output of an external command into a variable, you could use backticks or qx:
my $date = qx(date);
print "$date";
On a side note, whenever I see a series of print statements, I think here-doc:
print <<"EOF";
Enter the number of your chosen option:
Option Command
====== =======
1 Date
2 Directory Listing
etc...
EOF
A little easier to read and maintain, no?
Finally, it is also a good idea to use warnings;.
The first couple of suggests I have are, first like others have already suggested, use warnings is strongly encouraged. Older Perl interpreters may require you use the older form #!/usr/bin/perl -w as the first line of your Perl script. Second, there is a Switch module available, to make the switch statement look less ugly. I've also shown usage of subroutines to clean up the appearance of the program.
I've attached a alternative version of your script with some potential suggestions. Note it uses a slightly different alternative for switch. If available, I'd recommend using the Switch module. It includes a different way of printing the time, and of course fixes your problem with the system calls.
I hope that helps.
#!/usr/bin/perl
use strict;
use warnings; # otherwise /usr/bin/perl -w in first line
sub menu() {
print <<EOM;
Enter the number of your chosen option:
Option Command
====== =======
1 Date
2 Directory Listing
3 Calendar
4 Vi Editor
5 Calculator
6 Exit
EOM
}
sub showtime() {
my $time = localtime;
print $time,"\n";
}
sub listdir() {
my $path;
print "Enter the path: ";
chomp($path = <STDIN>);
system("ls $path");
print "\n";
}
system("clear");
my $option;
do {
menu();
chomp($option = <STDIN>);
# SWITCH:
for ($option) {
/1/ and do {
showtime();
};
/2/ and do {
listdir();
};
/3/ and do {
system("cal");
};
/4/ and do {
system("vi");
};
/5/ and do {
system("bc");
};
last;
}
} while ($option != 6);
print "Goodbye!\n";
sleep 2;

How can I translate a shell script to Perl?

I have a shell script, pretty big one. Now my boss says I must rewrite it in Perl.
Is there any way to write a Perl script and use the existing shell code as is in my Perl script. Something similar to Inline::C.
Is there something like Inline::Shell? I had a look at inline module, but it supports only languages.
I'll answer seriously. I do not know of any program to translate a shell script into Perl, and I doubt any interpreter module would provide the performance benefits. So I'll give an outline of how I would go about it.
Now, you want to reuse your code as much as possible. In that case, I suggest selecting pieces of that code, write a Perl version of that, and then call the Perl script from the main script. That will enable you to do the conversion in small steps, assert that the converted part is working, and improve gradually your Perl knowledge.
As you can call outside programs from a Perl script, you can even replace some bigger logic with Perl, and call smaller shell scripts (or other commands) from Perl to do something you don't feel comfortable yet to convert. So you'll have a shell script calling a perl script calling another shell script. And, in fact, I did exactly that with my own very first Perl script.
Of course, it's important to select well what to convert. I'll explain, below, how many patterns common in shell scripts are written in Perl, so that you can identify them inside your script, and create replacements by as much cut&paste as possible.
First, both Perl scripts and Shell scripts are code+functions. Ie, anything which is not a function declaration is executed in the order it is encountered. You don't need to declare functions before use, though. That means the general layout of the script can be preserved, though the ability to keep things in memory (like a whole file, or a processed form of it) makes it possible to simplify tasks.
A Perl script, in Unix, starts with something like this:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#other libraries
(rest of the code)
The first line, obviously, points to the commands to be used to run the script, just like normal shells do. The following two "use" lines make then language more strict, which should decrease the amount of bugs you encounter because you don't know the language well (or plain did something wrong). The third use line imports the "Dumper" function of the "Data" module. It's useful for debugging purposes. If you want to know the value of an array or hash table, just print Dumper(whatever).
Note also that comments are just like shell's, lines starting with "#".
Now, you call external programs and pipe to or pipe from them. For example:
open THIS, "cat $ARGV[0] |";
That will run cat, passing "$ARGV[0]", which would be $1 on shell -- the first argument passed to it. The result of that will be piped into your Perl script through "THIS", which you can use to read that from it, as I'll show later.
You can use "|" at the beginning or end of line, to indicate the mode "pipe to" or "pipe from", and specify a command to be run, and you can also use ">" or ">>" at the beginning, to open a file for writing with or without truncation, "<" to explicitly indicate opening a file for reading (the default), or "+<" and "+>" for read and write. Notice that the later will truncate the file first.
Another syntax for "open", which will avoid problems with files with such characters in their names, is having the opening mode as a second argument:
open THIS, "-|", "cat $ARGV[0]";
This will do the same thing. The mode "-|" stands for "pipe from" and "|-" stands for "pipe to". The rest of the modes can be used as they were (>, >>, <, +>, +<). While there is more than this to open, it should suffice for most things.
But you should avoid calling external programs as much as possible. You could open the file directly, by doing open THIS, "$ARGV[0]";, for example, and have much better performance.
So, what external programs you could cut out? Well, almost everything. But let's stay with the basics: cat, grep, cut, head, tail, uniq, wc, sort.
CAT
Well, there isn't much to be said about this one. Just remember that, if possible, read the file only once and keep it in memory. If the file is huge you won't do that, of course, but there are almost always ways to avoid reading a file more than once.
Anyway, the basic syntax for cat would be:
my $filename = "whatever";
open FILE, "$filename" or die "Could not open $filename!\n";
while(<FILE>) {
print $_;
}
close FILE;
This opens a file, and prints all it's contents ("while(<FILE>)" will loop until EOF, assigning each line to "$_"), and close it again.
If I wanted to direct the output to another file, I could do this:
my $filename = "whatever";
my $anotherfile = "another";
open (FILE, "$filename") || die "Could not open $filename!\n";
open OUT, ">", "$anotherfile" or die "Could not open $anotherfile for writing!\n";
while(<FILE>) {
print OUT $_;
}
close FILE;
This will print the line to the file indicated by "OUT". You can use STDIN, STDOUT and STDERR in the appropriate places as well, without having to open them first. In fact, "print" defaults to STDOUT, and "die" defaults to "STDERR".
Notice also the "or die ..." and "|| die ...". The operators or and || means it will only execute the following command if the first returns false (which means empty string, null reference, 0, and the like). The die command stops the script with an error message.
The main difference between "or" and "||" is priority. If "or" was replaced by "||" in the examples above, it would not work as expected, because the line would be interpreted as:
open FILE, ("$filename" || die "Could not open $filename!\n");
Which is not at all what is expected. As "or" has a lower priority, it works. In the line where "||" is used, the parameters to open are passed between parenthesis, making it possible to use "||".
Alas, there is something which is pretty much what cat does:
while(<>) {
print $_;
}
That will print all files in the command line, or anything passed through STDIN.
GREP
So, how would our "grep" script work? I'll assume "grep -E", because that's easier in Perl than simple grep. Anyway:
my $pattern = $ARGV[0];
shift #ARGV;
while(<>) {
print $_ if /$pattern/o;
}
The "o" passed to $patttern instructs Perl to compile that pattern only once, thus gaining you speed. Not the style "something if cond". It means it will only execute "something" if the condition is true. Finally, "/$pattern/", alone, is the same as "$_ =~ m/$pattern/", which means compare $_ with the regex pattern indicated. If you want standard grep behavior, ie, just substring matching, you could write:
print $_ if $_ =~ "$pattern";
CUT
Usually, you do better using regex groups to get the exact string than cut. What you would do with "sed", for instance. Anyway, here are two ways of reproducing cut:
while(<>) {
my #array = split ",";
print $array[3], "\n";
}
That will get you the fourth column of every line, using "," as separator. Note #array and $array[3]. The # sigil means "array" should be treated as an, well, array. It will receive an array composed of each column in the currently processed line. Next, the $ sigil means array[3] is a scalar value. It will return the column you are asking for.
This is not a good implementation, though, as "split" will scan the whole string. I once reduced a process from 30 minutes to 2 seconds just by not using split -- the lines where rather large, though. Anyway, the following has a superior performance if the lines are expected to be big, and the columns you want are low:
while(<>) {
my ($column) = /^(?:[^,]*,){3}([^,]*),/;
print $column, "\n";
}
This leverages regular expressions to get the desired information, and only that.
If you want positional columns, you can use:
while(<>) {
print substr($_, 5, 10), "\n";
}
Which will print 10 characters starting from the sixth (again, 0 means the first character).
HEAD
This one is pretty simple:
my $printlines = abs(shift);
my $lines = 0;
my $current;
while(<>) {
if($ARGV ne $current) {
$lines = 0;
$current = $ARGV;
}
print "$_" if $lines < $printlines;
$lines++;
}
Things to note here. I use "ne" to compare strings. Now, $ARGV will always point to the current file, being read, so I keep track of them to restart my counting once I'm reading a new file. Also note the more traditional syntax for "if", right along with the post-fixed one.
I also use a simplified syntax to get the number of lines to be printed. When you use "shift" by itself it will assume "shift #ARGV". Also, note that shift, besides modifying #ARGV, will return the element that was shifted out of it.
As with a shell, there is no distinction between a number and a string -- you just use it. Even things like "2"+"2" will work. In fact, Perl is even more lenient, cheerfully treating anything non-number as a 0, so you might want to be careful there.
This script is very inefficient, though, as it reads ALL file, not only the required lines. Let's improve it, and see a couple of important keywords in the process:
my $printlines = abs(shift);
my #files;
if(scalar(#ARGV) == 0) {
#files = ("-");
} else {
#files = #ARGV;
}
for my $file (#files) {
next unless -f $file && -r $file;
open FILE, "<", $file or next;
my $lines = 0;
while(<FILE>) {
last if $lines == $printlines;
print "$_";
$lines++;
}
close FILE;
}
The keywords "next" and "last" are very useful. First, "next" will tell Perl to go back to the loop condition, getting the next element if applicable. Here we use it to skip a file unless it is truly a file (not a directory) and readable. It will also skip if we couldn't open the file even then.
Then "last" is used to immediately jump out of a loop. We use it to stop reading the file once we have reached the required number of lines. It's true we read one line too many, but having "last" in that position shows clearly that the lines after it won't be executed.
There is also "redo", which will go back to the beginning of the loop, but without reevaluating the condition nor getting the next element.
TAIL
I'll do a little trick here.
my $skiplines = abs(shift);
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
shift #lines if $#lines == $skiplines;
}
print #lines;
Ok, I'm combining "push", which appends a value to an array, with "shift", which takes something from the beginning of an array. If you want a stack, you can use push/pop or shift/unshift. Mix them, and you have a queue. I keep my queue with at most 10 elements with $#lines which will give me the index of the last element in the array. You could also get the number of elements in #lines with scalar(#lines).
UNIQ
Now, uniq only eliminates repeated consecutive lines, which should be easy with what you have seen so far. So I'll eliminate all of them:
my $current = "";
my %lines;
while(<>) {
if($ARGV ne $current) {
undef %lines;
$current = $ARGV;
}
print $_ unless defined($lines{$_});
$lines{$_} = "";
}
Now here I'm keeping the whole file in memory, inside %lines. The use of the % sigil indicates this is a hash table. I'm using the lines as keys, and storing nothing as value -- as I have no interest in the values. I check where the key exist with "defined($lines{$_})", which will test if the value associated with that key is defined or not; the keyword "unless" works just like "if", but with the opposite effect, so it only prints a line if the line is NOT defined.
Note, too, the syntax $lines{$_} = "" as a way to store something in a hash table. Note the use of {} for hash table, as opposed to [] for arrays.
WC
This will actually use a lot of stuff we have seen:
my $current;
my %lines;
my %words;
my %chars;
while(<>) {
$lines{"$ARGV"}++;
$chars{"$ARGV"} += length($_);
$words{"$ARGV"} += scalar(grep {$_ ne ""} split /\s/);
}
for my $file (keys %lines) {
print "$lines{$file} $words{$file} $chars{$file} $file\n";
}
Three new things. Two are the "+=" operator, which should be obvious, and the "for" expression. Basically, a "for" will assign each element of the array to the variable indicated. The "my" is there to declare the variable, though it's unneeded if declared previously. I could have an #array variable inside those parenthesis. The "keys %lines" expression will return as an array they keys (the filenames) which exist for the hash table "%lines". The rest should be obvious.
The third thing, which I actually added only revising the answer, is the "grep". The format here is:
grep { code } array
It will run "code" for each element of the array, passing the element as "$_". Then grep will return all elements for which the code evaluates to "true" (not 0, not "", etc). This avoids counting empty strings resulting from consecutive spaces.
Similar to "grep" there is "map", which I won't demonstrate here. Instead of filtering, it will return an array formed by the results of "code" for each element.
SORT
Finally, sort. This one is easy too:
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print sort #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
}
print sort #lines;
Here, "sort" will sort the array. Note that sort can receive a function to define the sorting criteria. For instance, if I wanted to sort numbers I could do this:
my #lines;
my $current = "";
while(<>) {
if($ARGV ne $current) {
print sort #lines;
undef #lines;
$current = $ARGV;
}
push #lines, $_;
}
print sort {$a <=> $b} #lines;
Here "$a" and "$b" receive the elements to be compared. "<=>" returns -1, 0 or 1 depending on whether the number is less than, equal to or greater than the other. For strings, "cmp" does the same thing.
HANDLING FILES, DIRECTORIES & OTHER STUFF
As for the rest, basic mathematical expressions should be easy to understand. You can test certain conditions about files this way:
for my $file (#ARGV) {
print "$file is a file\n" if -f "$file";
print "$file is a directory\n" if -d "$file";
print "I can read $file\n" if -r "$file";
print "I can write to $file\n" if -w "$file";
}
I'm not trying to be exaustive here, there are many other such tests. I can also do "glob" patterns, like shell's "*" and "?", like this:
for my $file (glob("*")) {
print $file;
print "*" if -x "$file" && ! -d "$file";
print "/" if -d "$file";
print "\t";
}
If you combined that with "chdir", you can emulate "find" as well:
sub list_dir($$) {
my ($dir, $prefix) = #_;
my $newprefix = $prefix;
if ($prefix eq "") {
$newprefix = $dir;
} else {
$newprefix .= "/$dir";
}
chdir $dir;
for my $file (glob("*")) {
print "$prefix/" if $prefix ne "";
print "$dir/$file\n";
list_dir($file, $newprefix) if -d "$file";
}
chdir "..";
}
list_dir(".", "");
Here we see, finally, a function. A function is declared with the syntax:
sub name (params) { code }
Strictly speakings, "(params)" is optional. The declared parameter I used, "($$)", means I'm receiving two scalar parameters. I could have "#" or "%" in there as well. The array "#_" has all the parameters passed. The line "my ($dir, $prefix) = #_" is just a simple way of assigning the first two elements of that array to the variables $dir and $prefix.
This function does not return anything (it's a procedure, really), but you can have functions which return values just by adding "return something;" to it, and have it return "something".
The rest of it should be pretty obvious.
MIXING EVERYTHING
Now I'll present a more involved example. I'll show some bad code to explain what's wrong with it, and then show better code.
For this first example, I have two files, the names.txt file, which names and phone numbers, the systems.txt, with systems and the name of the responsible for them. Here they are:
names.txt
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555
systems.txt
Sales, Jane Doe
Inventory, John Doe
Payment, That Guy
I want, then, to print the first file, with the system appended to the name of the person, if that person is responsible for that system. The first version might look like this:
#!/usr/bin/perl
use strict;
use warnings;
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
sub get_system($) {
my ($name) = #_;
my $system = "";
open FILE, "systems.txt";
while(<FILE>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close FILE;
return $system;
}
This code won't work, though. Perl will complain that the function was used too early for the prototype to be checked, but that's just a warning. It will give an error on line 8 (the first while loop), complaining about a readline on a closed filehandle. What happened here is that "FILE" is global, so the function get_system is changing it. Let's rewrite it, fixing both things:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
This won't give any error or warnings, nor will it work. It returns just the sysems, but not the names and phone numbers! What happened? Well, what happened is that we are making a reference to "$_" after calling get_system, but, by reading the file, get_system is overwriting the value of $_!
To avoid that, we'll make $_ local inside get_system. This will give it a local scope, and the original value will then be restored once returned from get_system:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
local $_;
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
print $_ . ", $system\n";
}
close FILE;
And that still doesn't work! It prints a newline between the name and the system. Well, Perl reads the line including any newline it might have. There is a neat command which will remove newlines from strings, "chomp", which we'll use to fix this problem. And since not every name has a system, we might, as well, avoid printing the comma when that happens:
#!/usr/bin/perl
use strict;
use warnings;
sub get_system($) {
my ($name) = #_;
my $system = "";
local $_;
open my $filehandle, "systems.txt";
while(<$filehandle>) {
next unless /$name/o;
($system) = /([^,]*)/;
}
close $filehandle;
return $system;
}
open FILE, "names.txt";
while(<FILE>) {
my ($name) = /^([^,]*),/;
my $system = get_system($name);
chomp;
print $_;
print ", $system" if $system ne "";
print "\n";
}
close FILE;
That works, but it also happens to be horribly inefficient. We read the whole systems file for every line in the names file. To avoid that, we'll read all data from systems once, and then use that to process names.
Now, sometimes a file is so big you can't read it into memory. When that happens, you should try to read into memory any other file needed to process it, so that you can do everything in a single pass for each file. Anyway, here is the first optimized version of it:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /([^,]*),(.*)/;
$systems{$name} = $system;
}
close SYSTEMS;
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^([^,]*),/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
Unfortunately, it doesn't work. No system ever appears! What has happened? Well, let's look into what "%systems" contains, by using Data::Dumper:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /([^,]*),(.*)/;
$systems{$name} = $system;
}
close SYSTEMS;
print Dumper(%systems);
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^([^,]*),/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
The output will be something like this:
$VAR1 = ' Jane Doe';
$VAR2 = 'Sales';
$VAR3 = ' That Guy';
$VAR4 = 'Payment';
$VAR5 = ' John Doe';
$VAR6 = 'Inventory';
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555
Those $VAR1/$VAR2/etc is how Dumper displays a hash table. The odd numbers are the keys, and the succeeding even numbers are the values. Now we can see that each name in %systems has a preceeding space! Silly regex mistake, let's fix it:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
my ($system, $name) = /^\s*([^,]*?)\s*,\s*(.*?)\s*$/;
$systems{$name} = $system;
}
close SYSTEMS;
open NAMES, "names.txt";
while(<NAMES>) {
my ($name) = /^\s*([^,]*?)\s*,/;
chomp;
print $_;
print ", $systems{$name}" if defined $systems{$name};
print "\n";
}
close NAMES;
So, here, we are aggressively removing any spaces from the beginning or end of name and system. There are other ways to form that regex, but that's beside the point. There is still one problem with this script, which you'll have seen if your "names.txt" and/or "systems.txt" files have an empty line at the end. The warnings look like this:
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
John Doe, (555) 1234-4321, Inventory
Jane Doe, (555) 5555-5555, Sales
The Boss, (666) 5555-5555
Use of uninitialized value in hash element at ./exemplo3e.pl line 19, <NAMES> line 4.
What happened here is that nothing went into the "$name" variable when the empty line was processed. There are many ways around that, but I choose the following:
#!/usr/bin/perl
use strict;
use warnings;
our %systems;
open SYSTEMS, "systems.txt" or die "Could not open systems.txt!";
while(<SYSTEMS>) {
my ($system, $name) = /^\s*([^,]+?)\s*,\s*(.+?)\s*$/;
$systems{$name} = $system if defined $name;
}
close SYSTEMS;
open NAMES, "names.txt" or die "Could not open names.txt!";
while(<NAMES>) {
my ($name) = /^\s*([^,]+?)\s*,/;
chomp;
print $_;
print ", $systems{$name}" if defined($name) && defined($systems{$name});
print "\n";
}
close NAMES;
The regular expressions now require at least one character for name and system, and we test to see if "$name" is defined before we use it.
CONCLUSION
Well, then, these are the basic tools to translate a shell script. You can do MUCH more with Perl, but that was not your question, and it wouldn't fit here anyway.
Just as a basic overview of some important topics,
A Perl script that might be attacked by hackers need to be run with the -T option, so that Perl will complain about any vulnerable input which has not been properly handled.
There are libraries, called modules, for database accesses, XML&cia handling, Telnet, HTTP & other protocols. In fact, there are miriads of modules which can be found at CPAN.
As mentioned by someone else, if you make use of AWK or SED, you can translate those into Perl with A2P and S2P.
Perl can be written in an Object Oriented way.
There are multiple versions of Perl. As of this writing, the stable one is 5.8.8 and there is a 5.10.0 available. There is also a Perl 6 in development, but experience has taught everyone not to wait too eagerly for it.
There is a free, good, hands-on, hard & fast book about Perl called Learning Perl The Hard Way. It's style is similar to this very answer. It might be a good place to go from here.
I hope this helped.
DISCLAIMER
I'm NOT trying to teach Perl, and you will need to have at least some reference material. There are guidelines to good Perl habits, such as using "use strict;" and "use warnings;" at the beginning of the script, to make it less lenient of badly written code, or using STDOUT and STDERR on the print lines, to indicate the correct output pipe.
This is stuff I agree with, but I decided it would detract from the basic goal of showing patterns for common shell script utilities.
I don't know what's in your shell script, but don't forget there are tools like
a2p - awk-to-perl
s2p - sed-to-perl
and perhaps more. Worth taking a look around.
You may find that due to Perl's power/features, it's not such a big job, in that you may have been jumping through hoops with various bash features and utility programs to do something that comes out of Perl natively.
Like any migration project, it's useful to have some canned regression tests to run with both solutions, so if you don't have those, I'd generate those first.
I'm surprised no-one has yet mentioned the Shell module that is included with core Perl, which lets you execute external commands using function-call syntax. For example (adapted from the synopsis):
use Shell qw(cat ps cp);
$passwd = cat '</etc/passwd';
#pslines = ps '-ww';
cp "/etc/passwd", "/tmp/passwd";
Provided you use parens, you can even call other programs in the $PATH that you didn't mention on the use line, e.g.:
gcc('-o', 'foo', 'foo.c');
Note that Shell gathers up the subprocess's STDOUT and returns it as a string or array. This simplifies scripting, but it is not the most efficient way to go and may cause trouble if you rely on a command's output being unbuffered.
The module docs mention some shortcomings, such as that shell internal commands (e.g. cd) cannot be called using the same syntax. In fact they recommend that the module not be used for production systems! But it could certainly be a helpful crutch to lean on until you get your code ported across to "proper" Perl.
The inline shell thingy is called system. If you have user-defined functions you're trying to expose to Perl, you're out of luck. However, you can run short bits of shell using the same environment as your running Perl program. You can also gradually replace parts of the shell script with Perl. Start writing a module that replicates the shell script functionality and insert Perly bits into the shell script until you eventually have mostly Perl.
There's no shell-to-Perl translator. There was a long running joke about a csh-to-Perl translator that you could email your script to, but that was really just Tom Christainsen translating it for you to show you how cool Perl was back in the early 90s. Randal Schwartz uploaded a sh-to-Perl translator, but you have to check the upload date: it was April Fool's day. His script merely wrapped everything in system.
Whatever you do, don't lose the original shell script. :)
I agree that learning Perl and trying to write Perl instead of shell is for the greater good. I did the transfer once with the help of the "Replace" function of Notepad++.
However, I had a similar problem to the one initially asked while I was trying to create a Perl wrapper around a shell script (that could execute it).
I came with the following code that works in my case.
It might help.
#!perl
use strict;
use Data::Dumper;
use Cwd;
#Variables read from shell
our %VAR;
open SH, "<$ARGV[0]" or die "Error while trying to read $ARGV[0] ($!)\n";
my #SH=<SH>;
close SH;
sh2perl(#SH);
#Subroutine to execute shell from Perl (read from array)
sub sh2perl {
#Variables
my %case; #To store data from conditional block of "case"
my %if; #To store data from conditional block of "if"
foreach my $line (#_) {
#Remove blanks at the beginning and EOL character
$line=~s/^\s*//;
chomp $line;
#Comments and blank lines
if ($line=~/^(#.*|\s*)$/) {
#Do nothing
}
#Conditional block - Case
elsif ($line=~/case.*in/..$line=~/esac/) {
if ($line=~/case\s*(.*?)\s*\in/) {
$case{'var'}=transform($1);
} elsif ($line=~/esac/) {
delete $case{'curr_pattern'};
#Run conditional block
my $case;
map { $case=$_ if $case{'var'}=~/$_/ } #{$case{'list_patterns'}};
$case ? sh2perl(#{$case{'patterns'}->{$case}}) : sh2perl(#{$case{'patterns'}->{"*"}});
} elsif ($line=~/^\s*(.*?)\s*\)/) {
$case{'curr_pattern'}=$1;
push(#{$case{'list_patterns'}}, $case{'curr_pattern'}) unless ($line=~m%\*\)%)
} else {
push(#{$case{'patterns'}->{ $case{'curr_pattern'} }}, $line);
}
}
#Conditional block - if
elsif ($line=~/^if/..$line=~/^fi/) {
if ($line=~/if\s*\[\s*(.*\S)\s*\];/) {
$if{'condition'}=transform($1);
$if{'curr_cond'}="TRUE";
} elsif ($line=~/fi/) {
delete $if{'curr_cond'};
#Run conditional block
$if{'condition'} ? sh2perl(#{$if{'TRUE'}}) : sh2perl(#{$if{'FALSE'}});
} elsif ($line=~/^else/) {
$if{'curr_cond'}="FALSE";
} else {
push(#{$if{ $if{'curr_cond'} }}, $line);
}
}
#echo
elsif($line=~/^echo\s+"?(.*?[^"])"?\s*$/) {
my $str=$1;
#echo with redirection
if ($str=~m%[>\|]%) {
eval { system(transform($line)) };
if ($#) { warn "Error while evaluating $line: $#\n"; }
#print new line
} elsif ($line=~/^echo ""$/) {
print "\n";
#default
} else {
print transform($str),"\n";
}
}
#cd
elsif($line=~/^\s*cd\s+(.*)/) {
chdir $1;
}
#export
elsif($line=~/^export\s+((\w+).*)/) {
my ($var,$exported)=($2,$1);
if ($exported=~/^(\w+)\s*=\s*(.*)/) {
while($exported=~/(\w+)\s*=\s*"?(.*?\S)"?\s*(;(?:\s*export\s+)?|$)/g) { $VAR{$1}=transform($2); }
}
# export($var,$VAR{$var});
$ENV{$var}=$VAR{$var};
print "Exported variable $var = $VAR{$var}\n";
}
#Variable assignment
elsif ($line=~/^(\w+)\s*=\s*(.*)$/) {
$1 eq "" or $VAR{$1}=""; #Empty variable
while($line=~/(\w+)\s*=\s*"?(.*?\S)"?\s*(;|$)/g) {
$VAR{$1}=transform($2);
}
}
#Source
elsif ($line=~/^source\s*(.*\.sh)/) {
open SOURCE, "<$1" or die "Error while trying to open $1 ($!)\n";
my #SOURCE=<SOURCE>;
close SOURCE;
sh2perl(#SOURCE);
}
#Default (assuming running command)
else {
eval { map { system(transform($_)) } split(";",$line); };
if ($#) { warn "Error while doing system on \"$line\": $#\n"; }
}
}
}
sub transform {
my $src=$_[0];
#Variables $1 and similar
$src=~s/\$(\d+)/$ARGV[$1-1]/ge;
#Commands stored in variables "$(<cmd>)"
eval {
while ($src=~m%\$\((.*)\)%g) {
my ($cmd,$new_cmd)=($1,$1);
my $curr_dir=getcwd;
$new_cmd=~s/pwd/echo $curr_dir/g;
$src=~s%\$\($cmd\)%`$new_cmd`%e;
chomp $src;
}
};
if ($#) { warn "Wrong assessment for variable $_[0]:\n=> $#\n"; return "ERROR"; }
#Other variables
$src=~s/\$(\w+)/$VAR{$1}/g;
#Backsticks
$src=~s/`(.*)`/`$1`/e;
#Conditions
$src=~s/"(.*?)"\s*==\s*"(.*?)"/"$1" eq "$2" ? 1 : 0/e;
$src=~s/"(.*?)"\s*!=\s*"(.*?)"/"$1" ne "$2" ? 1 : 0/e;
$src=~s/(\S+)\s*==\s*(\S+)/$1 == $2 ? 1 : 0/e;
$src=~s/(\S+)\s*!=\s*(\S+)/$1 != $2 ? 1 : 0/e;
#Return Result
return $src;
}
You could start your "Perl" script with:
#!/bin/bash
Then, assuming bash was installed at that location, perl would automatically invoke the bash interpretor to run it.
Edit: Or maybe the OS would intercept the call and stop it getting to Perl. I'm finding it hard to track down the documentation on how this actually works. Comments to documentation would be welcomed.

What should I use instead of printf in Perl?

I need to use some string replacement in Perl to ease translations, i.e. replace many
print "Outputting " . $n . " numbers";
by something like
printf ("Outputting %d numbers", $n);
However, I'd like to replace printf with something easier to parse for humans, like this:
printX ("Outputting {num} numbers", { num => $n });
or generally something more Perly.
Can you recommend something (from CPAN or not) you like and use?
What about simply:
print "Outputting $n numbers";
That's very Perly. If you don't need any kind of fancy formatting, string interpolation is definitely the way to go.
Most Templating modules on CPAN will probably do what you want. Here's an example using Template Toolkit...
use Template;
my $tt = Template->new;
$tt->process( \"Outputting [% num %] numbers\n", { num => 100 } );
And you can mimic your required example with something like this...
sub printX {
use Template;
my $tt = Template->new( START_TAG => '{', END_TAG => '}' );
$tt->process( \( $_[0] . "\n" ), $_[1] );
}
and you've got...
printX 'Outputting {num} numbers' => { num => 100 };
The print builtin is very convenient for most situations. Besides variable interpolation:
print "Outputting $n numbers"; # These two lines
print "Outputting ${n} numbers"; # are equivalent
Remember that print can take multiple arguments, so there is no need to concatenate them first into a single string if you need to print the result of a subroutine call:
print "Output data: ", Dumper($data);
However, for outputting numbers other than simple integers, you'll probably want the formatting convenience of printf. Outputting other data types is easy with print, though.
You can use join to conveniently output arrays:
print join ', ', #array;
And combine with map and keys to output hashes:
print join ', ', map {"$_ : $hash{$_}"} keys %hash;
Use the qq operator if you want to output quotes around the data:
print join ', ', map {qq("$_" : "$hash{$_}"}) keys %hash;
If you're looking to ease translations you should consider using one of the L10n/i18n CPAN modules that are available. Specifically, a good overview of why your approach will end up falling short is written up as part of the Local::Maketext docs.
Another great module that pairs nicely with Locale::Maketext is Locale::Maketext::Lexicon. This allows you to use more standard localization formats such as gettext's .po/.mo files which have GUI tools to help translators work through all the text that needs translating. Locale::Maketext::Lexicon also comes with a helper script (xgettext.pl) that helps keep your localization files up-to-date with your templates or modules that have text that need translating. I've had very good results with this kind of setup in the past.
It seems you want to have a different way of parsing strings. I would advise you not to do this. The only one who is seeing the syntax with the %d in it is the developer and he will exactly understand what is meant. The printf syntax is powerful because of the options like padding, decimals etc.
I think you want to use more a replace method. It is perlish to do s/{num}/$n/.
In light of your comment about being for translators I suggest writing a perl script that strips all printf() and tabulates them in an easier more translator friendly manner.
Something like this:
while(<>)
{
#regex for striping printf
#print in tabulated form
}
If you print out the line number too you can easily write another program to replace the translated text.
This solution wouldn't take you any longer than re-factoring away from printf() and it's reusable.
I would definitely stick with printf(), it's standard across many languages.
It has almost become a standard for string output. Like i is for for loops.
Generally answer from Draegtun is great, but if you'd need something smaller (i.e. less memory), and not as powerful you can easily do it using this function:
sub printX {
my ( $format, $vars ) = #_;
my #sorted_keys = sort { length($b) <=> length($a) } keys %{ $vars };
my $re = join '|', map { "\Q$_\E" } #sorted_keys;
$format =~ s/ \{ \s* ($re) \s* \} /$vars->{$1}/xg;
print $format;
}
well, perl has printf function...
wait, do you want something like python's string formatting with dict?
>>> print '%(key)s' % {'key': 'value'}
value
mmm, I don't know something like that exist in perl...
at least not this "easy"...
maybe Text::Sprintf::Named can be your friend