Why is this map function so complicated? - perl

When I run the below script I get
$VAR1 = 'ssh -o Compression=yes -o ConnectTimeout=333 remoteIp \'mbuffer -q -s 128k -m mbufferSize -4 -I mbufferPort|zfs recv recvOpt dstDataSet\'';
which leads me to think, that all $shellQuote does is converting an array to a string and adding a ' in the beginning and end. Plus adding a | between two arrays. But the purpose of the map function can't I figure out.
The script is a super simplified version of this in order to figure out what exactly $shellQuote does.
Question
$shellQuote looks very complicated. Does it do anything else I am missing?
#!/usr/bin/perl
use Data::Dumper;
use warnings;
use strict;
my $shellQuote = sub {
my #return;
for my $group (#_){
my #args = #$group;
for (#args){
s/'/'"'"'/g;
}
push #return, join ' ', map {/^[-\/#=_0-9a-z]+$/i ? $_ : qq{'$_'}} #args;
}
return join '|', #return;
};
sub buildRemoteRefArray {
my $remote = shift;
my #sshCmdArray = (qw(ssh -o Compression=yes -o), 'ConnectTimeout=' . '333');
if ($remote){
return [#sshCmdArray, $remote, $shellQuote->(#_)];
}
return #_;
};
my #recvCmd = buildRemoteRefArray('remoteIp', ['mbuffer', (qw(-q -s 128k -m)), 'mbufferSize', '-4', '-I', 'mbufferPort'], ['zfs', 'recv', 'recvOpt', 'dstDataSet']);
my $cmd = $shellQuote->(#recvCmd);
print Dumper $cmd;

The map function, by which I assume you mean
map {/^[-\/#=_0-9a-z]+$/i ? $_ : qq{'$_'}} #args
checks each argument to see if it is a legal shell token or not. Legal shell tokens are passed through; anything with a suspicious character gets enclosed on '' quotes.
Bear in mind that your example has two calls to $shellQuote, not just one; you're printing:
print Dumper($shellQuote->(
[
qw(ssh -o Compression=yes -o),
'ConnectTimeout=' . '333',
'remoteIp',
$shellQuote->(
[
'mbuffer',
(qw(-q -s 128k -m)),
'mbufferSize',
'-4',
'-I',
'mbufferPort',
],
[
'zfs',
'recv',
'recvOpt',
'dstDataSet',
],
),
]
));
Where I've indented the arguments to each shell command one step further than the command for clarity of the structure of the list. So your '' quotes are coming from the outer $shellQuote, which is recognizing that the inner $shellQuote has put spaces into its result; the | is comming from the inner $shellQuote, which is using them to combine the the two array refs passed to it.
Breaking the map function down, map { expr } #args means 'evaluation expr for each element of #args and make a list of the results.
/^[-\/#=_0-9a-z]+$/i ? $_ : qq{'$_'} is a ternary expression (Googleable term). $_ is the current element of #args, and /re/i is true if and only if $_ matches the given regular expression (Googleable term) (case insensitive). The whole expression means 'if the current element of #args contains only the listed characters (ASCII letters, ASCII digits, and the characters -, /, #, and =), return it as-is; otherwise return it wrapped in single quotes'.
The for loop, before that, replaces each ' in each element of #args with '"'"', which is a particular way of embedding a single quote into a single-quoted string in sh.

Ignore your code for a second and look at this one as it's a bit clearer.
# shell_quote_arg("foo bar") => 'foo bar'
sub shell_quote_arg {
my ($s) = #_;
return $s if $s !~ /[^-\/#=_0-9a-z]/i;
$s =~ s/'/'"'"'/g; # '
return qq{'$s'}
}
# shell_quote("echo", "foo bar") => echo 'foo bar'
sub shell_quote {
return join ' ', map { shell_quote_arg($_) } #_;
}
my $remote_shell_cmd1 = shell_quote('mbuffer', 'arg1a', 'arg1b');
my $remote_shell_cmd2 = shell_quote('zfs', 'arg2a', 'arg2b');
my $remote_shell_cmd = join(' | ', $remote_shell_cmd1, $remote_shell_cmd2);
my $local_shell_cmd = shell_quote('ssh', $host, $remote_shell_cmd);
My shell_quote is used to build a shell command from a program name and argument. For example,
shell_quote('zfs', 'recv', 'recvOpt', 'dstDataSet')
returns
zfs recv recvOpt dstDataSet
So why not just use join(' ', 'zfs', 'recv', 'recvOpt', 'dstDataSet')? Because characters such as spaces, $ and ' have special meaning to the shell. shell_quote needs to do extra work if these are present. For example,
shell_quote('echo', q{He's got $100})
returns
echo 'He'"'"'s got $100' # When run, uses echo to output: He's got $100
The shellQuote you showed does the same thing as my shell_quote, but it also does the join('|', ...) you see in my code.
By the way, notice that shellQuote is called twice. The first time, it's used to build the command to execute on the remote machine, as the following does:
my $remote_shell_cmd1 = shell_quote('mbuffer', 'arg1a', 'arg1b');
my $remote_shell_cmd2 = shell_quote('zfs', 'arg2a', 'arg2b');
my $remote_shell_cmd = join(' | ', $remote_shell_cmd1, $remote_shell_cmd2);
The second time, it's used to build the command to execute on the local machine, as the following does:
my $local_shell_cmd = shell_quote('ssh', $host, $remote_shell_cmd);

Related

How to get array of hash arguments using Getopt::Long lib in perl?

I want to take arguments as an array of hashes by using Getopt::Long in my script.
Consider the following command line example:
perl testing.pl --systems id=sys_1 ip_address=127.0.0.1 id=sys_2 ip_address=127.0.0.2
For the sake of simplicity, I'm using two systems and only two sub arguments of each system, i.e., id and ip_address. Ideally, the number of systems is dynamic; it may contain 1, 2 or more and so with the number of arguments of each system.
My script should handle these arguments in such a way that it will store in #systems array and each element will be a hash containing id and ip_address.
Is there any way in Getopt::Long to achieve this without parsing it myself?
Following is pseudocode for what I'm trying to achieve:
testing.pl
use Getopt::Long;
my #systems;
GetOptions('systems=s' => \#systems);
foreach (#systems) {
print $_->{id},' ', $_->{ip_address};
}
Here is an attempt, there might be more elegant solutions:
GetOptions('systems=s{1,}' => \my #temp );
my #systems;
while (#temp) {
my $value1 = shift #temp;
$value1 =~ s/^(\w+)=//; my $key1 = $1;
my $value2 = shift #temp;
$value2 =~ s/^(\w+)=//; my $key2 = $1;
push #systems, { $key1 => $value1, $key2 => $value2 };
}
for (#systems) {
print $_->{id},' ', $_->{ip_address}, "\n";
}
Output:
sys_1 127.0.0.1
sys_2 127.0.0.2
I actually think this is a design problem, more than a problem with GetOpt - the notion of supporting multiple, paired arguments passed as command line arguments I think is something that you'd be far better off avoiding.
There's a reason that GetOpt doesn't really support it - it's not a scalable solution really.
How about instead just reading the values from STDIN?:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %systems = do { local $/; <DATA> } =~ m/id=(\w+) ip_address=([\d\.]+)/mg;
print Dumper \%systems;
And then you'd be able to invoke your script as:
perl testing.pl <filename_with_args>
Or similar.
And if you really must:
my %systems = "#ARGV" =~ m/id=(\w+) ip_address=([\d\.]+)/g;
Both of the above work for multiple parameters.
However, your comment on another post:
I can't because I'm fetching parameters from database and converting them into command line and then passing it to the script using system command $cmd_lines_args = '--system --id sys_1 --ip_address 127.0.0.0.1'; system("perl testing.pl $cmd_lines_args"); $cmd_lines_args I'll generate dynamically using for loop by reading from database
.. that makes this an XY Problem.
Don't do it like that:
open ( my $script, '|-', "perl testing.pl" );
print {$script} "id=some_id ip_address=10.9.8.7\n";
print {$script} "id=sys2 ip_address=10.9.8.7\n";
etc.
What you are describing,
--systems id=sys_1 ip_address=127.0.0.1 id=sys_2 ip_address=127.0.0.2
appears to be one option that takes a variable number of arguments that are pairs, and come in multiples of two. Getopt::Long's "Options with multiple values" lets you do the following:
GetOptions('systems=s{2,4}' => \#systems);
This lets you specify 2, 3 or 4 arguments, but it does not have syntax for "any even number of arguments" (to cover an arbitrary number of pairs beyond two), and you still have to unpack the id=sys_1 manually then. You can write a user-defined subroutine that handles the processing of --systems' arguments (but does not take into account missing id=...s):
my $system;
my %systems;
GetOptions('systems=s{,}' => sub {
my $option = shift;
my $pair = shift;
my ($key, $value) = split /=/, $pair;
$system = $value if $key eq 'id';
$systems{$system} = $value if $key eq 'ip_address';
});
I would however prefer one of the following schemes:
--system sys_1 127.0.0.1 --system sys_2 127.0.0.1
--system sys_1=127.0.0.1 --system sys_2=127.0.0.1
They're achieved with the following:
GetOptions('system=s{2}', \#systems);
GetOptions('system=s%', \#systems);
I would just parse the --systems arg and quote the "hashes" like this:
perl testing.pl --systems "id=s1 ip_address=127.0.0.1 id=s2 ip_address=127.0.0.2"
Parse like perhaps so:
my($systems,#systems);
GetOptions('systems=s' => \$systems);
for(split/\s+/,$systems){
my($key,$val)=split/=/,$_,2;
push #systems, {} if !#systems or exists$systems[-1]{$key};
$systems[-1]{$key}=$val;
}
print "$_->{id} $_->{ip_address}\n" for #systems;
Prints:
sys_1 127.0.0.1
sys_2 127.0.0.2

perl parse command line arguments using shift command

I have a question regarding parsing command line arguments and the use of the shift command in Perl.
I wanted to use this line to launch my Perl script
/home/scripts/test.pl -a --test1 -b /path/to/file/file.txt
So I want to parse the command line arguments. This is part of my script where I do that
if ($arg eq "-a") {
$main::john = shift(#arguments);
} elsif ($arg eq "-b") {
$main::doe = shift(#arguments);
}
I want to use then these arguments in a $command variable that will be executed afterwards
my $var1=$john;
my $var2=$doe;
my $command = "/path/to/tool/tool --in $line --out $outputdir $var1 $var2";
&execute($command);
Now here are two problems that I encounter:
It should not be obligatory to specify -a & -b at the command line. But what happens now is that when I don't specify -a, I get the message that I'm using an uninitialized value at the line where the variable is defined
Second problem: $var2 will now equal $doe so it will be in this case /path/to/file/file.txt. However I want $var2 to be equal to --text /path/to/file/file.txt. Where should I specify this --text. It cannot be standardly in the $command, because then it will give a problem when I don't specify -b. Should I do it when I define $doe, but how then?
You should build your command string according to the contents of the variables
Like this
my $var1 = $john;
my $var2 = $doe;
my $command = "/path/to/tool/tool --in $line --out $outputdir";
$command .= " $var1" if defined $var1;
$command .= " --text $var2" if defined $var2;
execute($command);
Also
Don't use ampersands & when you are calling Perl subroutine. That hasn't been good practice for eighteen years now
Don't use package variables like $main:xxx. Lexical variables (declared with my) are almost all that is necessary
As Alnitak says in the comment you should really be using the Getopt::Long module to avoid introducing errors into your command-line parsing
GetOpt::Long might be an option: http://search.cpan.org/~jv/Getopt-Long-2.48/lib/Getopt/Long.pm
Regarding your sample:
You didn't say what should happen if -a or -b are missing, but defaults may solve your problem:
# Use 'join' as default if $var1 is not set
my $var1 = $john // 'john';
# Use an empty value as default if $var2 is not set
my $var2 = $doe // '';
Regarding the --text prefix:
Do you want to set it always?
my $command = "/path/to/tool/tool --in $line --out $outputdir $var1 --text $var2";
Or do you want to set it if -b = $var2 has been set?
# Prefix
my $var2 = "--text $john";
# Prefix with default
my $var2 = defined $john ? "--text $john" : '';
# Same, but long format
my $var2 = ''; # Set default
if ($john) {
$var2 = "--text $john";
}

Perl confusion over grep {//} and eval {grep //} syntax

Kindly shed some light on these two ways of grep'ping in Perl as how they differ from each other
eval {grep /pattern/, ....};
and the normal one,
grep {/pattern/} ....
First of all, there are 2 independent differences between your alternatives, and they have different purposes. Wrapping the grep in eval allows you to catch errors that are normally fatal (like a syntax error in the regular expression). Putting a block after the grep keyword lets you use a matching rule that is more complex than a single expression.
Here are the 4 combinations that can be made out of your 2 examples:
#y = grep /pattern/, #x; # grep EXPR, no eval
#y = grep { /pattern/ } #x; # grep BLOCK, no eval
eval { #y = grep /pattern/, #x }; # grep EXPR inside eval BLOCK
eval { #y = grep { /pattern/ } #x }; # grep BLOCK inside eval BLOCK
Now we can look in more detail at 2 separate questions: what do you gain from the eval, and what do you gain from using the grep BLOCK syntax? In the simple cases shown above, you gain nothing from either one.
When you want to do a grep where the matching condition is more complicated than a simple regexp, grep BLOCK gives you more flexibility in how you express the condition. You can put multiple statements in the block and use temporary variables. For example this grep within a grep:
# Note: not the most efficient method for finding an intersection of arrays.
my #a = qw/A E I O U/;
my #b = qw/A B D O P Q R/;
my #intersection = grep { my $x = $_; grep { $_ eq $x } #b } #a;
print "#intersection\n";
In the above example, we needed a temporary $x to hold the value being tested by the outer grep so it could be compared to $_ in the inner grep. The inner grep could have been written without a BLOCK as grep $_ eq $x, #b but I think having using the same syntax for both looks better.
The eval block would be useful if you were looking for matches of a regexp that is determined at runtime, and you don't want your program to abort when the regexp is invalid. For example:
#x = qw/foo bar baz quux xyzzy/;
do {
print STDERR 'Enter pattern: ';
$pat = <STDIN>;
chomp $pat;
eval {
#y = grep /$pat/, #x;
};
} while($#);
print "result: #y\n";
We ask the user for a pattern and print the list of matches from #x. If the pattern is not a valid regexp, the eval catches the error and puts it into $#, and the program keeps running (The "Invalid" message is printed and the loop continues so the user can try again.) When a valid regexp is entered, there is no error so $# is false the "result" line is printed. Sample run:
Enter pattern: z$
result: baz
Enter pattern: ^(?!....)
result: foo bar baz
Enter pattern: ([^z])\1
result: foo quux
Enter pattern: [xyz
Invalid pattern
Enter pattern: [xyz]
result: baz quux xyzzy
Enter pattern: ^C
Note that eval doesn't catch syntax errors in a fixed regexp. Those are compiled when the script is compiled, so if you have a simple script like
perl -ne 'print if eval { /[xyz/ } or eval { /^ba/ }'
it fails immediately. The evals don't help. Compare to
perl -ne '$x = "[xyz"; $y = "^ba"; print if eval { /$x/ } or eval { /$y/ }'
which is the same thing but with regexps built from variables - this one runs and prints matches for /^ba/. The first eval always returns false (and sets $# which doesn't matter if you don't look at it).

Perl: How to use command line special characters (newline, tab) in $LIST_SEPARATOR ($")

I would like to use the value of a variable (fixed by a command line option for instance) as a list separator, enabling that value to be a special character (newline, tabulation, etc.).
Unfortunately the naïve approach does not work due to the fact that the two following print statement behave differentely :
my #tab = ("a","b","c");
# block 1 gives expected result:
# a
# b
# c
{
local $" = "\n"; #" let us please the color syntax engine
print "#tab";
}
# block 2 gives unwanted result:
# a\nb\nc
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = "$s"; #" let us please the color syntax engine
print "#tab";
}
Any idea I can correct the block 2 so that I get the wanted result (the one produced by block 1) ?
It actually does work the same if you assign the same string. Perl's
"\n"
creates a one character string consisting of a newline. With my shell (bash), you'd use
'
'
to do the same.
$ perl a.pl --separator='
'
a
b
ca
b
c
You didn't do this. You passed a string consisting of the two characters \ and n to Perl instead.
If you your program to convert two chars \n into a newline, you'll need to tell it to do so.
my #tab = qw( a b c );
sub handle_escapes {
my ($s) = #_;
$s =~ s/\\([\\a-z])/
$1 eq '\\' ? '\\' :
$1 eq 'n' ? "\n" :
do { warn("Unrecognised escape \\$1"); "\\$1" }
/seg;
return $s;
}
{
my $s = '\n'; #" let us please the color syntax engine
local $" = handle_escapes($s);
print "#tab";
}
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = handle_escapes($s); #" let us please the color syntax engine
print "#tab";
}
$ perl a.pl --separator='\n'
a
b
ca
b
c

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).