How can I pass command-line arguments to a Perl program? - perl

I'm working on a Perl script. How can I pass command line parameters to it?
Example:
script.pl "string1" "string2"

Depends on what you want to do. If you want to use the two arguments as input files, you can just pass them in and then use <> to read their contents.
If they have a different meaning, you can use GetOpt::Std and GetOpt::Long to process them easily. GetOpt::Std supports only single-character switches and GetOpt::Long is much more flexible. From GetOpt::Long:
use Getopt::Long;
my $data = "file.dat";
my $length = 24;
my $verbose;
$result = GetOptions ("length=i" => \$length, # numeric
"file=s" => \$data, # string
"verbose" => \$verbose); # flag
Alternatively, #ARGV is a special variable that contains all the command line arguments. $ARGV[0] is the first (ie. "string1" in your case) and $ARGV[1] is the second argument. You don't need a special module to access #ARGV.

You pass them in just like you're thinking, and in your script, you get them from the array #ARGV. Like so:
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
foreach my $argnum (0 .. $#ARGV) {
print "$ARGV[$argnum]\n";
}
From here.

foreach my $arg (#ARGV) {
print $arg, "\n";
}
will print each argument.

Yet another options is to use perl -s, eg:
#!/usr/bin/perl -s
print "value of -x: $x\n";
print "value of -name: $name\n";
Then call it like this :
% ./myprog -x -name=Jeff
value of -x: 1
value of -name: Jeff
Or see the original article for more details:

Alternatively, a sexier perlish way.....
my ($src, $dest) = #ARGV;
"Assumes" two values are passed. Extra code can verify the assumption is safe.

You can access them directly, by assigning the special variable #ARGV to a list of variables.
So, for example:
( $st, $prod, $ar, $file, $chart, $e, $max, $flag ,$id) = #ARGV;
perl tmp.pl 1 2 3 4 5

If the arguments are filenames to be read from, use the diamond (<>) operator to get at their contents:
while (my $line = <>) {
process_line($line);
}
If the arguments are options/switches, use GetOpt::Std or GetOpt::Long, as already shown by slavy13.myopenid.com.
On the off chance that they're something else, you can access them either by walking through #ARGV explicitly or with the shift command:
while (my $arg = shift) {
print "Found argument $arg\n";
}
(Note that doing this with shift will only work if you are outside of all subs. Within a sub, it will retrieve the list of arguments passed to the sub rather than those passed to the program.)

my $output_file;
if((scalar (#ARGV) == 2) && ($ARGV[0] eq "-i"))
{
$output_file= chomp($ARGV[1]) ;
}

If you just want some values, you can just use the #ARGV array. But if you are looking for something more powerful in order to do some command line options processing, you should use Getopt::Long.

Related

Expression in backticks with Perl `'cmd'.join ...`

I would like to send the remaining #ARGV to foo. I currently do this:
my $cmd = 'foo '.join ' ', #ARGV;
my $out = `$cmd`;
Is there possible to do it in one line? For instance with a non-existent e option:
my $out = qx/'foo'.join ' ', #ARGV/e;
In a more general case I might want to do this:
my $out = qx/'foo?.join(' ', keys %hash)/e;
The builtin readpipe function is what is at the back end of backticks/qx() calls, so you can use that directly:
my $out = readpipe('foo' . join ' ', #ARGV);
You don't need to assemble the command prior to running it. The qx() operator (aliased by the backticks) interpolates.
perl -e 'print `echo #ARGV`' foo bar
or in your script:
my $out = `foo #ARGV`
What "optional" says about qx and interpolation is right: Beware that double interpolation might bite you and it's prone to security issues!
Regarding your update: Try
perl -e '%h = (foo=>1,bar=>2); print `echo #{[keys %h]}`'
That constructs an anonymous arrayref and immediately dereferrences it. Hashes don't interpolate but this array context allows arbitrary Perl code producing a list. Also I'm pretty sure the compiler recognized this idiom and removes the arrayref (de)dereferrence during optimization.
But that is really ugly, nearly unreadable from my point of view. I'd rather recommend:
my #keys = keys %hash;
my $cmd = "foo #keys";
my $out = `$cmd`;
Hint: storing the command in a dedicated variable makes logging executes commands easier what is really desirable.
Sure
my $out = capture_this_command( 'foo', #ARGV );
sub capture_this_command {
use Capture::Tiny qw/ capture /;
## local %ENV;
## delete #ENV{'PATH', 'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
## $ENV{'PATH'} = '/bin:/usr/bin';
my #cmd = #_;
my( $stdout, $stderr, $exit ) = capture {
system { $cmd[0] } #cmd;
};;
if( $exit ){
die "got the exit( $exit ) and stderr: $stderr\n ";
} elsif( $stderr ){
warn "got stderr: $stderr\n ";
}
return $stdout;
}
update:
qx// is double quotes, it interpolates, so everything perlintro/perlsyn/perlquote say about that goes, but also, remember, qx// calls your shell (to see which one you have perl -V:sh) and shells have their own interpolation
So you could write my $out = qx/foo #ARGV/; but its subject to interpolation, first by perl, then by whatever shell you're invoking

How to add -v switch without needing a switch for the filename using GetOpt?

I want to be able to run my script in one of two ways:
perl script.pl file
perl script.pl -v file
I know how to do
perl script.pl -v -f file
But I want to do it without needing the -f for file.
After using GetOpt, the remaining items in #ARGV are your positional parameters. You just need to use $ARGV[0] for "file".
use Getopt::Long;
my $verbose = 0;
my %opts = ( 'verbose' => \$verbose );
GetOptions(\%opts, 'verbose|v') or die "Incorrect options";
my $file = $ARGV[0];
die "You must provide a filename" unless length $file;
You can use Getopt::Long's argument callback:
use Getopt::Long;
my $file;
GetOptions(
'v' => \my $v,
'<>' => sub { $file = shift },
);
print "\$v: $v\n";
print "\$file: $file\n";
The command perl script.pl -v foo.txt outputs:
$v: 1
$file: foo.txt
Getopt::Long parses (by default) the items in #ARGV and removes these items one-by-one as it processes #ARGV. After Getoptions finishes, the first item remaining in #ARGV will be the file name:
use warnings;
use strict;
use Getopt::Long;
my $verbose;
GetOptions (
'v' => "\$verbose",
) or die qq(Invalid arguments passed);
my $file = shift; #Assuming a single file. Could be multiple
if ( $verbose ) {
print "Do something verbosely\n";
}
else {
print "Do it the normal way...\n";
}
Nothing special is needed. You allow GetOptions to handle the -v parameter if it exists, and you allow #ARGV to contain all of the parameters that are left after GetOptions finishes executing.
By the way, you could do this:
GetOptions (
'verbose' => "\$verbose",
) or die qq(Invalid arguments passed);
And you could use:
perl script.pl -v file
or
perl script.pl -verbose file
or
perl script.pl -verb file
Because, by default, GetOptions will auto_abbreviate the parameters and figure out what parameters the user is attempting to pass.
I highly recommend you look at the documentation and play around with it a bit. There will be a lot of stuff that won't quite make sense to you, but this is probably one of the earliest modules that new Perl programmers start to use, and it is full of all sorts of neat stuff.
And keep going back and reread the documentation as your skills develop because you'll find new stuff in this module as your understanding of Perl increases.

Bug in Perl or I don't understand something about regexp matching and perl variables?

For long time, I was always thinking that parameters in Perl subs are passed by value. Now, I hit something that I don't understand:
use strict;
use warnings;
use Data::Dumper;
sub p {
print STDERR "Before match: " . Data::Dumper->Dump([[#_]]) . "\n";
"1" =~ /1/;
print STDERR "After match: " . Data::Dumper->Dump([[#_]]) . "\n";
}
my $line = "jojo.tsv.bz2";
if ($line =~ /\.([a-z0-9]+)(?:\.(bz2|gz|7z|zip))?$/i) {
p($1, $2 || 'none');
p([$1, $2 || 'none']);
}
On first invocation of p(), and after executing of regexp match, values in #_ will become undefs. On the second invocation, everything is OK (values passed as array ref are not affected).
This was tested with Perl versions 5.8.8 (CentOS 5.6) and 5.12.3 (Fedora 14).
The question is - how this could happen, that regexp match destroys content of #_, which was built using $1, $2 etc (other values, if you add them, are not affected)?
The perlsub man page says:
The array #_ is a local array, but its elements are aliases for the actual scalar parameters.
So when you pass $1 to a subroutine, inside that subroutine $_[0] is an alias for $1, not a copy of $1. Therefore it gets modified by the regexp match in your p.
In general, the start of every Perl subroutine should look something like this:
my #args = #_;
...or this:
my ($arg1, $arg2) = #_;
...or this:
my $arg = shift;
And any capturing regexp should be used like this:
my ($match1, $match2) = $str =~ /my(funky)(regexp)/;
Without such disciplines, you are likely to be driven mad by subtle bugs.
As suggested, copying the args in every sub is a good idea (if only to document what they are by giving them a non-punctuation name).
However, it's also a good idea to never pass global variables; pass "$1", "$2", not $1, $2. (This applies to things like $DBI::errstr too.)
I am not quite sure why this happens but I would say you should use something like
my $arg1 = shift;
my $arg2 = shift;
and use $arg1 and $arg2 in your sub.
Using the perl debugger you will see that #_ looks different in the 2 sub calls:
1st call: Before match:
x #_
0 'tsv'
1 'bz2'
After match:
x #_
0 undef
1 undef
I think this was overwritten by the match.
2nd call: Before match:
x #_
0 ARRAY(0xc2b6e0)
0 'tsv'
1 'bz2'
After match:
x #_
0 ARRAY(0xc2b6e0)
0 'tsv'
1 'bz2'
So maybe this wasn't overwritten because of the different structure(?).
Hope this helps a little.

Perl assignment with a dummy placeholder

In other languages I've used like Erlang and Python, if I am splitting a string and don't care about one of the fields, I can use an underscore placeholder. I tried this in Perl:
(_,$id) = split('=',$fields[1]);
But I get the following error:
Can't modify constant item in list assignment at ./generate_datasets.pl line 17, near ");"
Execution of ./generate_datasets.pl aborted due to compilation errors.
Does Perl have a similar such pattern that I could use instead of creating a useless temporary variables?
undef serves the same purpose in Perl.
(undef, $something, $otherthing) = split(' ', $str);
You don't even need placeholders if you use Slices:
use warnings;
use strict;
my ($id) = (split /=/, 'foo=id123')[1];
print "$id\n";
__END__
id123
You can assign to (undef).
(undef, my $id) = split(/=/, $fields[1]);
You can even use my (undef).
my (undef, $id) = split(/=/, $fields[1]);
You could also use a list slice.
my $id = ( split(/=/, $fields[1]) )[1];
And just to explain why you get the particular error that you see...
_ is a internal Perl variable that can be used in the stat command to indicate "the same file as we used in the previous stat call". That way Perl uses a cached stat data structure and doesn't make another stat call.
if (-x $file and -r _) { ... }
This filehandle is a constant value and can't be written to. The variable is stored in the same typeglob as $_ and #_.
See perldoc stat.

Perl and Environment Variables

Some of the environment variables which we use in Unix are as below (just an example):
VAR1=variable1
VAR2=variable2
VAR3=variable3
# and so on
Now, I have a perl script (let's call it test.pl) which reads a tab delimited text file (let's call it test.txt) and pushes the contents of it columnwise in separate arays. The first column of test.txt contains the following information for example (the strings in first column are delimited by / but I do not know how may / a string would contain and at what position the environment variable would appear):
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
The extract of the script is as below:
use strict;
my $input0 = shift or die "must provide test.txt as the argument 0\n";
open(IN0,"<",$input0) || die "Cannot open $input0 for reading: $!";
my #first_column;
while (<IN0>)
{
chomp;
my #cols = split(/\t/);
my $first_col = `eval $cols[0]`; #### but this does not work
# here goes the push stmt to populate the array
### more code here
}
close(IN0);
Question: How can I access environment variables in such a situation so that the array is populated as below:
$first_column[0] = variable1/vraible2/variable3
$first_column[1] = variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
$first_column[2] = variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you are looking for a way to process configuration files. I like Config::Std for that purpose although there are many others on CPAN.
Here is a way of processing just the contents of $cols[0] to show in an explicit way what you need to do with it:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
chomp $line;
my #components = split qr{/}, $line;
for my $c ( #components ) {
if ( my ($var) = $c =~ m{^\$(\w+)\z} ) {
if ( exists $ENV{$var} ) {
$c = $ENV{$var};
}
}
}
print join('/', #components), "\n";
}
__DATA__
$VAR1/$VAR2/$VAR3
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2
Instead of the split/join, you can use s/// to replace patterns that look like variables with the corresponding values in %ENV. For illustration, I put a second column in the __DATA__ section which is supposed to stand for a description of the path, and turned each line in to a hashref. Note, I factored out the actual substitution to eval_path so you can try alternatives without messing with the main loop:
#!/usr/bin/perl
use strict; use warnings;
# You should not type this. I am assuming the
# environment variables are defined in the environment.
# They are here for testing.
#ENV{qw(VAR1 VAR2 VAR3)} = qw(variable1 variable2 variable3);
my #config;
while ( my $config = <DATA> ) {
last unless $config =~ /\S/;
chomp $config;
my #cols = split /\t/, $config;
$cols[0] = eval_path( $cols[0] );
push #config, { $cols[1] => $cols[0] };
}
use YAML;
print Dump \#config;
sub eval_path {
my ($path) = #_;
$path =~ s{\$(\w+)}{ exists $ENV{$1} ? $ENV{$1} : $1 }ge;
return $path;
}
__DATA__
$VAR1/$VAR2/$VAR3 Home sweet home
$VAR3/some_string/SOME_OTHER_STRING/and_so_on/$VAR2 Man oh man
$VAR2/$VAR1/some_string/some_string_2/some_string_3/some_string_n/$VAR2 Can't think of any other witty remarks ;-)
Output:
---
- Home sweet home: variable1/variable2/variable3
- Man oh man: variable3/some_string/SOME_OTHER_STRING/and_so_on/variable2
- Can't think of any other witty remarks ;-): variable2/variable1/some_string/some_string_2/some_string_3/some_string_n/variable2
I think you just want to do this:
my #cols = map { s/(\$(\w+))/ $ENV{$2} || $1 /ge; $_ } split /\t/;
What you would do here is after you split them you would take each sequence of '$' followed by word characters and check to see if there was an environment variable for the word portion of it, otherwise leave it as is.
The e switch on a substitution allows you to execute code for the replacement value.
If you expect a '0' for any environment variable value, it's better off to do a defined or, that came in with 5.10.
my #cols = map { s|(\$(\w+))| $ENV{$2} // $1 |ge; $_ } split /\t/;
(Ignore the markup. // is a defined-or, not a C-comment)
If you want to allow for full shell expansions, one option to use the shell to do the expansion for you, perhaps via echo:
$ cat input
$FOO
bar
${FOO//cat/dog}
$ FOO=cat perl -wpe '$_ = qx"echo $_"' input
cat
bar
dog
If you cannot trust the contents of the environment variable, this introduces a security risk, as invoking qx on a string may cause the shell to invoke commands embedded in the string. As a result, this scriptlet will not run under taint mode (-T).
Perl keeps its environment variables in %ENV, in your case you can change your code like so:
my $first_col = $ENV[$cols[0]];