Difference between "printf" and "print sprintf" - perl

The following two simple perl programs have different behaviors:
#file1
printf #ARGV;
#file2
$tmp = sprintf #ARGV;
print $tmp;
$> perl file1 "hi %04d %.2f" 5 7.12345
#output: hi 0005 7.12
$> perl file2 "hi %04d %.2f" 5 7.12345
#output: 3
Why is the difference? I had thought the two programs are equivalent. Wonder if there is a way to make file2 (using "sprintf") to behave like file1.

The builtin sprintf function has a prototype:
$ perl -e 'print prototype("CORE::sprintf")'
$#
It treats the first argument as a scalar. Since you provided the argument #ARGV, it was coerced into a scalar by passing the number of elements in #ARGV instead.
Since the printf function has to support the syntax printf HANDLE TEMPLATE,LIST as well as printf TEMPLATE,LIST, it cannot support a prototype. So it always treats its arguments as a flat list, and uses the first element in the list as the template.
One way to make it the second script work correctly would be to call it like
$tmp = sprintf shift #ARGV, #ARGV
Another difference between printf and sprintf is that print sprintf appends $\ to the output, while printf does not (thanks, ysth).

#ARGV contains the arguments passed to the script in list form. printf takes that list and prints it out as is.
In second example you are using sprintf with the array and assigning it to scalar. Which basically means it stores the length of the array in your variable $tmp. Hence you get 3 as output.

From the perl docs (jaypal said it already)
Unlike printf, sprintf does not do what you probably mean when you pass it an array as your first argument. The array is given scalar context, and instead of using the 0th element of the array as the format, Perl will use the count of elements in the array as the format, which is almost never useful.

Related

Aggregate very big numbers by reading a file in perl

There is a file which contains more than 20 million records. I need to use perl to aggregate the numbers and print the TOTAL on the last line. The numbers that I am supposed to aggregate are very big numbers and they could be positive or negative. I am using bignum module of perl to aggregate the numbers. However, it is not showing the correct results. Please advise.
sample.txt (in reality, this file contains more than 20 million records):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Expected output (my perl one liner is showing incorrect TOTAL on the last line):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
TOTAL=<<total_should_be_printed>>
The perl one liner I am using:
perl -Mbignum -ne 'BEGIN{my $sum=0;} s/\r?\n$//; $sum=$sum+$_; print "$_\n"; END{print "TOTAL=$sum"."\n";}' sample.txt
The perl one-liner is showing the TOTAL as 40648913273951600.00 and this is INCORRECT.
EDIT: Following one-liner is showing 40648913273951631.2469 as answer. Now it is really getting weird......
perl -Mbignum -e 'my $num1=Math::BigFloat->new("12345678910111213.00"); my $num2=Math::BigFloat->new("14151617181920212.12345"); my $num3=Math::BigFloat->new("23242526272829301.54321"); my $num4=Math::BigFloat->new("32333435363738394.23456"); my $num5=Math::BigFloat->new("-41424344454647489.65432"); my $sum=$num1+$num2+$num3+$num4+$num5; print $sum."\n";'
Please verify calculation based on Math::BigFloat module.
use strict;
use warnings;
use feature 'say';
use Math::BigFloat;
my $sum = Math::BigFloat->new(0);
$sum->precision(-5);
while( <DATA> ) {
my $x = Math::BigFloat->new($_);
$sum->badd($x);
say $x;
}
say "\nSUM: $sum";
exit 0;
__DATA__
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Output
12345678910111213
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
SUM: 40648913273951631.24690
The main job of the bignum pragma is to turn literal numbers into Math::BigInt objects. Once assigned to a variable, that variable will also be an object, and any arithmetic operations carried out using it will be done using Math::BigInt operator overloading.
Since you are reading values from a file, they won't automatically be converted into Math::BigInt values. So you need something else to be the object, in this case $sum. By initialising to the literal 0 value as you have done, $sum becomes an object. Unfortunately you declare my $sum within the scope of the BEGIN block. Outside of this scope, $sum refers to a different package variable, which hasn't been initialised into an object.
So you need to declare the variable outside of the BEGIN, or add a literal zero to it to coerce it into an object:
perl -Mbignum -lne' $sum += 0+$_; END {print $sum}'

Filehandle stored in hash variable reading as GLOB [duplicate]

Code
$ cat test1
hello
i am
lazer
nananana
$ cat 1.pl
use strict;
use warnings;
my #fh;
open $fh[0], '<', 'test1', or die $!;
my #res1 = <$fh[0]>; # Way1: why does this not work as expected?
print #res1."\n";
my $fh2 = $fh[0];
my #res2 = <$fh2>; # Way2: this works!
print #res2."\n";
Run
$ perl 1.pl
1
5
$
I am not sure why Way1 does not work as expected while Way2 does. Aren't those two methods the same? What is happening here?
Because of the dual nature of the <> operator (i.e. is it glob or readline?), the rules are that to behave as readline, you can only have a bareword or a simple scalar inside the brackets. So you'll have to either assign the array element to a simple scalar (as in your example), or use the readline function directly.
Because from perlop:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob().
You can spell the <> operator as readline instead to avoid problems with this magic.
Anything more complex than a bareword (interpreted as a file handle) or a simple scalar $var is interpreted as an argument to the glob() function. Only barewords and simple scalars are treated as file handles to be iterated by the <...> operator.
Basically the rules are:
<bareword> ~~ readline bareword
<$scalar> ~~ readline $scalar
<$array[0]> ~~ glob "$array[0]"
<anything else> ~~ glob ...
It's because <$fh[0]> is parsed as glob($fh[0]).
Use readline instead:
my #res1 = readline($fh[0]);

Using perl command line arguments in the format of both while(<>) ARGV[1], possibly using a shift command?

I basically want to the do something like
while(<>){
my ($one, $two, $three) = split;
if ($one > ARGV[1]){
#some commands
}
}
Where I would invoke it like
./script.pl text.txt 50
But obviously I don't want the while loop to read anything from 50
Any ideas on the best cleanest way to do this, like if I could shift the command line arguments somehow
<> reads from the ARGV filehandle, which you can think of as a concatenation of all the filenames in #ARGV. The ARGV filehandle won't be initialized until <> is called, so it is safe to manipulate #ARGV before your while loop.
my $val = pop #ARGV; # take last argument
my ($val) = splice #ARGV, 1, 1; # take 2nd argument
...
while (<>) { # now ARGV fh uses whatever is currently in #ARGV
my ($one,$two,$three) = split;
if ($one > $val) { ... }
}
Also note that if #ARGV is empty, the <> operator will read from standard input. So long as you empty the #ARGV array before you try to read from <>, something like this will also work with <>:
./script.pl 50 < text.txt
The typical way this is done, is to assign #ARGV to a list of variables so the script documents what you were expecting to be passed
my ($file, $count) = #ARGV ;
Of course, this doesn't actually check the validity of what - if anything - is actually passed. If you want that, there are many options processing modules to choose from. Many like Getopt::Long but I prefer Getopt::Lucid. YMMV.
As with most things perl, Gabor Szabo has a great page about #ARGV here. You may find this quote from it useful:
How to extract the command line arguments from #ARGV
#ARGV is just a regular array in Perl. The only difference from arrays that you create, is that it does not need to be declared and it is populated by Perl when your script starts.
Aside from these issue, you can handle it as a regular array. You can go over the elements using foreach, or access them one by one using an index: $ARGV[0].
You can also use shift, unshift, pop or push on this array.
Indeed, not only can you fetch the content of #ARGV, you can also change it.
If you expect a single value on the command line you can check what was it, or if it was provided at all by looking at $ARGV[0]. If you expect two variables you will also check $ARGV[1].
I recommend you rearrange the order of the arguments
./script.pl 50 text.txt
Putting the file name(s) last is a more common practice, and it simplifies the needed code to the following:
my $limit = shift(#ARGV);
while (<>) {
my #fields = split;
if ($fields[0] > $limit) {
...
}
}
The trick is to remove all but the file names from #ARGV before while (<>).
This practice has the additional advantage that the following will simply read from STDIN:
./script.pl 50

Command line argument variable #ARGV

I have a task to convert script from Perl to PowerShell.
I read about command line argument of Perl: #ARGV. I understand that at the time of the script execution, any argument that is passed will be captured by this special array variable. We can read #ARGV and assign values to scalar variables using:
($var1,$var2) = #ARGV;
I need to understand what the statement below is doing:
($var1,$var2,#ARGV) = #ARGV;
In my script, I have an if condition on values in #ARGV, and based on #ARGV values, a respective subroutine is getting called.
As per my understanding if we have more than two values in #ARGV, then on left side in parenthesis statement is changing values of ARGV/used to rewrite #ARGV with remaining Values?
It chops off the first two arguments from #ARGV and puts them in $var1 and $var2.
Personally I would have written it as:
$var1 = shift #ARGV;
$var2 = shift #ARGV;
But it is a matter of taste.

Why can't I use a typeglob in the diamond operator in Perl?

Usually bareword as the filehanle or a variable holds filehandle could be places inside <> operator to reference the file, but NOT the filehandle extracted from typeglob as the last line below shows. Why it doesn't work because the last case also references a filehandle?
open FILE, 'file.txt';
my $myfile = *FILE{IO};
print <$myfile>;
print <*FILE{IO}>; # this line doesn't work.
<> is among other things shortcut for readline(), and it accepts simple scalars or bare word, ie. <FILE>. For more complex expressions you have to be more explicit,
print readline *FILE{IO};
otherwise it will be interpreted as glob()
perl -MO=Deparse -e 'print <*FILE{IO}>;'
use File::Glob ();
print glob('*FILE{IO}');
In perlop, it says:
If what's within the angle brackets is neither a filehandle nor a
simple scalar variable containing a filehandle name, typeglob, or
typeglob reference, it is interpreted as a filename pattern to be
globbed ...
Since we want to be able to say things like:
foreach (<*.c>) {
# Do something for each file that matches *.c
}
it is not possible for perl to interpret the '*' as meaning a typeglob.
As noted in the other answer, you can work around this using readline, or you can assign the typeglob to a scalar first (as your example shows).