There is a file which contains more than 20 million records. I need to use perl to aggregate the numbers and print the TOTAL on the last line. The numbers that I am supposed to aggregate are very big numbers and they could be positive or negative. I am using bignum module of perl to aggregate the numbers. However, it is not showing the correct results. Please advise.
sample.txt (in reality, this file contains more than 20 million records):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Expected output (my perl one liner is showing incorrect TOTAL on the last line):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
TOTAL=<<total_should_be_printed>>
The perl one liner I am using:
perl -Mbignum -ne 'BEGIN{my $sum=0;} s/\r?\n$//; $sum=$sum+$_; print "$_\n"; END{print "TOTAL=$sum"."\n";}' sample.txt
The perl one-liner is showing the TOTAL as 40648913273951600.00 and this is INCORRECT.
EDIT: Following one-liner is showing 40648913273951631.2469 as answer. Now it is really getting weird......
perl -Mbignum -e 'my $num1=Math::BigFloat->new("12345678910111213.00"); my $num2=Math::BigFloat->new("14151617181920212.12345"); my $num3=Math::BigFloat->new("23242526272829301.54321"); my $num4=Math::BigFloat->new("32333435363738394.23456"); my $num5=Math::BigFloat->new("-41424344454647489.65432"); my $sum=$num1+$num2+$num3+$num4+$num5; print $sum."\n";'
Please verify calculation based on Math::BigFloat module.
use strict;
use warnings;
use feature 'say';
use Math::BigFloat;
my $sum = Math::BigFloat->new(0);
$sum->precision(-5);
while( <DATA> ) {
my $x = Math::BigFloat->new($_);
$sum->badd($x);
say $x;
}
say "\nSUM: $sum";
exit 0;
__DATA__
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Output
12345678910111213
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
SUM: 40648913273951631.24690
The main job of the bignum pragma is to turn literal numbers into Math::BigInt objects. Once assigned to a variable, that variable will also be an object, and any arithmetic operations carried out using it will be done using Math::BigInt operator overloading.
Since you are reading values from a file, they won't automatically be converted into Math::BigInt values. So you need something else to be the object, in this case $sum. By initialising to the literal 0 value as you have done, $sum becomes an object. Unfortunately you declare my $sum within the scope of the BEGIN block. Outside of this scope, $sum refers to a different package variable, which hasn't been initialised into an object.
So you need to declare the variable outside of the BEGIN, or add a literal zero to it to coerce it into an object:
perl -Mbignum -lne' $sum += 0+$_; END {print $sum}'
Related
Perl substitute all Numbers to Alphabet
abc4xyz5u
to
abcdxyzeu
I try this,but it not work:
echo 'abc4xyz5u' | perl -pe'#n=1..9;#a=a..j;#h{#n}=#a;s#$n[$_]#$h{$&}#g for 0..$#n'
I know y/[1-9]/[a-j]/, but I want to use a substitute.
Your issue is within
s#$n[$_]#$h{$&}#g for 0..$#n
You expect $_ to be your input (so that s### is applied on it), but also $n[$_] to use the $_ from the for loop (0 to $#n). If you were to add a print, you'd notice that $_'s value within this loop is 0 to $#n, rather than your input.
What you could do instead to fix it is something like:
$r=$_; $r=~s#$n[$_]#$h{$&}#g for 0..$#n; $_=$r
But that's much more complicated that it has to be. I would instead do:
s#([1-9])#$h{$1}#g
Or, without using %h (since, let's face it, an hash with 0 => a, 1 => b etc. should be an array):
perl -pe '#a="a".."j"; s#([1-9])#$a[$1-1]#g'
Or, without requiring an array at all (I'll let you decide if you find it easier or harder to read; personally I'm fine with it),
perl -pe 's/([1-9])/chr(ord("a")+$1-1)/ge'
I would suggest to write it properly as a perl script.
The one liner you mentioned is little hard to understand.
use strict;
use warnings;
my #alphabets = ("a".."z");
my $input = $ARGV[0];
$input =~ s/(\d)/$alphabets[$1 - 1]/g;
print $input;
Run -
perl substitute.pl abc4xyz5u
Output -
abcdxyzeu
I am serching for the number in the string and replacing it with the alphabet on the same position(remenber array start form 0 index and hence 'position -1') in the 'alphabets' array
The following two simple perl programs have different behaviors:
#file1
printf #ARGV;
#file2
$tmp = sprintf #ARGV;
print $tmp;
$> perl file1 "hi %04d %.2f" 5 7.12345
#output: hi 0005 7.12
$> perl file2 "hi %04d %.2f" 5 7.12345
#output: 3
Why is the difference? I had thought the two programs are equivalent. Wonder if there is a way to make file2 (using "sprintf") to behave like file1.
The builtin sprintf function has a prototype:
$ perl -e 'print prototype("CORE::sprintf")'
$#
It treats the first argument as a scalar. Since you provided the argument #ARGV, it was coerced into a scalar by passing the number of elements in #ARGV instead.
Since the printf function has to support the syntax printf HANDLE TEMPLATE,LIST as well as printf TEMPLATE,LIST, it cannot support a prototype. So it always treats its arguments as a flat list, and uses the first element in the list as the template.
One way to make it the second script work correctly would be to call it like
$tmp = sprintf shift #ARGV, #ARGV
Another difference between printf and sprintf is that print sprintf appends $\ to the output, while printf does not (thanks, ysth).
#ARGV contains the arguments passed to the script in list form. printf takes that list and prints it out as is.
In second example you are using sprintf with the array and assigning it to scalar. Which basically means it stores the length of the array in your variable $tmp. Hence you get 3 as output.
From the perl docs (jaypal said it already)
Unlike printf, sprintf does not do what you probably mean when you pass it an array as your first argument. The array is given scalar context, and instead of using the 0th element of the array as the format, Perl will use the count of elements in the array as the format, which is almost never useful.
I get the following warning:
"Use of uninitialized value in concatenation (.) or string at C:\tools\test.pl line 17, DATA line 1."
But the next line of __DATA__ will be processed without any warning and get these:
test1b.txt:test test1c.txt:test :test
More strange thing is that when I add a line: print "$line:".$'."\n"; The warning disappeared.
Anybody have some clues?
#!/usr/bin/perl -w
use strict;
my $pattern='test';
my $output='$&';
while(<DATA>)
{
chomp;
my $line=$_;
chomp($line);
$line=~/$pattern/;
#print "$line:".$&."\n"; #why uncommenting this line make the following line pass without no warning.
my $result="$line:".eval($output)."\n";
print $result;
}
__DATA__
test1a.txt
test1b.txt
test1c.txt
Perl considers $&, $', and $` to be expensive, so it won't actually populate them in a program that doesn't use them. From the perlvar manpage:
The use of this variable [$&] anywhere in a program imposes a considerable
performance penalty on all regular expression matches. To avoid this
penalty, you can extract the same substring by using #-. Starting
with Perl 5.10, you can use the /p match flag and the ${^MATCH}
variable to do the same thing for particular match operations.
However, when you only use them inside a string that you pass to eval, Perl can't tell that you're using them, so it won't populate them, so they'll be undefined.
I have written some code, and I am not sure what the error is. I am getting the error:
Use of uninitialized value in concatenation (.) or string at mksmksmks.pl line 63
My code is as follows:
for(my $j = 0; $j < $num2; $j++) {
print {$out} "$destination[$j]|$IP_one_1[$j]|$IP_one_2[$j]|$reached[$j]|$IP_two_1[$j]|$IP_two_2[$j]\n";`
}
What it means is that one of the elements of either #destination, #IP_one_1, #IP_one_2, or #reached has not been defined (has not been assigned a value), or has been assigned a value of undef. You either need to detect (and prevent) undefined values at the source, or expect and deal with them later on. Since you have warnings enabled (which is a good thing), Perl is reminding you that your code is trying to concatenate a string where one of the values being concatenated is undefined.
Consider the following example:
perl -wE 'my #x = (); $x[0] = "Hello "; $x[2] = "world!"; say "#x"'
In this example, $x[0] has a value, and $x[2] has a value, but $x[1] does not. When we interpolate #x into a double-quoted construct, it is expanded as [element 0 (Hello )]<space>[element 1 (undef)]<space>[element 2 (world!)]. The undef elements interpolates as an empty string, and spews a warning. And of course by default array interpolation injects a space character between each element. So in the above example we see Hello <interpolation-space>(undef upgrades to empty string here)<interpolation-space>world!
An example of where you might investigate is one or more of the arrays is of a different total size than the others. For example, if #IP_one_2 has fewer elements than the others, or if $num2 is a value greater than the number of elements in any of the arrays.
Place the following near the top of your script and run it again:
use diagnostics;
When I run the following one-liner under warnings and diagnostics:
$ perl -wMdiagnostics -e '$a=$a; print "$a\n"'
I get the following output, and you will get something similar if you add use diagnostics;... a very helpful tool when you're first learning Perl's warnings.
Use of uninitialized value $a in concatenation (.) or string at -e
line 1 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
anid the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
Maybe my example will be useful to someone. Suppose variable $x is initialized from a database. It may contain an undefined value, and this is normal. We need to display its value on the console. As responsible programmers, we decided to use "use warnings FATAL => "all";". In this case, the script will fail.
perl -e 'use strict; use warnings FATAL => "all"; my $x; print("x=$x\n"); print("DONE\n");'
Returns:
Use of uninitialized value $x in concatenation (.) or string at -e line 1.
In this case, you can use
if(defined($x)){...}else{...}
But this is not pretty if just want to print a value.
perl -e 'use strict; use warnings FATAL => "all"; my $x; print("x=".($x//"null")."\n"); print("DONE\n");'
Prints:
x=null
DONE
Because the expression $x//"null" checks whether what comes before // is defined and if it is not defined returns what comes after //.
If you use eq "" it won't give any warning message.
But if you use eq " " (here you can see a space), then it will give this warning message:
Use of uninitialized value in concatenation (.) or string ....
A common 'Perlism' is generating a list as something to loop over in this form:
for($str=~/./g) { print "the next character from \"$str\"=$_\n"; }
In this case the global match regex returns a list that is one character in turn from the string $str, and assigns that value to $_
Instead of a regex, split can be used in the same way or 'a'..'z', map, etc.
I am investigating unpack to generate a field by field interpretation of a string. I have always found unpack to be less straightforward to the way my brain works, and I have never really dug that deeply into it.
As a simple case, I want to generate a list that is one character in each element from a string using unpack (yes -- I know I can do it with split(//,$str) and /./g but I really want to see if unpack can be used this way...)
Obviously, I can use a field list for unpack that is unpack("A1" x length($str), $str) but is there some other way that kinda looks like globbing? ie, can I call unpack(some_format,$str) either in list context or in a loop such that unpack will return the next group of character in the format group until $str is exausted?
I have read The Perl 5.12 Pack pod and the Perl 5.12 pack tutorial and the Perkmonks tutorial
Here is the sample code:
#!/usr/bin/perl
use warnings;
use strict;
my $str=join('',('a'..'z', 'A'..'Z')); #the alphabet...
$str=~s/(.{1,3})/$1 /g; #...in groups of three
print "str=$str\n\n";
for ($str=~/./g) {
print "regex: = $_\n";
}
for(split(//,$str)) {
print "split: \$_=$_\n";
}
for(unpack("A1" x length($str), $str)) {
print "unpack: \$_=$_\n";
}
pack and unpack templates can use parentheses to group things much like regexps can. The group can be followed by a repeat count. * as a repeat count means "repeat until you run out of things to pack/unpack".
for(unpack("(A1)*", $str)) {
print "unpack: \$_=$_\n";
}
You'd have to run a benchmark to find out which of these is the fastest.