convert scientific notation to decimal (not integer) in bash/perl - perl

I have a tab delimited file with several columns (9 columns) that looks like this:
1:21468 1 21468 2.8628817609765984 0.09640845515631684 0.05034710996552612 1.0 0.012377712911711025 54.0
However in column 5 I sometimes have scientific numbers like:
8.159959468796783E-4
8.465114165595303E-4
8.703354859736187E-5
9.05132870067004E-4
I need to have all numbers in column 5 in decimal notation. From the example above:
0.0008159959468796783
0.0008465114165595303
0.00008703354859736187
0.000905132870067004
And I need to change these numbers without changing the rest of the numbers in column 5 or the rest of the file.
I know there is a similar post in Convert scientific notation to decimal in multiple fields. But in this case there was a if statement not related to the type of number present in the field, and it was for all numbers in that column. So, I'm having trouble transforming the information in there to my specific case. Can someone help me figuring this out?
Thank you!

The easyiest (and fastest) way to convert a scientific notation number in perl, to a regular notation number:
my $num = '0.12345678E5';
$num *= 1;
print "$num\n";

As Jim already proposed, one way to do this is to simply treat the number as a string and do the translation yourself. This way you're able to fully maintain your significant digits.
The following demonstrates a function for doing just that. It takes in a number that's potentially in scientific notation, and it returns the decimal representation. Works with both positive and negative exponents:
use warnings;
use strict;
while (<DATA>) {
my ($num, $expected) = split;
my $dec = sn_to_dec($num);
print $dec . ' - ' . ($dec eq $expected ? 'good' : 'bad') . "\n";
}
sub sn_to_dec {
my $num = shift;
if ($num =~ /^([+-]?)(\d*)(\.?)(\d*)[Ee]([-+]?\d+)$/) {
my ($sign, $int, $period, $dec, $exp) = ($1, $2, $3, $4, $5);
if ($exp < 0) {
my $len = 1 - $exp;
$int = ('0' x ($len - length $int)) . $int if $len > length $int;
substr $int, $exp, 0, '.';
return $sign.$int.$dec;
} elsif ($exp > 0) {
$dec .= '0' x ($exp - length $dec) if $exp > length $dec;
substr $dec, $exp, 0, '.' if $exp < length $dec;
return $sign.$int.$dec;
} else {
return $sign.$int.$period.$dec;
}
}
return $num;
}
__DATA__
8.159959468796783E-4 0.0008159959468796783
8.465114165595303E-4 0.0008465114165595303
8.703354859736187E-5 0.00008703354859736187
9.05132870067004E-4 0.000905132870067004
9.05132870067004E+4 90513.2870067004
9.05132870067004E+16 90513287006700400
9.05132870067004E+0 9.05132870067004

If you do this the simple way, by parsing as floating point and then using printf to force it to print as a decimal, you may end up with slightly different results because you're at the upper limit of precision available in double-precision format.
What you should do is split each line into fields, then examine field 5 with something like this.
($u,$d,$exp) = $field[5] =~ /(\d)\.(\d+)[Ee]([-+]\d+)/
If field[5] is in scientific notation this will give you
$u the digit before the decimal
$d the digits after the decimal
$exp the exponent
(if it's not you'll get back undefined values and can just skip the reformatting step)
Using that information you can reassemble the digits with the correct number of leading zeros and decimal point. If the exponent is positive you have to reassemble the digits but then insert the decimal point in the right place.
Once you've reformatted the value the way you want, reassemble the entire line (using, say, join) and write it out.

Related

Remove upfront zeros from floating point lower than 1 in Perl

I would like to normalize the variable from ie. 00000000.1, to 0.1 using Perl
my $number = 000000.1;
$number =\~ s/^0+(\.\d+)/0$1/;
Is there any other solution to normalize floats lower than 1 by removing upfront zeros than using regex?
When I try to put those kind of numbers into an example function below
test(00000000.1, 0000000.025);
sub test {
my ($a, $b) = #_;
print $a, "\n";
print $b, "\n";
print $a + $b, "\n";
}
I get
01
021
22
which is not what is expected.
A number with leading zeros is interpreted as octal, e.g. 000000.1 is 01. I presume you have a string as input, e.g. my $number = "000000.1". With this your regex is:
my $number = "000000.1";
$number =~ s/^0+(?=0\.\d+)//;
print $number;
Output:
0.1
Explanation of regex:
^0+ -- 1+ 0 digits
(?=0\.\d+) -- positive lookahead for 0. followed by digits
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
Simplest way, force it to be treated as a number and it will drop the leading zeros since they are meaningless for decimal numbers
my $str = '000.1';
...
my $num = 0 + $str;
An example,† to run from the command-line:
perl -wE'$n = shift; $n = 0 + $n; say $n' 000.1
Prints 0.1
Another, more "proper" way is to format that string ('000.1' and such) using sprintf. Then you do need to make a choice about precision, but that is often a good idea anyway
my $num = sprintf "%f", $str; # default precision
Or, if you know how many decimal places you want to keep
my $num = sprintf "%.3f", $str;
† The example in the question is really invalid. An unquoted string of digits which starts with a zero (077, rather than '077') would be treated as an octal number except that the decimal point (in 000.1) renders that moot as octals can't be fractional; so, Perl being Perl, it is tortured into a number somehow, but possibly yielding unintended values.
I am not sure how one could get an actual input like that. If 000.1 is read from a file or from the command-line or from STDIN ... it will be a string, an equivalent of assigning '000.1'
See Scalar value constructors in perldata, and for far more detail, perlnumber.
As others have noted, in Perl, leading zeros produce octal numbers; 10 is just a decimal number ten but 010 is equal to decimal eight. So yeah, the numbers should be in quotes for the problem to make any sense.
But the other answers don’t explain why the printed results look funny. Contrary to Peter Thoeny’s comment and zdim’s answer, there is nothing ‘invalid’ about the numbers. True, octals can’t be floating point, but Perl does not strip the . to turn 0000000.025 into 025. What happens is this:
Perl reads the run of zeros and recognises it as an octal number.
Perl reads the dot and parses it as the concatenation operator.
Perl reads 025 and again recognises it as an octal number.
Perl coerces the operands to strings, i.e. the decimal value of the numbers in string form; 0000000 is, of course, '0' and 025 is '21'.
Perl concatenates the two strings and returns the result, i.e. '021'.
And without error.
(As an exercise, you can check something like 010.025 which, for the same reason, turns into '821'.)
This is why $a and $b are each printed with a leading zero. Also note that, to evaluate $a + $b, Perl coerces the strings to numbers, but since leading zeros in strings do not produce octals, '01' + '021' is the same as '1' + '21', returning 22.

Perl script to convert a binary number to a decimal number

I have to write a Perl script that converts a binary number, specified as an
argument, to a decimal number. In the question there's a hint to use the reverse function.
We have to assume that the binary number is in this format
EDIT: This is what I've progressed to (note this is code from my textbook that I've messed with):
#!/usr/bin/perl
# dec2.pl: Converts decimal number to binary
#
die("No arguments\n") if ( $#ARGV == -1 ) ;
foreach $number (#ARGV) {
$original_number = $number ;
until ($number == 0 ) {
$bit = $number % 2 ;
unshift (#bit_arr, $bit) ;
$number = int($number / 2 );
}
$binary_number = join ("", #bit_arr) ;
print reverse ("The decimal number of $binary_number is $original_number\n");
$#bit_arr = -1;
}
When executed:
>./binary.pl 8
The decimal number of 1000 is 8
I don't know how to word it to make the program know to add up all of the 1's in the number that is inputted.
You could just use sprintf to do the converting for you...
sprintf("%d", 0b010101); # Binary string 010101 -> Decimal 21
sprintf("%b", 21) # Decimal 21 -> Binary 010101 string
Of course, you can also just eval a binary string with 0b in front to indicate binary:
my $binary_string = '010101';
my $decimal = eval("0b$binary"); # 21
You don't have to use reverse, but it makes it easy to think about the problem with respect to exponents and array indices.
use strict;
use warnings;
my $str = '111110100';
my #bits = reverse(split(//, $str));
my $sum = 0;
for my $i (0 .. $#bits) {
next unless $bits[$i];
$sum += 2 ** $i;
}
First of all, you are suppose to convert from a binary to decimal, not the other way around, which you means you take an input like $binary = '1011001';.
The first thing you need to do is obtain the individual bits (a0, a1, etc) from that. We're talking about splitting the string into its individual digits.
for my $bit (split(//, $binary)) {
...
}
That should be a great starting point. With that, you have all that you need to apply the following refactoring of the formula you posted:
n = ( ( ( ... )*2 + a2 )*2 + a1 )*2 + a0
[I have no idea why reverse would be recommended. It's possible to use it, but it's suboptimal.]

perl-how to treat a string as a binary number?

Read a file that contains an address and a data, like below:
#0, 12345678
#1, 5a5a5a5a
...
My aim is to read the address and the data. Consider the data I read is in hex format, and then I need to unpack them to binary number.
So 12345678 would become 00010010001101000101011001111000
Then, I need to further unpack the transferred binary number to another level.
So it becomes, 00000000000000010000000000010000000000000001000100000001000000000000000100000001000000010001000000000001000100010001000000000000
They way I did is like below
while(<STDIN>) {
if (/\#(\S+)\s+(\S+)/) {
$addr = $1;
$data = $2;
$mem{$addr} = ${data};
}
}
foreach $key (sort {$a <=> $b} (keys %mem)) {
my $str = unpack ('B*', pack ('H*',$mem{$key}));
my $str2 = unpack ('B*', pack ('H*', $str));
printf ("#%x ", $key);
printf ("%s",$str2);
printf ("\n");
}
It works, however, my next step is to do some numeric operation on the transferred bits.
Such as bitwise or and shifting. I tried << and | operator, both are for numbers, not strings. So I don't know how to solve this.
Please leave your comments if you have better ideas. Thanks.
You can employ Bit::Vector module from metaCPAN
use strict;
use warnings;
use Bit::Vector;
my $str = "1111000011011001010101000111001100010000001111001010101000111010001011";
printf "orig str: %72s\n", $str;
#only 72 bits for better view
my $vec = Bit::Vector->new_Bin(72,$str);
printf "vec : %72s\n", $vec->to_Bin();
$vec->Move_Left(2);
printf "left 2 : %72s\n", $vec->to_Bin();
$vec->Move_Right(4);
printf "right 4 : %72s\n", $vec->to_Bin();
prints:
orig str: 1111000011011001010101000111001100010000001111001010101000111010001011
vec : 001111000011011001010101000111001100010000001111001010101000111010001011
left 2 : 111100001101100101010100011100110001000000111100101010100011101000101100
right 4 : 000011110000110110010101010001110011000100000011110010101010001110100010
If you need do some math with arbitrary precision, you can also use Math::BigInt or use bigint (http://perldoc.perl.org/bigint.html)
Hex and binary are text representation of numbers. Shifting and bit manipulations are numerical operations. You want a number, not text.
my $hex = '5a5a5a5a';
$num = hex($hex); # Convert to number.
$num >>= 1; # Manipulate the number.
$hex = sprintf('%08X', $num); # Convert back to hex.
In a comment, you mention you want to deal with 256 bit numbers. The native numbers don't support that, but you can use Math::BigInt.
My final solution of this is forget about treat them as numbers, just treat them as string . I use substring and string concentration instead of shift. Then for the or operation , I just add each bit of the string, if it's 0 the result is 0, else is 1.
It may not be the best way to solve this problem. But that's the way I finally used.

Perfect matching is not working

I have a problem about perfect matching.I want to get the sum of positive and negative integers from a file .Also I want to get dates have same values in the file.
My File:
Hello -12, 3.4 and 32. Where did you
go on 01/01/2013 ? On 01/01/2013, we
went home. -4 plus 5 makes 1.
03/02/2013
Results I should be getting:
-16 //the sum of negative integers.
38 //the sum of positive integers.
2 //total number of unique dates :)
My code is:
$sum=0;
$summ=0;
while (<>) {
foreach ($_=~ /-\d+/g)
{
$sum+=$_;
}
foreach ($poz=~ /^\d+?$/g) {
$summ+=$poz;
}
foreach (/\d{2}\/\d{2}\/\d{4}/) {
$count++;
}
}
print "$sum\n";
print "$summ\n";
print "$count\n";
The output I am getting is:
-16
0
2
I can not get the value of the sum of positive numbers. Could you please help me?
First of all, always use use strict; use warnings;. It would have found your first error: The use of $poz without ever giving it a value. Twice!
A positive integer is a sequence
Not preceded by -.
Not preceded by a digit.
Not preceded by ..
Not preceded by /.
Consists of digits
Not followed by . plus digits. (Well, you might consider 4.0 an integer, but I doubt it.)
Not followed by a digit.
Not followed by /.
(?<![\-\d./])\d+(?![\d/])(?!\.\d)
A negative integer is a sequence
Consists of - followed by digits
Not followed by . plus digits. (Well, you might consider 4.0 an integer, but I doubt it.)
Not followed by a digit.
-\d+(?!\d)(?!\.\d)
So,
use strict;
use warnings;
my $sum_p = 0;
my $sum_n = 0;
my $dates = 0;
while (<>) {
$sum_p += $_ for /(?<![\-\d.\/])\d+(?![\d\/])(?!\.\d)/g;
$sum_n += $_ for /-\d+(?!\d)(?!\.\d)/g;
++$dates while /\d{2}\/\d{2}\/\d{4}/g;
}
print "$sum_p\n";
print "$sum_n\n";
print "$dates\n";

Perl - Remove trailing zeroes without exponential value

I am trying to remove trailing zeroes from decimal numbers.
For eg: If the input number is 0.0002340000, I would like the output to be 0.000234
I am using sprintf("%g",$number), but that works for the most part, except sometimes it converts the number into an exponential value with E-. How can I have it only display as a full decimal number?
Numbers don't have trailing zeroes. Trailing zeroes can only occur once you represent the number in decimal, a string. So the first step is to convert the number to a string if it's not already.
my $s = sprintf("%.10f", $n);
(The solution is suppose to work with the OP's inputs, and his inputs appear to have 10 decimal places. If you want more digits to appear, use the number of decimal places you want to appear instead of 10. I thought this was obvious. If you want to be ridiculous like #asjo, use 324 decimal places for the doubles if you want to make sure not to lose any precision you didn't already lose.)
Then you can delete the trailing zeroes.
$s =~ s/0+\z// if $s =~ /\./;
$s =~ s/\.\z//;
or
$s =~ s/\..*?\K0+\z//;
$s =~ s/\.\z//;
or
$s =~ s/\.(?:|.*[^0]\K)0*\z//;
To avoid scientific notation for numbers use the format conversion %f instead of %g.
A lazy way could be simply: $number=~s/0+$// (substitute trailing zeroes by nothing).
The solution is easier than you might think.
Instead of using %g use %f and it will result in the behavior you are looking for. %f will always output your floating decimal in "fixed decimal notation".
What does the documentation say about %g vs %f?
As you may notice in the below table %g will result in either the same as %f or %e (when appropriate).
Ff you'd want to force the use of fixed decimal notation use the appropriate format identifier, which in this case is %f.
sprintf - perldoc.perl.org
%% a percent sign
%c a character with the given number
%s a string
%d a signed integer, in decimal
%u an unsigned integer, in decimal
%o an unsigned integer, in octal
%x an unsigned integer, in hexadecimal
%e a floating-point number, in scientific notation
%f a floating-point number, in fixed decimal notation
%g a floating-point number, in %e or %f notation
What about TIMTOWTDI; aren't we writing perl?
Yes, as always there are more than one ways of doing it.
If you'd just like to trim the trailing decimal-point zeros from a string you could use a regular expression such as the below.
$number = "123000.321000";
$number =~ s/(\.\d+?)0+$/$1/;
$number # is now "12300.321"
Remember that floating point values in perl doesn't have trailing decimals, unless you are dealing with a string. With that said; a string is not a number, even though it can explicitly and implicitly be converted to one.
The simplest way is probably to multiply by 1.
Original:
my $num = sprintf("%.10f", 0.000234000001234);
print($num);
#output
0.0002340000
With multiplying:
my $num = sprintf("%.10f", 0.000234000001234) * 1;
print($num);
#output
0.000234
The whole point of the %g format is to use a fixed point notation when it is reasonable and to use exponential notation when fixed point is not reasonable. So, you need to know the range of values you'll be dealing with.
Clearly, you could write a regular expression to post-process the string from sprintf(), removing the trailing zeroes:
my $str = sprintf("%g", $number);
$str =~ s/0+$//;
If you always want a fixed point number, use '%f', possibly with number of decimal places that you want; you might still need to remove trailing zeroes. If you always want exponential notation, use '%e'.
An easy way:
You could cheat a little. Add 1 to avoid the number breaking into scientific notation. Then manipulate the number as a string (thereby making perl convert it into a string).
$n++;
$n =~ s/^(\d+)(?=\.)/$1 - 1/e;
print $n;
A "proper" way:
For a more "proper" solution, counting the number of decimal places to use with %f would be optimal. It turned out getting the correct number of decimal points is trickier than one would think. Here's an attempt:
use strict;
use warnings;
use v5.10;
my $n = 0.000000234;
say "Original: $n";
my $l = getlen($n);
printf "New : %.${l}f\n", $n;
sub getlen {
my $num = shift;
my $dec = 0;
return 0 unless $num; # no 0 values allowed
return 0 if $num >= 1; # values above 1 don't need this computation
while ($num < 1) {
$num *= 10;
$dec++;
}
$num =~ s/\d+\.?//; # must have \.? to accommodate e.g. 0.01
$dec += length $num;
return $dec;
}
Output:
Original: 2.34e-007
New : 0.000000234
The value can have trailing zeroes only if it is a string.
You can add 0 to it. It will be coverted to a numerical value, not showing any trailing zeroes.