Decimal and binary conversion with big numbers - perl

I have to convert big numbers in Perl from decimal to binary and the other way around.
An example number of that length:
Dec: 76982379919017706648824420266
Bin: 111110001011111001010101000010011001000010101111001110000000000000000000000000000000000000000000
I found two functions:
sub dec2bin {
my $str = unpack("B32", pack("N", shift));
$str =~ s/^0+(?=\d)//; # otherwise you'll get leading zeros
return $str;
}
sub bin2dec {
return unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}
But, both of them seem to stop working with big numbers.
Output of
bin2dec(111110001011111001010101000010011001000010101111001110000000000000000000000000000000000000000000)
is 1543163
and output of
dec2bin(76982379919017706422040262422)
is 11111111111111111111111111111111
Is there a proper way of doing it with such big numbers?

You can use Math::BigInt. Please note, that input to these functions should be strings.
use Math::BigInt;
sub bin2dec {
my $bin = shift;
return Math::BigInt->new("0b$bin");
}
sub dec2bin {
my $dec = shift;
my $i = Math::BigInt->new($dec);
return substr($i->as_bin(), 2);
}
print "Dec: " . bin2dec("111110001011111001010101000010011001000010101111001110000000000000000000000000000000000000000000") . "\n";
print "Bin: " . dec2bin("76982379919017706648824420266") . "\n";
Output is:
Dec: 76982379919017710405206147072
Bin: 111110001011111001010101000010011001000010101111001101001001010101100110001100111001011110101010

Perl provides built-in bignum facilities. Turn them on with use bignum;. Your conversion functions would look like this:
use bignum;
my ($b_orig, $d_orig, $b, $d);
$d_orig = 76982379919017706648824420266;
$b_orig = '111110001011111001010101000010011001000010101111001110000000000000000000000000000000000000000000';
print ("dec($b_orig) [orig] = $d_orig;\n");
print ("dec($b_orig) [comp] = " . Math::BigInt->from_bin($b_orig) . ";\n");
print ("bin($d_orig) [orig] = $b_orig;\n");
print ("bin($d_orig) [comp] = ".substr(Math::BigInt->new($d_orig)->as_bin(), 2).";\n");
Caveat
There is no correspondence between the binary and the decimal number that you provide. I have not checked whether this is a flawof the bigint library or not.

Perl's bigint provides transparent support for big integers:
perl -Mbigint -E 'say oct "0b111110001011111001010101000010011001000010101111001110000000000000000000000000000000000000000000"'
76982379919017710405206147072
You do not need to write your own conversion routine. oct will convert for you.

Related

How to separate the Hex 8 bits into two 4 bits

I am using like this,
$a = "002c459f";
$b = $a%10000;
$c = int($a/10000);
print $b; #prints 0
print $c; #prints 2
I want
$b=459f;
$c=002c;
Can anyone suggest how will I get this?
If you had used warnings, you would have gotten a warning message indicating a problem.
Since your 8-bit input is already formatted as a simple hex string, you can just use substr:
use warnings;
use strict;
my $x = '002c459f';
my $y = substr $x, 0, 4;
my $z = substr $x, 4, 4;
print "z=$z, y=$y\n";
Output:
z=459f, y=002c
It is a good practice to also use strict. I changed your variable names since a and b are special variables in Perl.
You should always use use strict; use warnings;! It would have told you that 002c459f isn't a number. (It's the hex representation of a number.) As such, you can't use division before first converting it into a number. You also used the wrong divisor (10000 instead of 0x10000).
my $a_num = hex($a_hex);
my $b_num = $a_num % 0x10000; # More common: my $b_num = $a_num & 0xFFFF;
my $c_num = int( $a_num / 0x10000 ); # More common: my $c_num = $a_num >> 16
my $b_hex = sprintf("%04x", $b_num);
my $c_hex = sprintf("%04x", $c_num);
But if you have exactly eight characters, you can use the following instead:
my ($c, $b) = unpack('a4 a4', $a);
Note: You should avoid using $a and $b as it may interfere with sort and some subs.
Input data is a hex string, regular expression can be applied to split string by 4 characters into an array.
At this point you can use result as a strings, or you can use hex() to convert hex string representation into perl's internal digital representation.
use strict;
use warnings;
use feature 'say';
my $a = "002c459f"; # value is a string
my($b,$c) = $a =~ /([\da-f]{4})/gi;
say "0x$b 0x$c\tstrings"; # values are strings
$b = hex($b); # convert to digit
$c = hex($c); # convert to digit
printf "0x%04x 0x%04x\tdigits\n", $b, $c;
Output
0x002c 0x459f strings
0x002c 0x459f digits

Perl division gives different result to JavaScript

In Perl I'm dividing two numbers, so:
838041641/908376077
This gives a Perl answer of:
0.922571236978976
But if you put it in a calculator (or use JavaScript) and the answer is:
0.9225712369789765
I need it to match the JavaScript answer. Is there any way to achieve this?
Another example is:
838041641/152508066 = 5.49506437908668 (calculator = 5.495064379086677)
It seems you want to round to 16 significant digits. You can use %g for that.
$ perl -le'printf "%.16g\n", 838041641/908376077'
0.9225712369789765
$ perl -le'printf "%.16g\n", 838041641/152508066'
5.495064379086677
Do note that %g uses scientific notation for very large and very small numbers. If that's not acceptable, we could use %e to always get scientific notation, and use string manipulations to correct the format without introducing numerical errors.
sub round_to_16_sig_digits {
my ($n) = #_;
my $str = sprintf("%.15e", $n);
$str =~ /^(-?)(\d)\.(\d+)e(.*)/saa or die "!!!";
my $sign = $1;
$str = $2 . $3;
my $e = $4;
my $decimals = 15 - $e;
if ($decimals > 0) {
$str = ("0" x ($decimals-15)) . $str if $decimals > 15;
substr($str, -$decimals, 1, ".");
$str =~ s/^0+\./0./;
$str =~ s/\.?0*\z//;
} else {
$str .= "0" x (-$decimals);
}
return $sign . $str;
}
For example,
say round_to_16_sig_digits($_)
for
838041641/908376077,
838041641/152508066;
outputs
0.9225712369789765
5.495064379086677
Note that doubles (used by Perl and JavaScript) have a little less than 16 digits of precision (log10(253)=15.95...), so JavaScript is inventing information that doesn't actually exist, which is why Perl rounds to 15 digits of precision.

Perl printf to use commas as thousands-separator

Using awk, I can print a number with commas as thousands separators.
(with a export LC_ALL=en_US.UTF-8 beforehand).
awk 'BEGIN{printf("%\047d\n", 24500)}'
24,500
I expected the same format to work with Perl, but it does not:
perl -e 'printf("%\047d\n", 24500)'
%'d
The Perl Cookbook offers this solution:
sub commify {
my $text = reverse $_[0];
$text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
return scalar reverse $text;
}
However I am assuming that since the printf option works in awk, it should also work in Perl.
The apostrophe format modifier is a non-standard POSIX extension.
The documentation for Perl's printf has this to say about such extensions
Perl does its own "sprintf" formatting: it emulates the C
function sprintf(3), but doesn't use it except for
floating-point numbers, and even then only standard modifiers
are allowed. Non-standard extensions in your local sprintf(3)
are therefore unavailable from Perl.
The Number::Format module will do this for you, and it takes its default settings from the locale, so is as portable as it can be
use strict;
use warnings 'all';
use v5.10.1;
use Number::Format 'format_number';
say format_number(24500);
output
24,500
A more perl-ish solution:
$a = 12345678; # no comment
$b = reverse $a; # $b = '87654321';
#c = unpack("(A3)*", $b); # $c = ('876', '543', '21');
$d = join ',', #c; # $d = '876,543,21';
$e = reverse $d; # $e = '12,345,678';
print $e;
outputs 12,345,678.
I realize this question was from almost 4 years ago, but since it comes up in searches, I'll add an elegant native Perl solution I came up with. I was originally searching for a way to do it with sprintf, but everything I've found indicates that it can't be done. Then since everyone is rolling their own, I thought I'd give it a go, and this is my solution.
$num = 12345678912345; # however many digits you want
while($num =~ s/(\d+)(\d\d\d)/$1\,$2/){};
print $num;
Results in:
12,345,678,912,345
Explanation:
The Regex does a maximal digit search for all leading digits. The minimum number of digits in a row it'll act on is 4 (1 plus 3). Then it adds a comma between the two. Next loop if there are still 4 digits at the end (before the comma), it'll add another comma and so on until the pattern doesn't match.
If you need something safe for use with more than 3 digits after the decimal, use this modification: (Note: This won't work if your number has no decimal)
while($num =~ s/(\d+)(\d\d\d)([.,])/$1\,$2$3/){};
This will ensure that it will only look for digits that ends in a comma (added on a previous loop) or a decimal.
Most of these answers assume that the format is universal. It isn't. CLDR uses Unicode information to figure it out. There's a long thread in How to properly localize numbers?.
CPAN has the CLDR::Number module:
#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);
my $locale = $ARGV[0] // 'en';
my #numbers = qw(
123
12345
1234.56
-90120
);
my $cldr = CLDR::Number->new( locale => $locale );
my $decf = $cldr->decimal_formatter;
foreach my $n ( #numbers ) {
say $decf->format($n);
}
Here are a few runs:
$ perl comma.pl
123
12,345
1,234.56
-90,120
$ perl comma.pl es
123
12.345
1234,56
-90.120
$ perl comma.pl bn
১২৩
১২,৩৪৫
১,২৩৪.৫৬
-৯০,১২০
It seems heavyweight, but the output is correct and you don't have to allow the user to change the locale you want to use. However, when it's time to change the locale, you are ready to go. I also prefer this to Number::Format because I can use a locale that's different from my local settings for my terminal or session, or even use multiple locales:
#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);
my #locales = qw( en pt bn );
my #numbers = qw(
123
12345
1234.56
-90120
);
my #formatters = map {
my $cldr = CLDR::Number->new( locale => $_ );
my $decf = $cldr->decimal_formatter;
[ $_, $cldr, $decf ];
} #locales;
printf "%10s %10s %10s\n" . '=' x 32 . "\n", #locales;
foreach my $n ( #numbers ) {
printf "%10s %10s %10s\n",
map { $_->[-1]->format($n) } #formatters;
}
The output has three locales at once:
en pt bn
================================
123 123 ১২৩
12,345 12.345 ১২,৩৪৫
1,234.56 1.234,56 ১,২৩৪.৫৬
-90,120 -90.120 -৯০,১২০
Here's an elegant Perl solution I've been using for over 20 years :)
1 while $text =~ s/(.*\d)(\d\d\d)/$1\.$2/g;
And if you then want two decimal places:
$text = sprintf("%0.2f", $text);
1 liner: Use a little loop whith a regex:
while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}
Example:
use strict;
use warnings;
my #numbers = (12321, 12.12, 122222.3334, '1234abc', '1.1', '1222333444555,666.77');
for(#numbers) {
my $number = $_;
while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}
print "$_ -> $number\n";
}
Output:
12321 -> 12,321
12.12 -> 12.12
122222.3334 -> 122,222.3334
1234abc -> 1,234abc
1.1 -> 1.1
1222333444555,666.77 -> 1,222,333,444,555,666.77
Pattern:
(\d+)(\d{3})
-> Take all numbers but the last 3 in group 1
-> Take the remaining 3 numbers in group2 on the beginning of $number
-> Followed is ignored
Substitution
$1,$2
-> Put a seperator sign (,) between group 1 and 2
-> The rest remains unchanged
So if you have 12345.67 the numers the regex uses are 12345. The '.' and all followed is ignored.
1. run (12345.67):
-> matches: 12345
-> group 1: 12,
group 2: 345
-> substitute 12,345
-> result: 12,345.67
2. run (12,345.67):
-> does not match!
-> while breaks.
Parting from #Laura's answer, I tweaked the pure perl, regex-only solution to work for numbers with decimals too:
while ($formatted_number =~ s/^(-?\d+)(\d{3}(?:,\d{3})*(?:\.\d+)*)$/$1,$2/) {};
Of course this assumes a "," as thousands separator and a "." as decimal separator, but it should be trivial to use variables to account for that for your given locale(s).
I used the following but it does not works as of perl v5.26.1
sub format_int
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num))));
}
The form that worked for me was:
sub format_int
{
my $num = shift;
return scalar reverse(join(",",unpack("(A3)*", reverse int($num))));
}
But to use negative numbers the code must be:
sub format_int
{
if ( $val >= 0 ) {
return scalar reverse join ",", unpack( "(A3)*", reverse int($val) );
} else {
return "-" . scalar reverse join ",", unpack( "(A3)*", reverse int(-$val) );
}
}
Did somebody say Perl?
perl -pe '1while s/(\d+)(\d{3})/$1,$2/'
This works for any integer.
# turning above answer into a function
sub format_float
# returns number with commas..... and 2 digit decimal
# so format_float(12345.667) returns "12,345.67"
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num)))) . sprintf(".%02d",int(100*(.005+($num - int($num)))));
}
sub format_int
# returns number with commas.....
# so format_int(12345.667) returns "12,345"
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num))));
}
I wanted to print numbers it in a currency format. If it turned out even, I still wanted a .00 at the end. I used the previous example (ty) and diddled with it a bit more to get this.
sub format_number {
my $num = shift;
my $result;
my $formatted_num = "";
my #temp_array = ();
my $mantissa = "";
if ( $num =~ /\./ ) {
$num = sprintf("%0.02f",$num);
($num,$mantissa) = split(/\./,$num);
$formatted_num = reverse $num;
#temp_array = unpack("(A3)*" , $formatted_num);
$formatted_num = reverse (join ',', #temp_array);
$result = $formatted_num . '.'. $mantissa;
} else {
$formatted_num = reverse $num;
#temp_array = unpack("(A3)*" , $formatted_num);
$formatted_num = reverse (join ',', #temp_array);
$result = $formatted_num . '.00';
}
return $result;
}
# Example call
# ...
printf("some amount = %s\n",format_number $some_amount);
I didn't have the Number library on my default mac OS X perl, and I didn't want to mess with that version or go off installing my own perl on this machine. I guess I would have used the formatter module otherwise.
I still don't actually like the solution all that much, but it does work.
This is good for money, just keep adding lines if you handle hundreds of millions.
sub commify{
my $var = $_[0];
#print "COMMIFY got $var\n"; #DEBUG
$var =~ s/(^\d{1,3})(\d{3})(\.\d\d)$/$1,$2$3/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3$4/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4$5/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4,$5$6/;
#print "COMMIFY made $var\n"; #DEBUG
return $var;
}
A solution that produces a localized output:
# First part - Localization
my ( $thousands_sep, $decimal_point, $negative_sign );
BEGIN {
my ( $l );
use POSIX qw(locale_h);
$l = localeconv();
$thousands_sep = $l->{ 'thousands_sep' };
$decimal_point = $l->{ 'decimal_point' };
$negative_sign = $l->{ 'negative_sign' };
}
# Second part - Number transformation
sub readable_number {
my $val = shift;
#my $thousands_sep = ".";
#my $decimal_point = ",";
#my $negative_sign = "-";
sub _readable_int {
my $val = shift;
# a pinch of PERL magic
return scalar reverse join $thousands_sep, unpack( "(A3)*", reverse $val );
}
my ( $i, $d, $r );
$i = int( $val );
if ( $val >= 0 ) {
$r = _readable_int( $i );
} else {
$r = $negative_sign . _readable_int( -$i );
}
# If there is decimal part append it to the integer result
if ( $val != $i ) {
( undef, $d ) = ( $val =~ /(\d*)\.(\d*)/ );
$r = $r . $decimal_point . $d;
}
return $r;
}
The first part gets the symbols used in the current locale to be used on the second part.
The BEGIN block is used to calculate the sysmbols only once at the beginning.
If for some reason there is need to not use POSIX locale, one can ommit the first part and uncomment the variables on the second part to hardcode the sysmbols to be used ($thousands_sep, $thousands_sep and $thousands_sep)

Converting to unicode characters in Perl?

I want to convert the text ( Hindi ) to Unicode in Perl. I have searched in CPAN. But, I could not find the exact module/way which I am looking for. Basically, I am looking for something like this.
My Input is:
इस परीक्षण के लिए है
My expected output is:
\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948
How to achieve this in Perl?
Give me some suggestions.
Try this
use utf8;
my $str = 'इस परीक्षण के लिए है';
for my $c (split //, $str) {
printf("\\u%04x", ord($c));
}
print "\n";
You don't really need any module to do that. ord for extracting char code and printf for formatting it as 4-numbers zero padded hex is more than enough:
use utf8;
my $str = 'इस परीक्षण के लिए है';
(my $u_encoded = $str) =~ s/(.)/sprintf "\\u%04x", ord($1)/sge;
# \u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948
Because I left a few comments on how the other answers might fall short of the expectations of various tools, I'd like to share a solution that encodes characters outside of the Basic Multilingual Plane as pairs of two escapes: "😃" would become \ud83d\ude03.
This is done by:
Encoding the string as UTF-16, without a byte order mark. We explicitly choose an endianess. Here, we arbitrarily use the big-endian form. This produces a string of octets (“bytes”), where two octets form one UTF-16 code unit, and two or four octets represent an Unicode code point.
This is done for convenience and performance; we could just as well determine the numeric values of the UTF-16 code units ourselves.
unpacking the resulting binary string into 16-bit integers which represent each UTF-16 code unit. We have to respect the correct endianess, so we use the n* pattern for unpack (i.e. 16-bit big endian unsigned integer).
Formatting each code unit as an \uxxxx escape.
As a Perl subroutine, this would look like
use strict;
use warnings;
use Encode ();
sub unicode_escape {
my ($str) = #_;
my $UTF_16BE_octets = Encode::encode("UTF-16BE", $str);
my #code_units = unpack "n*", $UTF_16BE_octets;
return join '', map { sprintf "\\u%04x", $_ } #code_units;
}
Test cases:
use Test::More tests => 3;
use utf8;
is unicode_escpape(''), '',
'empty string is empty string';
is unicode_escape("\N{SMILING FACE WITH OPEN MOUTH}"), '\ud83d\ude03',
'non-BMP code points are escaped as surrogate halves';
my $input = 'इस परीक्षण के लिए है';
my $output = '\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948';
is unicode_escape($input), $output,
'ordinary BMP code points each have a single escape';
If you want only an simple converter, you can use the following filter
perl -CSDA -nle 'printf "\\u%*v04x\n", "\\u",$_'
#or
perl -CSDA -nlE 'printf "\\u%04x",$_ for unpack "U*"'
like:
echo "इस परीक्षण के लिए है" | perl -CSDA -ne 'printf "\\u%*v04x\n", "\\u",$_'
#or
perl -CSDA -ne 'printf "\\u%*v04x\n", "\\u",$_' <<< "इस परीक्षण के लिए है"
prints:
\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948\u000a
Unicode with surrogate pairs.
use strict;
use warnings;
use utf8;
use open qw(:std :utf8);
my $str = "if( \N{U+1F42A}+\N{U+1F410} == \N{U+1F41B} ){ \N{U+1F602} = \N{U+1F52B} } # ορισμός ";
print "$str\n";
for my $ch (unpack "U*", $str) {
if( $ch > 0xffff ) {
my $h = ($ch - 0x10000) / 0x400 + 0xD800;
my $l = ($ch - 0x10000) % 0x400 + 0xDC00;
printf "\\u%04x\\u%04x", $h, $l;
}
else {
printf "\\u%04x", $ch;
}
}
print "\n";
prints
if( 🐪+🐐 == 🐛 ){ 😂 = 🔫 } # ορισμός
\u0069\u0066\u0028\u0020\ud83d\udc2a\u002b\ud83d\udc10\u0020\u003d\u003d\u0020\ud83d\udc1b\u0020\u0029\u007b\u0020\ud83d\ude02\u0020\u003d\u0020\ud83d\udd2b\u0020\u007d\u0020\u0023\u0020\u03bf\u03c1\u03b9\u03c3\u03bc\u03cc\u03c2\u0020

How to generate XOR checksum for a string using perl

I am trying to generate checksum for a NEMA(GPS protocol) word using perl.
A sample NEMA word is string of characters as shown below
$GPGLL,5300.97914,N,00259.98174,E,125926,A*28
The checksum is calculated by taking XOR of all the characters between $ and *. In this sentence the checksum is the character representation of the hexadecimal value 28.
I tried the following:
my $NMEA_word = 'GPGLL,5300.97914,N,00259.98174,E,125926,A';
my $uff = unpack( '%8A*', $NMEA_word );
print "Hexadecimal number: ", uc(sprintf("%x\n", $uff)), "\n";
But it doesn't seem to give a correct value. Please suggest what shall be rectified
my $uff;
$uff ^= $_ for unpack 'C*', 'GPGLL,5300.97914,N,00259.98174,E,125926,A';
printf "Hexadecimal number: \U%x\n", $uff;
__END__
Hexadecimal number: 28
More functionally,
use List::Util 'reduce';
sub checksum {
sprintf '%02X', ord reduce { our $a ^ our $b } split //, shift;
}
print checksum('GPGLL,5300.97914,N,00259.98174,E,125926,A'), "\n";
The unpack facility to generate a checksum adds the field values together, whereas you want then XORed.
This program will do what you ask.
use strict;
use warnings;
my $NMEA_word = 'GPGLL,5300.97914,N,00259.98174,E,125926,A';
printf "Hexadecimal number: %s\n", checksum($NMEA_word);
sub checksum {
my ($string) = #_;
my $v = 0;
$v ^= $_ for unpack 'C*', $string;
sprintf '%02X', $v;
}
output
Hexadecimal number: 28