Perl mantissa differ from other doubles - perl

I'm trying to scan in a float: 13.8518009935297 .
The first routine is my own, the second is MacOSX libc's
strtod, the third is GMP's mpf_get_d() the forth is
perls numeric.c:Perl_my_atof2().
I use this snippet to print the mantissa:
union ieee_double {
struct {
uint32_t fracl;
uint32_t frach:20;
uint32_t exp:11;
uint32_t sign:1;
} s;
double d;
uint64_t l;
};
union ieee_double l0;
l0.d = ....
printf("... 0x%x 0x%x\n", l0.s.frach, l0.s.fracl);
The return values for the four functions are:
my-func : 0xbb41f 0x4283d21b
strtod : 0xbb41f 0x4283d21c
GMP : 0xbb41f 0x4283d21b
perl : 0xbb41f 0x4283d232
The difference between the first three functions is rounding.
However perl's mantissa is quite out of sync.
If I print all four doubles to a string again I get the
same decimal double back, the numbers seem to be equal.
My question:
The difference between my-func, strtod, GMP is rounding. However,
why is perl's mantissa so much out of sync, but still, if
converted back to decimal, it ends up as the same number again.
The difference is 22, so it should be noted in a decimal
fraction. How can I explain this?
Append:
Sorry, I think I figured out the problem:
$r = rand(25);
$t = $p->tokenize_str("$r");
tokenize_str() was my implementation of a conversion from string to double.
However the perl stringify "$r" prints out $r as 13.8518009935297, which is a
already truncation.
The actual value of $r is different, so when I at the end the binaries of
$t with $r I get values that diverge.

Here is some perl code to answer your question:
perl -le '($frac1, $frach)=unpack("II", pack "d", .0+"13.8518009935297");
print sprintf("%d %d 0x%03x 0x%04x", ($frach >> 31)&1, ($frach>>20)&0x5ff, $frach & 0xfffff, $frac1)'
-> 0 1026 0xbb41f 0x4283d21c
Perl gives the same result as strtod. The difference was the mistake you indicated in append.

Related

Writing a custom base64 encoding function in perl

I'm trying to learn perl by writing a custom base64 encoding function, unfortunately I've had no success by now. What I've come to is the following, which doesn't work and I unfortunately don't have any clue about how to proceed.
sub base64($) {
# Split string into single bits
my $bitstring = unpack("B*", $_[0]);
# Pack bits in pieces of six bits at a time
my #splitsixs = unpack("(A6)*", $bitstring);
my #enc = ("A".."Z", "a".."z", "0".."9", "+", "/");
# For each piece of six bits, convert them to integer, and take the corresponding place in #enc.
my #s = map { $enc[pack("B6", $_)] } #splitsixs;
join "", #s;
}
Can someone explain to me what am I doing wrong in this conversion? (Please leave aside for now the fact that I'm not considering padding)
I finally made it! I was erroneously trying to indexing elements in $enc directly via packed bytes, while I should convert them first into integers.
You can see this in the lines below.
I copy the entire function, padding included, in the hope that it might be useful to others.
sub base64($) {
# Split string into single bits
my $bitstring = unpack("B*", $_[0]);
# Pack bits in pieces of six bits at a time
my #sixs = unpack("(A6)*", $bitstring);
# Compute the amount of zero padding necessary to obtain a 6-aligned bitstring
my $padding = ((6 - (length $sixs[-1]) % 6) % 6);
$sixs[-1] = join "", ($sixs[-1], "0" x $padding);
# Array of mapping from pieces to encodings
my #enc = ("A".."Z", "a".."z", "0".."9", "+", "/");
# Unpack bit strings into integers
#sixs = map { unpack("c", pack("b6", join "", reverse(split "", $_))) } #sixs;
# For each integer take the corresponding place in #enc.
my #s = map { $enc[$_] } #sixs;
# Concatenate string adding necessary padding
join "", (#s, "=" x ($padding / 2));
}

How does Perl store integers in-memory?

say pack "A*", "asdf"; # Prints "asdf"
say pack "s", 0x41 * 256 + 0x42; # Prints "BA" (0x41 = 'A', 0x42 = 'B')
The first line makes sense: you're taking an ASCII encoded string, packing it into a string as an ASCII string. In the second line, the packed form is "\x42\x41" because of the little endian-ness of short integers on my machine.
However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number, since that's how (I assume) Perl stores numbers, as little-endian sequence of bytes. Is there a way to do so without unpacking it? I'm trying to get the correct mental model for the thing that pack() returns.
For instance, in C, I can do this:
#include <stdio.h>
int main(void) {
char c[2];
short * x = c;
c[0] = 0x42;
c[1] = 0x41;
printf("%d\n", *x); // Prints 16706 == 0x41 * 256 + 0x42
return 0;
}
If you're really interested in how Perl stores data internally, I'd recommend PerlGuts Illustrated. But usually, you don't have to care about stuff like that because Perl doesn't give you access to such low-level details. These internals are only important if you're writing XS extensions in C.
If you want to "cast" a two-byte string to a C short, you can use the unpack function like this:
$ perl -le 'print unpack("s", "BA")'
16706
However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number,
You need to unpack it first.
To be able to use it as a number in C, you need
char* packed = "\x42\x41";
int16_t int16;
memcpy(&int16, packed, sizeof(int16_t));
To be able to use it as a number in Perl, you need
my $packed = "\x42\x41";
my $num = unpack('s', $packed);
which is basically
use Inline C => <<'__EOI__';
SV* unpack_s(SV* sv) {
STRLEN len;
char* buf;
int16_t int16;
SvGETMAGIC(sv);
buf = SvPVbyte(sv, len);
if (len != sizeof(int16_t))
croak("usage");
Copy(buf, &int16, 1, int16_t);
return newSViv(int16);
}
__EOI__
my $packed = "\x42\x41";
my $num = unpack_s($packed);
since that's how (I assume) perl stores numbers, as little-endian sequence of bytes.
Perl stores numbers in one of following three fields of a scalar:
IV, a signed integer of size perl -V:ivsize (in bytes).
UV, an unsigned integer of size perl -V:uvsize (in bytes). (ivsize=uvsize)
NV, a floating point numbers of size perl -V:nvsize (in bytes).
In all case, native endianness is used.
I'm trying to get the correct mental model for the thing that pack() returns.
pack is used to construct "binary data" for interfacing with external APIs.
I see pack as a serialization function. It takes as input Perl values, and outputs a serialized form. The fact the output serialized form happens to be a Perl bytestring is more of an implementation detail than a core functionality.
As such, all you're really expected to do with the resulting string is feed it to unpack, though the serialized form is convenient to have it move around processes, hosts, planets.
If you're interested in serializing it to a number instead, consider using vec:
say vec "BA", 0, 16; # prints 16961
To take a closer look at the string's internal representation, take a look at Devel::Peek, though you're not going to see anything surprising with a pure ASCII string.
use Devel::Peek;
Dump "BA";
SV = PV(0xb42f80) at 0xb56300
REFCNT = 1
FLAGS = (POK,READONLY,pPOK)
PV = 0xb60cc0 "BA"\0
CUR = 2
LEN = 16

implementation of sha-256 in perl

i'm trying very hard on implementing the sha-256 algorithm. I have got problems with the padding of the message. for sha-256 you have to append one bit at the end of the message, which I have reached so far with $message .= (chr 0x80);
The next step should be to fill the emtpy space(512bit block) with 0's.
I calculated it with this formula: l+1+k=448-l and append it then to the message.
My problem comes now:Append in the last 64bit block the binary representation of the length of the message and fill the rest with 0's again. Since perl handles their data types by themself, there is no "byte" datatype. How can I figure out which value I should append?
please see also the official specification:
http://csrc.nist.gov/publications/fips/fips180-3/fips180-3_final.pdf
If at all possible, pull something off the shelf. You do not want to roll your own SHA-256 implementation because to get official blessing, you would have to have it certified.
That said, the specification is
5.1.1 SHA-1, SHA-224 and SHA-256
Suppose that the length of the message, M, is l bits. Append the bit 1 to the end of the message, followed by k zero bits, where k is the smallest, non-negative solution to the equation
l + 1 + k ≡ 448 mod 512
Then append the 64-bit block that is equal to the number l expressed using a binary representation. For example, the (8-bit ASCII) message “abc” has length 8 × 3 = 24, so the message is padded with a one bit, then 448 - (24 + 1) = 423 zero bits, and then the message length, to become the 512-bit padded message
423 64
.-^-. .---^---.
01100001 01100010 01100011 1 00…00 00…011000
“a” “b” “c” '-v-'
l=24
Then length of the padded message should now be a multiple of 512 bits.
You might be tempted to use vec because it allows you to address single bits, but you would have to work around funky addressing.
If bits is 4 or less, the string is broken into bytes, then the bits of each byte are broken into 8/BITS groups. Bits of a byte are numbered in a little-endian-ish way, as in 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80. For example, breaking the single input byte chr(0x36) into two groups gives a list (0x6, 0x3); breaking it into 4 groups gives (0x2, 0x1, 0x3, 0x0).
Instead, a pack template of B* specifies
A bit string (descending bit order inside each byte).
and N
An unsigned long (32-bit) in "network" (big-endian) order.
The latter is useful for assembling the message length. Although pack has a Q parameter for quad, the result is in the native order.
Start with a bit of prep work
our($UPPER32BITS,$LOWER32BITS);
BEGIN {
use Config;
die "$0: $^X not configured for 64-bit ints"
unless $Config{use64bitint};
# create non-portable 64-bit masks as constants
no warnings "portable";
*UPPER32BITS = \0xffff_ffff_0000_0000;
*LOWER32BITS = \0x0000_0000_ffff_ffff;
}
Then you can defined pad_message as
sub pad_message {
use bytes;
my($msg) = #_;
my $l = bytes::length($msg) * 8;
my $extra = $l % 512; # pad to 512-bit boundary
my $k = 448 - ($extra + 1);
# append 1 bit followed by $k zero bits
$msg .= pack "B*", 1 . 0 x $k;
# add big-endian length
$msg .= pack "NN", (($l & $UPPER32BITS) >> 32), ($l & $LOWER32BITS);
die "$0: bad length: ", bytes::length $msg
if (bytes::length($msg) * 8) % 512;
$msg;
}
Say the code prints the padded message with
my $padded = pad_message "abc";
# break into multiple lines for readability
for (unpack("H*", $padded) =~ /(.{64})/g) {
print $_, "\n";
}
Then the output is
6162638000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000018
which matches the specification.
First of all I hope you do this just as an exercise -- there is a Digest module in core that already computes SHA-256 just fine.
Note that $message .= (chr 0x80); appends one byte, not one bit. If you really need bitwise manipulation, take a look at the vec function.
To get the binary representation of an intger, you should use pack. To get it to 64 bit, do something like
$message .= pack 'Q', length($message)
Note that the 'Q' format is only available on 64 bit perls; if yours isn't one, simply concatenate four 0-bytes with a 32 bit value (pack format L).

format specifier for long double (I want to truncate the 0's after decimal)

I have a 15-digit floating-point number and I need to truncate the trailing zeros after the decimal point. Is there a format specifier for that?
%Lg is probably what you want: see http://developer.apple.com/library/ios/#DOCUMENTATION/System/Conceptual/ManPages_iPhoneOS/man3/printf.3.html.
Unfortunately in C there is no format specifier that seems to meet all the requirements you have. %Lg is the closest but as you noted it switched to scientific notation at its discretion. %Lf won't work by itself because it won't remove the trailing zeroes.
What you're going to have to do is print the fixed format number to a buffer and then manually remove the zeroes with string editing (which can STILL be tricky if you have rounding errors and numbers like 123.100000009781).
Is this what you want:
#include <iostream>
#include <iomanip>
int main()
{
double doubleValue = 78998.9878000000000;
std::cout << std::setprecision(15) << doubleValue << std::endl;
}
Output:
78998.9878
Note that trailing zeros after the decimal point are truncated!
Online Demo : http://www.ideone.com/vRFlQ
You could print the format specifier as a string, filling in the appropriate amount of digits if you can determine how many:
sprintf(fmt, "%%.%dlf", digits);
printf(fmt, number);
or, just checking trailing 0 characters:
sprintf(fmt, "%.15lf", 2.123);
truncate(fmt);
printf("%s", fmt);
truncate(char * fmt) {
int i = strlen(fmt);
while (fmt[--i] == '0' && i != 0);
fmt[i+1] = '\0';
}
%.15g — the 15 being the maximum number of significant digits required in the string (not the number of decimal places)
1.012345678900000 => 1.0123456789
12.012345678900000 => 12.0123456789
123.012345678900000 => 123.0123456789
1234.012345678900000 => 1234.0123456789
12345.012345678900000 => 12345.0123456789
123456.012345678900000 => 123456.012345679

0, 0e0, 0.0, -0, +0, 000 all mean the same thing to Perl, why?

Just puzzling to me.
Related, but different question:
What does “0 but true” mean in Perl?
Perl doesn't distinguish kinds of numbers. Looking at all of those with a non-CS/programmer eye, they all mean the same thing to me as well: zero. (This is one of the foundations of Perl: it tries to work like people, not like computers. "If it looks like a duck....")
So, if you use them as numbers, they're all the same thing. If you use them as strings, they differ. This does lead to situations where you may need to force one interpretation ("0 but true"; see also "nancy typing"). but by and large it "does the right thing" automatically.
I don't understand, what else should they mean?
You give integer, scientific, floating point, signed integers and octal notations of zero. Why should they differ?
0==0 as everyone, including Larry Wall, knows.
Perl interprets every scalar value as both a string and (potentially) a number. All of those string representations of zero can convert to the integer value 0 , according to perl's conversion rules:
"0", "0.0", "-0", "+0", "000" => Simplest case of straight string to numeric conversion.
"0e0" => In a numeric context, only the leading valid numeric characters are converted, so only the leading "0" is used. For example, "1984abcdef2112" would be interpreted numerically as 1984.
"0 but true" in perl means that a string like "0e0" will evalutate numerically to 0, but in a boolean context will be "true" because the conversion to boolean follows different rules than the strict numeric conversion.
Perl works in contexts. In string context, they are all different. In numeric context, they are all zero.
print "same string\n" if '0' eq '0.0';
print "same number\n" if 0 == 0.0;
'0 but true' in boolean context is true:
print "boolean context\n" if '0 but true';
print "string context\n" if '0 but true' eq '0';
print "numeric context\n" if '0 but true' == 0;