What's the difference between &<< and << operators in Swift? - swift

What's the difference between &<< and << operators in Swift? It seems like they return the same results:
print(2 << 3) // 16
print(2 &<< 3) // 16

The FixedWithInteger protocol defines the “masking left shift operator” &<< as
Returns the result of shifting a value’s binary representation the
specified number of digits to the left, masking the shift amount to
the type’s bit width.
Use the masking left shift operator (&<<) when you need to perform a
shift and are sure that the shift amount is in the range
0..<lhs.bitWidth. Before shifting, the masking left shift operator
masks the shift to this range. The shift is performed using this
masked value.
So the results can differ if the shift amount is larger than or equal to the bit width of the left operand: Example:
print(2 << 70) // 0
print(2 &<< 70) // 128
Here the shift amount (70) is larger than the bitwith of Int (64), so that 2 << 70 evaluates to zero.
In the second line, the number is shifted to the left by 70 % 64 = 6 bits.
There is also a similar “masking right shift operator” &>>. Example:
let x = Int8.min // -128 = 0b10000000
print(Int8.min >> 8) // -1 = 0b11111111
print(Int8.min &>> 8) // -128 = 0b10000000
Here the first result is -1 because shifting a signed integer to the right fills empty positions on the left with the sign bit. The second result is -128 because the shift amount is zero: 8 % 8 = 0.
Naming and intended usage is also described in SR-6749:
The goal of the operator, however, is never to have this wrapping
behavior happen — it's what you use when you have static knowledge
that your shift amount is <= the bitWidth. By masking, you and the
compiler can agree that there's no branch needed, so you get slightly
faster code without zero branching and zero risk of undefined behavior
(which a negative or too large shift would be).
The docs are confusing because they give an example I don't think
anyone would ever intentionally write—relying on the wrapping behavior
to use some out of bounds value for the shift amount.
So using masked shift operators can increase the performance. Many examples can be found in the source code of the Swift standard library, for example in UTF8.swift.

Related

How to use fast division operation without support of division instruction, for a constant divisor? [duplicate]

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C:
File division.c
#include <stdlib.h>
#include <stdio.h>
int main()
{
size_t i = 9;
size_t j = i / 5;
printf("%zu\n",j);
return 0;
}
And then generating assembly language code with:
gcc -S division.c -O0 -masm=intel
But looking at generated division.s file, it doesn't contain any div operations! Instead, it does some kind of black magic with bit shifting and magic numbers. Here's a code snippet that computes i/5:
mov rax, QWORD PTR [rbp-16] ; Move i (=9) to RAX
movabs rdx, -3689348814741910323 ; Move some magic number to RDX (?)
mul rdx ; Multiply 9 by magic number
mov rax, rdx ; Take only the upper 64 bits of the result
shr rax, 2 ; Shift these bits 2 places to the right (?)
mov QWORD PTR [rbp-8], rax ; Magically, RAX contains 9/5=1 now,
; so we can assign it to j
What's going on here? Why doesn't GCC use div at all? How does it generate this magic number and why does everything work?
Integer division is one of the slowest arithmetic operations you can perform on a modern processor, with latency up to the dozens of cycles and bad throughput. (For x86, see Agner Fog's instruction tables and microarch guide).
If you know the divisor ahead of time, you can avoid the division by replacing it with a set of other operations (multiplications, additions, and shifts) which have the equivalent effect. Even if several operations are needed, it's often still a heck of a lot faster than the integer division itself.
Implementing the C / operator this way instead of with a multi-instruction sequence involving div is just GCC's default way of doing division by constants. It doesn't require optimizing across operations and doesn't change anything even for debugging. (Using -Os for small code size does get GCC to use div, though.) Using a multiplicative inverse instead of division is like using lea instead of mul and add
As a result, you only tend to see div or idiv in the output if the divisor isn't known at compile-time.
For information on how the compiler generates these sequences, as well as code to let you generate them for yourself (almost certainly unnecessary unless you're working with a braindead compiler), see libdivide.
Dividing by 5 is the same as multiplying 1/5, which is again the same as multiplying by 4/5 and shifting right 2 bits. The value concerned is CCCCCCCCCCCCCCCD in hex, which is the binary representation of 4/5 if put after a hexadecimal point (i.e. the binary for four fifths is 0.110011001100 recurring - see below for why). I think you can take it from here! You might want to check out fixed point arithmetic (though note it's rounded to an integer at the end).
As to why, multiplication is faster than division, and when the divisor is fixed, this is a faster route.
See Reciprocal Multiplication, a tutorial for a detailed writeup about how it works, explaining in terms of fixed-point. It shows how the algorithm for finding the reciprocal works, and how to handle signed division and modulo.
Let's consider for a minute why 0.CCCCCCCC... (hex) or 0.110011001100... binary is 4/5. Divide the binary representation by 4 (shift right 2 places), and we'll get 0.001100110011... which by trivial inspection can be added the original to get 0.111111111111..., which is obviously equal to 1, the same way 0.9999999... in decimal is equal to one. Therefore, we know that x + x/4 = 1, so 5x/4 = 1, x=4/5. This is then represented as CCCCCCCCCCCCD in hex for rounding (as the binary digit beyond the last one present would be a 1).
In general multiplication is much faster than division. So if we can get away with multiplying by the reciprocal instead we can significantly speed up division by a constant
A wrinkle is that we cannot represent the reciprocal exactly (unless the division was by a power of two but in that case we can usually just convert the division to a bit shift). So to ensure correct answers we have to be careful that the error in our reciprocal does not cause errors in our final result.
-3689348814741910323 is 0xCCCCCCCCCCCCCCCD which is a value of just over 4/5 expressed in 0.64 fixed point.
When we multiply a 64 bit integer by a 0.64 fixed point number we get a 64.64 result. We truncate the value to a 64-bit integer (effectively rounding it towards zero) and then perform a further shift which divides by four and again truncates By looking at the bit level it is clear that we can treat both truncations as a single truncation.
This clearly gives us at least an approximation of division by 5 but does it give us an exact answer correctly rounded towards zero?
To get an exact answer the error needs to be small enough not to push the answer over a rounding boundary.
The exact answer to a division by 5 will always have a fractional part of 0, 1/5, 2/5, 3/5 or 4/5 . Therefore a positive error of less than 1/5 in the multiplied and shifted result will never push the result over a rounding boundary.
The error in our constant is (1/5) * 2-64. The value of i is less than 264 so the error after multiplying is less than 1/5. After the division by 4 the error is less than (1/5) * 2−2.
(1/5) * 2−2 < 1/5 so the answer will always be equal to doing an exact division and rounding towards zero.
Unfortunately this doesn't work for all divisors.
If we try to represent 4/7 as a 0.64 fixed point number with rounding away from zero we end up with an error of (6/7) * 2-64. After multiplying by an i value of just under 264 we end up with an error just under 6/7 and after dividing by four we end up with an error of just under 1.5/7 which is greater than 1/7.
So to implement divison by 7 correctly we need to multiply by a 0.65 fixed point number. We can implement that by multiplying by the lower 64 bits of our fixed point number, then adding the original number (this may overflow into the carry bit) then doing a rotate through carry.
Here is link to a document of an algorithm that produces the values and code I see with Visual Studio (in most cases) and that I assume is still used in GCC for division of a variable integer by a constant integer.
http://gmplib.org/~tege/divcnst-pldi94.pdf
In the article, a uword has N bits, a udword has 2N bits, n = numerator = dividend, d = denominator = divisor, ℓ is initially set to ceil(log2(d)), shpre is pre-shift (used before multiply) = e = number of trailing zero bits in d, shpost is post-shift (used after multiply), prec is precision = N - e = N - shpre. The goal is to optimize calculation of n/d using a pre-shift, multiply, and post-shift.
Scroll down to figure 6.2, which defines how a udword multiplier (max size is N+1 bits), is generated, but doesn't clearly explain the process. I'll explain this below.
Figure 4.2 and figure 6.2 show how the multiplier can be reduced to a N bit or less multiplier for most divisors. Equation 4.5 explains how the formula used to deal with N+1 bit multipliers in figure 4.1 and 4.2 was derived.
In the case of modern X86 and other processors, multiply time is fixed, so pre-shift doesn't help on these processors, but it still helps to reduce the multiplier from N+1 bits to N bits. I don't know if GCC or Visual Studio have eliminated pre-shift for X86 targets.
Going back to Figure 6.2. The numerator (dividend) for mlow and mhigh can be larger than a udword only when denominator (divisor) > 2^(N-1) (when ℓ == N => mlow = 2^(2N)), in this case the optimized replacement for n/d is a compare (if n>=d, q = 1, else q = 0), so no multiplier is generated. The initial values of mlow and mhigh will be N+1 bits, and two udword/uword divides can be used to produce each N+1 bit value (mlow or mhigh). Using X86 in 64 bit mode as an example:
; upper 8 bytes of dividend = 2^(ℓ) = (upper part of 2^(N+ℓ))
; lower 8 bytes of dividend for mlow = 0
; lower 8 bytes of dividend for mhigh = 2^(N+ℓ-prec) = 2^(ℓ+shpre) = 2^(ℓ+e)
dividend dq 2 dup(?) ;16 byte dividend
divisor dq 1 dup(?) ; 8 byte divisor
; ...
mov rcx,divisor
mov rdx,0
mov rax,dividend+8 ;upper 8 bytes of dividend
div rcx ;after div, rax == 1
mov rax,dividend ;lower 8 bytes of dividend
div rcx
mov rdx,1 ;rdx:rax = N+1 bit value = 65 bit value
You can test this with GCC. You're already seen how j = i/5 is handled. Take a look at how j = i/7 is handled (which should be the N+1 bit multiplier case).
On most current processors, multiply has a fixed timing, so a pre-shift is not needed. For X86, the end result is a two instruction sequence for most divisors, and a five instruction sequence for divisors like 7 (in order to emulate a N+1 bit multiplier as shown in equation 4.5 and figure 4.2 of the pdf file). Example X86-64 code:
; rbx = dividend, rax = 64 bit (or less) multiplier, rcx = post shift count
; two instruction sequence for most divisors:
mul rbx ;rdx = upper 64 bits of product
shr rdx,cl ;rdx = quotient
;
; five instruction sequence for divisors like 7
; to emulate 65 bit multiplier (rbx = lower 64 bits of multiplier)
mul rbx ;rdx = upper 64 bits of product
sub rbx,rdx ;rbx -= rdx
shr rbx,1 ;rbx >>= 1
add rdx,rbx ;rdx = upper 64 bits of corrected product
shr rdx,cl ;rdx = quotient
; ...
To explain the 5 instruction sequence, a simple 3 instruction sequence could overflow. Let u64() mean upper 64 bits (all that is needed for quotient)
mul rbx ;rdx = u64(dvnd*mplr)
add rdx,rbx ;rdx = u64(dvnd*(2^64 + mplr)), could overflow
shr rdx,cl
To handle this case, cl = post_shift-1. rax = multiplier - 2^64, rbx = dividend. u64() is upper 64 bits. Note that rax = rax<<1 - rax. Quotient is:
u64( ( rbx * (2^64 + rax) )>>(cl+1) )
u64( ( rbx * (2^64 + rax<<1 - rax) )>>(cl+1) )
u64( ( (rbx * 2^64) + (rbx * rax)<<1 - (rbx * rax) )>>(cl+1) )
u64( ( (rbx * 2^64) - (rbx * rax) + (rbx * rax)<<1 )>>(cl+1) )
u64( ( ((rbx * 2^64) - (rbx * rax))>>1) + (rbx*rax) )>>(cl ) )
mul rbx ; (rbx*rax)
sub rbx,rdx ; (rbx*2^64)-(rbx*rax)
shr rbx,1 ;( (rbx*2^64)-(rbx*rax))>>1
add rdx,rbx ;( ((rbx*2^64)-(rbx*rax))>>1)+(rbx*rax)
shr rdx,cl ;((((rbx*2^64)-(rbx*rax))>>1)+(rbx*rax))>>cl
I will answer from a slightly different angle: Because it is allowed to do it.
C and C++ are defined against an abstract machine. The compiler transforms this program in terms of the abstract machine to concrete machine following the as-if rule.
The compiler is allowed to make ANY changes as long as it doesn't change the observable behaviour as specified by the abstract machine. There is no reasonable expectation that the compiler will transform your code in the most straightforward way possible (even when a lot of C programmer assume that). Usually, it does this because the compiler wants to optimize the performance compared to the straightforward approach (as discussed in the other answers at length).
If under any circumstances the compiler "optimizes" a correct program to something that has a different observable behaviour, that is a compiler bug.
Any undefined behaviour in our code (signed integer overflow is a classical example) and this contract is void.

How to do bitwise operation decently?

I'm doing analysis on binary data. Suppose I have two uint8 data values:
a = uint8(0xAB);
b = uint8(0xCD);
I want to take the lower two bits from a, and whole content from b, to make a 10 bit value. In C-style, it should be like:
(a[2:1] << 8) | b
I tried bitget:
bitget(a,2:-1:1)
But this just gave me separate [1, 1] logical type values, which is not a scalar, and cannot be used in the bitshift operation later.
My current solution is:
Make a|b (a or b):
temp1 = bitor(bitshift(uint16(a), 8), uint16(b));
Left shift six bits to get rid of the higher six bits from a:
temp2 = bitshift(temp1, 6);
Right shift six bits to get rid of lower zeros from the previous result:
temp3 = bitshift(temp2, -6);
Putting all these on one line:
result = bitshift(bitshift(bitor(bitshift(uint16(a), 8), uint16(b)), 6), -6);
This is doesn't seem efficient, right? I only want to get (a[2:1] << 8) | b, and it takes a long expression to get the value.
Please let me know if there's well-known solution for this problem.
Since you are using Octave, you can make use of bitpack and bitunpack:
octave> a = bitunpack (uint8 (0xAB))
a =
1 1 0 1 0 1 0 1
octave> B = bitunpack (uint8 (0xCD))
B =
1 0 1 1 0 0 1 1
Once you have them in this form, it's dead easy to do what you want:
octave> [B A(1:2)]
ans =
1 0 1 1 0 0 1 1 1 1
Then simply pad with zeros accordingly and pack it back into an integer:
octave> postpad ([B A(1:2)], 16, false)
ans =
1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0
octave> bitpack (ans, "uint16")
ans = 973
That or is equivalent to an addition when dealing with integers
result = bitshift(bi2de(bitget(a,1:2)),8) + b;
e.g
a = 01010111
b = 10010010
result = 00000011 100010010
= a[2]*2^9 + a[1]*2^8 + b
an alternative method could be
result = mod(a,2^x)*2^y + b;
where the x is the number of bits you want to extract from a and y is the number of bits of a and b, in your case:
result = mod(a,4)*256 + b;
an extra alternative solution close to the C solution:
result = bitor(bitshift(bitand(a,3), 8), b);
I think it is important to explain exactly what "(a[2:1] << 8) | b" is doing.
In assembly, referencing individual bits is a single operation. Assume all operations take the exact same time and "efficient" a[2:1] starts looking extremely inefficient.
The convenience statement actually does (a & 0x03).
If your compiler actually converts a uint8 to a uint16 based on how much it was shifted, this is not a 'free' operation, per se. Effectively, what your compiler will do is first clear the "memory" to the size of uint16 and then copy "a" into the location. This requires an extra step (clearing the "memory" (register)) that wouldn't normally be needed.
This means your statement actually is (uint16(a & 0x03) << 8) | uint16(b)
Now yes, because you're doing a power of two shift, you could just move a into AH, move b into AL, and AH by 0x03 and move it all out but that's a compiler optimization and not what your C code said to do.
The point is that directly translating that statement into matlab yields
bitor(bitshift(uint16(bitand(a,3)),8),uint16(b))
But, it should be noted that while it is not as TERSE as (a[2:1] << 8) | b, the number of "high level operations" is the same.
Note that all scripting languages are going to be very slow upon initiating each instruction, but will complete said instruction rapidly. The terse nature of Python isn't because "terse is better" but to create simple structures that the language can recognize so it can easily go into vectorized operations mode and start executing code very quickly.
The point here is that you have an "overhead" cost for calling bitand; but when operating on an array it will use SSE and that "overhead" is only paid once. The JIT (just in time) compiler, which optimizes script languages by reducing overhead calls and creating temporary machine code for currently executing sections of code MAY be able to recognize that the type checks for a chain of bitwise operations need only occur on the initial inputs, hence further reducing runtime.
Very high level languages are quite different (and frustrating) from high level languages such as C. You are giving up a large amount of control over code execution for ease of code production; whether matlab actually has implemented uint8 or if it is actually using a double and truncating it, you do not know. A bitwise operation on a native uint8 is extremely fast, but to convert from float to uint8, perform bitwise operation, and convert back is slow. (Historically, Matlab used doubles for everything and only rounded according to what 'type' you specified)
Even now, octave 4.0.3 has a compiled bitshift function that, for bitshift(ones('uint32'),-32) results in it wrapping back to 1. BRILLIANT! VHLL place you at the mercy of the language, it isn't about how terse or how verbose you write the code, it's how the blasted language decides to interpret it and execute machine level code. So instead of shifting, uint32(floor(ones / (2^32))) is actually FASTER and more accurate.

How can I extract a specific bit from a 16-bit register using math ONLY?

I have a 16-bit WORD and I want to read the status of a specific bit or several bits.
I've tried a method that divides the word by the bit that I want, converts the result to two values - an integer and to a real, and compares the two. if they are not equal, then it it equates to false. This appears to only work if i am looking for a bit that the last 'TRUE' bit in the word. If there are any successive TRUE bits, it fails. Perhaps I just haven't done it right. I don't have the ability to use code, just basic math, boolean operations, and type conversion. Any ideas? I hope this isn't a dumb question but i have a feeling it is.
eg:
WORD 0010000100100100 = 9348
I want to know the value of bit 2. how can i determine it from 9348?
There are many ways, depending on what operations you can use. It appears you don't have much to choose from. But this should work, using just integer division and multiplication, and a test for equality.
(psuedocode):
x = 9348 (binary 0010000100100100, bit 0 = 0, bit 1 = 0, bit 2 = 1, ...)
x = x / 4 (now x is 1000010010010000
y = (x / 2) * 2 (y is 0000010010010000)
if (x == y) {
(bit 2 must have been 0)
} else {
(bit 2 must have been 1)
}
Every time you divide by 2, you move the bits to the left one position (in your big endian representation). Every time you multiply by 2, you move the bits to the right one position. Odd numbers will have 1 in the least significant position. Even numbers will have 0 in the least significant position. If you divide an odd number by 2 in integer math, and then multiply by 2, you loose the odd bit if there was one. So the idea above is to first move the bit you want to know about into the least significant position. Then, divide by 2 and then multiply by two. If the result is the same as what you had before, then there must have been a 0 in the bit you care about. If the result is not the same as what you had before, then there must have been a 1 in the bit you care about.
Having explained the idea, we can simplify to
((x / 8) * 2) <> (x / 4)
which will resolve to true if the bit was set, and false if the bit was not set.
AND the word with a mask [1].
In your example, you're interested in the second bit, so the mask (in binary) is
00000010. (Which is 2 in decimal.)
In binary, your word 9348 is 0010010010000100 [2]
0010010010000100 (your word)
AND 0000000000000010 (mask)
----------------
0000000000000000 (result of ANDing your word and the mask)
Because the value is equal to zero, the bit is not set. If it were different to zero, the bit was set.
This technique works for extracting one bit at a time. You can however use it repeatedly with different masks if you're interested in extracting multiple bits.
[1] For more information on masking techniques see http://en.wikipedia.org/wiki/Mask_(computing)
[2] See http://www.binaryhexconverter.com/decimal-to-binary-converter
The nth bit is equal to the word divided by 2^n mod 2
I think you'll have to test each bit, 0 through 15 inclusive.
You could try 9348 AND 4 (equivalent of 1<<2 - index of the bit you wanted)
9348 AND 4
should give 4 if bit is set, 0 if not.
So here is what I have come up with: 3 solutions. One is Hatchet's as proposed above, and his answer helped me immensely with actually understanding HOW this works, which is of utmost importance to me! The proposed AND masking solutions could have worked if my system supports bitwise operators, but it apparently does not.
Original technique:
( ( ( INT ( TAG / BIT ) ) / 2 ) - ( INT ( ( INT ( TAG / BIT ) ) / 2 ) ) <> 0 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT, then REAL division by 2. In the second part, integer division is performed TAG/BIT, then integer division again by 2. The difference between these two results is compared to 0. If the difference is not 0, then the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5-1168 <> 0, so the result is TRUE.
My modified technique:
( INT ( TAG / BIT ) / 2 ) <> ( INT ( INT ( TAG / BIT ) / 2 ) )
Explanation:
effectively the same as above, but instead of subtracting the two results and comparing them to 0, I am just comparing the two results themselves. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5 <> 1168, so the result is TRUE.
Hatchet's technique as it applies to my system:
( INT ( TAG / BIT )) <> ( INT ( INT ( TAG / BIT ) / 2 ) * 2 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT. In the second part, integer division is performed TAG/BIT, then integer division again by 2, then multiplication by 2. The two results are compared. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337. Then 2337/2 = 1168 w/ integer division. Then 1168x2=2336. 2337 <> 2336 so the result is TRUE. As Hatchet stated, this method 'drops the odd bit'.
Note - 9348/4 = 2337 w/ both REAL and integer division, but it is important that these parts of the formula use integer division and not REAL division (12164/32 = 380 w/ integer division and 380.125 w/ REAL division)
I feel it important to note for any future readers that the BIT value in the equations above is not the bit number, but the actual value of the resulting decimal if the bit in the desired position was the only TRUE bit in the binary string (bit 2 = 4 (2^2), bit 6 = 64 (2^6))
This explanation may be a bit too verbatim for some, but may be perfect for others :)
Please feel free to comment/critique/correct me if necessary!
I just needed to resolve an integer status code to a bit state in order to interface with some hardware. Here's a method that works for me:
private bool resolveBitState(int value, int bitNumber)
{
return (value & (1 << (bitNumber - 1))) != 0;
}
I like it, because it's non-iterative, requires no cast operations and essentially translates directly to machine code operations like Shift, And and Comparison, which probably means it's really optimal.
To explain in a little more detail, I'm comparing the bitwise value to a mask for the bit I am interested in (value & mask) using an AND operation. If the bitwise AND operation result is zero, then the bit is not set (return false). If the AND operation result is not zero, then the bit is set (return true). The result of the AND operation is either zero or the value of the bit (1, 2, 4, 8, 16, 32...). Hence the boolean evaluation comparing the AND operation result and 0. The mask is created by taking the number 1 and shifting it left (bit wise), by the appropriate number of binary places (1 << n). The number of places is the number of the bit targeted minus 1. If it's bit #1, I want to shift the 1 left by 0 and if it's #2, I want to shift it left 1 place, etc.
I'm surprised no one rates my solution. It think it's most logical and succinct... and works.

How do I round up a float whose mantissa is ending in 5?

I want to round up a floating-point number. For example, sprintf '%.4f', 0.12345 returns 0.1235 and sprintf '%.4f', 0.12325 returns 0.1232. For the second example, I want it to print out 0.1233, not 0.1232.
sprintf in Perl is not good enough. Math::BigFloat can do it, but it’s a little over-kill.
Does anyone know if there is other effective way to round up, or whether there is any other module in Perl?
The exact way sprintf will round a number is dependent on how the system libraries round numbers. If you need to control exactly how a number is rounded you will need to use a library that implements your desired rounding explicitly, or write a function to round as desired.
For example to round a positive number to an arbitrary precision, where 5 rounds up this would work.
sub round {
my $numer = shift;
my $precision = shift;
return int($numer * 10 ** $precision + 0.5) * 10 ** -$precision;
}
This however doesn't round correctly for negative numbers, -0.12324 incorrectly rounds to -0.1231. A solution where 5 should round up (that is towards positive infinity) would be to use floor instead of int.
use POSIX qw(floor);
sub round {
my $numer = shift;
my $precision = shift;
return floor($numer * 10 ** $precision + 0.5) * 10 ** -$precision;
}
If instead 5 should round to the largest absolute value (that is round away from 0) then you can add a simple check for negative numbers to round them in the correct direction.
sub round {
my $numer = shift;
my $precision = shift;
my $direction = $numer >= 0 ? 0.5 : -0.5;
return int($numer * 10 ** $precision + $direction) * 10 ** -$precision;
}
There are more complex rules for rounding used in some circumstances where the bias from rounding 5 in one direction for all numbers is unacceptable and any (decimal) rounding of floating point numbers is subject to possible errors due to imprecision in their binary format.
Perl's sprintf can only be as good as the data it's fed.
There is no floating point number equal exact to either 0.12345 or 0.12325:
The floating point number closest to the 0.12345 is exactly 2223877495995551/2**54 ≈ 0.123450000000000004, a value slightly greater than the input.
The floating point number closest to the 0.12325 is exactly 4440549232587309/2**55 ≈ 0.123249999999999998, a value slightly less than the input.
Note that the 5th decimal digit of the latter one is 4 not 5.
When rounded to 4 decimal places under the usual rules, they must come out as 0.1235 and 0.1232 respectively.
You say that using Math::BigFloat is a little overkill, but it seems that what you really want is decimal arithmetic so the overkill is inescapable.
This did work for me:
print sprintf '%.*f', 4, 0.12325
Your code does work for me too. Did you test that code on its own?
Could you tell us your perl version too (perl -v)?

hash function providing unique uint from an integer coordinate pair

The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs