What is the purpose of Float.addProduct in Swift? - swift

What is the purpose of Float.addProduct in Swift? Are the operators Float + Float and Float * Float actually implemented using this as a base function, as in
let zero:Float = 0
func + (a:inout Float, b:Float) {
a.addProduct(1,b)
}
func * (a:inout Float, b:Float) {
0.addProduct(a,b)
}
Are + and * actually implemented using the same processor instruction that is being called with addProduct?

Float.addProduct performs an operation known as "Fused Multiply and Add", or FMA. This computes a*b + c. Importantly, though, it does so without rounding the result of a*b to a representable floating point number. As a result, FMA can result in significantly lower error than directly computing a*b+c, when properly used. It's the sort of thing where, when you need it, you'll know it.
No, regular addition and multiplication are generally not implemented through the operation this function performs; regular addition and multiplication are already available, and FMA can be significantly slower than regular multiplication and addition on some platforms.

Related

Why sometimes Apple Accelerate framework is slow?

I am playing with C and Swift 3.0 code using vecLib and Accelerate framework from Apple as dynamic lib + my code in C lang based project and Swift playground.
And in situation with calling Apple's wrapper from framework of SIMD instruction with 1 or < 4 elements computation function like vvcospif() from framework is slower than simple standart cos(x * PI) when functions calls from loop near 1.000 times as example.
I know about difference between vvcospif() and cos(), I should use exactly vvcospif() for x * PI.
Example in playground, you can just copy code and run it:
import Cocoa
import Accelerate
func cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
let ft: Float = alpha * 3.1415927;
let f: Float = (1 - cos(ft)) * 0.5;
return a + f*(b - a);
}
var start: Date = NSDate() as Date
var interp: Float;
for index in 0..<1000 {
interp = cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
var end = NSDate();
var timeInterval: Double = end.timeIntervalSince(start);
print("cosine_interpolate in \(timeInterval) seconds")
func fast_cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
var x: Float = alpha
var count: Int32 = 1
var result: Float = 0
vvcospif(&result, &x, &count)
let SINSIN_HALF_X: Float = (1 - result) * 0.5;
return a + SINSIN_HALF_X * (b - a);
}
start = NSDate() as Date
for index in 0..<1000 {
interp = fast_cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
end = NSDate();
timeInterval = end.timeIntervalSince(start);
print("fast_cosine_interpolate in \(timeInterval) seconds")
My question is:
Why vvcospif() is slow in this example?
May be because vvcospif() it is wrapper under Objective-C runtime and converting data structures / copying of memory from Intel SIMD -> Objective-C -> Swift runtime is slower then tiny cos()?
I also have performance issue with C code +
#include <Accelerate/Accelerate.h>
vvcospif(resultVector, inputVector, &count);
when inputVector and resultVector is small arrays with 1 or 2 elements or just float variable, and calls in loop with ~ 1.000.000 times.
cos(x * PI) computation time near 20 ms.
and
vvcospif(x) with processing one float or float array[2] - computation time near 80 ms! Where is Acceleration? :)
Yes, in Xcode I use compiler -O -whole-module-optimization optimisation with whole module opt. enabled.
For a more detailed discussion with examples, see "Introduction to Fast Bezier (and Trying the Accelerate.framework)".
The first, fundamental problem is that non-inlined function calls are extremely expensive. You don't want function calls if you can possibly help it in performance-critical code. Within a module, the compiler can often inline functions for you, and parts of stdlib can be inlined for you. But when you start crossing module barriers, Swift generally cannot optimize out the call.
The point of SIMD functions is that you set up all your data in the right format, and then call them just one time. That way the cost of the function call is made up by the SIMD optimized code you're calling.
But remember, you don't have to call into Accelerate to get SIMD optimizations. The compiler is perfectly capable of noticing you've written a loop and turning it into an inline SIMD algorithm itself (and it does this all the time). So for many simple problems, the compiler is going to win anyway. Think about it: if calling vvcospif with a count of 1 were faster than calling cos, wouldn't they just implement cos that way?
I haven't played with your code much, but if you want to improve its performance with Accelerate, you want to think about how to arrange all your input data so you can call vvcospif one time with a large N. It's quite possible in that case that it will be much faster that a loop (since cos is not trivial).
If you want an example of Accelerate in practice, and how you need to organize your data, see PinchText. This code is computing offsets for a page full of a few thousand glyphs based on up to 10 touches in real-time, with animations (see PinchText.mov for what the result looks like). In particular look at adjustViewPositions:count:forTouchPoint:. Notice how count is large, and the data is transformed step by step with no loops. Even throwing in a (very expensive) ObjC method call into that method doesn't matter very much because it's only made one time. Getting rid of function calls in loops is a huge part of performance programming.

How to determine if CGFloat is Float or Double [duplicate]

This question already has answers here:
Should conditional compilation be used to cope with difference in CGFloat on different architectures?
(3 answers)
Closed 6 years ago.
Quartz uses CGFloat for its graphics. CGFloat is either Float or Double, depending on the processor.
The Accelerate framework has different variations of the same function.
For example dgetrf_ for Double's and sgetrf_ for Float's.
I have to make these two work together. Either I can use Double's everywhere and convert them to CGFloat every time I use quartz, or I can (try to) determine the actual type of CGFloat and use the appropriate Accelerate function.
Mixing CGFloat's and Double types all over my code base is not very appealing and converting thousands or millions of values to CGFloat every time doesn't strike me as very efficient either.
At this moment I would go with the second option. (Or shouldn't I?)
My question is: how do I know the actual type of CGFloat?
if ??? //pseudo-code: CGFloat is Double
{
dgetrf_(...)
}
else
{
sgetrf_(...)
}
Documentation on Swift Floating-Point Numbers:
Floating-point types can represent a much wider range of values than
integer types, and can store numbers that are much larger or smaller
than can be stored in an Int. Swift provides two signed floating-point
number types:
Double represents a 64-bit floating-point number.
Float represents a 32-bit floating-point number.
You can test using the sizeof function:
if sizeof(CGFloat) == sizeof(Double) {
// CGFloat is a Double
} else {
// CGFloat is a Float
}
Probably the easiest way to deal with this is to use conditional compilation to define a wrapper which will call the proper version:
import Accelerate
func getrf_(__m: UnsafeMutablePointer<__CLPK_integer>,
__n: UnsafeMutablePointer<__CLPK_integer>,
__a: UnsafeMutablePointer<CGFloat>,
__lda: UnsafeMutablePointer<__CLPK_integer>,
__ipiv: UnsafeMutablePointer<__CLPK_integer>,
__info: UnsafeMutablePointer<__CLPK_integer>) -> Int32 {
#if __LP64__ // CGFloat is Double on 64 bit archetecture
return dgetrf_(__m, __n, UnsafeMutablePointer<__CLPK_doublereal>(__a), __lda, __ipiv, __info)
#else
return sgetrf_(__m, __n, UnsafeMutablePointer<__CLPK_real>(__a), __lda, __ipiv, __info)
#endif
}
There is a CGFLOAT_IS_DOUBLE macro defined in Core Graphics. You can use it in Swift for direct comparison:
if CGFLOAT_IS_DOUBLE == 1 {
print("Double")
} else {
print("Float")
}
Of course, direct size comparison is also possible:
if sizeof(CGFloat) == sizeof(Double) {
}
However, since there are overloaded functions for all Float, Double and CGFloat, there is rarely a reason to inspect the size of the type.

Division returns something different in functions

Paste the following code into a playground:
5.0 / 100
func test(anything: Float) -> Float {
return anything / 100
}
test(5.0)
The first line should return 0.05 as expected. The function test returns 0.0500000007450581. Why?
It has nothing to do with functions. Your first example is using type Double which represents floating point numbers more precisely by using 64 bits. If you were to change your second example to:
func test(anything: Double) -> Double {
return anything / 100
}
test(5.0)
You would get the result you expect. Float uses only 32 bits of data, thus it provides a less precise representation of the number. Also, floating point numbers are stored as binary values and frequently are only an approximation of the base 10 representation. That is why 0.05 is showing up as 0.0500000007450581 when stored as a Float.

Double numbers and bitxor [duplicate]

This question already has answers here:
How to use Bitxor for Double Numbers?
(2 answers)
Closed 9 years ago.
I have two matrices a = [120.23, 255.23669877,...] and b = [125.000083, 800.0101010,...] with double numbers in [0, 999]. I want to use bitxor for a and b. I can not use bitxor with round like this:
result = bitxor(round(a(1,j),round(b(1,j))
Because the decimal parts 0.23 and 0.000083 ,... are very important to me. I thought maybe I could do a = a*10^k and b = b*10^k and use bitxor and after that result/10^k (because I want my result's range to also be [0, 999]. But I do not know the maximum length of the number after the decimal point. Does k = 16 support the max range of double numbers in Matlab? Does bitxor support two 19-digit numbers? Is there a better solution?
This is not really an answer, but a very long comment with embedded code. I don't have a current matlab installation, and in any case don't know enough to answer the question in that context. Instead, I've written a Java program that I think may do what you are asking for. It uses two Java classes, BigInteger and BigDecimal. BigInteger is an extended integer format. BigDecimal is the combination of a BigInteger and a decimal scale.
Conversion from double to BigDecimal is exact. Conversion in the opposite direction may require rounding.
The function xor in my program converts each of its operands to BigDecimal. It finds a number of decimal digits to move the decimal point by to make both operands integers. After scaling, it converts to BigInteger, does the actual xor, and converts back to BigDecimal undoing the scaling.
The main point of this is for you to look at the results, and see whether they are what you want, and would be useful to you if you could do the same thing in Matlab. Explaining any ways in which the results are not what you want may help clarify your requirements for the Matlab experts.
Here is some test output. The top and bottom rows of each block are in decimal. The middle row is the scaled integer versions of the inputs, in hex.
Testing operands 1.100000000000000088817841970012523233890533447265625, 2
2f0a689f1b94a78f11d31b7ab806d40b1014d3f6d59 xor 558749db77f70029c77506823d22bd0000000000000 = 7a8d21446c63a7a6d6a61df88524690b1014d3f6d59
1.1 xor 2.0 = 2.8657425494106605
Testing operands 100, 200.0004999999999881765688769519329071044921875
2cd76fe086b93ce2f768a00b22a00000000000 xor 59aeee72a26b59f6380fcf078b92c4478e8a13 = 7579819224d26514cf676f0ca932c4478e8a13
100.0 xor 200.0005 = 261.9771865509636
Testing operands 120.3250000000000028421709430404007434844970703125, 120.75
d2c39898113a28d484dd867220659fbb45005915 xor d3822c338b76bab08df9fee485d1b00000000000 = 141b4ab9a4c926409247896a5b42fbb45005915
120.325 xor 120.75 = 0.7174277813579485
Testing operands 120.2300000000000039790393202565610408782958984375, 120.0000830000000036079654819332063198089599609375
d298ff20fbed5fd091d87e56002df79fc7007cb7 xor d231e5f39e1db18654cb8c43d579692616a16a1f = a91ad365f0ee56c513f215d5549eb9d1a116a8
120.23 xor 120.000083 = 0.37711627930683345
Here is the Java program:
import java.math.BigDecimal;
import java.math.BigInteger;
public class Test {
public static double xor(double a, double b) {
BigDecimal ad = new BigDecimal(a);
BigDecimal bd = new BigDecimal(b);
/*
* Shifting the decimal point right by scale will make both operands
* integers.
*/
int scale = Math.max(ad.scale(), bd.scale());
/*
* Scale both operands by, in effect, multiplying by the same power of 10.
*/
BigDecimal aScaled = ad.movePointRight(scale);
BigDecimal bScaled = bd.movePointRight(scale);
/*
* Convert the operands to integers, treating any rounding as an error.
*/
BigInteger aInt = aScaled.toBigIntegerExact();
BigInteger bInt = bScaled.toBigIntegerExact();
BigInteger resultInt = aInt.xor(bInt);
System.out.println(aInt.toString(16) + " xor " + bInt.toString(16) + " = "
+ resultInt.toString(16));
/*
* Undo the decimal point shift, in effect dividing by the same power of 10
* as was used to scale to integers.
*/
BigDecimal result = new BigDecimal(resultInt, scale);
return result.doubleValue();
}
public static void test(double a, double b) {
System.out.println("Testing operands " + new BigDecimal(a) + ", " + new BigDecimal(b));
double result = xor(a, b);
System.out.println(a + " xor " + b + " = " + result);
System.out.println();
}
public static void main(String arg[])
{
test(1.1, 2.0);
test(100, 200.0005);
test(120.325, 120.75);
test(120.23, 120.000083);
}
}
"But I do not know the max length of number after point ..."
In double precision floating-point you have 15–17 significant decimal digits. If you give bitxor double inputs these must be less than intmax('uint64'): 1.844674407370955e+19. The largest double, realmax (= 1.797693134862316e+308), is much bigger than this, so you can't represent everything in the the way you're using. For example, this means that your value of 800.0101010*10^17 won't work.
If your range is [0, 999], one option is to solve for the largest fractional exponent k and use that: log(double(intmax('uint64'))/999)/log(10) (= 16.266354234268810).

hash function providing unique uint from an integer coordinate pair

The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs