There is no simd_packed_float3 type in Swift.
Why it's a problem?
Consider this Metal struct:
struct Test{
packed_float3 x;
float y;
};
First of all, you can't calculate a buffer pointer to address the memory of y, since you can't do this:
MemoryLayout<simd_packed_float3>.size
(Not sure if stride makes sense with packed types, but anyway with simd types it always gives the same length as size on my devices)
You can't use MemoryLayout<simd_float3>.size either, since it will return 16 and not 12 like in architectures available to me for testing.
Second, if you need to write a packed_float3 value of x to the buffer you will need to write the three consecutive floats, but not a single simd type. Again, simd_float3 is not usable since it will write 0 into the forth word corrupting the memory of the next property in the struct (y).
So I've done this:
struct Float_3{
var x: Float
var y: Float
var z: Float
}
typealias simd_packed_float3 = Float_3
It seems to be a functioning solution, but I'm not sure it's not a nasty thing to do...
What problems may I encounter with this approach, and how could I be sure that it won't break on some device that I don't have?
You can define a packed struct in your bridging header:
struct __attribute__((packed)) PackedFloat3 {
float x;
float y;
float z;
};
MemoryLayout<PackedFloat3>.size == 12
MemoryLayout<PackedFloat3>.stride == 12
By the way, simd_float3 is 16 bytes everywhere, simd types have stricter alignment requirements.
You can also typedef it to packed_float3 under #ifndef #ifdef __METAL_VERSION__ to have the same spelling in Swift and MSL.
The reason to do it in bridging header instead of Swift is that you can use the same structs with same spelling in both shaders and Swift.
I'm answering this following the answers I received on the Swift forum.
Turns out that someone in the Metal team at Apple has already thought of this problem and created the MTLPacked types exactly for the types that would have irregular sizes:
MTLPackedFloat3
MTLPackedFloat4x3
Related
I have a Uniforms struct defined in Swift as:
struct Uniforms {
var t = Float(0.0)
var arr = [0.2, 0.2, 0.2, 0.2, 0.2]
}
However, I cannot allocate a proper MTLBuffer for it because MemoryLayout<Uniforms>.stride returns 16. This contradicts the statement in Swift specification that the Array is a value-type. It is in fact treated as a reference-type by MemoryLayout.
Long story short, how can I pass a Uniforms structure that contains an array to a shader (I use constant namespace to pass it, all good there). Do I need to pass the array separately through a separate [[buffer(n)]] argument, into which I would copy the memory from the array? Any easier options?
Since Swift makes no guarantees about struct layout, it would be dangerous to copy the contents of such a struct into a Metal buffer directly (also, as written, the array contains Doubles, which are not supported by Metal currently anyway). There are a few different approaches that could work, depending on the shape of the real problem.
If you know the maximum number of elements in the array, you could add a struct member indicating the actual count, and make the last element of the struct expected by your shader a fixed-length array:
#define MAX_VALUE_COUNT 1024
struct ShaderUniforms {
float t;
uint32_t valueCount;
float values[MAX_VALUE_COUNT];
};
Then, in Swift, you could allocate a Metal buffer of the maximum size (4104 bytes, in this contrived case) and copy however many array elements you need into the buffer (preceded, of course, by the other struct members).
Alternately, yes, you could use a separate buffer parameter of pointer type (e.g., constant float *values [[buffer(1)]]). That would allow you to have a value count that isn't bounded by anything explicitly coded into the shader.
I am playing with C and Swift 3.0 code using vecLib and Accelerate framework from Apple as dynamic lib + my code in C lang based project and Swift playground.
And in situation with calling Apple's wrapper from framework of SIMD instruction with 1 or < 4 elements computation function like vvcospif() from framework is slower than simple standart cos(x * PI) when functions calls from loop near 1.000 times as example.
I know about difference between vvcospif() and cos(), I should use exactly vvcospif() for x * PI.
Example in playground, you can just copy code and run it:
import Cocoa
import Accelerate
func cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
let ft: Float = alpha * 3.1415927;
let f: Float = (1 - cos(ft)) * 0.5;
return a + f*(b - a);
}
var start: Date = NSDate() as Date
var interp: Float;
for index in 0..<1000 {
interp = cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
var end = NSDate();
var timeInterval: Double = end.timeIntervalSince(start);
print("cosine_interpolate in \(timeInterval) seconds")
func fast_cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
var x: Float = alpha
var count: Int32 = 1
var result: Float = 0
vvcospif(&result, &x, &count)
let SINSIN_HALF_X: Float = (1 - result) * 0.5;
return a + SINSIN_HALF_X * (b - a);
}
start = NSDate() as Date
for index in 0..<1000 {
interp = fast_cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
end = NSDate();
timeInterval = end.timeIntervalSince(start);
print("fast_cosine_interpolate in \(timeInterval) seconds")
My question is:
Why vvcospif() is slow in this example?
May be because vvcospif() it is wrapper under Objective-C runtime and converting data structures / copying of memory from Intel SIMD -> Objective-C -> Swift runtime is slower then tiny cos()?
I also have performance issue with C code +
#include <Accelerate/Accelerate.h>
vvcospif(resultVector, inputVector, &count);
when inputVector and resultVector is small arrays with 1 or 2 elements or just float variable, and calls in loop with ~ 1.000.000 times.
cos(x * PI) computation time near 20 ms.
and
vvcospif(x) with processing one float or float array[2] - computation time near 80 ms! Where is Acceleration? :)
Yes, in Xcode I use compiler -O -whole-module-optimization optimisation with whole module opt. enabled.
For a more detailed discussion with examples, see "Introduction to Fast Bezier (and Trying the Accelerate.framework)".
The first, fundamental problem is that non-inlined function calls are extremely expensive. You don't want function calls if you can possibly help it in performance-critical code. Within a module, the compiler can often inline functions for you, and parts of stdlib can be inlined for you. But when you start crossing module barriers, Swift generally cannot optimize out the call.
The point of SIMD functions is that you set up all your data in the right format, and then call them just one time. That way the cost of the function call is made up by the SIMD optimized code you're calling.
But remember, you don't have to call into Accelerate to get SIMD optimizations. The compiler is perfectly capable of noticing you've written a loop and turning it into an inline SIMD algorithm itself (and it does this all the time). So for many simple problems, the compiler is going to win anyway. Think about it: if calling vvcospif with a count of 1 were faster than calling cos, wouldn't they just implement cos that way?
I haven't played with your code much, but if you want to improve its performance with Accelerate, you want to think about how to arrange all your input data so you can call vvcospif one time with a large N. It's quite possible in that case that it will be much faster that a loop (since cos is not trivial).
If you want an example of Accelerate in practice, and how you need to organize your data, see PinchText. This code is computing offsets for a page full of a few thousand glyphs based on up to 10 touches in real-time, with animations (see PinchText.mov for what the result looks like). In particular look at adjustViewPositions:count:forTouchPoint:. Notice how count is large, and the data is transformed step by step with no loops. Even throwing in a (very expensive) ObjC method call into that method doesn't matter very much because it's only made one time. Getting rid of function calls in loops is a huge part of performance programming.
This question already has answers here:
Should conditional compilation be used to cope with difference in CGFloat on different architectures?
(3 answers)
Closed 6 years ago.
Quartz uses CGFloat for its graphics. CGFloat is either Float or Double, depending on the processor.
The Accelerate framework has different variations of the same function.
For example dgetrf_ for Double's and sgetrf_ for Float's.
I have to make these two work together. Either I can use Double's everywhere and convert them to CGFloat every time I use quartz, or I can (try to) determine the actual type of CGFloat and use the appropriate Accelerate function.
Mixing CGFloat's and Double types all over my code base is not very appealing and converting thousands or millions of values to CGFloat every time doesn't strike me as very efficient either.
At this moment I would go with the second option. (Or shouldn't I?)
My question is: how do I know the actual type of CGFloat?
if ??? //pseudo-code: CGFloat is Double
{
dgetrf_(...)
}
else
{
sgetrf_(...)
}
Documentation on Swift Floating-Point Numbers:
Floating-point types can represent a much wider range of values than
integer types, and can store numbers that are much larger or smaller
than can be stored in an Int. Swift provides two signed floating-point
number types:
Double represents a 64-bit floating-point number.
Float represents a 32-bit floating-point number.
You can test using the sizeof function:
if sizeof(CGFloat) == sizeof(Double) {
// CGFloat is a Double
} else {
// CGFloat is a Float
}
Probably the easiest way to deal with this is to use conditional compilation to define a wrapper which will call the proper version:
import Accelerate
func getrf_(__m: UnsafeMutablePointer<__CLPK_integer>,
__n: UnsafeMutablePointer<__CLPK_integer>,
__a: UnsafeMutablePointer<CGFloat>,
__lda: UnsafeMutablePointer<__CLPK_integer>,
__ipiv: UnsafeMutablePointer<__CLPK_integer>,
__info: UnsafeMutablePointer<__CLPK_integer>) -> Int32 {
#if __LP64__ // CGFloat is Double on 64 bit archetecture
return dgetrf_(__m, __n, UnsafeMutablePointer<__CLPK_doublereal>(__a), __lda, __ipiv, __info)
#else
return sgetrf_(__m, __n, UnsafeMutablePointer<__CLPK_real>(__a), __lda, __ipiv, __info)
#endif
}
There is a CGFLOAT_IS_DOUBLE macro defined in Core Graphics. You can use it in Swift for direct comparison:
if CGFLOAT_IS_DOUBLE == 1 {
print("Double")
} else {
print("Float")
}
Of course, direct size comparison is also possible:
if sizeof(CGFloat) == sizeof(Double) {
}
However, since there are overloaded functions for all Float, Double and CGFloat, there is rarely a reason to inspect the size of the type.
Looking at various posts on this topic but still no luck. Is there a simple way to make division/conversion when dividing Double (or Float) with Int? Here is a simple example in playground returning and error "Double is not convertible to UInt8".
var score:Double = 3.00
var length:Int = 2 // it is taken from some an array lenght and does not return decimal or float
var result:Double = (score / length )
Cast the int to double with var result:Double=(score/Double(length))
What this will do is before computing the division it will create a new Double variable with int inside parentheses hence constructor like syntax.
You cannot combine or use different variable types together.
You need to convert them all to the same type, to be able to divide them together.
The easiest way I see to make that happen, would be to make the Int a Double.
You can do that quite simply do that by adding a ".0" on the end of the Integer you want to convert.
Also, FYI:
Floats are pretty rarely used, so unless you're using them for something specific, its also just more fluid to use more common variables.
I have an CGFloat property and sometimes I get a return value of type Float64 or also of type Float32. Could I store both safely to CGFloat?
From the headers:
// CGBase.h
typedef float CGFloat;
// MacTypes.h
typedef float Float32;
typedef double Float64;
So CGFloat and Float32 are both floats while Float64 is a double so you would lose precision.
(Edit to clarify: this is for 32 bit systems such as the iPhone. If you are building for 64 bit, CGFloat is defined as a double.)
It's best practice to always try and store scalar values in the same type as you received them because the precision of scalar types changes with the hardware.
CGFloat isn't always guaranteed to be the same size on all current and future hardware. If you substitute another type for it or use it to store another type, your code made break somewhere down the road.
You might gain or lose precision when a new iPhone/iPad comes out or the code might break if you try to port it to Macs.