Determine number of threads for element-wise array addition in Metal

Determine number of threads for element-wise array addition in Metal - swift

In this example there are two large 1D arrays of size n. The arrays are added together element-wise to calculate a 1D results array using the Accelerate vDSP.add() function and a Metal GPU compute kernel adder().
// Size of each array
private let n = 5_000_000
// Create two random arrays of size n
private var array1 = (1...n).map{ _ in Float.random(in: 1...10) }
private var array2 = (1...n).map{ _ in Float.random(in: 1...10) }
// Add two arrays using Accelerate vDSP
addAccel(array1, array2)
// Add two arrays using Metal on the GPU
addMetal(array1, array2)
The Accelerate code is shown below:
import Accelerate
func addAccel(_ arr1: [Float], _ arr2: [Float]) {
let tic = DispatchTime.now().uptimeNanoseconds
// Add two arrays and store results
let y = vDSP.add(arr1, arr2)
// Print out elapsed time
let toc = DispatchTime.now().uptimeNanoseconds
let elapsed = Float(toc - tic) / 1_000_000_000
print("\nAccelerate vDSP elapsed time is \(elapsed) s")
// Print out some results
for i in 0..<3 {
let a1 = String(format: "%.4f", arr1[i])
let a2 = String(format: "%.4f", arr2[i])
let y = String(format: "%.4f", y[i])
print("\(a1) + \(a2) = \(y)")
}
}
The Metal code is shown below:
import MetalKit
private func setupMetal(arr1: [Float], arr2: [Float]) -> (MTLCommandBuffer?, MTLBuffer?) {
// Get the Metal GPU device
let device = MTLCreateSystemDefaultDevice()
// Queue for sending commands to the GPU
let commandQueue = device?.makeCommandQueue()
// Get our Metal GPU function
let gpuFunctionLibrary = device?.makeDefaultLibrary()
let adderGpuFunction = gpuFunctionLibrary?.makeFunction(name: "adder")
var adderComputePipelineState: MTLComputePipelineState!
do {
adderComputePipelineState = try device?.makeComputePipelineState(function: adderGpuFunction!)
} catch {
print(error)
}
// Create the buffers to be sent to the GPU from our arrays
let count = arr1.count
let arr1Buff = device?.makeBuffer(bytes: arr1,
length: MemoryLayout<Float>.size * count,
options: .storageModeShared)
let arr2Buff = device?.makeBuffer(bytes: arr2,
length: MemoryLayout<Float>.size * count,
options: .storageModeShared)
let resultBuff = device?.makeBuffer(length: MemoryLayout<Float>.size * count,
options: .storageModeShared)
// Create a buffer to be sent to the command queue
let commandBuffer = commandQueue?.makeCommandBuffer()
// Create an encoder to set values on the compute function
let commandEncoder = commandBuffer?.makeComputeCommandEncoder()
commandEncoder?.setComputePipelineState(adderComputePipelineState)
// Set the parameters of our GPU function
commandEncoder?.setBuffer(arr1Buff, offset: 0, index: 0)
commandEncoder?.setBuffer(arr2Buff, offset: 0, index: 1)
commandEncoder?.setBuffer(resultBuff, offset: 0, index: 2)
// Figure out how many threads we need to use for our operation
let threadsPerGrid = MTLSize(width: count, height: 1, depth: 1)
let maxThreadsPerThreadgroup = adderComputePipelineState.maxTotalThreadsPerThreadgroup
let threadsPerThreadgroup = MTLSize(width: maxThreadsPerThreadgroup, height: 1, depth: 1)
commandEncoder?.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
// Tell the encoder that it is done encoding. Now we can send this off to the GPU.
commandEncoder?.endEncoding()
return (commandBuffer, resultBuff)
}
func addMetal(_ arr1: [Float], _ arr2: [Float]) {
let (commandBuffer, resultBuff) = setupMetal(arr1: arr1, arr2: arr2)
let tic = DispatchTime.now().uptimeNanoseconds
// Push this command to the command queue for processing
commandBuffer?.commit()
// Wait until the GPU function completes before working with any of the data
commandBuffer?.waitUntilCompleted()
// Get the pointer to the beginning of our data
let count = arr1.count
var resultBufferPointer = resultBuff?.contents().bindMemory(to: Float.self, capacity: MemoryLayout<Float>.size * count)
// Print out elapsed time
let toc = DispatchTime.now().uptimeNanoseconds
let elapsed = Float(toc - tic) / 1_000_000_000
print("\nMetal GPU elapsed time is \(elapsed) s")
// Print out the results
for i in 0..<3 {
let a1 = String(format: "%.4f", arr1[i])
let a2 = String(format: "%.4f", arr2[i])
let y = String(format: "%.4f", Float(resultBufferPointer!.pointee))
print("\(a1) + \(a2) = \(y)")
resultBufferPointer = resultBufferPointer?.advanced(by: 1)
}
}
#include <metal_stdlib>
using namespace metal;
kernel void adder(
constant float *array1 [[ buffer(0) ]],
constant float *array2 [[ buffer(1) ]],
device float *result [[ buffer(2) ]],
uint index [[ thread_position_in_grid ]])
{
result[index] = array1[index] + array2[index];
}
Results from running the above code on a 2019 MacBook Pro are given below. Specs for the laptop are 2.6 GHz 6-Core Intel Core i7, 32 GB 2667 MHz DDR4, Intel UHD Graphics 630 1536 MB, and AMD Radeon Pro 5500M.
Accelerate vDSP elapsed time is 0.004532601 s
7.8964 + 6.3815 = 14.2779
9.3661 + 8.9641 = 18.3301
4.5389 + 8.5737 = 13.1126
Metal GPU elapsed time is 0.012219718 s
7.8964 + 6.3815 = 14.2779
9.3661 + 8.9641 = 18.3301
4.5389 + 8.5737 = 13.1126
Based on the elapsed times, the Accelerate function is faster than the Metal compute function. I think this is because I did not properly define the threads. How do I determine the optimum number of threads per grid and threads per thread group for this example?
// Figure out how many threads we need to use for our operation
let threadsPerGrid = MTLSize(width: count, height: 1, depth: 1)
let maxThreadsPerThreadgroup = adderComputePipelineState.maxTotalThreadsPerThreadgroup
let threadsPerThreadgroup = MTLSize(width: maxThreadsPerThreadgroup, height: 1, depth: 1)
commandEncoder?.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)

For metal you are measuring time in both computation and data transfer from GPU to CPU and also creating array on CPU.
You should use addcompletedhandler for gpu compution time

Related

Matching Torch STFT with Accelerate

Im trying to re-implement Torch's STFT code in Swift with Accelerate / vDSP, to produce a Log Mel Spectrogram by post processing the STFT so I can use the Mel Spectrogram as an input for a CoreML port of OpenAI's Whisper
Pytorch's native STFT / Mel code produces this Spectrogram (its clipped due to importing raw float 32s into Photoshop lol)
and mine:
Obviously the two things to notice are the values, and the lifted frequency components.
The STFT Docs here https://pytorch.org/docs/stable/generated/torch.stft.html
X[ω,m]=
k=0
∑
win_length-1

window[k] input[m×hop_length+k] * exp(−j * (2π⋅ωk) /win_length)
I believe Im properly handling window[k] input[m×hop_length+k] but I'm a bit lost as to how to calculate the exponent and what -J is referring to in the documentation, and how to convert the final exponential in vDSP. Also, if its a sum, how do I get the 200 elements I need!?
My Log Mel Spectrogram
My code follows:
func processData(audio: [Int16]) -> [Float]
{
assert(self.sampleCount == audio.count)
var audioFloat:[Float] = [Float](repeating: 0, count: audio.count)
vDSP.convertElements(of: audio, to: &audioFloat)
vDSP.divide(audioFloat, 32768.0, result: &audioFloat)
// Up to this point, Python and swift are numerically identical
// insert numFFT/2 samples before and numFFT/2 after so we have a extra numFFT amount to process
// TODO: Is this stricly necessary?
audioFloat.insert(contentsOf: [Float](repeating: 0, count: self.numFFT/2), at: 0)
audioFloat.append(contentsOf: [Float](repeating: 0, count: self.numFFT/2))
// Split Complex arrays holding the FFT results
var allSampleReal = [[Float]](repeating: [Float](repeating: 0, count: self.numFFT/2), count: self.melSampleCount)
var allSampleImaginary = [[Float]](repeating: [Float](repeating: 0, count: self.numFFT/2), count: self.melSampleCount)
// Step 2 - we need to create 200 x 3000 matrix of STFTs - note we appear to want to output complex numbers (?)
for (m) in 0 ..< self.melSampleCount
{
// Slice numFFTs every hop count (barf) and make a mel spectrum out of it
// audioFrame ends up holding split complex numbers
var audioFrame = Array<Float>( audioFloat[ (m * self.hopCount) ..< ( (m * self.hopCount) + self.numFFT) ] )
// Copy of audioFrame original samples
let audioFrameOriginal = audioFrame
assert(audioFrame.count == self.numFFT)
// Split Complex arrays holding a single FFT result of our Audio Frame, which gets appended to the allSample Split Complex arrays
var sampleReal:[Float] = [Float](repeating: 0, count: self.numFFT/2)
var sampleImaginary:[Float] = [Float](repeating: 0, count: self.numFFT/2)
sampleReal.withUnsafeMutableBytes { unsafeReal in
sampleImaginary.withUnsafeMutableBytes { unsafeImaginary in
vDSP.multiply(audioFrame,
hanningWindow,
result: &audioFrame)
var complexSignal = DSPSplitComplex(realp: unsafeReal.bindMemory(to: Float.self).baseAddress!,
imagp: unsafeImaginary.bindMemory(to: Float.self).baseAddress!)
audioFrame.withUnsafeBytes { unsafeAudioBytes in
vDSP.convert(interleavedComplexVector: [DSPComplex](unsafeAudioBytes.bindMemory(to: DSPComplex.self)),
toSplitComplexVector: &complexSignal)
}
// Step 3 - creating the FFT
self.fft.forward(input: complexSignal, output: &complexSignal)
}
}
// We need to match: https://pytorch.org/docs/stable/generated/torch.stft.html
// At this point, I'm unsure how to continue?
// let twoπ = Float.pi * 2
// let freqstep:Float = Float(16000 / (self.numFFT/2))
//
// var w:Float = 0.0
// for (k) in 0 ..< self.numFFT/2
// {
// let j:Float = sampleImaginary[k]
// let sample = audioFrame[k]
//
// let exponent = -j * ( (twoπ * freqstep * Float(k) ) / Float((self.numFFT/2)))
//
// w += powf(sample, exponent)
// }
allSampleReal[m] = sampleReal
allSampleImaginary[m] = sampleImaginary
}
// We now have allSample Split Complex holding 3000 200 dimensional real and imaginary FFT results
// We create flattened 3000 x 200 array of DSPSplitComplex values
var flattnedReal:[Float] = allSampleReal.flatMap { $0 }
var flattnedImaginary:[Float] = allSampleImaginary.flatMap { $0 }

what causes the iteration of an array to conclude prematurely in metal?

What I'm trying to do
I'm testing out metals capability to work with loops. Since I can't define new constants in metal, I'm passing a uint into the buffer and use it to iterate over an array filled with integers. It looks like this in swift.
let array1: [Int] = [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
The problem(s)
However when reading the result array buffer in Swift after completing the loop in metal, it seems like not every element has been allocated.
#include <metal_stdlib>
using namespace metal;
kernel void shader(constant int *arr [[ buffer(0) ]],
device int *resultArray [[ buffer(1) ]],
constant uint &iter [[ buffer(2) ]]) // value of 12
{
for (uint i = 0; i < iter; i++){
resultArray[i] = arr[i];
}
}
out
1
2
3
4
5
6
0
0
0
0
0
0
Similarly, using the iterator to set allocate each element of resultArray, yields strange results
for (uint i = 0; i < iter; i++){
resultArray[i] = i;
}
out
4294967296
12884901890
21474836484
30064771078
38654705672
47244640266
0
0
0
0
0
0
Multiplication seems to work
for (uint i = 0; i < iter; i++){
resultArray[i] = arr[i] * i;
}
out
0
4
12
24
40
60
0
0
0
0
0
0
Addition does not
for (uint i = 0; i < iter; i++){
resultArray[i] = arr[i] + i;
}
out
4294967297
12884901892
21474836487
30064771082
38654705677
47244640272
0
0
0
0
0
0
When however, I set iter to a value of for example 24 or higher, it at least iterated over the whole arrays of size 12.
for (uint i = 0; i < iter; i++){ // iter now value of 100
resultArray[i] = arr[i] * iter;
}
100
200
300
400
500
600
100
200
300
400
500
600
What is going on here?
MCVE
yes, it's a lot of code to get a simple loop running in metal, please bare with me
main.swift
import MetalKit
let array1: [Int] = [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
func gpuProcess(arr1: [Int]) {
let size = arr1.count // value of 12
// GPU we want to use
let device = MTLCreateSystemDefaultDevice()
// Fifo queue for sending commands to the gpu
let commandQueue = device?.makeCommandQueue()
// The library for getting our metal functions
let gpuFunctionLibrary = device?.makeDefaultLibrary()
// Grab gpu function
let additionGPUFunction = gpuFunctionLibrary?.makeFunction(name: "shader")
var additionComputePipelineState: MTLComputePipelineState!
do {
additionComputePipelineState = try device?.makeComputePipelineState(function: additionGPUFunction!)
} catch {
print(error)
}
// Create buffers to be sent to the gpu from our array
let arr1Buff = device?.makeBuffer(bytes: arr1,
length: MemoryLayout<Int>.size * size ,
options: .storageModeShared)
let resultBuff = device?.makeBuffer(length: MemoryLayout<Int>.size * size,
options: .storageModeShared)
// Create the buffer to be sent to the command queue
let commandBuffer = commandQueue?.makeCommandBuffer()
// Create an encoder to set values on the compute function
let commandEncoder = commandBuffer?.makeComputeCommandEncoder()
commandEncoder?.setComputePipelineState(additionComputePipelineState)
// Set the parameters of our gpu function
commandEncoder?.setBuffer(arr1Buff, offset: 0, index: 0)
commandEncoder?.setBuffer(resultBuff, offset: 0, index: 1)
// Set parameters for our iterator
var count = size
commandEncoder?.setBytes(&count, length: MemoryLayout.size(ofValue: count), index: 2)
// Figure out how many threads we need to use for our operation
let threadsPerGrid = MTLSize(width: 1, height: 1, depth: 1)
let maxThreadsPerThreadgroup = additionComputePipelineState.maxTotalThreadsPerThreadgroup // 1024
let threadsPerThreadgroup = MTLSize(width: maxThreadsPerThreadgroup, height: 1, depth: 1)
commandEncoder?.dispatchThreads(threadsPerGrid,
threadsPerThreadgroup: threadsPerThreadgroup)
// Tell encoder that it is done encoding. Now we can send this off to the gpu.
commandEncoder?.endEncoding()
// Push this command to the command queue for processing
commandBuffer?.commit()
// Wait until the gpu function completes before working with any of the data
commandBuffer?.waitUntilCompleted()
// Get the pointer to the beginning of our data
var resultBufferPointer = resultBuff?.contents().bindMemory(to: Int.self,
capacity: MemoryLayout<Int>.size * size)
// Print out all of our new added together array information
for _ in 0..<size {
print("\(Int(resultBufferPointer!.pointee) as Any)")
resultBufferPointer = resultBufferPointer?.advanced(by: 1)
}
}
// Call function
gpuProcess(arr1: array1)
compute.metal
#include <metal_stdlib>
using namespace metal;
kernel void shader(constant int *arr [[ buffer(0) ]],
device int *resultArray [[ buffer(1) ]],
constant uint &iter [[ buffer(2) ]]) // value of 12
{
for (uint i = 0; i < iter; i++){
resultArray[i] = arr[i] * iter;
}
}

You are using 64 bit Int in Swift and 32 bit integers in MSL. Your GPU threads are also overlapping their work. Instead, use Int32 in Swift and make each thread process their own piece of data. Like this
import MetalKit
let array1: [Int32] = [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
func gpuProcess(arr1: [Int32]) {
let size = arr1.count // value of 12
// GPU we want to use
let device = MTLCreateSystemDefaultDevice()
// Fifo queue for sending commands to the gpu
let commandQueue = device?.makeCommandQueue()
// The library for getting our metal functions
let gpuFunctionLibrary = device?.makeDefaultLibrary()
// Grab gpu function
let additionGPUFunction = gpuFunctionLibrary?.makeFunction(name: "shader")
var additionComputePipelineState: MTLComputePipelineState!
do {
additionComputePipelineState = try device?.makeComputePipelineState(function: additionGPUFunction!)
} catch {
print(error)
}
// Create buffers to be sent to the gpu from our array
let arr1Buff = device?.makeBuffer(bytes: arr1,
length: MemoryLayout<Int32>.stride * size ,
options: .storageModeShared)
let resultBuff = device?.makeBuffer(length: MemoryLayout<Int32>.stride * size,
options: .storageModeShared)
// Create the buffer to be sent to the command queue
let commandBuffer = commandQueue?.makeCommandBuffer()
// Create an encoder to set values on the compute function
let commandEncoder = commandBuffer?.makeComputeCommandEncoder()
commandEncoder?.setComputePipelineState(additionComputePipelineState)
// Set the parameters of our gpu function
commandEncoder?.setBuffer(arr1Buff, offset: 0, index: 0)
commandEncoder?.setBuffer(resultBuff, offset: 0, index: 1)
// Set parameters for our iterator
var count = size
commandEncoder?.setBytes(&count, length: MemoryLayout.size(ofValue: count), index: 2)
// Figure out how many threads we need to use for our operation
let threadsPerGrid = MTLSize(width: 1, height: 1, depth: 1)
let maxThreadsPerThreadgroup = additionComputePipelineState.maxTotalThreadsPerThreadgroup // 1024
let threadsPerThreadgroup = MTLSize(width: maxThreadsPerThreadgroup, height: 1, depth: 1)
commandEncoder?.dispatchThreads(threadsPerGrid,
threadsPerThreadgroup: threadsPerThreadgroup)
// Tell encoder that it is done encoding. Now we can send this off to the gpu.
commandEncoder?.endEncoding()
// Push this command to the command queue for processing
commandBuffer?.commit()
// Wait until the gpu function completes before working with any of the data
commandBuffer?.waitUntilCompleted()
// Get the pointer to the beginning of our data
var resultBufferPointer = resultBuff?.contents().bindMemory(to: Int32.self,
capacity: MemoryLayout<Int32>.stride * size)
// Print out all of our new added together array information
for _ in 0..<size {
print("\(Int32(resultBufferPointer!.pointee) as Any)")
resultBufferPointer = resultBufferPointer?.advanced(by: 1)
}
}
// Call function
gpuProcess(arr1: array1)
Kernel:
#include <metal_stdlib>
using namespace metal;
kernel void shader(constant int *arr [[ buffer(0) ]],
device int *resultArray [[ buffer(1) ]],
constant uint &iter [[ buffer(2) ]],
uint gid [[ thread_position_in_grid ]], // this is thread index in grid, since you have height and depth of a dispatch set to 1 in CPU code, you can use 1D `int` here.
)
{
// Early out if gid is out of array boudns
if(gid >= iter)
{
return;
}
// Each thread processes it's own data
resultArray[gid] = arr[gid] * iter;
}
For more information on how to use Metal for compute refer to developer docs and for the information about attributes such as thread_position_in_grid refer to Metal Shading Language specification.

memory issue: realtime eye localization

I'm currently working on a iOS-application, which should be able to detect the localization. I've created an tflite which comprises some CNN layers. In order to use the tflite in XCode/Swift I've created a helper class in which the tflite calculates the output. Whenever I run the predict-function once, it works. But apparently the predict function doesn't work in real-time camera-thread.
After about 7 seconds, XCode is throwing the following error:
Thread 1: EXC_BAD_ACCESS (code=1, address=0x123424001). This error must be evoken by looping through the image. Since I need each pixel value I'm using the solution suggested by Firebase.But apparently this solution is not waterproofed. Can anybody help to resolve this memory issue?
func creatInputForCNN(resizedImage: UIImage?) -> Data{
// In this section of the code I loop through the image (150,200,3)
// in order to fetch each pixel value (RGB).
let image: CGImage = resizedImage.cgImage!
guard let context = CGContext(
data: nil,
width: image.width, height: image.height,
bitsPerComponent: 8, bytesPerRow: image.width * 4,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {return nil}
context.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
guard let imageData = context.data else {return nil}
let size_w = 150
let size_H = 200
var inputData:Data?
inputData = Data()
for row in 0 ..< size_H{
for col in 0 ..< size_w {
let offset = 4 * (row * context.width + col)
// (Ignore offset 0, the unused alpha channel)
let red = imageData.load(fromByteOffset: offset+1, as: UInt8.self)
let green = imageData.load(fromByteOffset: offset+2, as: UInt8.self)
let blue = imageData.load(fromByteOffset: offset+3, as: UInt8.self)
// Normalize channel values to [0.0, 1.0]. This requirement varies
// by model. For example, some models might require values to be
// normalized to the range [-1.0, 1.0] instead, and others might
// require fixed-point values or the original bytes.
var normalizedRed:Float32 = Float32(red) / 255
var normalizedGreen:Float32 = (Float32(green) / 255
var normalizedBlue:Float32 = Float32(blue) / 255
// Append normalized values to Data object in RGB order.
let elementSize = MemoryLayout.size(ofValue: normalizedRed)
var bytes = [UInt8](repeating: 0, count: elementSize)
memcpy(&bytes, &normalizedRed, elementSize)
inputData!.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedGreen, elementSize)
inputData!.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedBlue, elementSize)
inputData!.append(&bytes, count: elementSize)
return inputData
}
This is the code

Perform normalization using Accelerate framework

I need to perform simple math operation on Data that contains RGB pixels data. Currently Im doing this like so:
let imageMean: Float = 127.5
let imageStd: Float = 127.5
let rgbData: Data // Some data containing RGB pixels
let floats = (0..<rgbData.count).map {
(Float(rgbData[$0]) - imageMean) / imageStd
}
return Data(bytes: floats, count: floats.count * MemoryLayout<Float>.size)
This works, but it's too slow. I was hoping I could use the Accelerate framework to calculate this faster, but have no idea how to do this. I reserved some space so that it's not allocated every time this function starts, like so:
inputBufferDataNormalized = malloc(width * height * 3) // 3 channels RGB
I tried few functions, like vDSP_vasm, but I couldn't make it work. Can someone direct me to how to use it? Basically I need to replace this map function, because it takes too long time. And probably it would be great to use pre-allocated space all the time.

Following up on my comment on your other related question. You can use SIMD to parallelize the operation, but you'd need to split the original array into chunks.
This is a simplified example that assumes that the array is exactly divisible by 64, for example, an array of 1024 elements:
let arr: [Float] = (0 ..< 1024).map { _ in Float.random(in: 0...1) }
let imageMean: Float = 127.5
let imageStd: Float = 127.5
var chunks = [SIMD64<Float>]()
chunks.reserveCapacity(arr.count / 64)
for i in stride(from: 0, to: arr.count, by: 64) {
let v = SIMD64.init(arr[i ..< i+64])
chunks.append((v - imageMean) / imageStd) // same calculation using SIMD
}
You can now access each chunk with a subscript:
var results: [Float] = []
results.reserveCapacity(arr.count)
for chunk in chunks {
for i in chunk.indices {
results.append(chunk[i])
}
}
Of course, you'd need to deal with a remainder if the array isn't exactly divisible by 64.

I have found a way to do this using Accelerate. First I reserve space for converted buffer like so
var inputBufferDataRawFloat = [Float](repeating: 0, count: width * height * 3)
Then I can use it like so:
let rawBytes = [UInt8](rgbData)
vDSP_vfltu8(rawBytes, 1, &inputBufferDataRawFloat, 1, vDSP_Length(rawBytes.count))
vDSP.add(inputBufferDataRawScalars.mean, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
vDSP.multiply(inputBufferDataRawScalars.std, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
return Data(bytes: inputBufferDataRawFloat, count: inputBufferDataRawFloat.count * MemoryLayout<Float>.size)
Works very fast. Maybe there is better function in Accelerate, if anyone know of it, please let me know. It need to perform function (A[n] + B) * C (or to be exact (A[n] - B) / C but the first one could be converted to this).

Generate random values based on a seed Swift 3

lets say you were making a game and this game has procedurally generated terrain from a seed that the user inputted in his world-creation menu
and this seed generates a set of values that only change if the seed changes
for instance lets say you want to get a random set of integers to generate in a for-loop and pull randomly from an array the set of integers stay the same every time and pull the same items from the array in the exact same order every time you run the for-loop until you change the seed
how would one achieve this in swift
in my code here i can get the terrain to spawn in but it does not generate the same every time which is what i am trying to get it to do
override func didMove(to view: SKView) {
let tile1 = SKTileDefinition(texture: SKTexture(imageNamed: "stone") ,size: CGSize(width: 64, height: 64))
let tile2 = SKTileDefinition(texture: SKTexture(imageNamed: "water") ,size: CGSize(width: 64, height: 64))
let tile3 = SKTileDefinition(texture: SKTexture(imageNamed: "sand") ,size: CGSize(width: 64, height: 64))
let tile4 = SKTileDefinition(texture: SKTexture(imageNamed: "grass") ,size: CGSize(width: 64, height: 64))
let tileGroup1 = SKTileGroup(tileDefinition: tile1)
let tileGroup2 = SKTileGroup(tileDefinition: tile2)
let tileGroup3 = SKTileGroup(tileDefinition: tile3)
let tileGroup4 = SKTileGroup(tileDefinition: tile4)
let tileGroup5 = SKTileGroup(tileDefinition: tile4)
let tileGroup6 = SKTileGroup(tileDefinition: tile4)
let tileSet = SKTileSet(tileGroups: [tileGroup1,tileGroup2,tileGroup3,tileGroup4,tileGroup5,tileGroup6])
let columns = 5
let rows = 5
let tileSize = CGSize(width: 64, height: 64)
//this is another GKNoise class called Noise
let noise = Noise()
let noiseMap = GKNoiseMap(noise, size: vector_double2(10.0,10.0), origin: vector_double2(0.0,0.0), sampleCount: vector_int2(100), seamless: true)
//this is another SKTileMapNode Class called TileMap and a class func called tileMapNodes
let tileMap = TileMap.tileMapNodes(tileSet: tileSet, columns: columns, rows: rows, tileSize: tileSize, from: noiseMap, tileTypeNoiseMapThresholds: [(-1.0 as NSNumber),(+1.0 as NSNumber)])
tileMapNode = tileMap.first!
let seed = Int
for column in 0 ..< tileMapNode.numberOfColumns {
for row in 0 ..< tileMapNode.numberOfRows {
let rand = Int(arc4random_uniform(UInt32(tileSet.tileGroups.count)))
print(rand)
let tile = tileMapNode.tileSet.tileGroups[rand]
tileMapNode.setTileGroup(tile, forColumn: column, row: row)
}
}

If you want to generate a random value based on a specific seed use: srand48 and drand48:
srand allow you to specify the seed:
srand48(230)
Then drand48 will give you a double between 0 and 1:
let doubleValue = drand48()
So in your case, if you want an Int between 0 and 9, you can do something like:
srand48(230)
print("Random number: \(1 + Int((drand48() * 8) + 0.5))")
print("Random number: \(1 + Int((drand48() * 8) + 0.5))")
print("Random number: \(1 + Int((drand48() * 8) + 0.5))")
It will give you the 3 pseudo random numbers until you change the seed
Here is the drand48 applied to your new question:
let seed = srand48(230)
for column in 0 ..< tileMapNode.numberOfColumns {
for row in 0 ..< tileMapNode.numberOfRows {
let rand = Int(Double(tileSet.tileGroups.count) * drand48() - 0.5)
print(rand)
let tile = tileMapNode.tileSet.tileGroups[rand]
tileMapNode.setTileGroup(tile, forColumn: column, row: row)
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Determine number of threads for element-wise array addition in Metal - swift

For metal you are measuring time in both computation and data transfer from GPU to CPU and also creating array on CPU. You should use addcompletedhandler for gpu compution time

Related

Matching Torch STFT with Accelerate

what causes the iteration of an array to conclude prematurely in metal?

memory issue: realtime eye localization

Perform normalization using Accelerate framework

Generate random values based on a seed Swift 3

Categories

Resources