How to populate a pixel buffer much faster?

How to populate a pixel buffer much faster? - swift

As part of a hobby project, I'm working on a 2D game engine that will draw each pixel every frame, using a color from a palette.
I am looking for a way to do that while maintaining a reasonable frame rate (60fps being the minimum).
Without any game-logic in place, I am updating the values of my pixels with some value form the palette.
I'm currently taking the mod of an index, to (hopefully) prevent the compiler from doing some loop-optimisation it could do with a fixed value.
Below is my (very naive?) implementation of updating the bytes in the pixel array.
On an iPhone 12 Pro, each run of updating all pixel values takes on average 43 ms, while on a simulator running on an M1 mac, it takes 15 ms. Both unacceptable, as that would leave not for any additional game logic (which would be much more operations than taking the mod of an Int).
I was planning to look into Metal and set up a surface, but clearly the bottleneck here is the CPU, so if I can optimize this code, I could go for a higher-level framework.
Any suggestions on a performant way to write this many bytes much, much faster (parallelisation is not an option)?
Instruments shows that most of the time is being spent in Swifts IndexingIterator.next() function. Maybe there is way to reduce the time spent there, there is quite a substantial subtree inside it.
struct BGRA
{
let blue: UInt8
let green: UInt8
let red: UInt8
let alpha: UInt8
}
let BGRAPallet =
[
BGRA(blue: 124, green: 124, red: 124, alpha: 0xff),
BGRA(blue: 252, green: 0, red: 0, alpha: 0xff),
// ... 62 more values in my code, omitting here for brevity
]
private func test()
{
let screenWidth: Int = 256
let screenHeight: Int = 240
let pixelBufferPtr = UnsafeMutableBufferPointer<BGRA>.allocate(capacity: screenWidth * screenHeight)
let runCount = 1000
let start = Date.now
for _ in 0 ..< runCount
{
for index in 0 ..< pixelBufferPtr.count
{
pixelBufferPtr[index] = BGRAPallet[index % BGRAPallet.count]
}
}
let elapsed = Date.now.timeIntervalSince(start)
print("Average time per run: \((Int(elapsed) * 1000) / runCount) ms")
}

First of all, I don't believe you're testing an optimized build for two reasons:
You say “This was measured with optimization set to Fastest [-O3].” But the Swift compiler doesn't recognize -O3 as a command-line flag. The C/C++ compiler recognizes that flag. For Swift the flags are -Onone, -O, -Osize, and -Ounchecked.
I ran your code on my M1 Max MacBook Pro in Debug configuration and it reported 15ms. Then I ran it in Release configuration and it reported 0ms. I had to increase the screen size to 2560x2400 (100x the pixels) to get it to report a time of 3ms.
Now, looking at your code, here are some things that stand out:
You're picking a color using BGRAPalette[index % BGRAPalette.count]. Since your palette size is 64, you can say BGRAPalette[index & 0b0011_1111] for the same result. I expected Swift to optimize that for me, but apparently it didn't, because making that change reduced the reported time to 2ms.
Indexing into BGRAPalette incurs a bounds check. You can avoid the bounds check by grabbing an UnsafeBufferPointer for the palette. Adding this optimization reduced the reported time to 1ms.
Here's my version:
public struct BGRA {
let blue: UInt8
let green: UInt8
let red: UInt8
let alpha: UInt8
}
func rng() -> UInt8 { UInt8.random(in: .min ... .max) }
let BGRAPalette = (0 ..< 64).map { _ in
BGRA(blue: rng(), green: rng(), red: rng(), alpha: rng())
}
public func test() {
let screenWidth: Int = 2560
let screenHeight: Int = 2400
let pixelCount = screenWidth * screenHeight
let pixelBuffer = UnsafeMutableBufferPointer<BGRA>.allocate(capacity: pixelCount)
let runCount = 1000
let start = SuspendingClock().now
BGRAPalette.withUnsafeBufferPointer { paletteBuffer in
for _ in 0 ..< runCount
{
for index in 0 ..< pixelCount
{
pixelBuffer[index] = paletteBuffer[index & 0b0011_1111]
}
}
}
let end = SuspendingClock().now
let elapsed = end - start
let msElapsed = elapsed.components.seconds * 1000 + elapsed.components.attoseconds / 1_000_000_000_000_000
print("Average time per run: \(msElapsed / Int64(runCount)) ms")
// return pixelBuffer
}
#main
struct MyMain {
static func main() {
test()
}
}
In addition to the two optimizations I described, I removed the dependency on Foundation (so I could paste the code into the compiler explorer) and corrected the spelling of ‘palette‘.
But realistically, even this isn't probably isn't particularly good test of your fill rate. You didn't say what kind of game you want to write, but given your screen size of 256x240, it's likely to use a tile-based map and sprites. If so, you shouldn't copy a pixel at a time. You can write a blitter that copies blocks of pixels at a time, using CPU instructions that operate on more than 32 bits at a time. ARM64 has 128-bit (16-byte) registers.
But even more realistically, you should use learn to use the GPU for your blitting. Not only is it faster for this sort of thing, it's probably more power-efficient too. Even though you're lighting up more of the chip, you're lighting it up for shorter intervals.

Okay so i gave this a shot. You can probably move to using single input (or instruction), multiple output. This would probably speed up the whole process by 10 - 20 percent or so.
Here is the code link (same code will be pasted below for brevity): https://codecatch.net/post/4b9683bf-8e35-4bf5-a1a9-801ab2e73805
I made two versions just in case your systems architecture doesn't support simd_uint4. Let me know if this is what you were looking for.
import simd
private func test() {
let screenWidth: Int = 256
let screenHeight: Int = 240
let pixelBufferPtr = UnsafeMutableBufferPointer<BGRA>.allocate(capacity: screenWidth * screenHeight)
let runCount = 1000
let start = Date.now
for _ in 0 ..< runCount {
var index = 0
var palletIndex = 0
let palletCount = BGRAPallet.count
while index < pixelBufferPtr.count {
let bgra = BGRAPallet[palletIndex]
let bgraVector = simd_uint4(bgra.blue, bgra.green, bgra.red, bgra.alpha)
let maxCount = min(pixelBufferPtr.count - index, 4)
let pixelBuffer = pixelBufferPtr.baseAddress! + index
pixelBuffer.storeBytes(of: bgraVector, as: simd_uint4.self)
palletIndex += 1
if palletIndex == palletCount {
palletIndex = 0
}
index += maxCount
}
}
let elapsed = Date.now.timeIntervalSince(start)
print("Average time per run: \((Int(elapsed) * 1000) / runCount) ms")
}

Related

Perform normalization using Accelerate framework

I need to perform simple math operation on Data that contains RGB pixels data. Currently Im doing this like so:
let imageMean: Float = 127.5
let imageStd: Float = 127.5
let rgbData: Data // Some data containing RGB pixels
let floats = (0..<rgbData.count).map {
(Float(rgbData[$0]) - imageMean) / imageStd
}
return Data(bytes: floats, count: floats.count * MemoryLayout<Float>.size)
This works, but it's too slow. I was hoping I could use the Accelerate framework to calculate this faster, but have no idea how to do this. I reserved some space so that it's not allocated every time this function starts, like so:
inputBufferDataNormalized = malloc(width * height * 3) // 3 channels RGB
I tried few functions, like vDSP_vasm, but I couldn't make it work. Can someone direct me to how to use it? Basically I need to replace this map function, because it takes too long time. And probably it would be great to use pre-allocated space all the time.

Following up on my comment on your other related question. You can use SIMD to parallelize the operation, but you'd need to split the original array into chunks.
This is a simplified example that assumes that the array is exactly divisible by 64, for example, an array of 1024 elements:
let arr: [Float] = (0 ..< 1024).map { _ in Float.random(in: 0...1) }
let imageMean: Float = 127.5
let imageStd: Float = 127.5
var chunks = [SIMD64<Float>]()
chunks.reserveCapacity(arr.count / 64)
for i in stride(from: 0, to: arr.count, by: 64) {
let v = SIMD64.init(arr[i ..< i+64])
chunks.append((v - imageMean) / imageStd) // same calculation using SIMD
}
You can now access each chunk with a subscript:
var results: [Float] = []
results.reserveCapacity(arr.count)
for chunk in chunks {
for i in chunk.indices {
results.append(chunk[i])
}
}
Of course, you'd need to deal with a remainder if the array isn't exactly divisible by 64.

I have found a way to do this using Accelerate. First I reserve space for converted buffer like so
var inputBufferDataRawFloat = [Float](repeating: 0, count: width * height * 3)
Then I can use it like so:
let rawBytes = [UInt8](rgbData)
vDSP_vfltu8(rawBytes, 1, &inputBufferDataRawFloat, 1, vDSP_Length(rawBytes.count))
vDSP.add(inputBufferDataRawScalars.mean, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
vDSP.multiply(inputBufferDataRawScalars.std, inputBufferDataRawFloat, result: &inputBufferDataRawFloat)
return Data(bytes: inputBufferDataRawFloat, count: inputBufferDataRawFloat.count * MemoryLayout<Float>.size)
Works very fast. Maybe there is better function in Accelerate, if anyone know of it, please let me know. It need to perform function (A[n] + B) * C (or to be exact (A[n] - B) / C but the first one could be converted to this).

for (increase time delay) depending on object count

A math question. I am trying to animate objects sequentially, but I can't figure out formula which will allow me to set a delay smoothly. If I have, lets say, 2 object in my array I want them to animate almost normally with i*0.25 delay, but if I have 25 objects I want them to animate rather quickly. Yes I can try to set manual ratio switching the .count, but I think there should be a nice formula for this?
for (i,object) in objects.enumerated() {
object.animate(withDelay: (i * 0.25) / objects.count)
}

Your best bet is to choose an animation time that will happen EVERY time, no matter the # of variables.
let animateTime = 2 // 2 secs
let animateTimePerObject:Double = animateTime/objects.count
for (i,object) in objects.enumerated() {
object.animate(withDelay: (i * animateTimePerObject)
}
Say there are 10 objects, and you want to animate for 2 seconds. This will set animateTimePerObject = 2/10 = .2 Each item will be delayed by i (whatever position they are at) * the animatetime per object. So in order, 0, 0.2, 0.4, 0.6, 0.8, 0.1, 0.12, 0.14, 0.16, 0.18, 0.2.
Same could be done with 2 objects.
OR you could do a log function that would allow for growth but at a slower rate. Here are some functions you could look at using.
Add this function to create a custom log functionality
func logC(val: Double, forBase base: Double) -> Double {
return log(val)/log(base)
}
for (i,object) in objects.enumerated() {
let delay = i == 0 ? .25 : logC(Double(i)*10, forBase: 10) * 0.25
object.animate(withDelay: delay)
}
This will slow down your 0.25*i rate to a much slower one.
0 -> .25
1 -> Log(20, base: 10) = 1.3 * 0.25 = .325
...
25 -> Log(250, base: 10) = 2.3979 * 0.25 = .6
where it would have been
0 -> .25
1 -> .25 * 2 = .5
25 -> .25 * 25 = 6.25
You can play with the log function as you like, but these are just some ideas. It's not precise as to what kind of algorithm you are looking for.
NOTE: May be syntax issues in there slightly, with the Doubles and Ints but you can adjust! :)
Comparing Log and Sqrt:
func logC(val: Double, forBase base: Double) -> Double {
return log(val)/log(base)
}
for i in 0..<25 {
let delay = i == 0 ? 0.25 : pow(logC(val: Double(i)*10, forBase: 10) * 0.25, log(1/Double(i))) * 0.45
let delay2 = i == 0 ? 0.25 : sqrt(Double(i)) * 0.5
print(delay, delay2)
}
0.25 0.25
0.45 0.5
0.9801911408397829 0.7071067811865476
1.3443747821649137 0.8660254037844386
1.5999258430124579 1.0
1.7853405889097305 1.118033988749895
1.9234257236285595 1.224744871391589
2.0282300761096543 1.3228756555322954
2.1088308307833894 1.4142135623730951
2.1713433790123178 1.5
2.2200343505615683 1.5811388300841898
2.2579686175608598 1.6583123951777
2.2874024254699274 1.7320508075688772
2.3100316733059247 1.8027756377319946
2.32715403828525 1.8708286933869707
2.33977794890637 1.9364916731037085
2.348697701417663 2.0
2.3545463958925756 2.0615528128088303
2.357833976756366 2.1213203435596424
2.358975047645847 2.179449471770337
2.35830952737025 2.23606797749979
2.3561182050020992 2.29128784747792
2.35263460234384 2.345207879911715
2.348054124507179 2.3979157616563596
2.3425411926260447 2.449489742783178

You can go with the function below, which depends on the object count as you specified earlier and if the array will have more objects each animation will be executed with less delay but nonetheless first item's delay will be longer than latter:
for (i,object) in objects.enumerated() {
object.animate(withDelay: ((1/((i+1)*0.5)) * 0.25) / objects.count)
}
There are a lot of parantheses but I hope it will increase readability, also I applied i+1 so you wont have division by zero problem for the first item.
With this formula I hope the delay will diminish gradually and smoothly when your array has large amount of objects.
Note:
If you think delay is too much when there are not much elements in the array (which will lower the "objects.count" number. Try replacing objects.count with (2 * objects.count)
Also if you think the reverse (delay is not much) when there are a lot of elements in the array (which will increase the "objects.count" number. Try replacing objects.count with (objects.count / 2)

Relative Strength Index in Swift

I am trying to code an RSI (which has been a good way for me to learn API data fetching and algorithms already).
The API I am fetching data from comes from a reputable exchange so I know the values my algorithm is analyzing are correct, that's a good start.
The issue I'm having is that the result of my calculations are completely off from what I can read on that particular exchange and which also provides an RSI indicator (I assume they analyze their own data, so the same data as I have).
I used the exact same API to translate the Ichimoku indicator into code and this time everything is correct! I believe my RSI calculations might be wrong somehow but I've checked and re-checked many times.
I also have a "literal" version of the code where every step is calculated like an excel sheet. It's pretty stupid in code but it validates the logic of the calculation and the results are the same as the following code.
Here is my code to calculate the RSI :
let period = 14
// Upward Movements and Downward Movements
var upwardMovements : [Double] = []
var downwardMovements : [Double] = []
for idx in 0..<15 {
let diff = items[idx + 1].close - items[idx].close
upwardMovements.append(max(diff, 0))
downwardMovements.append(max(-diff, 0))
}
// Average Upward Movements and Average Downward Movements
let averageUpwardMovement1 = upwardMovements[0..<period].reduce(0, +) / Double(period)
let averageDownwardMovement1 = downwardMovements[0..<period].reduce(0, +) / Double(period)
let averageUpwardMovement2 = (averageUpwardMovement1 * Double(period - 1) + upwardMovements[period]) / Double(period)
let averageDownwardMovement2 = (averageDownwardMovement1 * Double(period - 1) + downwardMovements[period]) / Double(period)
// Relative Strength
let relativeStrength1 = averageUpwardMovement1 / averageDownwardMovement1
let relativeStrength2 = averageUpwardMovement2 / averageDownwardMovement2
// Relative Strength Index
let rSI1 = 100 - (100 / (relativeStrength1 + 1))
let rSI2 = 100 - (100 / (relativeStrength2 + 1))
// Relative Strength Index Average
let relativeStrengthAverage = (rSI1 + rSI2) / 2
BitcoinRelativeStrengthIndex.bitcoinRSI = relativeStrengthAverage
Readings at 3:23pm this afternoon give 73.93 for my algorithm and 18.74 on the exchange. As the markets are crashing right now and I have access to different RSIs on different exchanges, they all display an RSI below 20 so my calculations are off.
Do you guys have any idea?

I am answering this 2 years later, but hopefully it helps someone.
RSI gets more precise the more data points you feed into it. For a default RSI period of 14, you should have at least 200 previous data points. The more, the better!
Let's suppose you have an array of close candle prices for a given market. The following function will return RSI values for each candle. You should always ignore the first data points, since they are not precise enough or the number of candles is not the 14 (or whatever your periods number is).
func computeRSI(on prices: [Double], periods: Int = 14, minimumPoints: Int = 200) -> [Double] {
precondition(periods > 1 && minimumPoints > periods && prices.count >= minimumPoints)
return Array(unsafeUninitializedCapacity: prices.count) { (buffer, count) in
buffer.initialize(repeating: 50)
var (previousPrice, gain, loss) = (prices[0], 0.0, 0.0)
for p in stride(from: 1, through: periods, by: 1) {
let price = prices[p]
let value = price - previousPrice
if value > 0 {
gain += value
} else {
loss -= value
}
previousPrice = price
}
let (numPeriods, numPeriodsMinusOne) = (Double(periods), Double(periods &- 1))
var avg = (gain: gain / numPeriods, loss: loss /numPeriods)
buffer[periods] = (avg.loss > .zero) ? 100 - 100 / (1 + avg.gain/avg.loss) : 100
for p in stride(from: periods &+ 1, to: prices.count, by: 1) {
let price = prices[p]
avg.gain *= numPeriodsMinusOne
avg.loss *= numPeriodsMinusOne
let value = price - previousPrice
if value > 0 {
avg.gain += value
} else {
avg.loss -= value
}
avg.gain /= numPeriods
avg.loss /= numPeriods
if avgLoss > .zero {
buffer[p] = 100 - 100 / (1 + avg.gain/avg.loss)
} else {
buffer[p] = 100
}
previousPrice = price
}
count = prices.count
}
}
Please note that the code is very imperative to reduce the amount of operations/loops and get the maximum compiler optimizations. You might be able to squeeze more performance using the Accelerate framework, though. We are also handling the edge case where you might get all gains or losses in a periods range.
If you want to have a running RSI calculation. Just store the last RSI value and perform the RSI equation for the new price.

Swift: Random delay only works first time? Random number func only executed once?

Alright so I'm building a sprite kit game where I need to delay an enemy spawn func repeatedly.
I've managed to generate random Doubles using this function:
func randomDelay()->Double {
var randLower : UInt32 = 1
var randUpper : UInt32 = 50
var randDelayTime = arc4random_uniform(randUpper - randLower) + randLower
var randDelayTimer = Double(randDelayTime) / 10
//var randDelay = randDelayTimer * Double(NSEC_PER_SEC)
//var newTime = dispatch_time(DISPATCH_TIME_NOW, Int64(randDelay))
println(randDelayTimer)
return randDelayTimer
}
I know this works because I printed randDelayTimer out several times and it does in fact generate random Doubles.
Here is where I try to spawn my enemies after the random delay repeatedly:
runAction(SKAction.repeatActionForever(
SKAction.sequence([
SKAction.waitForDuration(NSTimeInterval(randomDelay())),
SKAction.runBlock({self.spawnEnemy1()})))
This works, but only the first time it is run. Whatever random Double is generated by randomDelay() the FIRST TIME the sequence is cycled through is applied to every enemy that is spawned after that. For example if when the game is run, randomDelay() = 3.5, a 3.5 sec delay is repeated forever.
I know that randomDelay() is only being run that first time also because nothing is printed to the console after the first time.
How can I fix this?

You should use the following static method on SKAction:
class func waitForDuration(_ sec: NSTimeInterval,
withRange durationRange: NSTimeInterval) -> SKAction
From the SKAction documentation:
Each time the action is executed, the action computes a new random value for the duration. The duration may vary in either direction by up to half of the value of the durationRange parameter.
In your case, with a minimum duration of 1 second and a maximum duration of 50 seconds, you'd use:
// A mean wait of 25.5 seconds. The wait can vary by 24.5 seconds above and below the mean.
let waitAction = SKAction.waitForDuration(25.5, withRange: 49)
It's a little inconvenient to manually calculate the mean and range from the maximum and minimum waiting time, so you could use an extension on SKAction to do the work for you:
extension SKAction {
static func waitForDuration(minimum min: NSTimeInterval, maximum max: NSTimeInterval) -> SKAction {
return SKAction.waitForDuration((max + min) / 2, withRange: abs(max - min))
}
}
Usage:
let waitAction = SKAction.waitForDuration(minimum: 1, maximum: 50)

Linear regression - accelerate framework in Swift

My first question here at Stackoverflow... hope my question is specific enough.
I have an array in Swift with measurements at certain dates. Like:
var myArray:[(day: Int, mW: Double)] = []
myArray.append(day:0, mW: 31.98)
myArray.append(day:1, mW: 31.89)
myArray.append(day:2, mW: 31.77)
myArray.append(day:4, mW: 31.58)
myArray.append(day:6, mW: 31.46)
Some days are missing, I just didn't take a measurement... All measurements should be on a line, more or less. So I thought about linear regression. I found the Accelerate framework, but the documentation is missing and I can't find examples.
For the missing measurements I would like to have a function, with as input a missing day and as output a best guess, based on the other measurements.
func bG(day: Int) -> Double {
return // return best guess for measurement
}
Thanks for helping out.
Jan

My answer doesn't specifically talk about the Accelerate Framework, however I thought the question was interesting and thought I'd give it a stab. From what I gather you're basically looking to create a line of best fit and interpolate or extrapolate more values of mW from that. To do that I used the Least Square Method, detailed here: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html and implemented this in Playgrounds using Swift:
// The typealias allows us to use '$X.day' and '$X.mW',
// instead of '$X.0' and '$X.1' in the following closures.
typealias PointTuple = (day: Double, mW: Double)
// The days are the values on the x-axis.
// mW is the value on the y-axis.
let points: [PointTuple] = [(0.0, 31.98),
(1.0, 31.89),
(2.0, 31.77),
(4.0, 31.58),
(6.0, 31.46)]
// When using reduce, $0 is the current total.
let meanDays = points.reduce(0) { $0 + $1.day } / Double(points.count)
let meanMW = points.reduce(0) { $0 + $1.mW } / Double(points.count)
let a = points.reduce(0) { $0 + ($1.day - meanDays) * ($1.mW - meanMW) }
let b = points.reduce(0) { $0 + pow($1.day - meanDays, 2) }
// The equation of a straight line is: y = mx + c
// Where m is the gradient and c is the y intercept.
let m = a / b
let c = meanMW - m * meanDays
In the code above a and b refer to the following formula from the website:
a:
b:
Now you can create the function which uses the line of best fit to interpolate/extrapolate mW:
func bG(day: Double) -> Double {
return m * day + c
}
And use it like so:
bG(3) // 31.70
bG(5) // 31.52
bG(7) // 31.35

If you want to do fast linear regressions in Swift, I suggest using the Upsurge framework. It provides a number of simple functions that wrap the Accelerate library and so you get the benefits of SIMD on either iOS or OSX
without having to worry about the complexity of vDSP calls.
To do a linear regression with base Upsurge functions is simply:
let meanx = mean(x)
let meany = mean(y)
let meanxy = mean(x * y)
let meanx_sqr = measq(x)
let slope = (meanx * meany - meanxy) / (meanx * meanx - meanx_sqr)
let intercept = meany - slope * meanx
This is essentially what is implemented in the linregress function.
You can use it with an array of [Double], other classes such as RealArray (comes with Upsurge) or your own objects if they can expose contiguous memory.
So a script to meet your needs would look like:
#!/usr/bin/env cato
import Upsurge
typealias PointTuple = (day: Double, mW:Double)
var myArray:[PointTuple] = []
myArray.append((0, 31.98))
myArray.append((1, 31.89))
myArray.append((2, 31.77))
myArray.append((4, 31.58))
myArray.append((6, 31.46))
let x = myArray.map { $0.day }
let y = myArray.map { $0.mW }
let (slope, intercept) = Upsurge.linregress(x, y)
func bG(day: Double) -> Double {
return slope * day + intercept
}
(I left in the appends rather than using literals as you are likely programmatically adding to your array if it is of significant length)
and full disclaimer: I contributed the linregress code. I hope to also add the co-efficient of determination at some point in the future.

To estimate the values between different points, you can also use SKKeyframeSequence from SpriteKit
https://developer.apple.com/documentation/spritekit/skinterpolationmode/spline
import SpriteKit
let sequence = SKKeyframeSequence(keyframeValues: [0, 20, 40, 60, 80, 100], times: [64, 128, 256, 512, 1024, 2048])
sequence.interpolationMode = .spline // .linear, .step
let estimatedValue = sequence.sample(atTime: CGFloat(1500)) as! Double // 1500 is the value you want to estimate
print(estimatedValue)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to populate a pixel buffer much faster? - swift

Related

Perform normalization using Accelerate framework

for (increase time delay) depending on object count

Relative Strength Index in Swift

Swift: Random delay only works first time? Random number func only executed once?

Linear regression - accelerate framework in Swift

Categories

Resources