Cocoa Touch Programming. KVO/KVC in the inner loop is super slow. How do I speed things up? - iphone

I've become a huge fan of KVO/KVC. I love the way it keeps my MVC architecture clean. However I am not in love with the huge performance hit I incur when I use KVO within the inner rendering loop of the 3D rendering app I'm designing where messages will fire at 60 times per second for each object under observation - potentially hundreds.
What are the tips and tricks for speeding up KVO? Specifically, I am observing a scalar value - not an object - so perhaps the wrapping/unwrapping is killing me. I am also setting up and tearing down observation
[foo addObserver:bar forKeyPath:#"fooKey" options:0 context:NULL];
[foo removeObserver:bar forKeyPath:#"fooKey"];
within the inner loop. Perhaps I'm taking a hit for that.
I really, really, want to keep the huge flexibility KVO provides me. Any speed freaks out there who can lend a hand?
Cheers,
Doug

Objective-C's message dispatch and other features are tuned and pretty fast for what they provide, but they still don't approach the potential of tuned C for computational tasks:
NSNumber *a = [NSNumber numberWithIntegerValue:(b.integerValue + c.integerValue)];
is way slower than:
NSInteger a = b + c;
and nobody actually does math on objects in Objective-C for that reason (well that and the syntax is awful).
The power of Objective-C is that you have a nice expressive message based object system where you can throw away the expensive bits and use pure C when you need to. KVO is one of the expensive bits. I love KVO, I use it all the time. It is computationally expensive, especially when you have lots of observed objects.
An inner loop is that small bit of code you run over and over, anything thing there will be done over and over. It is the place where you should be eliminating OOP features if need be, where you should not be allocating memory, where you should be considering replacing method calls with static inline functions. Even if you somehow manage to get acceptable performance in your rendering loop, it will be much lower performance than if you got all that expensive notification and dispatch logic out of there.
If you really want to try to keep it going with KVO here are a few things you can try to make things go faster:
Switch from automatic to manual KVO in your objects. This may allow you to reduce spurious notifications
Aggregate updates: If your intermediate values over some time interval are not relevant, and you can defer for some amount of time (like the next animation frame) don't post the change, mark that the change needs to posted and wait for a the relevent timer to go off, you might get to avoid a bunch of short lived intermediary updates. You might also use some sort of proxy to aggregate related changes between multiple objects.
Merge observable properties: If you have a large number of properties in one type of object that might change you may be better off making a single "hasChanges" property observe and having the the observer query the properties.

Related

Is GameObject.FindObjectOfType() less expensive in performance than GameObject.Find() in Unity?

I'm on my last finishing touches with my indie game using Unity and I've been thinking about improving my game's performance. I was wondering if GameObject.FindObjectOfType() is less expensive than GameObject.Find()? Thanks!
If you really care about performance, try using none of them - or at least minimise it.
Using these Methods will loop through the list of GameObjects and return the object and because of the looping it is pretty heavy on the performance. So if you use them, never call them in the Update()-Method, call them in Start() or any method that doesn't get called often and store the value.
To be honest I don't know, which one is faster. If I had to guess, I would say it is GameObject.Find(), since it's only checking the name, whereas FindObjectOfType() checks the components.
But even if I would consider using FindObjectOfType(), because Find() uses a string and you might want to avoid that, because of typos (if you are not storing it inside a single class and just reference the variable)

Single function handling events or a function for each event?

What to prefer in which situation: Single function handling events or a function for each event?
Here is a basic code example:
Option 1
enum Notification {
case A
case B
case C
}
protocol One {
func consumer(consumer: Consumer, didReceiveNotification notification: Notification)
}
or
Option 2
protocol Two {
func consumerDidReceiveA(consumer: Consumer)
func consumerDidReceiveB(consumer: Consumer)
func consumerDidReceiveC(consumer: Consumer)
}
Background
Apple use both options. E.g. for NSStreamDelegate we have the first option, while in CoreBluetooth (e.g. CBCentralManagerDelegate) we see option two.
One big difference I see is that Swift does not support optional protocol methods nicely (via extension or #obj keyword).
What would you prefer? What's the (dis)advantages?
In terms of achieving the loosest form of coupling and highest degree of cohesion, naturally the choice would sway towards individual events, not this kind of multi-event bundle of responsibilities.
Yet there are a lot of practical concerns that might move you towards favoring the opposite, coarser way of dealing with events instead of a separate function per granular event.
Here are some possible ones (not listed in any specific order).
Boilerplate
While it's not the biggest thing to worry about, writing a bunch of functions tends to take a bit more effort than writing a bunch of if/else statements or switch cases within one. More importantly than this, however, is the code needed to connect/disconnect event-handling slots to event-handling signals. Avoiding the need to write that subscription/unsubscription kind of code for every single teeny event handled can save considerably on the amount of code to maintain.
Performance
It might seem counter-intuitive that performance can favor the coarser multi-event handler. After all, the granular event-handler requires less branching (one dynamic dispatch to get to the precise event handler), while the coarser one requires twice as much (one dynamic dispatch to get to a coarse event-handling site, and another local series of branches to get to the precise event-handling code).
Yet the cost of dynamic dispatch leans heavily on branch prediction. If you're branching into coarser event handlers, then often you're branching more often into the same set of instructions, and that can be an optimization strategy. To have two sets of more predictable branches can often produce more optimal results than one less predictable branch.
Moreover, coarser event-handling typically implies fewer aggregates, fewer lists of functions to call on the side of those triggering events. And that can translate to reduced memory usage and improved locality of reference.
On the flip side, to branch into coarser event handlers often means branching more often. For example, some site might only be interested in push kind of input events, not resize events. If we lump all these together into a coarse event handler and without some filtering mechanism on top, then typically we would have to pay the cost of dynamic dispatch even for a resize event that isn't even handled for a particular site.
Yet I've found that this is actually often better than I thought it would be to branch needlessly into the same coarse functions (most likely due to the branch predictor succeeding) as opposed to branching into a wide variety of disparate functions and only as needed.
So there's a balancing act here and even performance doesn't clearly side with one strategy over the another. It still varies case-by-case.
Nevertheless, lacking measurements and very detailed data about the critical code paths, it's typically safer from a performance perspective to err on the side of these coarser multi-event handlers. After all, even if that proves to be the wrong decision from a performance standpoint, it's easier to optimize from coarse to fine (we can even do so very non-intrusively by keeping the coarse and using fine-grained event-handling in cases that benefit most from it) than vice versa.
Event Subscription/Unsubscription
This can likewise swing one way or the other, but in my experience (from team settings), most of the human errors associated with event handling do not occur within the event-handling code, but outside. The most common source of errors I see relate to failing to subscribing to events and, most commonly, failing to unsubscribe when the events are no longer of interest.
When events are handled at a coarser level, there's typically less of that error-prone subscription/unsubscription code involved (this relates to the boilerplate concerns above, but this is unusual kind of boilerplate in that it can be quite error-prone and not merely tedious to write).
This is also very case-by-case. In the systems I've often been involved in, there was a common need for entities to continue to exist that unsubscribed from certain events prematurely. Those premature cases often required the code to unsubscribe from events to be written manually, as they could not be tied to an entity's lifetime. That may have pointed more to design issues elsewhere, but in that scenario, the number of mistakes made team-wide went down with coarser event handling.
Type Safety
While not shown in the examples here, typically with coarser event-handling is a need to squeeze more disparate types of data through more generic parameters. That might translate in an extreme scenario like in C to squeezing more data through void pointers and more dangerous pointer type casts. With that, compile-time type safety is obliterated and we could start seeing a whole new source of human error.
In higher-level languages, this might translate to more down casts or things of that sort when we cannot model the signature of a delegate to perfectly fit the parameters passed in when an event is triggered.
I've found typically that this isn't the biggest source of confusions and bugs provided that there is at least some form of runtime type safety when casting or unboxing these parameters. But it is a con on the side of coarser event-handling.
Intellectual Overhead
This might vary per individual but I tend to look at systems from a very administrative/overview kind of standpoint and specifically with respect to control flow. It's because I tend to work in lower-level portions of the system, including things like proprietary UI toolkits.
In those cases, when a button is pushed, what functions are called? It turns into a mystery in a large-scale codebase composed of hundreds of thousands of little functions without tracing into the code actually invoked when a button is pushed and seeing each and every function that is called.
That's an inevitability with an event-driven paradigm and something I never became 100% comfortable about, but I find it alleviates some of that explosive complexity that I perceive in my personal mental model (something resembling a very complex graph) when there's less code decentralization. With coarser event handlers comes fewer, more centralized functions to branch into throughout a system on such a button push, and that helps me increase my familiarity when there are fewer but bigger functions involved in my mental graph.
There is a very simple practical benefit here where if you want to find out when a specific entity responds to a series of events, we can simply put a breakpoint on this one coarse event-handling site (while still being able to drill down a specific event for that specific entity by putting a breakpoint in a local branch of code).
Of course, I might be an exception there working in these low-level systems that everyone uses. It seems a lot of people are comfortable with the idea of just subscribing to a button push event in their code without worrying about all the other subscribers to the same event.
From my kind of holistic control flow view of the system, it helps me to absorb the complexity more easily when there are fewer but coarser event-handling sites in the codebase even though I normally otherwise find monolithic functions to be a burden. Especially in a debugging context where I face a concern like, "What caused this to happen?", that combined with the event-handling concern of "What functions are actually going to be called when this happens?" can really multiply the complexity. With fewer potential target sites where events are handled, the latter concern is mitigated.
Conclusion
So these are some factors that might sway you to choose one design strategy over another. I find myself somewhere in the middle. I generally don't choose design as coarse as say, wndproc on Windows which wants to associate a single, ultra-coarse event handler for every single window event imaginable. Yet I might favor designing at a coarser event-handling level than some just to alleviate this kind of mental complexity, reduce code decentralization, possibly improve performance (always with a profiler in hand).
And then there are times when I choose to design at a very granular level when the complexity isn't that great (typically when the package triggering events isn't that central), when performance isn't a concern or performance actually favors this route, and for the improved type safety. It's all case-by-case.

why callee return autoreleased object instead of returning retained object and let caller release object?

For instance:
we always write like this way 1:
-(NSObject*)giveMeAnObject
{
return [[[NSObject alloc] init] autorelease];
}
-(void)useObject
{
NSObject *object = [self giveMeAnObject];
//use that object
}
but why don't we write like this way 2:
-(NSObject*)giveMeAnObject
{
return [[NSObject alloc] init];
}
-(void)useObject
{
NSObject *object = [self giveMeAnObject];
//use that object
[object release];
}
The Cocoa SDK do things like way 1, I think thats why we all use way 1, it has become a coding convention.
But I just think that if the convention is way 2, we can gain little performance improvement.
So is there any other reason that we use way 1 instead way 2 except coding convention?
Returning an autoreleased object is a form of abstraction -- a convenience for the developer, so he/she does not have to think about the reference counting of the returned object as much -- and consequently results in fewer bugs in specific categories (although you can also say autorelease pools introduce new categories of bugs or complexities). It can really simplify the clients code, although yes, there can be performance penalties. It can also be used as an abstracted optimization when no reference operation must be made - consider when the object holds an instance of what is returned and no retain or copy need be made. Although chaining statements can be overused, this practice is also convenient for chaining statements.
Also, the static analyzer which determines reference count errors is relatively new to some of these libraries and programs. Autorelease pools preceded ARC and static analysis of objc objects' reference counts by many years -- ensuring your reference counting is correct is much simpler now with these tools. They are able to detect many of the bugs.
Of course with ARC, a lot of that changes if you favored the simplicity of returning autoreleased objects -- with ARC, you can return fewer autoreleased objects without the chance of introducing the bugs autorelease pools abstracted.
Using uniform ownership semantics also simplifies programs. If an abstract selector or collection of selectors always return using the same semantics, then it can really simplify some generic forms. For example -- if a set of selectors passed to performSelector: had different ownership-on-return semantics, then it would add a lot of complexity. So uniform ownership on return can really simplify some of the more 'generic' implementations.
Performance: Reference count operations (retain/release) can be rather weighty -- especially if you are used to working in lower levels. However, the current autorelease pool implementations are very fast. They have been recently updated, and are faster than they used to be. The compiler and runtime use several special shortcuts to keep these costs low. It's also a good idea to keep your autorelease pool sizes down -- particularly on mobile devices. Creating an autorelease pool is very fast. In practice, you may see from zero to a few percent increase from the autorelease operations themselves (i.e. it consumes much less time than objc_msgSend+variants). Some tests even ran slightly faster. This isn't an optimization many people will gain much from. It wouldn't qualify as low hanging fruit under typical circumstances, and it's actually relatively difficult to measure the effects and locality of such changes in real programs -- based off some testing I did after bbum mentioned the changes below. So the tests were limited in scope, but it appears to be better/faster in MRC and ARC.
So a lot of this comes down to the level of responsibility you want to assume, in the event you are performing your own reference counting operations. For most people, it shouldn't really change how they write. I think localizing memory issues, more deterministic object destruction, and more predictable heap sizes are the primary reasons one might favor return an 'owning' (+1) reference count if you are running on modern systems. Even then, the runtime and compiler work to reduce this for you in many cases (see bbum's answer +1). Even though autorelease pools are approximately as fast, I don't intend on using them more than I do now at this time -- so there are still justifiable reasons to minimize using them, but the reasons are lessening.
Have you measured the performance benefit? Do you have a quantifiable case where autorelease vs. CF style caller-must-release has a measurable performance impact?
If not, moot point. If so, I'd bet that there is a systemic architecture problem that far eclipses autorelease vs. not.
Regardless, if you adopt ARC, the "cost" of autorelease is minimized. The compiler and runtime can actually optimize away the autorelease across method calls.
There are three main reasons:
It imposes concept of taking ownership.
It eliminates the issues of dangling pointers.
It returns the object created in local auto release pool and so increases performance.

Class design for weapons in a game?

I enjoy making games and now for the first time try myself out on mobile devices. There, performance is of course a much bigger issue than on a nice PC and I find myself particularly struggling with weapon (or rather projectile) class design.
They need to be updated a lot, get destroyed/created a lot and generally require much updating.
At the moment I do it the obvious way, I create a projectile object each time I fire and destroy it on impact. Every frame all active projectiles get checked for collision with other objects.
Both steps seem like they could definitely need improvement. Are there common ways on how to handle such objects effectively?
In general I am looking for advice on how to do clean and performant class design, my googling skills were weak on this one so far.
I will gladly take any advice on this subject.
When you have lots of objects being created and destroyed in a short timespan, a common approach is to have a pool of instances already allocated that you simply reinitialise. Only if the pool is empty do you allocate new instances. Apple do this with MapKit and table views, among others. Studying those interfaces will probably serve you well.
I don't think this is about class design. Your classes are fine; it's the algorithms that need work.
They need to be updated a lot, get destroyed/created a lot and generally require much updating.
Instead of destroying every projectile, consider putting it into a dead projectile list. Then, when you need to create a new one, instead of allocating a fresh object, pull one from the the dead-list and reinitialise it. This is often quicker as you save on memory management calls.
As for updating, you need to update everything that changes - there's no way around that really.
Every frame all active projectiles get checked for collision with other objects.
Firstly - if you check every object against every other then each pair of objects gets compared twice. You can get away with half that number of checks by only comparing the objects that come later in the update list.
#Bad
for obj1 in all_objects:
for obj2 in all_objects:
if obj1 hit obj2:
resolve_collision
#Good
for obj1 in all_objects:
for obj2 in all_objects_after_obj1:
if obj1 hit obj2:
resolve_collision
How to implement 'all_objects_after_obj1' is language specific, but if you have an array or other random access structure holding your objects, you can just start the indexing from 1 after obj1.
Secondly, the hit check itself can be slow. Make sure you're not performing complex mathematics to check the collision when a simpler option would do. And if the world is big, a spatial database scheme can help, eg. a grid map or quadtree, to cut down the number of objects to check potential collisions against. But that is often awkward and a lot of work for little gain in a small game.
Both steps seem like they could definitely need improvement.
They only 'seem'? Profile the app and see where the slow parts are. It's rarely a good idea to guess at performance, because modern languages and hardware can be surprising.
As Jim wrote you can create a pool of objects and manage them. If you looking a specific design pattern there is Flyweight .Hope it will help you.

Avoiding autoreleased objects, good practice or overkill?

I am just curious with regards to the following, when I am writing code I always try and manage the memory myself by sticking to non-autoreleased objects. I know that this means that objects are not hanging around in the pool, but I was just curious if doing it this way in general is good practice or simply overkill?
// User Managed Memory
NSSet *buttonSizes = [[NSSet alloc] initWithObjects:#"Go", #"Going", #"Gone", nil];
[barItemLeft setPossibleTitles:buttonSizes];
[barItemRight setPossibleTitles:buttonSizes];
[buttonSizes release];
.
// Autoreleased Memory
NSSet *buttonSizes = [NSSet setWithObjects:#"Go", #"Going", #"Gone", nil];
[barItemLeft setPossibleTitles:buttonSizes];
[barItemRight setPossibleTitles:buttonSizes];
Total overkill. If you read the relevant documentation carefully, it does not say that you should avoid autoreleasing objects. It says that you should avoid using autorelease pools when you in tight, object-rich loops. In those cases, you should be managing your memory explicitly (with retain and release) to ensure that objects are created and destroyed in an amortized manner.
The argument that the iPhone is a memory-constrained environment is true, but a complete red herring. Objective-C, with the Foundation and Cocoa frameworks (though they weren't called that at the time), ran just fine on a NeXTcube, which had 16MB of RAM (expandable to 64). Even the iPhone 3, which is pretty much EOL at this point, has 128MB.
edit
Since an iPhone app is a runloop-based application, a new autorelease pool is going to be created and destroyed every time it spins. This is clearly defined in the documentation. As such, the only reasons you'd have to create your own autorelease pool are:
Spawning a background thread, where you must create your own pool (since new threads, by default, do not have a base ARPool)
Performing some operation that will generate a significant number of autoreleased objects. In this case, creating and draining a pool would be to help ensure the limited lifetime of the temporary objects.
However, in the second case, you're recommended to explicitly manage memory as much as possible. This is so that your operation doesn't come to a screeching halt when it attempts to drain a pool with a few thousand objects in it. If you manage your memory manually, you can release these objects gradually as they are no longer needed, instead of saving them up for a single all-at-once release. You could help amortize the all-at-once release by nesting ARPools, which will help.
At the end of the day, though, just do what feels natural, and then (and only then) optimize it if you have concrete evidence that you need to do so.
edit #2
OK, it turns out that there is a recommendation to avoid using autorelease. BUT that recommendation is in the "Allocate Memory Wisely" section of the "Tuning for Performance and Responsiveness" area of the iOS Application Programming Guide. In other words, avoid it if you're measuring a performance problem. But seriously: autorelease has been around for ~20 years and did just fine on computers far slower and more constrained than the iDevices of today.
Unless you see performance issues (or memory issues), I wouldn't worry about using the autorelease pools. Furthermore, autorelease in theory can be more performant than non-auto release. The really dangerous places to use autorelease is in large loops (for example, when importing a large data set). In these cases you can run out of the limited iPhone memory.
I'd recommend you only focus on removing autorelease usage once you've finished your application and can identify memory management issues using the built in instruments.
When possible I like to use the autorelease methods. It eliminates the chance that I forget to release them or somebody comes in and changes the flow of the code so that the explicit release is bypassed. I vote that self-managing the releases is overkill.
It depends on whether or not an autorelease pool is in place. For instance, if you're using objective-c objects in an AudioUnit's render callback (which is called from a separate thread without an autorelease pool) your autoreleased objects will leak. In this case it's important that you release them manually, or wrap them in an autorelease pool.
In general though, I think it's up to you and your coding style
There's a balance. It depends a lot on the situation. In that particular case, assuming the barItem* variables are long-lived ivars, avoiding autorelease is pretty close to 100% pointless, because the set will be sticking around regardless of whether you release it now or two seconds from now.
In fact, in most cases, it makes little difference whether objects are released now or on the next iteration of the runloop, because the runloop is going so fast that it's actually capable of creating smooth animation. The iPhone is a particularly memory-starved platform, so it's good not to leave things alive too long. But at the same time, be realistic: One autoreleased NSNumber in a method that's called every couple of seconds isn't even going to make a dent in your app's profile. Even 100 thousand-character NSStrings will only use about 0.065% of the system RAM on an iPhone 3GS. That only becomes significant when you build up a ton of them in a very short time. So if there's no problem, don't sweat it.
I think there is only one resonable answer: usually autorelease is not an performance issue. It is good to keep in mind that it could be problematic in tight loops, but unless a performance meter like Instruments shows a memory spike you have to get rid of, I would use it if you like.
Premature optimization is a great way of spending time for nothing. Maybe in the end you know that your solution is elegant, but it could be that the easier solution performs just as fine.