google::dense_hash_map vs std::tr1::unordered_map? - iphone

I'm working on a Mobile Game for several platforms ( Android, iOS, and some maybe even some kind of console in the future ).
I'm trying to decide whether to use tr1::unordered_map or google::dense_hash_map to retrieve Textures from a Resource Manager (for later binding using OpenGL). Usually this can happen quite a few times per second (N per frame, where my Game is running at ~60 fps)
Considerations are:
Performance (memory and cpu wise)
Any ideas or suggestions are welcome.

go with the STL for standard containers. They have predictable behavior, and can be used seamlessly in STL algos/iterators. You're also given some performance guarantees by the STL.
This should also guarantee portability. Most compilers have the new standard implemented.

In a C++ project I developed, I was wondering something similar: which one was best, tr1:unordered_map, boost::unordered_map or std::map? I ended up declaring a typedef, controllable at compilation:
#ifdef UnorderedMapBoost
typedef boost::unordered_map<cell_key, Cell> cell_map;
#ifdef UnorderedMapTR1
typedef std::tr1::unordered_map<cell_key, Cell> cell_map;
typedef std::map<cell_key, Cell> cell_map;
#endif // #ifdef UnorderedMapTR1
#endif // #ifdef UnorderedMapBoost
I could then control at compile-time which one to use, and profiled it. In my case, the portability ended up being more important, so I normally use std::map.


Can I trigger a breakpoint on OpenGL errors in Xcode 4?

Is there any way I can set a symbolic breakpoint that will trigger when any OpenGL function call sets any state other than GL_NO_ERROR? Initial evidence suggests opengl_error_break is intended to to serve just that purpose, but it doesn't break.
Based on Lars' approach you can achieve this tracking of errors automatically, it is based on some preprocessor magic and generating stub functions.
I wrote a small Python script which processes the OpenGL header (I used the Mac OS X one in the example, but it should also work with the one of iOS).
The Python script generates two files, a header to include in your project everywhere where you call OpenGL like this (you can name the header however you want):
#include "gl_debug_overwrites.h"
The header contains macros and function declarations after this scheme:
#define glGenLists _gl_debug_error_glGenLists
GLuint _gl_debug_error_glGenLists(GLsizei range);
The script also produces a source file in the same stream which you should save separately, compile and link with your project.
This will then wrap all gl* functions in another function which is prefixed with _gl_debug_error_ which then checks for errors similar to this:
GLuint _gl_debug_error_glGenLists(GLsizei range) {
GLuint var = glGenLists(range);
return var;
Wrap your OpenGL calls to call glGetError after every call in debug mode. Within the wrapper method create a conditional breakpoint and check if the return value of glGetError is something different than GL_NO_ERROR.
Add this macro to your project (from OolongEngine project):
#define CHECK_GL_ERROR() ({ GLenum __error = glGetError(); if(__error) printf("OpenGL error 0x%04X in %s\n", __error, __FUNCTION__); (__error ? NO : YES); })
Search for all your OpenGL calls manually or with an appropriate RegEx. Then you have two options exemplary shown for the glViewport() call:
Replace the call with glViewport(...); CHECK_GL_ERROR()
Replace the call with glDebugViewport(...); and implement glDebugViewport as shown in (1).
I think that what could get you out of the problem is to capture OpenGL ES Frames (scroll down to "Capture OpenGL ES Frames"), which is now supported by Xcode. At least this is how I am debugging my OpenGL Games.
By capturing the frames when you know an error is happening you could identify the issue in the OpenGL Stack without too much effort.
Hope it helps!

NSLog on the device, is that a problem or must i remove that?

I have read this post: what happens to NSLog info when running on a device?
...but wondering if NSLog is a problem when distributing the app such as filling up the memory or something? I am only interested to see it when i test the consistency of my input data to the database.
The reason is that I have NSLog to control when i load the data into my database in the simulator. I could remove it when i upload but it would be good if i do not need to?
You should remove it. For example if you log contents of a UITableViewCell (in -tableView:cellForRowAtIndexPath:), it can make a big difference in performance, especially on slower hardware.
Use a macro to keep NSLog output in Debug mode, but remove it from Release mode. An example can be found on the following site:
I use a set of macros in my pch file that are quite handy for this.
See for details.
#ifdef DEBUG
#define DLog(...) NSLog(#"%s %#", __PRETTY_FUNCTION__, [NSString stringWithFormat:__VA_ARGS__])
#define ALog(...) [[NSAssertionHandler currentHandler] handleFailureInFunction:[NSString stringWithCString:__PRETTY_FUNCTION__ encoding:NSUTF8StringEncoding] file:[NSString stringWithCString:__FILE__ encoding:NSUTF8StringEncoding] lineNumber:__LINE__ description:__VA_ARGS__]
#define DLog(...) do { } while (0)
#define ALog(...) NSLog(#"%s %#", __PRETTY_FUNCTION__, [NSString stringWithFormat:__VA_ARGS__])
#define ZAssert(condition, ...) do { if (!(condition)) { ALog(__VA_ARGS__); }} while(0)
The log on the device only keeps around an hour of data, even Apple's apps log quite a few things there. So it is generally acceptable to have meaningful output in the log.
But since logging is a disk operation you might find that excessive logging slows down your app since the writing blocks the main (= UI) thread for a short while for every NSLog.
Because of this you should only log things that give more information if errors are happening in your app to facilitate finding what is going wrong.
Consider switching to cocoalumberjack, it is a drop-in replacement for NSLog and has significantly more useful functionality and better performance.
It is similar in concept to other popular logging frameworks such as log4j, yet is designed specifically for objective-c, and takes advantage of features such as multi-threading, grand central dispatch (if available), lockless atomic operations, and the dynamic nature of the objective-c runtime.
There only real reason to remove your NSLog entries is to save on memory consumption by running unnecessary code, but unless you're consistenly adding data, it shouldn't be too much of an issue.
Furthermore, if a user has an issue such as app crash, etc. NSlogs can be submitted which the dev can read to work out the reason for the crash. If you're loading the NSlog with a great deal of unnecessary data, it can be troublesome at a later date if you need to go through said log to find a user's issue in order to fix the issue.
Ultimately, I wouldn't worry too much about removing them. That's my 2 cents anyhow.

Genetic Algorithm optimization - using the -O3 flag

Working on a problem that requires a GA. I have all of that working and spent quite a bit of time trimming the fat and optimizing the code before resorting to compiler optimization. Because the GA runs as a result of user input it has to find a solution within a reasonable period of time, otherwise the UI stalls and it just won't play well at all. I got this binary GA solving a 27 variable problem in about 0.1s on an iPhone 3GS.
In order to achieve this level of performance the entire GA was coded in C, not Objective-C.
In a search for a further reduction of run time I was considering the idea of using the "-O3" optimization switch only for the solver module. I tried it and it cut run time nearly in half.
Should I be concerned about any gotcha's by setting optimization to "-O3"? Keep in mind that I am doing this at the file level and not for the entire project.
-O3 flag will make the code work in the same way as before (only faster), as long as you don't do any tricky stuff that is unsafe or otherwise dependant on what the compiler does to it.
Also, as suggested in the comments, it might be a good idea to let the computation run in a separate thread to prevent the UI from locking up. That also gives you the flexibility to make the computation more expensive, or to display a progress bar, or whatever.
Tricky stuff
Optimisation will produce unexpected results if you try to access stuff on the stack directly, or move the stack pointer somewhere else, or if you do something inherently illegal, like forget to initialise a variable (some compilers (MinGW) will set them to 0).
int main() {
int array[10];
array[-2] = 13; // some negative value might get the return address
Some other tricky stuff involves the optimiser stuffing up by itself. Here's an example of when -O3 completely breaks the code.

Force the iPhone to simulate doing CPU-intensive tasks?

For a normal app, you'd never want to do this.
But ... I'm making an educational app to show people exactly what happens with the different threading models on different iPhone hardware and OS level. OS 4 has radically changed the different models (IME: lots of existing code DOES NOT WORK when run on OS 4).
I'm writing an interactive test app that lets you fire off threads for different models (selector main thread, selector background, nsoperationqueue, etc), and see what happens to the GUI + main app while it happens.
But one of the common use-cases I want to reproduce is: "Thread that does a backgorund download then does a CPU-intensive parse of the results". We see this a lot in real-world apps.
It's not entirely trivial; the manner of "being busy" matters.
So ... how can I simulate this? I'm looking for something that is guaranteed not to be thrown-away by an optimizing compiler (either now, or with a better compiler), and is enough to force a thread to run at max CPU for about 5 seconds.
NB: in my real-world apps, I've noticed there are some strange things that happen when an iPhone thread gets busy - e.g. background threads will starve the main thread EVEN WHEN set at lower priority. Although this is clearly a bug in Apple's thread scheduler, I'd like to make a busy that demonstrates this - and/or an alternate busy that shows what happens when you DON'T trigger that behavioru in the scheduler.
For instance, the following can have different effects:
for( int i=0; i<1000; i++ )
for( int k=0; k<1000; k++ )
CC_MD5( cStr, strlen(cStr), result );
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
...sometimes, at least, the compiler seems to optimize the latter (and I have no idea what the compiler voodoo is for that - some builds it showed no difference, some it did :()
25 threads, on a first gen iPhone, doing a million MD5's each ... and there's almost no perceptible effect on the GUI.
Whereas 5 threads parsing XML using the bundled SAX-based parser will usually grind the GUI to a halt.
It seems that MD5 hashing doesn't trigger the problems in the iPhone's buggy thread-scheduler :(. I'm going to investigate mem allocations instead, see if that has a different effect.
You can avoid the compiler optimising things away by making sure the compiler can't easily infer what you're trying to do at compile time.
For example, this:
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
has an invariant input, so the compiler could realise that it's going to get the same result everytime. Or that the result isn't getting used so it doesn't need to calculate it.
You could avoid both these problems like so:
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
sprintf(cStr, "%02x%02x", result[0], result[1]);
If you're seeing the problem with SAX, then I'd start with getting the threads in your simulation app doing SAX and check you do see the same problems outside of your main app.
If the problem is not related to pure processor power or memory allocations, other areas you could look at would be disk I/O (depending where your xml input is coming from), mutexes or calling selectors/delegates.
Good luck, and do report back how you get on!
Apple actually provides sample code that does something similiar to what you are looking for at, with the intent to highlight the performance differences between using LibXML (SAX) and CocoaXML. The focus is not on CPU performance, but assuming you can actually measure processor utilization, you could likely just scale up (repeat within your xml) the dataset that the sample downloads.

Is Objective C fast enough for DSP/audio programming

I've been making some progress with audio programming for iPhone. Now I'm doing some performance tuning, trying to see if I can squeeze more out of this little machine. Running Shark, I see that a significant part of my cpu power (16%) is getting eaten up by objc_msgSend. I understand I can speed this up somewhat by storing pointers to functions (IMP) rather than calling them using [object message] notation. But if I'm going to go through all this trouble, I wonder if I might just be better off using C++.
Any thoughts on this?
Objective C is absolutely fast enough for DSP/audio programming, because Objective C is a superset of C. You don't need to (and shouldn't) make everything a message. Where performance is critical, use plain C function calls (or use inline assembly, if there are hardware features you can leverage that way). Where performance isn't critical, and your application can benefit from the features of message indirection, use the square brackets.
The Accelerate framework on OS X, for example, is a great high-performance Objective C library. It only uses standard C99 function calls, and you can call them from Objective C code without any wrapping or indirection.
The problem with Objective-C and functions like DSP is not speed per se but rather the uncertainty of when the inevitable bottlenecks will occur.
All languages have bottlenecks but in static linked languages like C++ you can better predict when and where in the code they will occur. In the case of Objective-C's runtime coupling, the time it takes to find the appropriate object, the time it takes to send a message is not necessary slow but it is variable and unpredictable. Objective-C's flexibility in UI, data management and reuse work against it in the case of tightly timed task.
Most audio processing in the Apple API is done in C or C++ because of the need to nail down the time it takes code to execute. However, its easy to mix Objective-C, C and C++ in the same app. This allows you to pick the best language for the immediate task at hand.
Is Objective C fast enough for DSP/audio programming
Real Time Rendering
Definitely Not. The Objective-C runtime and its libraries are simply not designed for the demands of real time audio rendering. The fact is, it's virtually impossible to guarantee that using ObjC runtime or libraries such as Foundation (or even CoreFoundation) will not result your renderer missing its deadline.
The common case is a lock -- even a simple heap allocation (malloc, new/new[], [[NSObject alloc] init]) will likely require a lock.
To use ObjC is to utilize libraries and a runtime which assume locks are acceptable at any point within their execution. The lock can suspend execution of your render thread (e.g. during your render callback) while waiting to acquire the lock. Then you can miss your render deadline because your render thread is held up, ultimately resulting in dropouts/glitches.
Ask a pro audio plugin developer: they will tell you that blocking within the realtime render domain is forbidden. You cannot e.g. run to the filesystem or create heap allocations because you have no practical upper bound regarding the time it will take to finish.
Here's a nice introduction:
Offline Rendering
Yes, it would be acceptably fast in most scenarios for high level messaging. At the lower levels, I recommend against using ObjC because it would be wasteful -- it could take many, many times longer to render if ObjC messaging used at that level (compared to a C or C++ implementation).
See also: Will my iPhone app take a performance hit if I use Objective-C for low level code?
objc_msgSend is just a utility.
The cost of sending a message is not just the cost of sending the message.
It is the cost of doing everything that the message initiates.
(Just like the true cost of a function call is its inclusive cost, including I/O if there is any.)
What you need to know is where are the time-dominant messages coming from and going to and why.
Stack samples will tell you which routines / methods are being called so often that you should figure out how to call them more efficiently.
You may find that you're calling them more than you have to.
Especially if you find that many of the calls are for creating and deleting data structure, you can probably find better ways to do that.