Force the iPhone to simulate doing CPU-intensive tasks? - iphone

For a normal app, you'd never want to do this.
But ... I'm making an educational app to show people exactly what happens with the different threading models on different iPhone hardware and OS level. OS 4 has radically changed the different models (IME: lots of existing code DOES NOT WORK when run on OS 4).
I'm writing an interactive test app that lets you fire off threads for different models (selector main thread, selector background, nsoperationqueue, etc), and see what happens to the GUI + main app while it happens.
But one of the common use-cases I want to reproduce is: "Thread that does a backgorund download then does a CPU-intensive parse of the results". We see this a lot in real-world apps.
It's not entirely trivial; the manner of "being busy" matters.
So ... how can I simulate this? I'm looking for something that is guaranteed not to be thrown-away by an optimizing compiler (either now, or with a better compiler), and is enough to force a thread to run at max CPU for about 5 seconds.
NB: in my real-world apps, I've noticed there are some strange things that happen when an iPhone thread gets busy - e.g. background threads will starve the main thread EVEN WHEN set at lower priority. Although this is clearly a bug in Apple's thread scheduler, I'd like to make a busy that demonstrates this - and/or an alternate busy that shows what happens when you DON'T trigger that behavioru in the scheduler.
UPDATE:
For instance, the following can have different effects:
for( int i=0; i<1000; i++ )
for( int k=0; k<1000; k++ )
CC_MD5( cStr, strlen(cStr), result );
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
...sometimes, at least, the compiler seems to optimize the latter (and I have no idea what the compiler voodoo is for that - some builds it showed no difference, some it did :()
UPDATE 2:
25 threads, on a first gen iPhone, doing a million MD5's each ... and there's almost no perceptible effect on the GUI.
Whereas 5 threads parsing XML using the bundled SAX-based parser will usually grind the GUI to a halt.
It seems that MD5 hashing doesn't trigger the problems in the iPhone's buggy thread-scheduler :(. I'm going to investigate mem allocations instead, see if that has a different effect.

You can avoid the compiler optimising things away by making sure the compiler can't easily infer what you're trying to do at compile time.
For example, this:
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
has an invariant input, so the compiler could realise that it's going to get the same result everytime. Or that the result isn't getting used so it doesn't need to calculate it.
You could avoid both these problems like so:
for( int i=0; i<1000000; i++ )
{
CC_MD5( cStr, strlen(cStr), result );
sprintf(cStr, "%02x%02x", result[0], result[1]);
}
If you're seeing the problem with SAX, then I'd start with getting the threads in your simulation app doing SAX and check you do see the same problems outside of your main app.
If the problem is not related to pure processor power or memory allocations, other areas you could look at would be disk I/O (depending where your xml input is coming from), mutexes or calling selectors/delegates.
Good luck, and do report back how you get on!

Apple actually provides sample code that does something similiar to what you are looking for at developer.apple.com, with the intent to highlight the performance differences between using LibXML (SAX) and CocoaXML. The focus is not on CPU performance, but assuming you can actually measure processor utilization, you could likely just scale up (repeat within your xml) the dataset that the sample downloads.

Related

Using all CPUs on CENT OS to run a c++ program

Hi and thanks for the help. I have been running a program that has many functions with for loops that iterate over 10000 times. I have been using "#pragma omp_set_num_threads();" to use all the CPUs of my CENT OS device. This seems to work fine for all functions i have in my program except one. The function it doesnt work on is something like this:
void move_check()//function to check if "molecule obj" is in space under consid
{
for(int i = 0 ; i < NM ; ++i)//NM-no of "molecules"
{
int bounds;
bounds = molecule_obj.at(i).check(dom_x,dom_y,dom_z);
////returns status of molecule
////molecule is a class that i have created and molecule_obj is an obj of that class.
if(bounds==1)
{
molecule_obj.erase(molecule_obj.begin()+i);
i -= 1;
NM -= 1;
}
}
}
Can I use pragma for this? If not what other alternative do i have?
As the above function is the one that seems to be consuming the most time, i would like to utilize all the CPUs to execute it. How do i go about doing that?
Thanks a lot.
Yes, in principle you can use an OpenMP pragma, just add
#pragma omp parallel for
before the for loop.
You have to make sure that it is save to use the methods of your molecule_obj in parallel. I cannot know if they are but I assume the following:
molecule_obj.at(i).check(dom_x,dom_y,dom_z); really works with one specific molecule and does depend any others. That is fine than.
But the erase function most likely is not, dependent on how you store the entries. If you use something like std::list, std::vector erasing an element during the loop would lead to invalid iterators, wrong trip counts, etc.
I suggest that you replace the removal of the entry in the loop by just marking it for removal. You can add a special flag to each molecule for that.
Then after the completion of the parallel loop you can go over the list once in serial and actually remove the marked entries.
Just another suggestion: make a clearer distinction between one molecule and the list of molecules. It would improve the understandability of your code.

Volatile vars and multi-core thread synchronization!

I have several threads executing concurrently and checking a value of a field in their own object. The field is set by the launch thread like this:
for (i = 0; i < ThreadCount; i++)
{
ThreadContext[i].MyField = 1;
}
Within each thread then I check the value of this value:
if (MyField == 1)
{
...//do something
}
However, I noticed that on a 4 core CPU, some of the (4) running threads need several miliseconds or even longer in order to see the changed MyField. MyField is a single char field. What appears to be happening is that when the memory bus is maxed out by the first thread which detects the change, all other threads may stall almost for the entire duration of the run of the first. (assuming there is enough memory pressure). Only when the first thread eases on memory (and does more with registers), is when other threads also get to see the new value.
I checked the asm and there is no compiler optimization in the way here. Calls go directly to memory. How can this be fixed?
Thanks!
Jam
I got feedback from Intel: Yes, that's how it works (no easy fix).

Genetic Algorithm optimization - using the -O3 flag

Working on a problem that requires a GA. I have all of that working and spent quite a bit of time trimming the fat and optimizing the code before resorting to compiler optimization. Because the GA runs as a result of user input it has to find a solution within a reasonable period of time, otherwise the UI stalls and it just won't play well at all. I got this binary GA solving a 27 variable problem in about 0.1s on an iPhone 3GS.
In order to achieve this level of performance the entire GA was coded in C, not Objective-C.
In a search for a further reduction of run time I was considering the idea of using the "-O3" optimization switch only for the solver module. I tried it and it cut run time nearly in half.
Should I be concerned about any gotcha's by setting optimization to "-O3"? Keep in mind that I am doing this at the file level and not for the entire project.
-O3 flag will make the code work in the same way as before (only faster), as long as you don't do any tricky stuff that is unsafe or otherwise dependant on what the compiler does to it.
Also, as suggested in the comments, it might be a good idea to let the computation run in a separate thread to prevent the UI from locking up. That also gives you the flexibility to make the computation more expensive, or to display a progress bar, or whatever.
Tricky stuff
Optimisation will produce unexpected results if you try to access stuff on the stack directly, or move the stack pointer somewhere else, or if you do something inherently illegal, like forget to initialise a variable (some compilers (MinGW) will set them to 0).
E.g.
int main() {
int array[10];
array[-2] = 13; // some negative value might get the return address
}
Some other tricky stuff involves the optimiser stuffing up by itself. Here's an example of when -O3 completely breaks the code.

Is Objective C fast enough for DSP/audio programming

I've been making some progress with audio programming for iPhone. Now I'm doing some performance tuning, trying to see if I can squeeze more out of this little machine. Running Shark, I see that a significant part of my cpu power (16%) is getting eaten up by objc_msgSend. I understand I can speed this up somewhat by storing pointers to functions (IMP) rather than calling them using [object message] notation. But if I'm going to go through all this trouble, I wonder if I might just be better off using C++.
Any thoughts on this?
Objective C is absolutely fast enough for DSP/audio programming, because Objective C is a superset of C. You don't need to (and shouldn't) make everything a message. Where performance is critical, use plain C function calls (or use inline assembly, if there are hardware features you can leverage that way). Where performance isn't critical, and your application can benefit from the features of message indirection, use the square brackets.
The Accelerate framework on OS X, for example, is a great high-performance Objective C library. It only uses standard C99 function calls, and you can call them from Objective C code without any wrapping or indirection.
The problem with Objective-C and functions like DSP is not speed per se but rather the uncertainty of when the inevitable bottlenecks will occur.
All languages have bottlenecks but in static linked languages like C++ you can better predict when and where in the code they will occur. In the case of Objective-C's runtime coupling, the time it takes to find the appropriate object, the time it takes to send a message is not necessary slow but it is variable and unpredictable. Objective-C's flexibility in UI, data management and reuse work against it in the case of tightly timed task.
Most audio processing in the Apple API is done in C or C++ because of the need to nail down the time it takes code to execute. However, its easy to mix Objective-C, C and C++ in the same app. This allows you to pick the best language for the immediate task at hand.
Is Objective C fast enough for DSP/audio programming
Real Time Rendering
Definitely Not. The Objective-C runtime and its libraries are simply not designed for the demands of real time audio rendering. The fact is, it's virtually impossible to guarantee that using ObjC runtime or libraries such as Foundation (or even CoreFoundation) will not result your renderer missing its deadline.
The common case is a lock -- even a simple heap allocation (malloc, new/new[], [[NSObject alloc] init]) will likely require a lock.
To use ObjC is to utilize libraries and a runtime which assume locks are acceptable at any point within their execution. The lock can suspend execution of your render thread (e.g. during your render callback) while waiting to acquire the lock. Then you can miss your render deadline because your render thread is held up, ultimately resulting in dropouts/glitches.
Ask a pro audio plugin developer: they will tell you that blocking within the realtime render domain is forbidden. You cannot e.g. run to the filesystem or create heap allocations because you have no practical upper bound regarding the time it will take to finish.
Here's a nice introduction: http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing
Offline Rendering
Yes, it would be acceptably fast in most scenarios for high level messaging. At the lower levels, I recommend against using ObjC because it would be wasteful -- it could take many, many times longer to render if ObjC messaging used at that level (compared to a C or C++ implementation).
See also: Will my iPhone app take a performance hit if I use Objective-C for low level code?
objc_msgSend is just a utility.
The cost of sending a message is not just the cost of sending the message.
It is the cost of doing everything that the message initiates.
(Just like the true cost of a function call is its inclusive cost, including I/O if there is any.)
What you need to know is where are the time-dominant messages coming from and going to and why.
Stack samples will tell you which routines / methods are being called so often that you should figure out how to call them more efficiently.
You may find that you're calling them more than you have to.
Especially if you find that many of the calls are for creating and deleting data structure, you can probably find better ways to do that.

iPhone OS memory problem - how to debug?

I have a pretty weird problem in my iPhone app which is, I think, related to memory getting corrupted:
At one point, I need to sort an array, which I do with -[sortArrayUsingFunction].
The result is not correct unless I either allocate some memory with something like void *test = malloc(2 * sizeof( int )) before the method call or have, e.g., a call to NSLog() (which is never invoked) in the sorting function.
In other words: the sorting only works if I slightly increase the memory that was used before calling the sorting function. I think this is because at some point, memory gets corrupted.
How do you debug something like this?
It sounds like some of your code is using already released objects. A lot of help with debugging this kind of errors is provided in Appleā€™s great Mac OS X Debugging Magic tech note, especially the foundation part.
For your case I'd disable autorelease pools (setting the environment variable NSEnableAutoreleasePool=NO) or use the zombie feature (NSZombieEnabled=YES) to find places where you send messages to released objects.
Try running your program in the simulator under Valgrind:
http://valgrind.org/
And how to use it under the simulator:
http://landonf.bikemonkey.org/code/iphone/iPhone_Simulator_Valgrind.20081224.html
You may have to change the VALGRIND path in the code example depending on where it gets installed.
Such things can be a challenge to debug. There are some tools for detecting out-of-bounds accesses and such on other platforms, so I presume there would be something for the iPhone, however I don't know of any.
Perhaps you should store two copies of the array, and compare them for differences. Print out the differences. The nature of the "junk" that was introduced to one of the arrays might give a hint as to where it came from.
Also just go through the code that runs before this point, and re-read it (or better yet, get someone else to read it). You might spot a bug.