Hi and thanks for the help. I have been running a program that has many functions with for loops that iterate over 10000 times. I have been using "#pragma omp_set_num_threads();" to use all the CPUs of my CENT OS device. This seems to work fine for all functions i have in my program except one. The function it doesnt work on is something like this:
void move_check()//function to check if "molecule obj" is in space under consid
{
for(int i = 0 ; i < NM ; ++i)//NM-no of "molecules"
{
int bounds;
bounds = molecule_obj.at(i).check(dom_x,dom_y,dom_z);
////returns status of molecule
////molecule is a class that i have created and molecule_obj is an obj of that class.
if(bounds==1)
{
molecule_obj.erase(molecule_obj.begin()+i);
i -= 1;
NM -= 1;
}
}
}
Can I use pragma for this? If not what other alternative do i have?
As the above function is the one that seems to be consuming the most time, i would like to utilize all the CPUs to execute it. How do i go about doing that?
Thanks a lot.
Yes, in principle you can use an OpenMP pragma, just add
#pragma omp parallel for
before the for loop.
You have to make sure that it is save to use the methods of your molecule_obj in parallel. I cannot know if they are but I assume the following:
molecule_obj.at(i).check(dom_x,dom_y,dom_z); really works with one specific molecule and does depend any others. That is fine than.
But the erase function most likely is not, dependent on how you store the entries. If you use something like std::list, std::vector erasing an element during the loop would lead to invalid iterators, wrong trip counts, etc.
I suggest that you replace the removal of the entry in the loop by just marking it for removal. You can add a special flag to each molecule for that.
Then after the completion of the parallel loop you can go over the list once in serial and actually remove the marked entries.
Just another suggestion: make a clearer distinction between one molecule and the list of molecules. It would improve the understandability of your code.
Related
I'm trying to code in OMNeT++ an app that gets the queue length from the node where it is invoked and sends it to another node.
The plan is to modify the UdpBasicApp.cc file invoked in a router and make it get the length of the queue of the DropTailQueue module.
Searching online I found that this is the right method...
cModule *mod = getModuleByPath("router3.eth[*].mac.queue");
queueing::PacketQueue *queue = check_and_cast<queueing::PacketQueue*>(mod);
int c = queue->getNumPackets();
EV << c;
...since the DropTailQueue extends the PacketQueue module.
I put a print at the end to see if there was something wrong.
When I run the simulation, using the modified UdpBasicApp module, c is always 0.
I hardly doubt that the queue is always 0, but I don't know how to verify this doubt.
If it's an error, why is it always 0?
My guess is, that you are querying a different queue than you assume. You should not use patterns (i.e. *) in your model path, because that may match on multiple eth modules, and it's unspecified, which one will be returned.
I've been working on some Perl libraries for data mining. The libraries are full of nested loops for gathering and processing information. I'm working with strict mode and I always declare my variables with my outside of the first loop. For instance:
# Pretty useless code for clarity purposes:
my $flag = 1;
my ($v1, $v2);
while ($flag) {
for $v1 (1 .. 1000) {
# Lots and lots of code...
$v2 = $v1 * 2;
}
}
For what I've read here, performance-wise, it is better to declare them outside of the loop, however, the maintenance of my code is becoming increasingly difficult because the declaration of some variables ends up pretty far away from where they are actually used.
Something like this would be easier to mantain:
my $flag = 1;
while ($flag) {
for my $v1 (1 .. 1000) {
# Lots and lots of code...
my $v2 = $v1 * 2;
}
}
I don't have much of experience with Perl since I come from working mostly with C++. At some point, I would like to open source most of my libraries, so I would like them to be as pleasing for all of the Perl gurus as possible.
From a professional Perl developer point of view, what is most appropriate choice between these options?
The general rule is to declare every variable as late as possible.
If the value of a variable doesn't need to be kept across iterations of a loop then declare it inside the loop, or as the loop control variable for a for loop.
If it needs to remain static across the loop iterations (like your $flag) then declare it immediately before the loop.
Yes, there is a minimal speed cost to be paid if you discard and reallocate a variable every time a block is executed, but programming and maintenance costs are by far the most important efficiency and should always be put first.
You shouldn't be optimising your code before it has been made to work and found to be running too slowly; and even then, moving declarations to the top of the file is a long way down the list of compromises that are likely to make a useful difference.
Optimize for readability. This means declaring variables in the smallest possible scope. Ideally, I can see the variable declaration and all usages of that variable at the same time. We can only keep a very limited amount of context in our heads, so declaring variables near their use makes it easier to understand, write, and debug code.
Understanding what variant performs better is difficult to estimate, and difficult to measure as the effect will be rather small. But if performance is roughly equivalent, we might as well use the more readable variant.
I personally often try to write code in a single assignment form where variables aren't reassigned, and mutators like push #array, $elem are avoided. This makes sure that the name of a variable and its value are always interchangeable which makes it easier to reason about code. This implies that each variable declaration is also an initialization, which removes a whole class of errors.
You should declare variables when you're ready to define them unless you need to access the answer in a larger scope. Even then passing the value back explicitly will be easier to follow.
The particular example you have given (declaration of a loop variable) probably does not have a performance penalty. As the link you quoted says the reason for a performance difference boils down to whether the variable is initialised inside the loop. In the case of a for loop it will be initialised either way.
I almost always declare the variables in the innermost scope possible. It reduces the chances of making mistakes. I would only alter that if performance became a problem in a specific loop.
By that I mean, is it acceptable to add to or subtract from semaphores? The example I have is the following:
semaphore secureTarget = 7;
semaphore allClearAlert = 0;
semaphore bellAlert = 0;
Archer:
start();
wait(secureTarget);
wait(allClearAlert);
fireAtTarget();
signal(secureTarget);
wait(secureTarget - 7);
signal(bellAlert);
end();
Boy:
start();
signal(allClearAlert);
wait(bellAlert);
end();
Does that seem acceptable? If it helps, the initial question I'm trying to answer is:
An archery club has seven targets. Archers in the club must compete
to secure a target. Once an archer secures her target she must wait
until the all-clear has been sounded before she can fire. Once an
archer finishes firing she leaves her target. The last archer to finish
sounds the bell that signifies that all have finished firing. Only then
is it safe for the small boy who collects the arrows to venture forth.
When all the arrows have been collected the boy gets out of the firing
lines and sounds the all-clear to the archers.
Semaphores can only be incremented using the signal() and wait() methods, you can't explictly change the variable as you describe. I can't give the solution explictly - looking at your history I think I'm doing the same coursework for the same module and I don't want to be done for plagiarism, but you may find the Little Book of Semaphores useful.
EDIT: you don't have to just use semaphores. You can use other types of shared data, as long as you use a mutex semaphore to control concurrent access to those variables.
For some current projects, I'm working with several data structures that are pretty large (in the area of 10K elements). To be able to access this data in lists, I need to use loops and iterators, which can be a pain when the problem area is in the latter half of the list.
So I find myself spending alot of time with my finger on the F8 button in Eclipse's debugger to loop through each element of an iterating loop. This gets worse when have to step through that particular section several times to get an idea why the code is reacting a particular way.
If one has a general idea how many times a loop is to execute before a problem area is hit, is there a way to set a loop breakpoint to execute up to that point then pause?
Use conditional breakpoints.
http://wiki.eclipse.org/FAQ_How_do_I_set_a_conditional_breakpoint%3F
I believe there's a better way to do this, but you can create a trivial block of code in the loop that only executes at a certain iteration, and put the breakpoint inside of it.
if (loopIndex == 1000) {
int number = 14; //Break here
}
Using this as an example:
for(int i=0;i<10000;i++){
System.out.println(i);
}
Set a breakpoint on the print line, then right click on it and select Breakpoint Properties.... From here you can set a condition to trigger the breakpoint. This is the similar to a conditional you would have in an if-statement. If you wanted to trigger the breakpoint when i equals 6000, check the Conditional box and try this:
For a normal app, you'd never want to do this.
But ... I'm making an educational app to show people exactly what happens with the different threading models on different iPhone hardware and OS level. OS 4 has radically changed the different models (IME: lots of existing code DOES NOT WORK when run on OS 4).
I'm writing an interactive test app that lets you fire off threads for different models (selector main thread, selector background, nsoperationqueue, etc), and see what happens to the GUI + main app while it happens.
But one of the common use-cases I want to reproduce is: "Thread that does a backgorund download then does a CPU-intensive parse of the results". We see this a lot in real-world apps.
It's not entirely trivial; the manner of "being busy" matters.
So ... how can I simulate this? I'm looking for something that is guaranteed not to be thrown-away by an optimizing compiler (either now, or with a better compiler), and is enough to force a thread to run at max CPU for about 5 seconds.
NB: in my real-world apps, I've noticed there are some strange things that happen when an iPhone thread gets busy - e.g. background threads will starve the main thread EVEN WHEN set at lower priority. Although this is clearly a bug in Apple's thread scheduler, I'd like to make a busy that demonstrates this - and/or an alternate busy that shows what happens when you DON'T trigger that behavioru in the scheduler.
UPDATE:
For instance, the following can have different effects:
for( int i=0; i<1000; i++ )
for( int k=0; k<1000; k++ )
CC_MD5( cStr, strlen(cStr), result );
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
...sometimes, at least, the compiler seems to optimize the latter (and I have no idea what the compiler voodoo is for that - some builds it showed no difference, some it did :()
UPDATE 2:
25 threads, on a first gen iPhone, doing a million MD5's each ... and there's almost no perceptible effect on the GUI.
Whereas 5 threads parsing XML using the bundled SAX-based parser will usually grind the GUI to a halt.
It seems that MD5 hashing doesn't trigger the problems in the iPhone's buggy thread-scheduler :(. I'm going to investigate mem allocations instead, see if that has a different effect.
You can avoid the compiler optimising things away by making sure the compiler can't easily infer what you're trying to do at compile time.
For example, this:
for( int i=0; i<1000000; i++ )
CC_MD5( cStr, strlen(cStr), result );
has an invariant input, so the compiler could realise that it's going to get the same result everytime. Or that the result isn't getting used so it doesn't need to calculate it.
You could avoid both these problems like so:
for( int i=0; i<1000000; i++ )
{
CC_MD5( cStr, strlen(cStr), result );
sprintf(cStr, "%02x%02x", result[0], result[1]);
}
If you're seeing the problem with SAX, then I'd start with getting the threads in your simulation app doing SAX and check you do see the same problems outside of your main app.
If the problem is not related to pure processor power or memory allocations, other areas you could look at would be disk I/O (depending where your xml input is coming from), mutexes or calling selectors/delegates.
Good luck, and do report back how you get on!
Apple actually provides sample code that does something similiar to what you are looking for at developer.apple.com, with the intent to highlight the performance differences between using LibXML (SAX) and CocoaXML. The focus is not on CPU performance, but assuming you can actually measure processor utilization, you could likely just scale up (repeat within your xml) the dataset that the sample downloads.