Issue while using VLfeat's vl_kdforest_query_with_array - multicore

I am trying to use VlFeat's vl_kd_forest_query and vl_kdforest_query_with_array. The documentation is poor. All it says is that vl_kdforest_query_with_array "can" make use of multiple cores for running. Does anyone know if this function already uses multiple cores or is there a way to make use of it? Thanks!
(I know one option is to go through the code, but just wanted to know if anyone already has this information.)

This function uses parallelization via OpenMP as you can see here:
#ifdef _OPENMP
#pragma omp for
#endif
for(qi = 0 ; qi < (signed)numQueries; ++ qi) {
/* ... */
}

Related

Can I use OpenACC to system call Python function?

I want to parallelize a Python loop on GPU, but I don't want to use pyCUDA, because I need to do lots of thing myself. I am looking for something like OpenACC as in C++ for Python to implement the simple parallelization, but it seems no such thing. So I am thinking just using OpenACC in C++ and then system call a Python script, as in the code below. Will this work? Or is there any simple alternative without using pyCUDA?
void foo(float*parameters){
%%system call python function with parameters as input
}
#pragma acc parallel loop
for ( int i=0; i<n; ++i) {
foo(parameters[i]);
//call on the device
}
No, this wont work. You can't execute a host system call from the device.
For OpenACC device code, you can only call routines having the OpenACC "routine" directive, or a CUDA "device" routine.

Using all CPUs on CENT OS to run a c++ program

Hi and thanks for the help. I have been running a program that has many functions with for loops that iterate over 10000 times. I have been using "#pragma omp_set_num_threads();" to use all the CPUs of my CENT OS device. This seems to work fine for all functions i have in my program except one. The function it doesnt work on is something like this:
void move_check()//function to check if "molecule obj" is in space under consid
{
for(int i = 0 ; i < NM ; ++i)//NM-no of "molecules"
{
int bounds;
bounds = molecule_obj.at(i).check(dom_x,dom_y,dom_z);
////returns status of molecule
////molecule is a class that i have created and molecule_obj is an obj of that class.
if(bounds==1)
{
molecule_obj.erase(molecule_obj.begin()+i);
i -= 1;
NM -= 1;
}
}
}
Can I use pragma for this? If not what other alternative do i have?
As the above function is the one that seems to be consuming the most time, i would like to utilize all the CPUs to execute it. How do i go about doing that?
Thanks a lot.
Yes, in principle you can use an OpenMP pragma, just add
#pragma omp parallel for
before the for loop.
You have to make sure that it is save to use the methods of your molecule_obj in parallel. I cannot know if they are but I assume the following:
molecule_obj.at(i).check(dom_x,dom_y,dom_z); really works with one specific molecule and does depend any others. That is fine than.
But the erase function most likely is not, dependent on how you store the entries. If you use something like std::list, std::vector erasing an element during the loop would lead to invalid iterators, wrong trip counts, etc.
I suggest that you replace the removal of the entry in the loop by just marking it for removal. You can add a special flag to each molecule for that.
Then after the completion of the parallel loop you can go over the list once in serial and actually remove the marked entries.
Just another suggestion: make a clearer distinction between one molecule and the list of molecules. It would improve the understandability of your code.

Implementing SPI library in Arduino (how do classes work?)

I am currently trying to self learn Arduino/C programming/Assembly. I am working on a project which requires a lot of data collection, and by research I discovered a chip called the "23K256" from Microchip (see here: http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en539039). Moreover, I have also discovered that an Arduino library taking advantage of this chip exists (see here: http://playground.arduino.cc/Main/SpiRAM). I downloaded the "spiRAM3a.zip" file, which I believe is the one most up-to-date. Note that I have only recently downloaded the Arduino software and thus have the latest version installed (I believe it's 1.0.6). Also note that I'm using Arduino Uno, although I will eventually need to use Arduino Mega (I just want this working on ANYTHING at this point). With this library is some code that exemplifies its use to read and write to the 23K256 (the file name is "SpiRAM_Example" included in the package I downloaded), effectively increasing the SRAM on Arduino available. Here is the actual, exact code:
#include <SPI.h>
#include <SpiRAM.h>
#define SS_PIN 10
byte clock = 0;
SpiRAM SpiRam(0, SS_PIN);
void setup() {
Serial.begin(9600);
}
void loop()
{
char data_to_chip[17] = "Testing 90123456";
char data_from_chip[17] = " ";
int i = 0;
// Write some data to RAM
SpiRam.write_stream(0, data_to_chip, 16);
delay(100);
// Read it back to a different buffer
SpiRam.read_stream(0, data_from_chip, 16);
// Write it to the serial port
for (i = 0; i < 16; i++) {
Serial.print(data_from_chip[i]);
}
Serial.print("\n");
delay(1000); // wait for a second
}
My problem is that when I complie the code, to test my confguration and try to learn its use, I surprisingly get an error. This is what I get:
SpiRAM_Example:7: error: 'SpiRAM' does not name a type
SpiRAM_Example.ino: In function 'void loop()':
SpiRAM_Example:20: 'SpiRAM' was not declared in this scope
So it's basically telling me that there's something wrong with the SpiRAM SpiRam(0, SS_PIN);line of code. My question is, why? Am I misunderstanding something very fundamental about how classes work? I feel like I must not be doing something because I highly doubt an incorrect piece of code would be published on Arduino's website. How can I get this code to compile, or at least be able to simply use this library? Should I post the code for the library itself ("SpiRAM.h"), which was included in the package I downloaded?
I would really appreciate any help I can get, and sincerely apologize if this is a really dumb question. I think this is the first I've worked with classes.
Did you download Attach:spiRAM3a.zip or the original? I installed this and your code. It complies on the IDE 1.05

google::dense_hash_map vs std::tr1::unordered_map?

I'm working on a Mobile Game for several platforms ( Android, iOS, and some maybe even some kind of console in the future ).
I'm trying to decide whether to use tr1::unordered_map or google::dense_hash_map to retrieve Textures from a Resource Manager (for later binding using OpenGL). Usually this can happen quite a few times per second (N per frame, where my Game is running at ~60 fps)
Considerations are:
Performance (memory and cpu wise)
Portability
Any ideas or suggestions are welcome.
http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/
http://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/
go with the STL for standard containers. They have predictable behavior, and can be used seamlessly in STL algos/iterators. You're also given some performance guarantees by the STL.
This should also guarantee portability. Most compilers have the new standard implemented.
In a C++ project I developed, I was wondering something similar: which one was best, tr1:unordered_map, boost::unordered_map or std::map? I ended up declaring a typedef, controllable at compilation:
#ifdef UnorderedMapBoost
typedef boost::unordered_map<cell_key, Cell> cell_map;
#else
#ifdef UnorderedMapTR1
typedef std::tr1::unordered_map<cell_key, Cell> cell_map;
#else
typedef std::map<cell_key, Cell> cell_map;
#endif // #ifdef UnorderedMapTR1
#endif // #ifdef UnorderedMapBoost
I could then control at compile-time which one to use, and profiled it. In my case, the portability ended up being more important, so I normally use std::map.

How to increment a value using a C-Preprocessor in Objective-C? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Can you make an incrementing compiler constant?
Example: I try to do this:
static NSInteger stepNum = 1;
#define METHODNAME(i) -(void)step##i
#define STEP METHODNAME(stepNum++)
#implementation Test
STEP {
// do stuff...
[self nextFrame:#selector(step2) afterDelay:1];
}
STEP {
// do stuff...
[self nextFrame:#selector(step3) afterDelay:1];
}
STEP {
// do stuff...
[self nextFrame:#selector(step4) afterDelay:1];
}
// ...
When building, Xcode complains that it can't increment stepNum. This seems logical to me, because at this time the code is not "alive" and this pre-processing substitution stuff happens before actually compiling the source code. Is there another way I could have an variable be incremented on every usage of STEP macro, the easy way?
It seems to me that the fundamental problem is having these numbered variables, which are really just a poor man's array. An array would be the idiomatic way to do this in Objective-C.
Not a chance that will work. For a start, stepNum isn't a preprocessor macro so the preprocessor just thinks of it as a load of characters. Explicitly naming the steps would definitely be a good thing. Your macro doesn't save much typing and obfuscates the code, even if you could get it to work.
Anyway, this is the wrong way to do what you want. You actually seem to be reinventing program control flow.
ETA: It occurs to me that my answer to your other question might help here. You manually number all of your methods, but then put all their selectors in an array. You then iterate through the array and the order is determined by the order you put them in the array.