Can I use OpenACC to system call Python function? - pycuda

I want to parallelize a Python loop on GPU, but I don't want to use pyCUDA, because I need to do lots of thing myself. I am looking for something like OpenACC as in C++ for Python to implement the simple parallelization, but it seems no such thing. So I am thinking just using OpenACC in C++ and then system call a Python script, as in the code below. Will this work? Or is there any simple alternative without using pyCUDA?
void foo(float*parameters){
%%system call python function with parameters as input
}
#pragma acc parallel loop
for ( int i=0; i<n; ++i) {
foo(parameters[i]);
//call on the device
}

No, this wont work. You can't execute a host system call from the device.
For OpenACC device code, you can only call routines having the OpenACC "routine" directive, or a CUDA "device" routine.

Related

wrap function without dlsym

How to write a shared library that:
wraps a system function (say malloc),
internally uses the real version of wrapped functions (e.g., malloc defined in libc), AND
can be linked from client code without giving --wrap=malloc every time it is used?
I learned from several posts that I can wrap system functions with --wrap option of ld; something like this:
void * __wrap_malloc(size_t sz) {
return __real_malloc(sz);
}
and get a shared library with:
gcc -O0 -g -Wl,--wrap=malloc -shared -fPIC m.c -o libwrapmalloc.so
But when a client code links this library, it needs to pass --wrap=malloc every time. I want to hide this from the client code, as the library I am working on actually wraps tons of system functions.
An approach I was using was to define malloc and find the real malloc in libc using dlopen and dlsym. This was nearly what I needed, but just as someone posted before Function interposition in Linux without dlsym, dlsym and dlopen internally call mem-alloc functions (calloc, as I witnessed it) so we cannot easily override calloc/malloc functions with this approach.
I recently learned --wrap and thought it was neat, but I just do not want to ask clients to give tons of --wrap=xxxx arguments every time they get executables...
I want to have a situation in which malloc in the client code calls malloc defined in my shared library whereas malloc in my shared library calls malloc in libc.
If this is impossible, I would like to reduce the burden of the clients to give lots of --wrap=... arguments correctly.

Using all CPUs on CENT OS to run a c++ program

Hi and thanks for the help. I have been running a program that has many functions with for loops that iterate over 10000 times. I have been using "#pragma omp_set_num_threads();" to use all the CPUs of my CENT OS device. This seems to work fine for all functions i have in my program except one. The function it doesnt work on is something like this:
void move_check()//function to check if "molecule obj" is in space under consid
{
for(int i = 0 ; i < NM ; ++i)//NM-no of "molecules"
{
int bounds;
bounds = molecule_obj.at(i).check(dom_x,dom_y,dom_z);
////returns status of molecule
////molecule is a class that i have created and molecule_obj is an obj of that class.
if(bounds==1)
{
molecule_obj.erase(molecule_obj.begin()+i);
i -= 1;
NM -= 1;
}
}
}
Can I use pragma for this? If not what other alternative do i have?
As the above function is the one that seems to be consuming the most time, i would like to utilize all the CPUs to execute it. How do i go about doing that?
Thanks a lot.
Yes, in principle you can use an OpenMP pragma, just add
#pragma omp parallel for
before the for loop.
You have to make sure that it is save to use the methods of your molecule_obj in parallel. I cannot know if they are but I assume the following:
molecule_obj.at(i).check(dom_x,dom_y,dom_z); really works with one specific molecule and does depend any others. That is fine than.
But the erase function most likely is not, dependent on how you store the entries. If you use something like std::list, std::vector erasing an element during the loop would lead to invalid iterators, wrong trip counts, etc.
I suggest that you replace the removal of the entry in the loop by just marking it for removal. You can add a special flag to each molecule for that.
Then after the completion of the parallel loop you can go over the list once in serial and actually remove the marked entries.
Just another suggestion: make a clearer distinction between one molecule and the list of molecules. It would improve the understandability of your code.

Implementing SPI library in Arduino (how do classes work?)

I am currently trying to self learn Arduino/C programming/Assembly. I am working on a project which requires a lot of data collection, and by research I discovered a chip called the "23K256" from Microchip (see here: http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en539039). Moreover, I have also discovered that an Arduino library taking advantage of this chip exists (see here: http://playground.arduino.cc/Main/SpiRAM). I downloaded the "spiRAM3a.zip" file, which I believe is the one most up-to-date. Note that I have only recently downloaded the Arduino software and thus have the latest version installed (I believe it's 1.0.6). Also note that I'm using Arduino Uno, although I will eventually need to use Arduino Mega (I just want this working on ANYTHING at this point). With this library is some code that exemplifies its use to read and write to the 23K256 (the file name is "SpiRAM_Example" included in the package I downloaded), effectively increasing the SRAM on Arduino available. Here is the actual, exact code:
#include <SPI.h>
#include <SpiRAM.h>
#define SS_PIN 10
byte clock = 0;
SpiRAM SpiRam(0, SS_PIN);
void setup() {
Serial.begin(9600);
}
void loop()
{
char data_to_chip[17] = "Testing 90123456";
char data_from_chip[17] = " ";
int i = 0;
// Write some data to RAM
SpiRam.write_stream(0, data_to_chip, 16);
delay(100);
// Read it back to a different buffer
SpiRam.read_stream(0, data_from_chip, 16);
// Write it to the serial port
for (i = 0; i < 16; i++) {
Serial.print(data_from_chip[i]);
}
Serial.print("\n");
delay(1000); // wait for a second
}
My problem is that when I complie the code, to test my confguration and try to learn its use, I surprisingly get an error. This is what I get:
SpiRAM_Example:7: error: 'SpiRAM' does not name a type
SpiRAM_Example.ino: In function 'void loop()':
SpiRAM_Example:20: 'SpiRAM' was not declared in this scope
So it's basically telling me that there's something wrong with the SpiRAM SpiRam(0, SS_PIN);line of code. My question is, why? Am I misunderstanding something very fundamental about how classes work? I feel like I must not be doing something because I highly doubt an incorrect piece of code would be published on Arduino's website. How can I get this code to compile, or at least be able to simply use this library? Should I post the code for the library itself ("SpiRAM.h"), which was included in the package I downloaded?
I would really appreciate any help I can get, and sincerely apologize if this is a really dumb question. I think this is the first I've worked with classes.
Did you download Attach:spiRAM3a.zip or the original? I installed this and your code. It complies on the IDE 1.05

Pause/Resume embedded python interpreter

Is there any possibility to pause/resume the work of embedded python interpreter in place, where I need? For example:
C++ pseudo-code part:
main()
{
script = "python_script.py";
...
RunScript(script); //-- python script runs till the command 'stop'
while(true)
{
//... read values from some variables in python-script
//... do some work ...
//... write new value to some other variables in python-script
ResumeScript(script); //-- python script resumes it's work where
// it was stopped. Not from begin!
}
...
}
Python script pseudo-code part:
#... do some init-work
while true:
#... do some work
stop # - here script stops and C++-function RunScript()
# returns control to C++-part
#... After calling C++-function ResumeScript
# the work continues from this line
Is this possible to do with Python/C API?
Thanks
I too have recently been searching for a way to manually "drive" an embedded language and I came across this question and figured I'd share a potential workaround.
I would implement the "blocking" behavior either through a socket, or some kind of messaging system. Instead of actually stopping the whole python interpreter, just have it block when it is waiting for C++ to do it's evaluations.
C++ will start the embedded runtime, then enter a loop of some sort that waits for python to "throw the signal" that it's ready. For instance C++ listens on port 5000, starts python, python does work, connects to port 5000 on localhost, then C++ sees the connection and grabs the data from python, performs work on it, then shuffles the data back over the socket to python, where python then receives the data and leaves the blocking loop.
I still need a way to fully pause the virtual runtime, but in your case you could achieve the same thing with a socket and some blocking behavior that uses the socket to coordinate the two pieces of code.
Good luck :)
EDIT: You may be able to hook this "injection" functionality used in this answer to completely stop python. Just modify it to inject a wait-loop perhaps.
Stopping embedded Python

Combining GStreamer, AnyEvent and EV (perl)

I'm trying to use GStreamer within an existing perl application that uses AnyEvent with the EV event loop. It is not a Glib application. I have loaded EV::Glib to get the Glib main loop to use EV. I freely admit that, with respect to Glib, I am pretty ignorant. I think that I have all the bits that I need but I am struggling (failing) to get them to work together.
If I use a standalone perl program to build a GStreamer pipeline and then put it into PLAYING state then it all simply works. I do not need to do anything with a Glib mainloop or otherwise tickle the GStreamer bus.
Building the same pipeline in my existing application, within the context of an AnyEvent event-handler, then it fails to run the pipeline. I have played around with various ways of trying to exercise it, including calling $pipeline->get_bus->poll(). If I call ...->poll() repeatedly in the original event handler (that is, the handler does not return) then it works but this is clearly not a valid solution. Calling ...->poll() in an AnyEvent timer callback does not run the pipeline.
My best guess at the moment is that EV::Glib enables some level of integration but does not actually run the necessary bits of the main loop. What am I missing?
I came here with a similar question regarding EV::Glib usage, but ended up having no issues using it.. So maybe I'm missing what you're trying to do here.
Here is the simple script I knocked up to test how EV::Glib works:
use EV::Glib;
use Gtk2 '-init';
my $t = EV::timer 1, 1, sub { print "I am here!\n" };
Glib::Timeout->add(1000, sub { print "I am also here!\n" });
my $window = Gtk2::Window->new('toplevel');
$window->signal_connect(delete_event => sub { EV::unloop });
my $button = Gtk2::Button->new('Action');
$button->signal_connect(clicked => sub {
print("Hello Gtk2-EV-Perl\n");
});
$window->add($button);
$window->show_all;
EV::loop;
With this the signal handler on the button will work, and so too will both the timer events. So the EV loop will correctly drive the entire thing.
The main issue I can see is in the documentation: "This [module] makes Glib compatible to EV. Calls into the Glib main loop are more or less equivalent to calls to EV::loop (but not vice versa, you have to use the Glib mainloop functions)."
What this means is if you're hookling up an EV::loop event, it won't equate to a Glib::mainloop and so might not 'tickle' (or 'be tickled by') your GStreamer event. Maybe that could be the issue you're experiencing, especially if you're using AnyEvent and its generic callbacks which are likely translating to EV::loop calls instead of Glib::MainLoop calls.
This is all just a guess though -- I've never used GStreamer myself, and I certainly don't know what you're trying to achieve without seeing more code. But I think my halfassed conclusion is pretty sound advice regardless: If you're using something specific to Glib, you should probably be hooking up events to it using Glib.
EV::Glib embeds Glib into EV - you (and everybody else) has to use EV to make it work. Chances are that gstreamer doesn't know about this and disrespectfully calls glib mainloop functions internally, which doesn't work.
Fortunately there is another module that does just the opposite: Glib::EV. That module makes Glib use EV internally. When using it, you can/should use the glib mainloop functions (you can use EV watchers, but you cannot call glib functions from the EV callbacks, as glib doesn't support that).
It might be better suited to your application, as applications using glib will "just work", as the EV usage is completely internal.
Another possible issue is that perl modules are dynamically linked, and only "accidentally" get the same library. For all of this to work, you need to ensure that all perl modules link against the same shared glib library.