Cuda cuLaunchHostFunc equivalent in python libs - callback

Using Python to drive CUDA, I want to schedule a Python host function asynchronous in a stream that runs after a kernel and memory copy has been taken place.
Is there an equivalent to the CUDA C++ function CUresult cuLaunchHostFunc(CUstream hStream, CUhostFn fn, void* userData) in one of the Python libs (PyCuda, Numba, ...)?
The Driver API function is here in the CUDA docs

Is there an equivalent to the CUDA C++ function CUresult cuLaunchHostFunc(CUstream hStream, CUhostFn fn, void* userData) in one of the Python libs (PyCuda, Numba, ...)?
Not in either of those two. None of the driver API based frameworks for CUDA I am aware of exposes cuLaunchHostFunc (PyCUDA, Numba, JCUDA).
I want to schedule a Python host function asynchronous in a stream that runs after a kernel and memory copy has been taken place
Nothing in the native CUDA driver API could ever support that. Tensorflow and Pytorch both have elaborate execution pipelining and callback mechanisms at a Python level which might get you something functionally similar to what you envisage. But it won't be done at a CUDA level, it will be at a higher level of abstraction.

Related

AArch64 - GNU ld - multiple linker scripts (for kernel and userland)

I have started a bare-metal application for AArch64. The bare-metal application should implement a simple kernel (for memory/device management and exception handling) and an userland which can made syscalls to output something over the UART via printf() as example. Currently I'm working on the kernel at EL1. The indent is to put kernel and userland in a single ELF binary, because I don't have implemented an filesystem driver and ELF support yet.
The kernel should reside at address 0xC0000000 and the main application (userland) at 0x40000000 as example. But I will change this addresses later. Is it possible to pass two linker scripts to GNU ld? I realize that I must use different sections for kernel and userland.
Or in another question:
Is my indent even possible? Okay it's maybe a generic question, but currently didn't find a similar question here.
From the LD manual: https://man7.org/linux/man-pages/man1/ld.1.html, it's said:
Multiple -T options accumulate.
Just use it like this: -T script1.ld -T script2.ld

Take kernel dump on-demand from user-space without kernel debugging (Windows)

What would be the simplest and most portable way (in the sense of only having to copy a few files to the target machine, like procdump is) to generate a kernel dump that has handle information?
procdump has the -mk option which generates a limited dump file pertaining to the specified process. It is reported in WinDbg as:
Mini Kernel Dump File: Only registers and stack trace are available. Most of the commands I try (!handle, !process 0 0) fail to read the data.
Seems that officially, windbg and kd would generate dumps (which would require kernel debugging).
A weird solution I found is using livekd with -ml: Generate live dump using native support (Windows 8.1 and above only).. livekd still looks for kd.exe, but does not use it :) so I can trick it with an empty file, and does not require kernel debugging. Any idea how that works?
LiveKD uses the undocumented NtSystemDebugControl API to capture the memory dump. While you can easily find information about that API online the easiest thing to do is just use LiveKD.

Debugging Linux LKM: how to force probe()

When you insert LKM with insmod it does not seem to execute defined probe() function. What do I need to do to trigger it?
Background: trying to create driver for MAX14830 for old kernel (2.6.39). Cannot use one available (max310x.c) because of old kernel, no support for regmap etc. In the source tree of old kernel there is max3107 driver (same thing, but for 1 serial port, while 14830 has 4). Both drivers use probe functions for initialization, as the SOC communicates with MAX chip over spi. I want to develop driver as LKM first.
What could be my problem?

Virtualization CPU Emulation

I have a question about CPU virtualization from a virtual machine. I am not able to understand the difference between on-the-fly to native code translation and trap-and-emulate translation.
As far as I understand, in the first case suppose I emulate binary code from a different platform the code is converted to the equivalent x86 instruction if I have an x86 CPU. Now in the trap-and-emulate method the virtual machine receives the ISA call from the guest OS and translates it to the equivalent ISA call for the host OS.
Why do we need to translate from ISA to ISA? Suppose I am running an Ubuntu guest on a Windows host. The Ubuntu ISA call is different from the Windows ISA call? I understand that the Guest is not able to access System ISA on the host, only the monitor can do that. But why there is a need of conversion to the Host ISA? The ISA depends also on the operating system?
"On-the-fly to native" translation (often called JIT compilation/translation) is used when running code from one ISA on another ISA, such as running M68K code on an x86 CPU.
It's in no way virtualization, but emulation.
Trap-and-emulate is a way to run "privileged" code in an unprivileged environment (example: running a kernel as an application).
The way it works is that you start executing the privileged code, and once it tries to execute a privileged instruction (lidt in x86 for example), the host OS will issue a trap. In the handler for that trap, you could emulate that specific privileged instruction, and then let the guest kernel continue executing.
The advantage of this is that you will reach close to native speeds for CPU emulation.
However, just emulating the ISA is only a "small" part of emulating a complete system. Emulating/virtualization of the MMU is much more complex to get right, and to get running fast.

Can NLTK be used in a Postgres Python Stored Procedure

Has anyone done or even no if its possible to use NLTK within a Postgres Python Stored Procedure or trigger
You can use pretty much any Python library in a PL/Python stored procedure or trigger.
See the PL/Python documentation.
Concepts
The crucial point to understand is that PL/Python is CPython (in PostgreSQL up to and including 9.3, anyway); it uses exactly the same interpreter that the normal standalone Python does, it just loads it as a library into the PostgreSQL backed. With a few limitations (outlined below), if it works with CPython it works with PL/Python.
If you have multiple Python interpreters installed on your system - versions, distributions, 32-bit vs 64-bit etc - you might need to make sure you're installing extensions and libraries into the right one when running distutils scripts, etc, but that's about it.
Since you can load any library available to the system Python there's no reason to think NLTK would be a problem unless you know it requires things like threading that aren't really recommended in a PostgreSQL backend. (Sure enough, I tried it and it "just worked", see below).
One possible concern is that the startup overhead of something like NLTK might be quite big, you probably want to preload PL/Python it in the postmaster and import the module in your setup code so it's ready when backends start. Understand that the postmaster is the parent process that all the other backends fork() from, so if the postmaster preloads something it's available to the backends with greatly reduced overheads. Test performance either way.
Security
Because you can load arbitrary C libraries via PL/Python and because the Python interpreter has no real security model, plpythonu is an "untrusted" language. Scripts have full and unrestricted access to the system as the postgres user and can fairly simply bypass access controls in PostgreSQL. For obvious security reasons this means that PL/Python functions and triggers may only be created by the superuser, though it's quite reasonable to GRANT normal users the ability to run carefully written functions that were installed by the superuser.
The upside is that you can do pretty much anything you can do in normal Python, keeping in mind that the Python interpreter's lifetime is that of the database connection (session). Threading isn't recommended, but most other things are fine.
PL/Python functions must be written with careful input sanitation, must set search_path when invoking the SPI to run queries, etc. This is discussed more in the manual.
Limitations
Long-running or potentially problematic things like DNS lookups, HTTP connections to remote systems, SMTP mail delivery, etc should generally be done from a helper script using LISTEN and NOTIFY rather than an in-backend job in order to preserve PostgreSQL's performance and avoid hampering VACUUM with lots of long transactions. You can do these things in the backend, it just isn't a great idea.
You should avoid creating threads within the PostgreSQL backend.
Don't attempt to load any Python library that'll load the libpq C library. This could cause all sorts of exciting problems with the backend. When talking to PostgreSQL from PL/Python use the SPI routines not a regular client library.
Don't do very long-running things in the backend, you'll cause vacuum problems.
Don't load anything that might load a different version of an already loaded native C library - say a different libcrypto, libssl, etc.
Don't write directly to files in the PostgreSQL data directory, ever.
PL/Python functions run as the postgres system user on the OS, so they don't have access to things like the user's home directory or files on the client side of the connection.
Test result
$ yum install python-nltk python-nltk
$ psql -U postgres regress
regress=# CREATE LANGUAGE plpythonu;
regress=# CREATE OR REPLACE FUNCTION nltk_word_tokenize(word text) RETURNS text[] AS $$
import nltk
return nltk.word_tokenize(word)
$$ LANGUAGE plpythonu;
regress=# SELECT nltk_word_tokenize('This is a test, it''s going to work fine');
nltk_word_tokenize
-----------------------------------------------
{This,is,a,test,",",it,'s,going,to,work,fine}
(1 row)
So, as I said: Try it. So long as the Python interpreter PostgreSQL is using for plpython has nltk's dependencies installed it will work fine.
Note
PL/Python is CPython, but I'd love to see a PyPy based alternative that can run untrusted code using PyPy's sandbox features.