Dynamic Array in Metal(API)? - swift

I am writing an object recognition program. I am using Metal Api.The problem is that i need array list or dynamic array, but there is no dynamic array in Metal. Is there a way to declare one or to implement your own?

There is no way to do dynamic memory allocation inside the Metal kernels (shaders). I'd just define more buffers on the CPU side and pass it to the shader (instead of creating dynamic arrays inside the shaders). Just make sure to change the 'storage mode' to 'private' for the buffers you want to use just in the shader for the intermediate calculations. The 'private' mode mean the buffer is just on the GPU and CPU does not have access to it (and can reduce overhead).

Related

ebpf-tc: how to keep unique information inside a ebpf instance when same program is attached to multiple interface

When we pin a MAP, we can able to share information from userspace to ebpf but it is system wide.
But if i want to share different value to different instance from tc ingress/egress (array map with size of 1)
Is there any way to pass argument ?
Map (unpinned unique per instance) - update from userspace
Any other way to communicate from userspace to kernel (while attaching or after)
Really appreciate your help.
Pinning a map doesn't make it system wide. Every map is always accessible system wide, pinning just adds a reference to the file system to make it easier to find and to make sure a map isn't removed even when not in use by any program.
Is there any way to pass argument ?
No, once a program is loaded, only the kernel can pass arguments(contexts) to a program, userspace can only use maps to communicate with eBPF programs.
Map (unpinned unique per instance) - update from userspace
Any userspace program with the right permissions can update any map as long as you can obtain a file descriptor to the map. Map FDs can be obtained:
By creating a new map
By opening a map pin
By using its unique ID (directly or by looping over all loaded maps)
By IPC from another process that already has the FD
Any other way to communicate from userspace to kernel (while attaching or after)
Maps are it. You can rewrite the program before loading it into the kernel by setting constants at specific locations, but not at attach time. One way which might be interesting is that on newer kernels, global data is supported. Which allow you to change the value of variables defined in the global scope from userspace. In this case the global data is packed in a array map with a single key/value.

How ti get the size of eBPF map?

I'm new to eBPF, I want to insert elements to a BPF_ARRAY, so is there any way to do like C++ push_back() size() function?
Yes and no. You can change elements in an array map (through the system call bpf(BPF_MAP_UPDATE_ELEM, ...)), but you cannot dynamically resize your map at this time (see also this answer), although this might be possible in the future as some work in that direction has been presented before.
So for now you have a fixed-size array that you can update either from your BPF program, or from user space. For the user space side, in C++, you probably want to use libbpf to update your map. You have a low-level wrapper around the syscall in bpf.h, bpf_map_update_elem(), as well as a higher-level function that works on map objects managed by the library, in libbpf.h: bpf_map__update_elem().
To answer to the question from your title: Yes you can obtain the map size, again through the bpf() system call. Libbpf provides a wrapper: bpf_obj_get_info_by_fd(), which fills a struct bpf_map_info from which you can retrieve the size of the map.

Behaviour of passing struct as a parameter to a CUDA kernel

I'm relatively new to CUDA programming, so I want to clarify the behaviour of a struct when I pass it into a kernel. I've defined the following struct to somewhat imitate the behavior of a 3D array that knows its own size:
struct protoarray {
size_t dim1;
size_t dim2;
size_t dim3;
float* data;
};
I create two variables of type protoarray, dynamically allocate space to data via malloc and cudaMalloc on the host and device side, and update dim1, dim2 and dim3 to reflect the size of array I want this struct to represent. I read in this thread that the struct should be passed via copy. So this is what I do in my kernel
__global__ void kernel(curandState_t *state, protoarray arr_device){
const size_t dim1 = arr_device.dim1;
const size_t dim2 = arr_device.dim2;
for(size_t j(0); j < dim2; j++){
for(size_t i(0); i < dim1; i++){
// Do something
}
}
}
The struct is passed by copy, so all its contents are copied into shared memory of each block. This is where I'm getting bizarre behaviour, which I'm hoping you could help me with. Suppose I had set arr_device.dim1 = 2 on the host side. While debugging inside the kernel and setting a breakpoint at one of the for loops, checking the value of arr_device.dim1 yields something like 16776576, nowhere large enough to cause overflow, but this value copies correctly into dim1 as 2, which means that the for loops execute as I intended them to. As a side question, is using size_t which is essential unsigned long long int bad practice, seeing as the GPU's are made of 32bit cores?
Generally, how safe is it to pass struct and class into kernels as arguments, is bad practice that should be avoided at all cost? I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory, and that they should be very lightweight if I want to pass them by value.
This is a partial answer, since without a proper program to look into, it is difficult/impossible to guess why you would see an invalid value in your arr_device.dim1.
The struct is passed by copy, so all its contents are copied into shared memory of each block.
Incorrect. Kernel arguments are stored in constant memory, which is device-global and not block-specific. They are not stored shared memory (which is block-specific).
When a thread runs, it typically reads arguments from constant memory into registers (and again, not shared memory).
Generally, how safe is it to pass struct and class into kernels as arguments
My personal rule of thumb on this matter is: If the struct/class...
is trivially-copyable; and
all its members of the struct/class are defined both for the host and the device side, or at least - designed with GPU use in mind;
then it should be safe to pass to a kernel.
passing struct and class into kernels as arguments [ - ] is [it] bad practice that should be avoided at all cost?
No. But remember that most C++ libraries only provide host-side code; and were not written with a mind of being used on a GPU. So I'd be wary of using non-trivial classes without a lot of scrutiny.
I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory
Yes, this can be problematic. However - if you used cuda::memory::managed::allocate(), cuda::memory::managed::make_unique() or cudaMallocManaged() - then this should "just work", i.e. the relevant memory pages will be fetched to the GPU or the CPU as necessary when accessed. See:
Unified Memory in CUDA for beginners
Beyond GPU Memory Limits with Unified Memory on Pascal
and that they should be very lightweight if I want to pass [objects to kernels] by value.
Yes, because each and every thread has to read each argument from constant memory before it can use that argument. And while constant memory allows this to happen relatively quickly, it's still a bunch of overhead that you want to minimize.
Also remember that you can't pass anything to kernels by (C++) reference; it's all "by-value" - the object itself or a pointer to it.

Usage of VAR RETAIN PERSISTENT

I'm using WAGO PLC PFC200 in my home automation project. I've plenty of POUs, each for one room. Each room implements IRoom interface and uses base POU for common logic like turning off all lights.
For lights management, I'm using
FbEvaluateShortLongPress from WagoAppBuilding to handle short and long press of buttons on the wall (it could also be a function block from OSCAT library)
FbLatchingRelay from WagoAppBuilding as a toggle for PLC digital output
I want to save the state of FbLatchingRelay in case of e.g.: power drop. I want all lights which were turned off before power drop to be turned on when the power comes back.
I've solved it by declaring a FbLatchingRelay in the VAR RETAIN PERSISTENT area in my POU. But then after reading here that:
If you declare a local variable in a function block as RETAIN, CODESYS stores the complete instance of this function block in the Retain range (all data of the function block); however, only the declared RETAIN variable is treated as such.
I decided to change that in order not to waste RETAIN memory for a bunch of variables which are in POU but are not needed to be stored as RETAIN.
So right now I have something like that:
the VAR RETAIN PERSISTENT area is only declared in my main program
it stores structures for each room (each POU) only with needed data - FbLatchingRelay POU and few other variables
while initializing the room (POU) I'm passing those structures to my rooms using VAR_IN_OUT
each room (POU) uses then this data
PLC_PRG:
VAR RETAIN PERSISTENT
BathroomPersistentData: BathroomData;
END_VAR
Bathroom(PersistData := BathroomPersistentData, xMainLightSwitch := DI1_13, xMirrorLightSwitch := DI2_3, xMirrorLightSwitchActuator => DO2_1, xMainLightSwitchActuator => DO1_11);
Bathroom POU:
VAR_IN_OUT
PersistData: BathroomData;
END_VAR
Is this a good approach? What do you think? It complicates project a bit but I'm not wasting RETAIN memory for things which should not be there (whole POUs).
Yes, this is how my organization handles retain vars. This also lends itself to supporting “save to disk” solutions for other FB demands (not so much for your light states).
On the other hand, did you run out of memory by the original way? Sometimes I find we worry about things that never happen. Yes it is “wasteful” for the whole FB instance to be put in retain memory, but if your FBs are small and your device has plenty of retain memory - then nothing to worry about until later.

Loading Variables into Functions from Structs in Matlab

Say I have a project which is comprised of:
A main script that handles all of the running of my simulation
Several smaller functions
A couple of structs containing the data
Within the script I will be accessing the functions many times within for loops (some over a thousand times within the minute long simulation). Each function is also looking for data contained with a struct files as part of their calculations, which are usually parameters that are fixed over the course of the simulation, however need to be varied manually between runs to observe the effects.
As typically these functions form the bulk of the runtime I'm trying to save time, as my simulation can't quite run at real-time as it stands (the ultimate goal), and I lose alot of time passing variables/parameters around functions. So I've had three ideas to try and do this:
Load the structs in the main simulation, then pass each variable in turn to the function in the form of a large argument (the current solution).
Load the structs every time the function is called.
Define the structs as global variables.
In terms of both the efficiency of the system (most relevent as the project develops), and possibly as I'm no expert programmer from a "good practise" perspective what is the best solution for this? Is there another option that I have not considered?
As mentioned above in the comments - the 1st item is best one.
Have you used the profiler to find out where you code takes most of its time?
profile on
% run your code
profile viewer
Note: if you are modifying your input struct in your child functions -> this will take more time, but if you are just referencing them then that should not be a problem.
Matlab does what's known as a "lazy copy" when passing arguments between functions. This means that it passes a pointer to the data to the function, rather than creating a new instance of that data, which is very efficient memory- and speed-wise. However, if you make any alteration to that data inside the subroutine, then it has to make a new instance of that argument so as to not overwrite the argument's value in the main function. Your response to matlabgui indicates you're doing just that. So, the subroutine may be making an entire new struct every time it's called, even though it's only modifying a small part of that struct's values.
If your subroutine is altering a small part of the array, then your best bet is to just pass that small part to it, then assign your outputs. For instance,
[modified_array] = somesubroutine(struct.original_array);
struct.original_array=modified_array;
You can also do this in just one line. Conceptually, the less data you pass to the subroutine, the smaller the memory footprint is. I'd also recommend reading up on in-place operations, as it relates to this.
Also, as a general rule, don't use global variables in Matlab. I have not personally experienced, nor read of an instance in which they were genuinely faster.