pybind11 return std::vector reference pointer only to python - pybind11

I have written several c++ functions to process large LIDAR point cloud data. Most of these functions produces large std::vector arrays which I need to pass to python. In python the reference of these arrays would be passed to PyOpenGL functions (void pointer parameters). Neither do I manipulate these arrays in python nor I would be sending them back to c++.
py::dict points_to_mesh(<parameters>) {
Mesh mesh = generateMesh(<parameters>);
auto d1 = py::dict(
"width"_a=mesh.width,
"varray"_a=py::array_t<float>(py::cast(mesh.vn, py::return_value_policy::reference)),
"iarray"_a=py::array_t<unsigned int>(py::cast(mesh.indices, py::return_value_policy::reference)));
return d1;
}
The above code is working fine and I'm getting numpy arrays in python but I'm not sure about the approach as it's moving the vector data or copying the actual data? If it's possible to get the vector array as numpy array in python then I'm ok with it provided no copying should be involved. If not, then how do I pass just the reference pointer back to python?
I have found this article and changed the code.
template< typename T >
py::array
array_from_vector(std::vector<T> & m) {
if (m.empty()) return py::array_t<T>();
std::vector<T>* ptr = new std::vector<T>(std::move(m));
auto capsule = py::capsule(ptr, [](void* p) {
delete reinterpret_cast<std::vector<T>*>(p);
});
return py::array_t<T>(
{ptr->size(), ptr->at(0).size()}, // shape of array
{ptr->at(0).size()*sizeof(T), sizeof(T)}, // c-style contiguous strides
capsule);
}
The code compiles fine but when I use it
"varray"_a=py::array_from_vector(mesh.vn),
I'm getting an error stating:
error: no matching function for call to 'pybind11::array_t<float>::array_t(<brace-enclosed initializer list>, <brace-enclosed initializer list>, pybind11::capsule&)'
23 | return py::array_t<T>(
| ^~~~~~~~~~~
24 | {ptr->size(), ptr->at(0).size()}, // shape of array
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
25 | {ptr->at(0).size()*sizeof(T), sizeof(T)}, // c-style contiguous strides
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26 | capsule);
| ~~~~~~~~
Another concern involves ownership and life time of data. I'm generating data in plain c++ functions and passing it back to python. Is it possible to solve the requirements without using classes?
Cheers :-)

Related

How to access a collection of heterogenous functions randomly?

I am implementing an evolutionary algorithm where I have a numerical genetic encoding (0-n). Where each number from 0 to n represents a function. I have implemented a numpy version where it is possible to do the following. The actual implementation is a bit more complicated but this snippet captures the core functionality.
n = 3
max_ops = 10
# Generate randomly generated args and OPs
for i in range(number_of_iterations):
args = np.random.randint(min_val_arg, max_val_arg, size=(arg_count, arg_shape[0], arg_shape[1])
gene_of_operations = np.random.randint(0,n,size=(max_ops))
# A collection of OP encodings and operations. Doesn't need to be a dict.
dict_of_n_OPs = {
0:np.add,
1:np.multiply,
2:np.diff
}
#njit
def execute_genome(gene_of_operations, args, dict_of_n_OPs):
result = 0
for op, arg in zip(gene_of_operations,args)
result+= op(arg)
return result
## executing the gene
execute_genome(gene_of_operations, args, dict_of_n_OPs)
print(results)
Now when adding the njit decorator expects a statically typed function. Where heterogenously typed collections such as my dict_of_n_OPs are not supported, I have tried rendering it as a numpy array, numba.typed.Dict, numba.typed.List. But discovered none supports heteregoenous types.
What would be a numba compliant approach that allows for executing different functions based on a numerical encoding such as '00201'. Where number 0 would execute function 0?
Is the only way an n line if else statement for n unique operations/functions?

Fast iteration over unicode string Cython

I have the following cython function.
01:
+02: cdef int count_char_in_x(unicode x,Py_UCS4 c):
03: cdef:
+04: int count = 0
05: Py_UCS4 x_k
06:
+07: for x_k in x: ## Yellow
+08: if x_k == c:
+09: count+=1
10:
+11: return count
Line 07 is not properly optimized.
The annotated HTML code is expanded as:
+07: for x_k in x: ## Yellow
if (unlikely(__pyx_v_x == Py_None)) {
PyErr_SetString(PyExc_TypeError, "'NoneType' is not iterable");
__PYX_ERR(0, 8, __pyx_L1_error)
}
__Pyx_INCREF(__pyx_v_x);
__pyx_t_1 = __pyx_v_x;
__pyx_t_6 = __Pyx_init_unicode_iteration(__pyx_t_1, (&__pyx_t_3), (&__pyx_t_4), (&__pyx_t_5)); if (unlikely(__pyx_t_6 == ((int)-1))) __PYX_ERR(0, 8, __pyx_L1_error)
for (__pyx_t_7 = 0; __pyx_t_7 < __pyx_t_3; __pyx_t_7++) {
__pyx_t_2 = __pyx_t_7;
__pyx_v_x_k = __Pyx_PyUnicode_READ(__pyx_t_5, __pyx_t_4, __pyx_t_2);
Any tips on how could this be improved?
I think it is possible to write a cdef/cpdef function that at runtime completly avoids Python None type checks. Any idea on how this could be done?
The generated C code looks pretty good to me. The loop overall is a int-iterated for loop (i.e. it's not relying on calling the Python methods __iter__ and __next__).
__Pyx_PyUnicode_READ is translated pretty directly to PyUnicode_READ (depending slightly on the Python version you're using). PyUnicode_READ is a C macro which is as close to a direct array access as you can get.
This is probably as good as it's getting. You might get a small improvement by using bytes rather than unicode (provided you're dealing with ASCII characters). You might just consider whether it's really worth reimplementing unicode.count.
If it were a regular def function you could declare x as unicode not None to remove the None check before the loop. That might make a small difference. However, as #ead points out that isn't supported for cdef functions. It's likely the cost of a def function call will be slightly larger than the cost of a None-check, but you should time it if you care.

System Verilog to Specman E

What is the equivalent syntax in Specman E for $readmemh(file,array) and similar system tasks and functions in System verilog?
I am working in converting the existing System verilog code into Specman E ,I have converted and implemented most of the concepts except few system methods like below .Please help me to implement methods like below in Specman E.
$readmemh(file_s,data_2d_i);//For converting SV code into Specman E
In the vr_ad Package there is an equivalent method. Assuming you have a vr_ad_mem object called data_2d_i, you can e.g. call
data_2d_i.readmemh(file_s,0,1000,0,1000);
To read addresses 0..1000 from that file into memory.
Example:
import vr_ad/e/vr_ad_top;
extend sys {
mem: vr_ad_mem;
keep mem.addressing_width_in_bytes == 1;
keep mem.size == 1000;
run() is also {
var data_2d_l: list of byte;
-- read first 16 bytes of mem-file and store the result in a list
mem.readmemh("mem.txt", 0, 15, 0, 15);
data_2d_l = mem.fetch(0, 16);
print data_2d_l;
};
};

Can operations on a numpy.memmap be deferred?

Consider this example:
import numpy as np
a = np.array(1)
np.save("a.npy", a)
a = np.load("a.npy", mmap_mode='r')
print(type(a))
b = a + 2
print(type(b))
which outputs
<class 'numpy.core.memmap.memmap'>
<class 'numpy.int32'>
So it seems that b is not a memmap any more, and I assume that this forces numpy to read the whole a.npy, defeating the purpose of the memmap. Hence my question, can operations on memmaps be deferred until access time?
I believe subclassing ndarray or memmap could work, but don't feel confident enough about my Python skills to try it.
Here is an extended example showing my problem:
import numpy as np
# create 8 GB file
# np.save("memmap.npy", np.empty([1000000000]))
# I want to print the first value using f and memmaps
def f(value):
print(value[1])
# this is fast: f receives a memmap
a = np.load("memmap.npy", mmap_mode='r')
print("a = ")
f(a)
# this is slow: b has to be read completely; converted into an array
b = np.load("memmap.npy", mmap_mode='r')
print("b + 1 = ")
f(b + 1)
Here's a simple example of an ndarray subclass that defers operations on it until a specific element is requested by indexing.
I'm including this to show that it can be done, but it almost certainly will fail in novel and unexpected ways, and require substantial work to make it usable.
For a very specific case it may be easier than redesigning your code to solve the problem in a better way.
I'd recommend reading over these examples from the docs to help understand how it works.
import numpy as np
class Defered(np.ndarray):
"""
An array class that deferrs calculations applied to it, only
calculating them when an index is requested
"""
def __new__(cls, arr):
arr = np.asanyarray(arr).view(cls)
arr.toApply = []
return arr
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
## Convert all arguments to ndarray, otherwise arguments
# of type Defered will cause infinite recursion
# also store self as None, to be replaced later on
newinputs = []
for i in inputs:
if i is self:
newinputs.append(None)
elif isinstance(i, np.ndarray):
newinputs.append(i.view(np.ndarray))
else:
newinputs.append(i)
## Store function to apply and necessary arguments
self.toApply.append((ufunc, method, newinputs, kwargs))
return self
def __getitem__(self, idx):
## Get index and convert to regular array
sub = self.view(np.ndarray).__getitem__(idx)
## Apply stored actions
for ufunc, method, inputs, kwargs in self.toApply:
inputs = [i if i is not None else sub for i in inputs]
sub = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
return sub
This will fail if modifications are made to it that don't use numpy's universal functions. For instance percentile and median aren't based on ufuncs, and would end up loading the entire array. Likewise, if you pass it to a function that iterates over the array, or applies an index to substantial amounts the entire array will be loaded.
This is just how python works. By default numpy operations return a new array, so b never exists as a memmap - it is created when + is called on a.
There's a couple of ways to work around this. The simplest is to do all operations in place,
a += 1
This requires loading the memory mapped array for reading and writing,
a = np.load("a.npy", mmap_mode='r+')
Of course this isn't any good if you don't want to overwrite your original array.
In this case you need to specify that b should be memmapped.
b = np.memmap("b.npy", mmap+mode='w+', dtype=a.dtype, shape=a.shape)
Assigning can be done by using the out keyword provided by numpy ufuncs.
np.add(a, 2, out=b)

objective-c variable length array global scope

is it possible to declare a variable length array with global scope in objective-c?
I'm making a game with a world class, which initializes the world map as a three dimensional integer array. while it's only a two dimensional side scroller, the third dimension of the list states which kinda of block goes at the coordinate given by the first two dimensions
after the initialization function, a method nextFrame: is scheduled (I'm using cocos2d and the CCDirector schedule method). I was wondering how to pass the int[][][] map array from the initialization function to the nextFrame function
I tried using global (static keyword) declaration, but got an error saying that global arrays cannot be variable length
the actual line of code I'm referring to is:
int map[xmax][ymax][3];
where xmax and ymax are the farthest x and y coordinates in the list of coordinates that defines the stage.
I'd like to somehow pass them to nextFrame:, which is scheduled in
[self schedule:#selector(nextFrame:)];
I realize I can use NSMutableArray, but NSMutableArray is kinda a headache for 3-dimensional lists of integers (I have to use wrapper numbers for everything...). is there any way to do this with integer arrays?
You can't have a statically allocated global array of dynamic dimensions in C (of which Objective C is a clean superset). But you can use a global array of any length or size (up to available memory) at runtime by using a global pointer, malloc, and array indexing arithmetic.
static int *map = NULL;
...
map = malloc (dim1 * dim2 * dim3 * sizeof(int)); // in some initialization method
if (map == NULL) { /* handle error */ } // before first array access
...
myElement = map[ index3 + dim2 * ( index2 + dim1 * index1 ) ]; // some macro might be suitable here
Or you could make Objective C getter and setter methods that checks the array and array bounds on every access, since a method can return plain C data types.
Another option, if you know the max dimensions you want to have available and are willing to use (waste) that amount of memory, is to just statically allocate the max array, and throw an exception if the program tries to set up something larger than your allowed max.
I tried using global (static keyword)
declaration, but got an error saying
that global arrays cannot be variable
length
But global array pointers can point to arrays of variable length.