Numba apply mapping to ndarray - numba

Very simple question, which I have looked for but haven't found a clear answer. I would like to efficiently apply a mapping to an input ndarray, remapping each element in the array and returning the modified array.
A simple version using numpy would be something like:
def remap(a, mapping):
return np.vectorize(mapping.__getitem__)(a)
I am pretty sure I can't use Numba vectorize but was hoping to be able to use guvectorize. But it looks like I can't pass a Numba TypedDict into a guvectorize function either. Any suggestions welcome

If you know the number of dimentions of the target array, then you just need a simple loop. AFAIK, guvectorize does not support this use-case yet. However, you can solve this using a simple reshape.
import numba as nb
#nb.njit
def remap(a, mapping):
# Linearize the array (can cause a copy of `a`)
b = a.reshape(-1)
for i in range(b.size):
b[i] = mapping[b[i]]
return b.reshape(a.shape)
Note that the first reshape can cause a copy of a if a is not contiguously store in memory and Numpy cannot find a unified stride in 1D. Thus, the returned array may be a copy of a. The second reshape is guaranteed not to copy the array. If you want to always return a copy of the input array (no mutation) like most Numpy function, then you can use flatten instead of the first reshape call.

Related

How to remove brackets from a python array?

I have an array like this:
>>>Opt
array([[array([[0.5]])]], dtype=object)
How to remove these brackets and get the value of 0.5 as a single floating point?
I have tried
>>>np.array(Opt)
array([[array([[0.5]])]], dtype=object)
>>>Opt.ravel()
array([[array([[0.5]])]], dtype=object)
>>>Opt.flatten()
array([[array([[0.5]])]], dtype=object)
None of these works. Is it because of the data type "object"?
That is a 4-dimensional numpy array you defined, so in this instance to basic way to get the number you have to navigate the four dimensions:
import numpy as np
the_array = np.array([[np.array([[0.5]])]], dtype=object)
print(the_array[0][0][0][0])
Output:
0.5
I don't know what you want to do with this array, based on your use case there may be better approaches to your problem.
dtype=object means that you defined an array of pointers to Python objects, this defines both the way memory is managed when allocating space for the array and the permitted operations on the elements.
I find out the best way to do it is using item().
Opt.item().item()

Matlab GPU use with functions that take arguments of different dimensions

I am trying to use parallel computing with GPU in Matlab, and I would like to apply a function to a large array (to avoid the use of a for loop, which is quite slow). I have read Matlab's documentation, and I can use arrayfun, but only if I want to do elementwise operations. Maybe I am confused, but I would appreciate if someone can help me to use it. As an example of what I want to do, imagine that I would like to perform the following operation,
$X_t = B Z_t + Q\varepsilon_t$
where $X_t$ is 2x1, $B$ is 2x5, and $Z_t$ is 5x1, with $Q$ 2x2. I define a function,
function X = propose(Z,B,Q)
X=Z*B+Q*rand(2,1);
end
Now, suppose that I have an array $Z_p$ which is 5x1000. To each of the 1000 columns I would like to apply the previous function, for given matrices $B$ and $Q$, to get an array $X_p$ which is 2x1000.
Given the documentation for arrayfun I can not do this,
X=arrayfun(#propose,Zp,B,Q)
So, is there any possibility to do it?
Thanks!
PS: Yes, I know that in this simple example I can just do the multiplication without a for loop, but the application I have in mind is more complicated and I cannot do it. I just put this example as an illustration.

Return parameter passing in Julia

If a Julia function returns an array, is the reference returned or a copy?
function pass(A::Matrix)
return A
end
A real example is reshape:
reshape(A, dims)
Create an array with the same data as the given array, but with different dimensions. An implementation for a particular type of array may choose whether the data is copied or shared.
How does the implementation determine whether data is copied or shared?
The pass function above returns by reference, http://julia.readthedocs.org/en/latest/manual/arrays/ .
There is a bit more to the reshape example.
For full arrays the reshaped array is a new array object that shares the same data. But keep in mind that there are plenty of specialized array types. The docs warn you not to rely on that because for example for a future implementation of immutable fixed sized arrays a different reshape mechanism could be used.

Vectorizing scalars/vector division

If for example I have:
Q1=4;
Q2=5;
PG=2:60
A1=Q1./sqrt(PG);
A2=Q2./sqrt(PG);
plot(PG,A1)
plot(PG,A2)
can I do sth like : ?
Q=[Q1,Q2];
A=Q./sqrt(PG);
plot(PG,A(1))
plot(PG,A(2))
or sth to avoid the A1 and A2?
A=bsxfun(#rdivide,[Q1;Q2],sqrt(PG)) will do (note the semicolon, not comma, between Q1 and Q2), but if the code in the question is your use case and you ever want anyone else to read and understand the code, I'd advise against using it.
You have to address the rows of A using A(1,:) and A(2,:) (no matter how you get to A), but you probably want to plot(PG,A) anyway.
[edit after first comment:]
rdivide is simply the name of the function usually denoted ./ in MATLAB code, applicable to arrays of the same size or a scalar and an array. bsxfun will simply apply a two-argument function to the other two arguments supplied to it in a way it considers best-fitting (to simplify a bit). arrayfun does something similar: applying a function to all elements of one array. To apply here, one would need a function having PG hard-coded inside.

In-Place Quicksort in matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).
My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.
I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?
Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.
Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.
%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(#(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(#(s)s.foo, sortedC) %// for human readability of results
If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.
%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
keyCol = keyCols{i};
key(:,i) = cellfun(#(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(#(s)s.foo, sortedC)
One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.
Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.
In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.
When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.
function y = myFunc(x)
x = x .* 2; %# pass-by-value, changes only visible inside function
y = x;
end
This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.
function y = myFunc(x)
%# x was never changed, thus passed-by-reference avoiding making a copy
y = x .* 2;
end
Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.
In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.
function x = myFunc(x)
x = x.*2;
end
Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.
In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...
For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.