In-Place Quicksort in matlab - matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).
My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.
I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?

Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.
Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.
%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(#(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(#(s)s.foo, sortedC) %// for human readability of results
If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.
%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
keyCol = keyCols{i};
key(:,i) = cellfun(#(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(#(s)s.foo, sortedC)
One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.
Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.

In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.
When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.
function y = myFunc(x)
x = x .* 2; %# pass-by-value, changes only visible inside function
y = x;
end
This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.
function y = myFunc(x)
%# x was never changed, thus passed-by-reference avoiding making a copy
y = x .* 2;
end
Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.
In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.
function x = myFunc(x)
x = x.*2;
end
Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.
In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...
For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.

Related

Simulink: Use Enumeration As Index

I feel like this is something that'd be absurdly easy in C# but is impossible in Simulink. I am trying to use an enumerated value as an array index. The trick is: I have an array that is sized for the number of elements in the enumeration, but their values are non-contiguous. So I want the defined enumeration and Simulink code to read the value at A(4). Obviously, it will instead read A(999). Any way to get the behavior I'm looking for?
classdef Example < Simulink.IntEnumType
enumeration
value1 (1)
value2 (2)
value13 (13)
value999 (999)
end
end
// Below in Simulink; reputation is not good enough to post images.
A = Data Store Memory
A.InitialValue = uint16(zeros(1, length(enumeration('Example'))))
// Do a Data Store Read with Indexing enabled; Index Option = Index vector (dialog)
A(Example.value999)
After a weekend of experimentation, I came up with a working solution: using a Simulink Function to call a MATLAB function that searches for the correct index using the "find" command. In my particular instance, I was assigning the data to Data Store Memory, so I was able to just pass the enumeration index and a new value to these blocks, but you could just as easily have a single input block that spits out the requested index. (My reputation is still too low to post pictures, so hopefully my textual descriptions will suffice.)
Data Store Memory 'A': Data type = uint16, Dimensions = length(enumeration('RegisterList'))
Simulink Function: SetValueA(ExampleEnum, NewValue)
--> MATLAB Function: SetA_Val(ExampleEnum, NewValue)
--> function SetModbusRegister(RegisterListEnum, NewValue)
global A;
if(isa(ExampleEnum, 'Example'))
A(find(enumeration('Example') == ExampleEnum, 1)) = NewValue;
end
From there, you use the Function Caller blocks in Simulink with the "Function prototype" filled in with "SetValueA(ExampleEnum,NewValue)" anywhere you wish to set this data. The logic would get more complicated if you wished to use vectors and write multiple values at once, but this is at least a starting point. It should just be a matter of modifying the Simulink and MATLAB functions to allow vector inputs and looping through those inputs in the MATLAB function.
EDIT 1
Slight update: If your MATLAB function is set up such that you cannot use variable-length vectors in it, just replace the "find" function with the "ismember" function. Using a scalar in ismember always returns a scalar, and the MATLAB compiler won't complain about it.

Does MATLAB have any set-like datatype?

I am looking for a way to compare finite sequential data with non-deterministic ordering in MATLAB. Basically, what I want is an array, but without imposing an order on the contained elements. If I have the objects
a = [x y z];
and
b = [x z y];
I'd want isequal(a, b) to return true. With arrays, this is not the case. The easy fix would be to sort the entries before comparing them. Unfortunately, in my case the elements are complex objects which cannot easily be mapped to have an unambigious numerical relationship to each other. Another approach would be not to use isequal, but rather a custom comparison function which asserts matching lengths and then simply checks if each element from the first array is contained in the second one. However, in my case the arrays are non-trivially nested inside the structs I am trying to compare via isequal, and it would be quite complicated to write a custom comparison function for the encapsulating structs. Other than this ordering problem, the inbuilt isequal function covers all of my needs, as it correctly handles arbitrarily nested structs with arbitrary fields, so I would really like to avoid writing a complicated custom function for that.
Is there any datatype in MATLAB which allows for the described behavior? Or is there a way to easily build such a custom type? In Java, I could simply write a wrapper class with a custom implementation for the equals method, but there seems to be no such mechanism in MATLAB?
I've found a way to solve my problem elegantly. Contrary to my previously stated belief, MATLAB actually does allow for class-specific overriding of isequal.
classdef CustomType
properties
value
end
methods
function self = CustomType(value)
self.value = value;
end
function equal = isequal(self, other)
if not(isa(other, 'CustomType'))
equal = false;
return;
end
% implement custom comparison rules here
end
end
end
So, I can simply assign the fields in question like this and don't have to change anything else in my code:
a = Set([x y z]); % custom type
...
b = Set([x z y]);
...
isequal(a, b); % true
In my use case, I don't even need the uniqueness property of sets. So I only have to perform order-independent comparison and don't need to waste performance on ensuring unrequired properties. Furthermore, by using a dedicated type, I can differentiate explicitly between fields which have order (i.e. regular arrays) and those which don't, at the moment of assignment.
Another solution might be to overwrite the inbuilt isequal and make it apply custom comparison rules when its arguments are of specific type. However, this would slow down all comparisons in the whole program and make for bad encapsulation. I feel like using a custom type with an overriden isequal is the way to solve this kind of problem. But I still think that sets (and other types of commonly used containers) should be included in the basic repertoire of MATLAB.

Matlab: Randomly select from "slowly varying" index set

I would like to find or implement a Matlab data structure that allows me to efficiently do the following three things:
Retrieve an element uniformly at random.
Add a new element.
Delete an element. (If it helps, this element was just "retrieved" out of the structure, so I can use both its location and its value to delete it).
Since I don't need duplicates, this structure is mathematically equivalent to a set. Also, my elements are always integers in the range 1 to 2500; it is not unusual for the set to be this entire range.
What is such a data structure? I've thought of using something like containers.Map or java.util.HashSet, but I don't know how to satisfy the first requirement in this case, because I don't know how to efficiently retrieve the nth key of such a structure. An ordinary array can achieve the first requirement of course, but it is a bad choice for the second and third requirements because of inefficient resizing.
For some context for why I'm looking to do this, in some current code I spent about 1/4 of the runtime doing:
find(x>0,Inf)
and then randomly retrieving an element from this vector. Yet this vector changes very little, and in a very predictable manner, in each iteration of my program. So I would prefer to carry around a data structure and update it as I go rather than recomputing it every time.
If you're familiar with Haskell, one way to implement the operations I'm looking to support would be
randomSelect set = fmap (\n -> elemAt n set) $ randomRIO (0,size set-1)
along with insert and delete, from Data.Set. But I have other reasons not to use Haskell in this project, and I don't know how to implement the backend of Data.Set myself.
Frequently, the best way to decrease time complexity is to increase space complexity. Given that your sets are going to be rather small, we can probably afford to use a little extra space.
To contain the set itself, you can use a preallocated array:
maxSize = 2500;
theSet = zeros(1, maxSize); % set elements
setCount = 0; % number of set elements
You can then have an auxiliary array to check for set membership:
isMember = zeros(1, maxSize);
To insert a new element newval into the set, add it to the end of theSet and increment the count (assuming there's room):
if ~isMember(newval)
assert(setCount < maxSize, 'Too many elements in set.');
theSet(++setCount) = newval;
isMember(newval) = 1;
else
% tried to add duplicate element... do something here
end
To delete an element by index delidx, swap the element to be deleted and the last element and decrement the count:
assert(delidx <= setCount, 'Tried to remove element beyond end of set.');
isMember(theSet(delidx)) = 0;
theSet(delidx) = theSet(setCount--);
Getting a random element of the set is then simple, just:
randidx = randi(setCount);
randelem = theSet(randidx);
All operations are O(1) and the only real disadvantage is that we have to carry along two arrays of size maxCount. Because of that you probably don't want to put these operations in functions as you'd end up creating new arrays on every function call. You'd be better off putting them inline or, better yet, wrapping them in a nice class.

Replace values in an array in matlab without changing the original array

My question is that given an array A, how can you give another array identical to A except changing all negatives to 0 (without changing values in A)?
My way to do this is:
B = A;
B(B<0)=0
Is there any one-line command to do this and also not requiring to create another copy of A?
While this particular problem does happen to have a one-liner solution, e.g. as pointed out by Luis and Ian's suggestions, in general if you want a copy of a matrix with some operation performed on it, then the way to do it is exactly how you did it. Matlab doesn't allow chained operations or compound expressions, so you generally have no choice but to assign to a temporary variable in this manner.
However, if it makes you feel better, B=A is efficient as it will not result in any new allocated memory, unless / until B or A change later on. In other words, before the B(B<0)=0 step, B is simply a reference to A and takes no extra memory. This is just how matlab works under the hood to ensure no memory is wasted on simple aliases.
PS. There is nothing efficient about one-liners per se; in fact, you should avoid them if they lead to obscure code. It's better to have things defined over multiple lines if it makes the logic and intent of the algorithm clearer.
e.g, this is also a valid one-liner that solves your problem:
B = subsasgn(A, substruct('()',{A<0}), 0)
This is in fact the literal answer to your question (i.e. this is pretty much code that matlab will call under the hood for your commands). But is this clearer, more elegant code just because it's a one-liner? No, right?
Try
B = A.*(A>=0)
Explanation:
A>=0 - create matrix where each element is 1 if >= 0, 0 otherwise
A.*(A>=0) - multiply element-wise
B = A.*(A>=0) - Assign the above to B.

Vectorize matlab code to map nearest values in two arrays

I have two lists of timestamps and I'm trying to create a map between them that uses the imu_ts as the true time and tries to find the nearest vicon_ts value to it. The output is a 3xd matrix where the first row is the imu_ts index, the third row is the unix time at that index, and the second row is the index of the closest vicon_ts value above the timestamp in the same column.
Here's my code so far and it works, but it's really slow. I'm not sure how to vectorize it.
function tmap = sync_times(imu_ts, vicon_ts)
tstart = max(vicon_ts(1), imu_ts(1));
tstop = min(vicon_ts(end), imu_ts(end));
%trim imu data to
tmap(1,:) = find(imu_ts >= tstart & imu_ts <= tstop);
tmap(3,:) = imu_ts(tmap(1,:));%Use imu_ts as ground truth
%Find nearest indecies in vicon data and map
vic_t = 1;
for i = 1:size(tmap,2)
%
while(vicon_ts(vic_t) < tmap(3,i))
vic_t = vic_t + 1;
end
tmap(2,i) = vic_t;
end
The timestamps are already sorted in ascending order, so this is essentially an O(n) operation but because it's looped it runs slowly. Any vectorized ways to do the same thing?
Edit
It appears to be running faster than I expected or first measured, so this is no longer a critical issue. But I would be interested to see if there are any good solutions to this problem.
Have a look at knnsearch in MATLAB. Use cityblock distance and also put an additional constraint that the data point in vicon_ts should be less than its neighbour in imu_ts. If it is not then take the next index. This is required because cityblock takes absolute distance. Another option (and preferred) is to write your custom distance function.
I believe that your current method is sound, and I would not try and vectorize any further. Vectorization can actually be harmful when you are trying to optimize some inner loops, especially when you know more about the context of your data (e.g. it is sorted) than the Mathworks engineers can know.
Things that I typically look for when I need to optimize some piece of code liek this are:
All arrays are pre-allocated (this is the biggest driver of performance)
Fast inner loops use simple code (Matlab does pretty effective JIT on basic commands, but must interpret others.)
Take advantage of any special data features that you have, e.g. use sort appropriate algorithms and early exit conditions from some loops.
You're already doing all this. I recommend no change.
A good start might be to get rid of the while, try something like:
for i = 1:size(tmap,2)
C = max(0,tmap(3,:)-vicon_ts(i));
tmap(2,i) = find(C==min(C));
end