Accumulating votes in MATLAB - matlab

First, a little background to my problem:
I am building an object recognition system using a geometric hashing technique. My hash table is indexed by the affine co-ordinates of points in a model determined by a basis triplet (allowing an affine invariant representation of any learned object). Each hash table entry is a structure :
entry = struct('ModelName', modelName, 'BasisTriplet', [a; b; c])];
Now, an arbitrary basis triplet is extracted from image points then the affine co-ordinates of all other points are calculated relative to this basis and used as indices to the hash table. For each entry that exists in this hash bin, a vote is cast for the modelName and basis triplet.
After checking all points, the models and their corresponding basis triplets with a sufficiently high number of votes are taken as candidates for an object and a further verification step is performed.
I am unsure however what is the most efficient method of casting these votes. Currently I am using a dynamic cell array, each time a new model and basis triplet pair is voted for, an additional row is added to the array. Otherwise the vote count of an existing candidate is incremented.
for keylist = 1:length(keylist)
% Where keylist is an array of indicies to the relevant keys to look up
% xkeys is the n by 2 array of all of the keys in the hash table
% Obtain this hash bin
bin = hashTable(xkeys(keylist(i), 1), xkeys(keylist(i), 2));
% Vote for every entry in the bin
for entry = 1:length(bin)
% Find the index of this model/basis in the voting accumulator
indAcc = find( strcmp(bin.ModelName, v_models(:, 1)) & myIsEqual(v_basisTriplets, bin.BasisTriplet) );
if isempty(indAcc)
% If entries do not exist yet, Add new entries
v_models = [v_models; {bin.ModelName, 1}];
v_basisTriplets = cat(3, v_basisTriplets, bin.BasisTriplet);
else
% Otherwise increment the count
v_models(indAcc, 2) = v_models(indAcc, 2)+1;
end
end
end
There is a separate 3D array (v_basisTriplets) in which the 2D basis array is concatenated and indexed along the 3rd dimension. I did have these basis triplets in the cell array also, however I had difficulty searching this cell array for a 2D array. The myIsEqual function just searches through the third dimension and checks if the 2D array at each index is equal, returning a 1D vector of which arrays are equal for use in the find.
function ind = myIsEqual(vec3D, A)
ind = zeros(size(vec3D, 3), 1);
for i = 1:size(vec3D, 3)
ind(i) = isequal(vec3D(:, :, i), A);
end
This is most certainly not the most efficient way. Immediately I can see that it would be more efficient to initialize the arrays to store the votes beforehand. However however is there a better way in general of going about this? I need to try and find the most efficient and elegant way of voting as there are usually hundreds of points to check and time is valuable.
Thanks

If you are only considering time efficiency, consider using a 4d matrix.
The dimensions would be:
Model
coordinateA
coordinateB
coordinateC
Depending on the ratio between this matrix size and the amount of points that you check, consider using a sparse matrix.
Note that especially if you can't use a sparse array, this method can be rather memory inefficient and may therefore be infeasible.

Related

Build a matrix starting from instances of structure fields in MATLAB

I'm really sorry to bother so I hope it is not a silly or repetitive question.
I have been scraping a website, saving the results as a collection in MongoDB, exporting it as a JSON file and importing it in MATLAB.
At the end of the story I obtained a struct object organised
like this one in the picture.
What I'm interested in are the two last cell arrays (which can be easily converted to string arrays with string()). The first cell array is a collection of keys (think unique products) and the second cell array is a collection of values (think prices), like a dictionary. Each field is an instance of possible values for a set of this keys (think daily prices). My goal is to build a matrix made like this:
KEYS VALUES_OF_FIELD_1 VALUES_OF_FIELD2 ... VALUES_OF_FIELDn
A x x x
B x z NaN
C z x y
D NaN y x
E y x z
The main problem is that, as shown in the image and as I tried to explain in the example matrix, I don't always have a value for all the keys in every field (as you can see sometimes they are 321, other times 319 or 320 or 317) and so the key is missing from the first array. In that case I should fill the missing value with a NaN. The keys can be ordered alphabetically and are all unique.
What would you think would be the best and most scalable way to approach this problem in MATLAB?
Thank you very much for your time, I hope I explained myself clearly.
EDIT:
Both arrays are made of strings in my case, so types are not a problem (I've modified the example). The main problem is that, since the keys vary in each field, firstly I have to find all the (unique) keys in the structure, to build the rows, and then for each column (field) I have to fill the values putting NaN where the key is missing.
One thing to remember you can't simply use both strings and number in one matrix. So, if you combine them together they can be either all strings or all numbers. I think all strings will work for you.
Before make a matrix make sure that all the cells have same element.
new_matrix = horzcat(keys,values1,...valuesn);
This will provide a matrix for each row (according to your image). Now you can use a for loop to get matrices for all the rows.
For now, I've solved it by considering the longest array of keys in the structure as the complete set of keys, let's call it keys_set.
Then I've created for each field in the structure a Map object in this way:
for i=1:length(structure)
structure(i).myMap = containers.Map(structure(i).key_field, structure(i).value_field);
end
Then I've built my matrix (M) by checking every map against the keys_set array:
for i=1:length(keys_set)
for j=1:length(structure)
if isKey(structure(j).myMap,char(keys_set(i)))
M(i,j) = string(structure(j).myMap(char(keys_set(i))));
else
M(i,j) = string('MISSING');
end
end
end
This works, but it would be ideal to also be able to check that keys_set is really complete.
EDIT: I've solved my problem by using this function and building the correct set of all the possible keys:
%% Finding the maximum number of keys in all the fields
maxnk = length(structure(1).key_field);
for i=2:length(structure)
if length(structure(i).key_field) > maxnk
maxnk = length(structure(i).key_field);
end
end
%% Initialiting the matrix containing all the possibile set of keys
keys_set=string(zeros(maxnk,length(structure)));
%% Filling the matrix by putting "0" if the dimension is smaller
for i=1:length(structure)
d = length(string(structure(i).key_field));
if d == maxnk
keys_set(:,i) = string(structure(i).key_field);
else
clear tmp
tmp = [string(structure(i).key_field); string(zeros(maxnk-d,1))];
keys_set(:,i) = tmp;
end
end
%% Merging without duplication and removing the "0" element
keys_set = union_several(keys_set);
keys_set = keys_set(keys_set ~= string(0));

How to perform operations along a certain dimension of an array?

I have a 3D array containing five 3-by-4 slices, defined as follows:
rng(3372061);
M = randi(100,3,4,5);
I'd like to collect some statistics about the array:
The maximum value in every column.
The mean value in every row.
The standard deviation within each slice.
This is quite straightforward using loops,
sz = size(M);
colMax = zeros(1,4,5);
rowMean = zeros(3,1,5);
sliceSTD = zeros(1,1,5);
for indS = 1:sz(3)
sl = M(:,:,indS);
sliceSTD(indS) = std(sl(1:sz(1)*sz(2)));
for indC = 1:sz(1)
rowMean(indC,1,indS) = mean(sl(indC,:));
end
for indR = 1:sz(2)
colMax(1,indR,indS) = max(sl(:,indR));
end
end
But I'm not sure that this is the best way to approach the problem.
A common pattern I noticed in the documentation of max, mean and std is that they allow to specify an additional dim input. For instance, in max:
M = max(A,[],dim) returns the largest elements along dimension dim. For example, if A is a matrix, then max(A,[],2) is a column vector containing the maximum value of each row.
How can I use this syntax to simplify my code?
Many functions in MATLAB allow the specification of a "dimension to operate over" when it matters for the result of the computation (several common examples are: min, max, sum, prod, mean, std, size, median, prctile, bounds) - which is especially important for multidimensional inputs. When the dim input is not specified, MATLAB has a way of choosing the dimension on its own, as explained in the documentation; for example in max:
If A is a vector, then max(A) returns the maximum of A.
If A is a matrix, then max(A) is a row vector containing the maximum value of each column.
If A is a multidimensional array, then max(A) operates along the first array dimension whose size does not equal 1, treating the elements as vectors. The size of this dimension becomes 1 while the sizes of all other dimensions remain the same. If A is an empty array whose first dimension has zero length, then max(A) returns an empty array with the same size as A.
Then, using the ...,dim) syntax we can rewrite the code as follows:
rng(3372061);
M = randi(100,3,4,5);
colMax = max(M,[],1);
rowMean = mean(M,2);
sliceSTD = std(reshape(M,1,[],5),0,2); % we use `reshape` to turn each slice into a vector
This has several advantages:
The code is easier to understand.
The code is potentially more robust, being able to handle inputs beyond those it was initially designed for.
The code is likely faster.
In conclusion: it is always a good idea to read the documentation of functions you're using, and experiment with different syntaxes, so as not to miss similar opportunities to make your code more succinct.

how to find mean of columns in nested structure in MATLAB

I've organized some data into a nested structure that includes several subjects, 4-5 trials per subject, then identifying data like height, joint torque over a gait cycle, etc. So, for example:
subject(2).trial(4).torque
gives a matrix of joint torques for the 4th trial of subject 2, where the torque matrix columns represent degrees of freedom (hip, knee, etc.) and the rows represent time increments from 0 through 100% of a stride. What I want to do is take the mean of 5 trials for each degree of freedom and use that to represent the subject (for that degree of freedom). When I try to do it like this for the 1st degree of freedom:
for i = 2:24
numTrialsThisSubject = size(subject(i).trial, 2);
subject(i).torque = mean(subject(i).trial(1:numTrialsThisSubject).torque(:,1), 2);
end
I get this error:
??? Scalar index required for this type of multi-level indexing.
I know I can use a nested for loop to loop through the trials, store them in a temp matrix, then take the mean of the temp columns, but I'd like to avoid creating another variable for the temp matrix if I can. Is this possible?
You can use a combination of deal() and cell2mat().
Try this (use the built-in debugger to run through the code to see how it works):
for subject_k = 2:24
% create temporary cell array for holding the matrices:
temp_torques = cell(length(subject(subject_k).trial), 1);
% deal the matrices from all the trials (copy to temp_torques):
[temp_torques{:}] = deal(subject(subject_k).trial.torque);
% convert to a matrix and concatenate all matrices over rows:
temp_torques = cell2mat(temp_torques);
% calculate mean of degree of freedom number 1 for all trials:
subject(subject_k).torque = mean(temp_torques(:,1));
end
Notice that I use subject_k for the subject counter variable. Be careful with using i and j in MATLAB as names of variables, as they are already defined as 0 + 1.000i (complex number).
As mentioned above in my comment, adding another loop and temp variable turned out to be the simplest execution.

What's an appropriate data structure for a matrix with random variable entries?

I'm currently working in an area that is related to simulation and trying to design a data structure that can include random variables within matrices. To motivate this let me say I have the following matrix:
[a b; c d]
I want to find a data structure that will allow for a, b, c, d to either be real numbers or random variables. As an example, let's say that a = 1, b = -1, c = 2 but let d be a normally distributed random variable with mean 0 and standard deviation 1.
The data structure that I have in mind will give no value to d. However, I also want to be able to design a function that can take in the structure, simulate a uniform(0,1), obtain a value for d using an inverse CDF and then spit out an actual matrix.
I have several ideas to do this (all related to the MATLAB icdf function) but would like to know how more experienced programmers would do this. In this application, it's important that the structure is as "lean" as possible since I will be working with very very large matrices and memory will be an issue.
EDIT #1:
Thank you all for the feedback. I have decided to use a cell structure and store random variables as function handles. To save some processing time for large scale applications, I have decided to reference the location of the random variables to save time during the "evaluation" part.
One solution is to create your matrix initially as a cell array containing both numeric values and function handles to functions designed to generate a value for that entry. For your example, you could do the following:
generatorMatrix = {1 -1; 2 #randn};
Then you could create a function that takes a matrix of the above form, evaluates the cells containing function handles, then combines the results with the numeric cell entries to create a numeric matrix to use for further calculations:
function numMatrix = create_matrix(generatorMatrix)
index = cellfun(#(c) isa(c,'function_handle'),... %# Find function handles
generatorMatrix);
generatorMatrix(index) = cellfun(#feval,... %# Evaluate functions
generatorMatrix(index),...
'UniformOutput',false);
numMatrix = cell2mat(generatorMatrix); %# Change from cell to numeric matrix
end
Some additional things you can do would be to use anonymous functions to do more complicated things with built-in functions or create cell entries of varying size. This is illustrated by the following sample matrix, which can be used to create a matrix with the first row containing a 5 followed by 9 ones and the other 9 rows containing a 1 followed by 9 numbers drawn from a uniform distribution between 5 and 10:
generatorMatrix = {5 ones(1,9); ones(9,1) #() 5*rand(9)+5};
And each time this matrix is passed to create_matrix it will create a new 10-by-10 matrix where the 9-by-9 submatrix will contain a different set of random values.
An alternative solution...
If your matrix can be easily broken into blocks of submatrices (as in the second example above) then using a cell array to store numeric values and function handles may be your best option.
However, if the random values are single elements scattered sparsely throughout the entire matrix, then a variation similar to what user57368 suggested may work better. You could store your matrix data in three parts: a numeric matrix with placeholders (such as NaN) where the randomly-generated values will go, an index vector containing linear indices of the positions of the randomly-generated values, and a cell array of the same length as the index vector containing function handles for the functions to be used to generate the random values. To make things easier, you can even store these three pieces of data in a structure.
As an example, the following defines a 3-by-3 matrix with 3 random values stored in indices 2, 4, and 9 and drawn respectively from a normal distribution, a uniform distribution from 5 to 10, and an exponential distribution:
matData = struct('numMatrix',[1 nan 3; nan 2 4; 0 5 nan],...
'randIndex',[2 4 9],...
'randFcns',{{#randn , #() 5*rand+5 , #() -log(rand)/2}});
And you can define a new create_matrix function to easily create a matrix from this data:
function numMatrix = create_matrix(matData)
numMatrix = matData.numMatrix;
numMatrix(matData.randIndex) = cellfun(#feval,matData.randFcns);
end
If you were using NumPy, then masked arrays would be the obvious place to start, but I don't know of any equivalent in MATLAB. Cell arrays might not be compact enough, and if you did use a cell array, then you would have to come up with an efficient way to find the non-real entries and replace them with a sample from the right distribution.
Try using a regular or sparse matrix to hold the real values, and leave it at zero wherever you want a random variable. Then alongside that store a sparse matrix of the same shape whose non-zero entries correspond to the random variables in your matrix. If you want, the value of the entry in the second matrix can be used to indicate which distribution (ie. 1 for uniform, 2 for normal, etc.).
Whenever you want to get a purely real matrix to work with, you iterate over the non-zero values in the second matrix to convert them to samples, and then add that matrix to your first.

Compact MATLAB matrix indexing notation

I've got an n-by-k sized matrix, containing k numbers per row. I want to use these k numbers as indexes into a k-dimensional matrix. Is there any compact way of doing so in MATLAB or must I use a for loop?
This is what I want to do (in MATLAB pseudo code), but in a more MATLAB-ish way:
for row=1:1:n
finalTable(row) = kDimensionalMatrix(indexmatrix(row, 1),...
indexmatrix(row, 2),...,indexmatrix(row, k))
end
If you want to avoid having to use a for loop, this is probably the cleanest way to do it:
indexCell = num2cell(indexmatrix, 1);
linearIndexMatrix = sub2ind(size(kDimensionalMatrix), indexCell{:});
finalTable = kDimensionalMatrix(linearIndexMatrix);
The first line puts each column of indexmatrix into separate cells of a cell array using num2cell. This allows us to pass all k columns as a comma-separated list into sub2ind, a function that converts subscripted indices (row, column, etc.) into linear indices (each matrix element is numbered from 1 to N, N being the total number of elements in the matrix). The last line uses these linear indices to replace your for loop. A good discussion about matrix indexing (subscript, linear, and logical) can be found here.
Some more food for thought...
The tendency to shy away from for loops in favor of vectorized solutions is something many MATLAB users (myself included) have become accustomed to. However, newer versions of MATLAB handle looping much more efficiently. As discussed in this answer to another SO question, using for loops can sometimes result in faster-running code than you would get with a vectorized solution.
I'm certainly NOT saying you shouldn't try to vectorize your code anymore, only that every problem is unique. Vectorizing will often be more efficient, but not always. For your problem, the execution speed of for loops versus vectorized code will probably depend on how big the values n and k are.
To treat the elements of the vector indexmatrix(row, :) as separate subscripts, you need the elements as a cell array. So, you could do something like this
subsCell = num2cell( indexmatrix( row, : ) );
finalTable( row ) = kDimensionalMatrix( subsCell{:} );
To expand subsCell as a comma-separated-list, unfortunately you do need the two separate lines. However, this code is independent of k.
Convert your sub-indices into linear indices in a hacky way
ksz = size(kDimensionalMatrix);
cksz = cumprod([ 1 ksz(1:end-1)] );
lidx = ( indexmatrix - 1 ) * cksz' + 1; #'
% lindx is now (n)x1 linear indices into kDimensionalMatrix, one index per row of indexmatrix
% access all n values:
selectedValues = kDimensionalMatrix( lindx );
Cheers!