how to store and retrieve multiple decision trees in matlab - matlab

I have a for loop that generates a single decision tree each time and later on in the program I need to apply all the decision trees to the testing data (the decision trees are NOT combined in an ensemble). I tried to store them in an array of structures but when I am applying them to the test data I have the following error:
(Undefined function 'predict' for input arguments of type 'struct'.).
I know that the tree generated is an object but how you can store and retrieve multiple objects in MATLAB?

Usually the arrays of objects would do (the only thing is that the class needs a default constructor in order to allocate the space).
So, the safest way is to use a cell array. The code would be something along the lines of:
%'"N" is the size of your problem'
dtrees = cell(1,N);
for k = 1:N
%'Create the decision tree "obj"'
%'...'
dtrees{k} = obj;
end;
%'...'
%'Later iterate in the cell array'
for k = 1:N
obj = dtrees{k};
%'Now do stuff with "obj"'
%'...'
end;

You use cell arrays.
http://uk.mathworks.com/help/matlab/matlab_prog/create-a-cell-array.html
they can store pretty much whatever.

Related

How to loop through multiple structures and perform the same operations [Matlab]

I am trying to loop through multiple structures at once, extract variables of interest, and combine them into a single cell array. The problem: all of the variables have the same names. I have a working pseudocode--here it is:
Let's say I load i structures in my workspace. Now I want to loop through each structure, and extract time and position data from each structure.
First, I load my structures. Something like...
data_1
data_2
data_3
Then, I create appropriately sized cell arrays.
time{i,:} = zeros(size(structures));
position{i,:} = zeros(size(structures));
Finally, I loop through my structures to extract cell arrays and create a single array.
for i = 1:size(structures)
time_i= data_i.numbers.time;
position_i= data_i.numbers.position;
time {i,:} = time_i;
position{i,:} = position_i;
end
I want to end with a cell array containing a concatenation of all the variables in a single cell structure.
Could you please help convert my pseudo code/ideas into a script, or point me to resources that might help?
Thanks!
You're likely going to be better off loading your data internal to the loop and storing it into a cell or structure rather than trying to deal with iteratively named variables in your workspace. eval is, in nearly all cases, a significant code smell, not least of which because MATLAB's JIT compiler ignores eval statements so you get none of the engine's optimizations. eval statements are also difficult to parse, debug, and maintain.
An example of a stronger approach:
for ii = 1:nfiles
tmp = load(thefilenames{ii}); % Or use output of dir
trialstr = sprintf('trial_%u', ii); % Generate trial string
data.(trialstr).time = tmp.numbers.time;
data.(trialstr).position = tmp.numbers.position;
end
Which leaves you with a final data structure of:
data
trial_n
time
position
Which is far easier to iterate through later.
My final script for anyone interested:
for i = 1:4 %for 4 structures that I am looping through
eval(['time_',num2str(i),'= data_',num2str(i),'.numbers.time;']);
eval(['position_',num2str(i),'= data_',num2str(i),'.numbers.position;']);
%concatenate data into a single cell array here
time{i} = {eval(['time_',num2str(i)])};
position{i} = {eval(['position_',num2str(i)])};
end
...
eval(['time_',num2str(i),'= data_',num2str(i),'.numbers.time;'])
eval(['position_',num2str(i),'= data_',num2str(i),'.numbers.position;'])
...

error in array of struct matlab

I want to train data on various labels using svm and want svm model as array of struct. I am doing like this but getting the error:
Subscripted assignment between dissimilar structures.
Please help me out
model = repmat(struct(),size);
for i=1:size
model(i) = svmtrain(train_data,labels(:,i),'Options', options);
end
A structure array in MATLAB can only contain structures with identical fields. By first creating an array of empty structures1 and then trying to fill it with SVMStruct structures, you try to create a mixed array with some empty structures, and some SVMStruct structures. This is not possible, thus MATLAB complains about "dissimilar structures" (not-equal structures: empty vs. SVMStruct).
To allocate an array of structs, you will have to specify all fields during initialization and fill them all with initial values - which is rather inconvenient in this case. A very simple alternative is to drop this initialization, and run your loop the other way around2,3:
for ii=sizeOfLabels:-1:1
model(ii) = svmtrain(train_data,labels(:,ii),'Options', options);
end
That way, for ii=sizeOfLabels, e.g. ii=100, MATLAB will call model(100)=..., while model doesn't exist yet. It will then allocate all space needed for 100 SVMStructs and fill the first 99 instances with empty values. That way you pre-allocate the memory, without having to worry about initializing the values.
1Note: if e.g. size=5, calling repmat(struct(),size) will create a 5-by-5 matrix of empty structs. To create a 1-by-5 array of structs, call repmat(struct(),1,size).
2Don't use size as a variable name, as this is a function. If you do that, you can't use the size function anymore.
3i and j denote the imaginary unit in MATLAB. Using them as a variable slows the code down and is error-prone. Use e.g. k or ii for loops instead.

Preallocating a structure with variables of different names, size and type

I am going to use an algorithm in a for loop as an iteration loop. I know that some of the calculations can be done once and the their results can be used as necessary inputs for the algorithm in the for loop so in this way there is no need to calculate the same things in each iteration. To do so, I calculate them once and put them in a structure.
I use structure since I have many variables to be kept to be used in the for loop and their name and size are different. I put them in the structure with the same name for example:
out.A = A;
out.myvector = myvector;
out.s = s;
out.Hx_l = Hx_l;
and so on. some of them are matrices, some of them cubes or variables with a forth dimension and some of them are cells.
Is there any way to preallocate this kind of structure?
You could initialise the structure the following way:
out = struct('A',[],'myvector',[],'s',[],'Hx',[]);
When you assign the variables afterwards, the structure fields will be already created. Usually the contents are not initialised upfront. Quoting Loren:
How important is it to initialize the contents of the struct. Of
course it depends on your specifics, but since each field is its own
MATLAB array, there is not necessarily a need to initialize them all
up front. The key however is to try to not grow either the struct
itself or any of its contents incrementally.

Matlab: Matrix of ClassificationKNN class objects

For classification I'm building a number of models for a classifier in MATLAB. I use the class ClassificationKNN for this.
I would very much like to store multiple models (or objects of this class) inside a matrix.
Normally you could access and create matrices inside a matrix with the curly braces ({}).
My loop looks like this:
models = []
for i = 1:length(x)
models = [models, {ClassificationKNN.fit(x,y)}]
end
Unfortunately this returns a matrix models of size (1,3) but all cells are empty which means the models are lost...
How can I make sure every model is stored in a matrix? I need to do this because I need all models later in my calculations and the position in the matrix is important...
Any ideas?
You want a cell array of models, right? It sure looks that way, if that will work try this:
models = {}
for ii = 1:length(x)
models = [models, {ClassificationKNN.fit(x,y)}]
end
Also, you loop through calling ClassificationKNN.fit(x,y) with the same arguments every time, is this just a test, or pseudo-code for an example. Like the comment says, it's best to preallocate like:
models = cell(length(x),1);
for ii = 1:length(x)
models{ii} = ClassificationKNN.fit(x,y);
end
But, either way is likely fine.
Thanks to macduffs post I finally figured out what was going on. Whilest reading his proposition I realised that that indeed should be the correct way if getting a cell array of objects.
After trying it, the array again seemed empty when opening it in the variable editor. I tried calling the first cell in the array to see if it was indeed empty and it was not. It returned the object I had stored in it. This means the question was answered.
I then reverted back to my own method to see if that worked as well and it did. When calling a cell it also returned an object.
Bottom line:
Do not trust the variable editor ^^.

In-Place Quicksort in matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).
My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.
I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?
Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.
Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.
%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(#(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(#(s)s.foo, sortedC) %// for human readability of results
If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.
%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
keyCol = keyCols{i};
key(:,i) = cellfun(#(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(#(s)s.foo, sortedC)
One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.
Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.
In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.
When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.
function y = myFunc(x)
x = x .* 2; %# pass-by-value, changes only visible inside function
y = x;
end
This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.
function y = myFunc(x)
%# x was never changed, thus passed-by-reference avoiding making a copy
y = x .* 2;
end
Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.
In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.
function x = myFunc(x)
x = x.*2;
end
Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.
In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...
For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.