How to access the variable range for leaf node in classregtree of matlab? - matlab

I used classregtree to fit a tree to my data set in order to classify the data. All of predictors and the response are quantitative. I want to save the range of each variable on terminal nodes, because I am gonna use those ranges in another function.
So is there any way that I can have access to those ranges? I can see the variable ranges in view(tree) plot but I need to save them in like a matrix to use them.

I am not totally sure that this is what you were asking for but this gives you the split criterions for all trees
B = TreeBagger(nTrees,M,tag, 'Method', 'classification','OOBPred','on');
view(B.Trees{1:B.NTrees})
where M is your trainig data set and tag are the classes.

Related

How to use binning method for identifying the incoming point belongs to which bin?

I have small query. I have two data sets. In one data sets for example I did binning and calculated the mean value and std value along with group binning. Now in I have second data sets of same parameters say X. I would like identify this X data sets belong to which bin groups of my previous data sets using matlab.
Could you give some example how to identify the incoming data points belongs to which bin group...??
I used following binning which is available in matlab :
binEdges = linspace(botEdge, topEdge, numBins+1);
[h,whichBin] = histc(x, binEdges);
Well... you already have your bin edges. Anything inside specific edges is in that bin.
If you know that the data is inside the ranges you defined then, for each new data
newdatabin=find(newdata>binedges,1,'last'); %this is the bin number where the new data goes in
h(newdatabin)=h(newdatabin)+1; %add one!
Also consider using histcounts if your MATLAB version is new enough.

Splitting non-continuous sized matrix in vectors

I'm writing an piece of software within Matlab. Here, the user can define a dimension say 3.
This dimension is subsequently the number of iterations of a for loop. Within this loop, I construct a matrix to store the results which are generated during every iteration. So, the data of every iteration is stored in a row of a matrix.
Therefore, the size of the matrix depends on the size of the loop and thus the user input.
Now, I want to separate each row of this matrix (cl_matrix) and create separate vectors for every row automatically. How would one go on about? I am stuck here...
So far I have:
Angle = [1 7 15];
for i = 1:length(Angle)
%% do some calculations here %%
cl_matrix(i,:) = A.data(:,7);
end
I want to automate this based on the length of Angle:
length(Angle)
cl_1 = cl_matrix(1,:);
cl_7 = cl_matrix(2,:);
cl_15= cl_matrix(3,:);
Thanks!
The only way to dynamically generate in the workspace variables variables whos name is built by aggregating string and numeric values (as in your question) is to use the eval function.
Nevertheless, eval is only one character far from "evil", seductive as it is and dangerous as it is as well.
A possible compromise between directly working with the cl_matrix and generating the set of array cl_1, cl_7 and cl_15 could be creating a structure whos fields are dynamically generated.
You can actually generate a struct whos field are cl_1, cl_7 and cl_15 this way:
cl_struct.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
(you might notice the field name, e. g. cl_1, is generated in the same way you could generate it by using eval).
Using this approach offers a remarkable advantage with respect to the generation of the arrays by using eval: you can access to the field od the struct (that is to their content) even not knowing their names.
In the following you can find a modified version of your script in which this approach has been implemented.
The script generate two structs:
the first one, cl_struct_same_length is used to store the rows of the cl_matrix
thesecond one, cl_struct_different_length is used to store arrays of different length
In the script there are examples on how to access to the fileds (that is the arrays) to perform some calculations (in the example, to evaluate the mean of each of then).
You can access to the struct fields by using the functions:
getfield to get the values stored in it
fieldnames to get the names (dynamically generated) of the field
Updated script
Angle = [1 7 15];
for i = 1:length(Angle)
% do some calculations here %%
% % % cl_matrix(i,:) = A.data(:,7);
% Populate cl_matrix
cl_matrix(i,:) = randi(10,1,10)*Angle(i);
% Create a struct with dinamic filed names
cl_struct_same_length.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
cl_struct_different_length.(['cl_' num2str(Angle(i))])=randi(10,1,Angle(i))
end
% Use "fieldnames" to get the names of the dinamically generated struct's field
cl_fields=fieldnames(cl_struct_same_length)
% Loop through the struct's fileds to perform some calculation on the
% stored values
for i=1:length(cl_fields)
cl_means(i)=mean(cl_struct_same_length.(cl_fields{i}))
end
% Assign the value stored in a struct's field to a variable
row_2_of_cl_matrix=getfield(cl_struct_different_length,(['cl_' num2str(Angle(2))]))
Hope this helps.

Cannot get clustering output Mahout

I am running kmeans in Mahout and as an output I get folders clusters-x, clusters-x-final and clusteredPoints.
If I understood well, clusters-x are centroid locations in each of iterations, clusters-x-final are final centroid locations, and clusteredPoints should be the points being clustered with cluster id and weight which represents probability of belonging to cluster (depending on the distance between point and its centroid). On the other hand, clusters-x and clusters-x-final contain clusters centroids, number of elements, features values of centroid and the radius of the cluster (distance between centroid and its farthest point.
How do I examine this outputs?
I used cluster dumper successfully for clusters-x and clusters-x-final from terminal, but when I used it clusteredPoints, I got an empty file? What seems to be the problem?
And how can I get this values from code? I mean, the centroid values and points belonging to clusters?
FOr clusteredPoint I used IntWritable as key, and WeightedPropertyVectorWritable for value, in a while loop, but it passes the loop like there are no elements in clusteredPoints?
This is even more strange because the file that I get with clusterDumper is empty?
What could be the problem?
Any help would be greatly appreciated!
I believe your interpretation of the data is correct (I've only been working with Mahout for ~3 weeks, so someone more seasoned should probably weigh in on this).
As far as linking points back to the input that created them I've used NamedVector, where the name is the key for the vector. When you read one of the generated points files (clusteredPoints) you can convert each row (point vector) back into a NamedVector and retrieve the name using .getName().
Update in response to comment
When you initially read your data into Mahout, you convert it into a collection of vectors with which you then write to a file (points) for use in the clustering algorithms later. Mahout gives you several Vector types which you can use, but they also give you access to a Vector wrapper class called NamedVector which will allow you to identify each vector.
For example, you could create each NamedVector as follows:
NamedVector nVec = new NamedVector(
new SequentialAccessSparseVector(vectorDimensions),
vectorName
);
Then you write your collection of NamedVectors to file with something like:
SequenceFile.Writer writer = new SequenceFile.Writer(...);
VectorWritable writable = new VectorWritable();
// the next two lines will be in a loop, but I'm omitting it for clarity
writable.set(nVec);
writer.append(new Text(nVec.getName()), nVec);
You can now use this file as input to one of the clustering algorithms.
After having run one of the clustering algorithms with your points file, it will have generated yet another points file, but it will be in a directory named clusteredPoints.
You can then read in this points file and extract the name you associated to each vector. It'll look something like this:
IntWritable clusterId = new IntWritable();
WeightedPropertyVectorWritable vector = new WeightedPropertyVectorWritable();
while (reader.next(clusterId, vector))
{
NamedVector nVec = (NamedVector)vector.getVector();
// you now have access to the original name using nVec.getName()
}
check the parameter named "clusterClassificationThreshold".
clusterClassificationThreshold should be 0.
You can check this http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700#windwardsolutions.com%3E

MATLAB Saving and Loading Feature Vectors

I am trying to load feature vectors into classifiers such as a k-nearest neighbors classifier.
I have my code for GLCM, so I get contrast, correlation, energy, homogeneity in numbers (feature vectors).
My question is, how can I save every set of feature vectors from all the training images? I have seen somewhere that people had a .set file to load into classifiers (may be it is a special case for the particular classifier toolbox).
load 'mydata.set';
for example.
I suppose it does not have to be a .set file.
I'd just need a way to store all the feature vectors from all the training images in a separate file that can be loaded.
I've google,
and I found this that may be useful
but I am not entirely sure.
Thanks for your time and help in advance.
Regards.
If you arrange your feature vectors as the columns of an array called X, then just issue the command
save('some_description.mat','X');
Alternatively, if you want the save file to be readable, say in ASCII, then just use this instead:
save('some_description.txt', 'X', '-ASCII');
Later, when you want to re-use the data, just say
var = {'X'}; % <-- You can modify this if you want to load multiple variables.
load('some_description.mat', var{:});
load('some_description.txt', var{:}); % <-- Use this if you saved to .txt file.
Then the variable named 'X' will be loaded into the workspace and its columns will be the same feature vectors you computed before.
You will want to replace the some_description part of each file name above and instead use something that allows you to easily identify which data set's feature vectors are saved in the file (if you have multiple data sets). Your array of feature vectors may also be called something besides X, so you can change the name accordingly.

MATLAB/Simulink - programmatically supply multiple external inputs

I have the following Simulink model:
I would like to externally provide inputs u[k] and y[k], i.e., I will be running simulations via MATLAB command line. I found previously that I could set the [LoadExternalInput and ExternalInput][3] options, and they default to the vector [t u].
But my u[k] and y[k] are vectors, and it looks like the ExternalInput can only specify one vector. So each row of [t u] is the value of the entire vector u at time t.
The sizes of u[k] and y[k] in my model here are not necessarily known ahead of time. Is there a way to pass in these vectors (as structs, perhaps)?
From Importing Data to Root-Level Input Ports I've found that I could do something like
sim('myModel', 'LoadExternalInput', 'on', 'ExternalInput', 'u, y');
where u and y are structures with fields time, signals.values, and signals.dimensions; each row of signals.values is a vector corresponding to a n element of time. signals.dimensions is the dimension of signals.values. I have to manually set the In1 ports to expect the same dimension as u.signals.values (well, I can of course do it programatically before hand..):
Note You must set the Port dimensions parameter of the Inport or the Trigger block to be the same value as the dimensions field of the corresponding input structure. If the values differ, an error message is displayed when you try to simulate the model.
(from "Importing Data Structures to a Root-Level Input Port")
What's the point of setting signals.dimensions if I have to set the dimension on the In1 block manually anyhow? Anyway that might have to be how I do this: just examine u and y before running the simulation, then setting the Inblock properties (programatically, of course) to expect vectors of that length.
I am still hoping there is a more elegant solution for this.