In MATLAB how to get the result of a classification tree in a matrix?

In MATLAB how to get the result of a classification tree in a matrix? - matlab

I made a Classification Tree, code:
mytree=ClassificationTree.fit(MyData,MyLables);
mytree.view('mode','graph');
My data has two classes and I want to get the result of prediction as a matrix that can show me every data row is belongs to which as an example.
data row predicted class
1 2
2 1
. .
. .
. .
how can i make this matrix?
---------------------Edited----------------------
I found that with this function I can predict my data:
label = predict(Mdl,MyData([1:50],:));
but this labels are belong to which rows?

The first column, i.e. 'data row', is simply a vector starting from 1 to number of rows of X (which is obviously also the same as number of values in Y). The second column, i.e. 'predicted class', is the same as the variable MyLables. Hence:
ReqResult = [(1:numel(Y)).' Y];
%Assuming Y is a column vector (order = nx1).
%If Y is a row vector then take the transpose of Y as well.
Warning:
If you're using ≥ R2014a, you should use fitctree instead of ClassificationTree.fit because as mentioned in the documentation:
ClassificationTree.fit will be removed in a future release. Use fitctree instead.

Related

MatLab: Create matrix row element if row elements of two previous matrices are identical

Sorry for the title. I could not think of something better.
I have the following problem.
I have two four-column matrices build up like this:
Property | X | Y | Z
The two matrices have different sizes, since matrix 1 has a large amount of additional rows compared to matrix 2.
What I want to do is the following:
I need to create a third matrix that only features those rows (of the large matrix) that are identical in columns X, Y and Z to rows in matrix2(the property column is always different).
I tried an if-statement but it did not really work out due to my programming syntax. Has somebody a tip?
Thank you!
I tried something like this: (in this case A is the larger matrix and I want its property column for X,Y,Z-positions that are identical to another matrix B.. I am terrible with the MatLab-syntax..
if (A(:,2) == B(:,2) and (A(:,3) == B(:,3) and (A(:,4) == B(:,4))
newArray(:,1) = A(:,1);
end

Use ismember with the 'rows' option to find the desired rows, and then use that as an index to build the result:
ind = ismember(A(:,2:4), B(:,2:4), 'rows');
C = A(ind,:);
I have assumed that a row of A is selected if its last three columns match those of any row of B.

Matlab calculate outliers from data and time they occur

In Matlab I have a large matrix A. The first column of the matrix contains a time in seconds. The second to 13th column contain results from a calculation. For each column (except the first) I calculated the whisker by:
quantile(A,[.75])-1.5*(quantile(A,[.75])-quantile(A,[.25]))
Now I would like to now how many outliers (= values below whisker) there are in each column, and when they occur. This will give me the ability to calculate how much the outliers are spread over time.
I prefer to create a loop which gives me 12 martices containing two columns. The second column should contain the values of the outliers (= values of cells below whisker) without any zero's in between, and the first column should contain the time at which a outlier occurs (chronologically).
How can I create this?
regards,
Vincent

let,
A =
0.6260 0.7690 0.1209 0.5523 0.0495
0.6609 0.5814 0.8627 0.6299 0.4896
0.7298 0.9283 0.4843 0.0320 0.1925
0.8908 0.5801 0.8449 0.6147 0.1231
0.9823 0.0170 0.2094 0.3624 0.2055
for second column:
B = quantile(A(:,2),[.75])-1.5*(quantile(A(:,2),[.75])-quantile(A(:,2),[.25]))
Then,
index = find(A(:,2) < B)
value_outliner = A(index,2)
outliner_time = A(index,1)

No need to loop: use matrix operations and logical indexing instead.
Assuming you have a matrix A and outlier threshold thr is a 1x12 vector with the threshold for each column:
vals = A(:,2:13);
outliers = bsxfun(#lt, vals, thr); #% #lt is 'less than' function handle
#% outliers is a Nx12 logical matrix with true(1) where the value < threshold
#% and false(0) otherwise.
To get the time when these outliers occurred (for a given column, let's say column 2 of the data portion of the original matrix):
t = A(outliers(:,2), 1);
#% ^____________ logical index of rows where outliers occurred in that column
You can also easily get the number of outliers in each column (or row) by summing:
num_outliers = sum(outliers,1);

replace all numbers in a matrix

There are two matrices; the first one is my input matrix
and the second one ("renaming matrix") is used to replace the values of the first one
That is, looking at the renaming matrix; 701 must be replaced by 1,...,717 must be replaced by 10,etc.. such that the input matrix becomes as such
The ? values are defined but i didn't put them. The second column of the input matrix is already sorted(ascending order from top down) but the values are not consecutive(no "710": see first pic).
The question is how to get the output matrix(last pic) from the first two.

Looks to me like it's screaming for a sparse matrix solution. In matlab you can create a sparse matrix with the following command:
SM = sparse( ri, ci, val );
where ri is the row index of the non-zero elements, ci is the corresponding column index, and val is the values.
Let's call your input matrix IM and your lookup matrix LUM, then we construct the sparse matrix:
nr = size(LUM, 1);
SM = sparse( ones(nr, 1), LUM(:, 1), LUM(:, 2) );
Now we can get your result in a single line:
newMatrix = reshape(SM(1, IM), size(IM));
almost magic.
I didn't have a chance to check this tonight - but if it doesn't work exactly as described, it should be really really close...

If the values in the first column all appear in the second column, and if all you want is replace the values in the second column by 1..n and change the values in the first column accordingly, you can do all of this with a simple call to ismember:
%# define "inputMatrix" here as the first array in your post
[~,newFirstColumn] = ismember(inputMatrix(:,1),inputMatrix(:,2));
To create your output, you'd then write
outputMatrix = [newFirstColumn,(1:length(newFirstColumn))'];

If M is the original matrix and R is the renaming matrix, here's how you do it
N = M;
for n = 1:size(M,1)
N(find(M==R(n,1))) = R(n,2);
end
Note that in this case you're creating a new matrix N with the renamed values. You don't have to do that if you like.

How to master the random generator in MATLAB for application to bootstrap artificial neural network models

I am working with hydrological time series data and I am attempting to construct Bootstrap Artificial Neural Network models. In order to provide an uncertainty assessment using confidence intervals, one must make sure when resampling/Bootstrapping the original time series data set, that every value in the original time series is held back at least twice within all bootstrap samples in order to calculate the variance and confidence intervals at that point in time.
To give some background:
I am using a hydrological time series that contains Standard Precipitation Index values at monthly time steps, this time series spans 429 (rows) x 1 (column), let's call this time series vector X. All elements/values of X are normalized and standardized between 0 and 1.
Time series X is then trained against some Target values (same length and conditions as X) in a Neural Network to produce new estimates of the Target values, we'll call this output vector, O (same length and conditions as X).
I am now to take X and resample it ii =1:1:200 times (i.e. Bootstrap size = 200) for length(429) with replacement. Let's call the matrix where all the bootstrap samples are placed, M. I use B = randsample(X, length(X), true) and fill M using a for loop such that M(:,ii) = B. Note: I also make sure to include rng('shuffle') after my randsample statement to keep the RNG moving to new states in hopes that it will provide more random results.
Now I am to test how "well" my data was resampled for use in creating confidence intervals.
My procedure is as follow:
Generate a for loop to create M using above procedure
Create a new variable Xc, this will hold all values of X that were not resampled in bootstrap sample ii for ii = 1:1:200
For j=1:1:length(X) fill 'Xc' using the Xc(j,ii) = setdiff(X, M(:,ii)), if element j exists in M(:,ii) fill Xc(j,ii) with NaN.
Xc is now a matrix the same size and dimensions as M. Count the amount of NaN values in each row of Xc and place in vector CI.
If any row in CI is > [Bootstrap sample size, for this case (200) - 1], then no confidence interval can be created at this point.
When I run this I find that the values chosen from my set X are almost always repeated, i.e. the same values of X are used to generate all the samples in M. It's roughly the same ~200 data points in my original time series that are always chosen to create the new bootstrap samples.
How can I effectively alter my program or use any specific functions that will allow me to avoid the negative solution in (5)?
Here is an example of my code, but please keep in mind the variables used in the script may differ from my text in here.
Thank you for the help and please see the code below.
for ii = 1:1:Blen % for loop to create 'how many bootstraps we desire'
B = randsample(Xtrain, wtrain, true); % bootstrap resamples of data series 'X' for 'how many elements' with replacement
rng('shuffle');
M(:,ii) = B; % creates a matrix of all bootstrap resamples with respect to the amount created by the for loop
[C,IA] = setdiff(Xtrain,B); % creates a vector containing all elements of 'Xtrain' that were not included in bootstrap sample 'ii' and the location of each element
[IAc] = setdiff(k,IA); % creates a vector containing locations of elements of 'Xtrain' used in bootstrap sample 'ii' --> ***IA + IAc = wtrain***
for j = 1:1:wtrain % for loop that counts each row of vector
if ismember(j,IA)== 1 % if the count variable is equal to a value of 'IA'
XC(j,ii) = Xtrain(j,1); % place variable in matrix for sample 'ii' in position 'j' if statement above is true
else
XC(j,ii) = NaN; % hold position with a NaN value to state that this value has been used in bootstrap sample 'ii'
end
dum1(:,ii) = wtrain - sum(isnan(XC(:,ii))); % dummy variable to permit transposing of 'IAs' limited by 'isnan' --> used to calculate amt of elements in IA
dum2(:,ii) = sum(isnan(XC(:,ii))); % dummy variable to permit transposing of 'IAsc' limited by 'isnan'
IAs = transpose(dum1) ; % variable counting amount of elements not resampled in 'M' at set 'i', ***i.e. counts 'IA' for each resample set 'i'
IAsc = transpose(dum2) ; % variable counting amount of elements resampled in 'M' at set 'i', ***i.e. counts 'IAc' for each resample set 'i'
chk = isnan(XC); % returns 1 in position of NaN and 0 in position of actual value
chks = sum(chk,2); % counts how many NaNs are in each row for length of time training set
chks_cnt = sum(chks(:)<(Blen-1)); % counts how many values of the original time series that can be provided a confidence interval, should = wtrain to provide complete CIs
end
end

This doesn't appear to be a problem with randsample, but rather a problem in your other code somewhere. randsample does the right thing. For example:
x = (1:10)';
nSamples = 10;
for iter = 1:100;
data(:,iter) = randsample(x,nSamples ,true);
end;
hist(data(:)) %this is approximately uniform
randsample samples quite randomly...

what the mean of the following MATLAB code

please help me to understand this code:
x(:,i) = mean( (y(:,((i-1)*j+1):i*j)), 2 )';
i can't find it in my book. thanks.

The code you posted can be made more readable using temporary variables:
a = (i-1)*j+1;
b = i*j;
val = y(:,a:b);
x(:,i) = mean( val, 2 )'; %# =mean( val' )
What exactly you do not understand? For meaning of mean , : and ' consult matlab help.

It would help if you said exactly what you don't understand, but here are a few tips:
if you have something like a(r,c), that means matrix a, row r, column c (always in this order). In other words, you should have two elements inside the brackets separated by a comma where the first represents the row, the second the column.
If you have : by itself in one of the sides of the comma, that means "all". Thus, if you had a(r,:), then you would have matrix a, row r, all columns.
If : is not alone in one of the sides of the comma, then it will mean "to". So if you have a(r, z:y), that means matrix a, row r, columns z to y.
Mean = average. The format of the function in Matlab is M = mean(A,dim). A will be the matrix you take the average (or mean) of, M will be the place where the results are going to go. If dim = 1, you will get a row vector with each element being the average of a column. If dim = 2 (as it is in your case), then you should get a column vector, with each element being the average of a row. Be careful, though, because at the end of your code you have ', which means transpose. That means that your column vector will be transformed into a row vector.
OK, so your code:
x(:,i) = mean( (y(:,((i-1)*j+1):i*j)), 2 )';
Start with the bit inside, that is
y(:,((i-1)*j+1):i*j)
So that is saying
matrix y(r,c)
where
r (row) is :, that is, all rows
c (column) is ((i-1)j+1):ij, that is, columns going from (i-1)j+1 until ij
Your code will then get the matrix resulting from that, which I called y(r,c), and will do the following:
mean( (y(r,c), 2 )
so get the result from above and take the mean (average) of each row. As your code has the ' afterwards, that is, you have:
mean( (y(r,c), 2 )'
then it will get the column vector and transform into a row vector. Each element of this row will be the average of a row of y(r,c).
Finally:
x(:,i) = mean( (y(r,c), 2 )';
means that the result of the above will be put in column i of matrix x.
Shouldn't this be x(i,:) instead?

The i-th column of the array x is the average of the i-th group of j columns of the array y.
For example, if i is 1 and j is 3, the 1st column of x is the average of the first three columns of y.