Training a Decision Tree in MATLAB over binary train data - matlab

I want to train a decision tree in MATLAB for binary data. Here is a sample of data I use.
traindata <87*239> [array of data with 239 features]
1 0 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 ... [till 239]
1 1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 ... [till 239]
....
The thing is that this data corresponds to a form which has only options for yes/no. The outcome of the form is also binary and has the meaning that a patinet has some medical disorder or not! we have used classification tree and the classifier shows us double numbers. for example it branches the first node based on x137 value being bigger than 0.75 or not! Since we don't have 0.75 in our data and it has no yes/no meaning we wanted to use a decision tree which is best for our work. The best decision tree for us is the one that is trained based on boolean variables not double ones. Also it understands that the data is not continuous and for example instead of above representation shows x137 is yes o no (1 or 0). Can someone help me with this? I would also appreciate a solution to map our data to double variables and features if the boolean decision tree is not appliable. I am currently using classregtree in matlab with <87*237> as train and <87*1> as results.

classregtree has an optional input parameter categorical. Using this option, you can pass in a vector indicating which of your input variables are categorical (in your case, this vector would be 1x239, all ones). The decision tree should then contain yes/no decisions rather than numerical thresholds.

From the help of classregtree:
t = classregtree(X,y) creates a decision tree t for predicting the response y as a function of the predictors in the columns of X. X is an n-by-m matrix of predictor values. If y is a vector of n response values, classregtree performs regression. If y is a categorical variable, character array, or cell array of strings, classregtree performs classification.
What's the type of y in your case? It seems that classregtree is doing regression in your case but you want classification. So, y should be a categorical variable.
EDIT: To make your y categorical, you can try "nominal(y)".

Related

Matlab: Covariance Matrix from matrix of combinations using E(X) and E(X^2)

I have a set of independent binary random variables (say A,B,C) which take a positive value with some probability and zero otherwise, for which I have generated a matrix of 0s and 1s of all possible combinations of these variables with at least a 1 i.e.
A B C
1 0 0
0 1 0
0 0 1
1 1 0
etc.
I know the values and probabilities of A,B,C so I can calculate E(X) and E(X^2) for each. I want to treat each combination in the above matrix as a new random variable equal to the product of the random variables which are present in that combination (show a 1 in the matrix). For example, random variable Row4 = A*B.
I have created a matrix of the same size to the above, which shows the relevant E(X)s instead of the 1s, and 1s instead of the 0s. This allows me to easily calculate the vector of Expected values of the new random variables (one per combination) as the product of each row. I have also generated a similar matrix which shows E(X^2) instead of E(X), and another one which shows prob(X>0) instead of E(X).
I'm looking for a Matlab script that computes the Covariance matrix of these new variables i.e. taking each row as a random variable. I presume it will have to use the formula:
Cov(X,Y)=E(XY)-E(X)E(Y)
For example, for rows (1 1 0) and (1 0 1):
Cov(X,Y)=E[(AB)(AC)]-E(X)E(Y)
=E[(A^2)BC]-E(X)E(Y)
=E(A^2)E(B)E(C)-E(X)E(Y)
These values I already have from the matrices I've mentioned above. For each Covariance, I'm just unsure how to know which two variables appear in both rows, because for those I will have to select E(X^2) instead of E(X).
Alternatively, the above can be written as:
Cov(X,Y)=E(X)E(Y)*[1/prob(A>0)-1]
But the problem remains as the probabilities in the denominator will only be the ones of the variables which are shared between two combinations.
Any advice on how automate the computation of the Covariance matrix in Matlab would be greatly appreciated.
I'm pretty sure this is not the most efficient way to do that but that's a start:
Assume r1...n the combinations of the random variables, R is the matrix:
A B C
r1 1 0 0
r2 0 1 0
r3 0 0 1
r4 1 1 0
If you have the vector E1, E2 and ER as:
E1 = [E(A) E(B) E(C) ...]
E2 = [E(A²) E(B²) E(C²) ...]
ER = [E(r1) E(r2) E(r3) ...]
If you want to compute E(r1,r2) you can:
1) Extract the R1 and R2 columns from R
v1 = R(1,:)
v2 = R(2,:)
2) Sum both vectors in vs
vs = v1 + v2
3) Loop in vs, if you see a 2 that means the value in R2 has to be used, if you see a 1 it is the value in R1, if it is 0 do not use the value.
4) Using the loop, compute your E(r1,r2) as wanted.

Neural Networks for integer values

I have approximately 5000 integer vectors (=SIZE) that look like:
[1 0 4 2 0 1 3 ...]
They have the same length N=32 and their values ranges from 0 to 4 but let's say [0 MAX].
I created a NN that takes vectors as inputs and outputs a binary array corresponding to one of the desired output(number of possible outputs = M):
for instance [0 1 0 0 ...0] => 2nd output. array_length = M
I used a Multi Layer Perceptron in Neuroph with those integer values but it did not converge.
So I am guessing the problem is using integer values or using a MLP with 3 layers: input, hidden and output.
Can you advise me on the network structure? which type of NN is suitable? Should I remodel the input and output to simplify the learning process? I have been thinking about Gray encoding for the integers input.

Matlab Function block to define Steady Space model (controller)

Hello every one
I have Steady-Space model(Controller) as below:
A =[ *M* ]; B =[0 0 2 0 0 0 0];;
C =[0;2;0]; D =[0 2 0 0 0 0 0 ; 2 0 2 0 0 0 0 ; 0 0 0 *M* 0 2 0]
Controller =ss(A,B,C,D)
This controller have 7 inputs and 3 outputs.
I don't want to use simulink steady-space block to define this controller in it.
As How to change variables in time in Simulink?
I have variable M in my controller that can be changed with time and I want to use
variable signal to this scenario like top linked link.
How can I use user defined blocks to write this variable steady-space controller ?
Which User defined blocks can be use for programming and how?
Need help
Thanks
You can probably use the Matrix Concatenate block to create your D matrix from your M signal , muxed with other constants (0 and 2) to create a vector, which you can then contatenate with 2 other constant vectors to create the matrix. Matrices A, B, and C are constant, so you can then just construct your state-space system from scratch using these 4 matrices, using simple Add, Multiply and Subtract blocks.
Another alternative is to use a MATLAB Function block, taking M as an input, but I don't know if state-space objects are allowed as a data type for the function output. I guess you'd need to compute the state-space output at each time step based on the state-space input. Not sure how you do that with a MATLAB Function block.

Is there an array creation function for full arrays that has the same signature as sparse matrix constructor?

I'd like to accumulate indexed elements in a matrix, like table and tapply function in R.
I found sparse(i,j,s,m,n) fit my need perfectly,
As the document says:"Any elements of s that have duplicate values of i and j are added together."
But I have to convert the obtained sparse matrix to a full one using full():
a = a + full(sparse(i,j,s,m,n));
Is this a efficient way to do so?
By the way, is there anything like below, no matter whether adding duplicated i,j pairs?
a = setelements(a, i,j,s);
and
vector = getelement(a,i,j);
where i&j take meanings in sparse() function.
And what if a is a multidimensional array? sparse() only deal with matrix.
Do I have to set the entries page by page with outer loops?
Take a look at accumaray. For example,
ii = [1 2 2 3 3];
jj = [3 2 2 2 2];
s = [10 20 30 40 50];
a = accumarray([ii(:) jj(:)],s(:));
gives
a =
0 0 10
0 50 0
0 90 0
Note that each row of the first argument ([ii(:) jj(:)]) defines an N-dimensional index into the output array (N is 2 in this example).
accumarray is very flexible. It works for N-dimensional arrays, lets you specify size of the result (it may be larger than inferred from the supplied indices), and can even apply an arbitrary function (different from sum) to each set of values defined by the same index.
As a more general example, with the above data,
a = accumarray([ii(:) jj(:)],s(:),[4 4],#max)
gives
a =
0 0 10 0
0 30 0 0
0 50 0 0
0 0 0 0

I Need help Numeric Comparison in matlab

I have one matrix called targets (1X4000); column 1 to 2000 contains double value 0 and column 2001 to 4000 contains double value 1
a)
i want to create a matrix called targets_1 where i want to check if the value is 0 then make the entry 1 so at the end of the day i must have a matrix with :column 1 to 2000 with value 1 and column 2001:4000 with value zero
b)
Same situation as above but this time i want to check if the value is 1 then make the entry 1 and if it is zero then make the entry zero; at the end; my new matrix targets_2 contains values: column 1 to 2000 with value zero and column 2001:4000 with value 1
i know how to use the strcmp function to make such checking with strings, but problem is that my original matrix is double and i dont know if there is such function like
setosaCmp = strcmp('setosa',species);
which could work with double (numbers); any help would be appreciated
Your question isn't very clear. It sounds like the following would satisfy your description:
targets_1 = 1 - targets;
targets_2 = targets;
targets1 = double(targets == 0);
targets2 = targets;
I'm basing this answer purely on the fact that you've mentioned setosaCmp = strcmp('setosa', species);. From this I'm guessing that
You have Statistics Toolbox, as setosa is a species of iris from the Fisher Iris dataset widely used in Statistics Toolbox demos, and
You have a variable containing class labels, and you'd like to construct some class indicator variables (i.e. a new variable for each class label, each of which is 1 when the item is in that class, and 0 when it's not).
Is that right? If not, please ignore me.
If I'm right, then I think the command you're looking for is dummyvar from Statistics Toolbox. Try this:
>> classLabels = [1, 2, 1, 2, 3, 1, 3];
>> dummyvar(classLabels)
ans =
1 0 0
0 1 0
1 0 0
0 1 0
0 0 1
1 0 0
0 0 1