Regress categorical variables in Matlab - matlab

I have a cell type variable with 12 columns and 20000 rows. I call it Atotal:
Atotal= [ATY1;ATY2;ATY3;ATY4;ATY5;ATY6;ATY7;ATY8;ATY9;ATY10;ATY11;ATY12;ATY13;ATY14;ATY15;ATY16;ATY17];
Atotal={ 972 1 0 0 0 0 0 21 60 118 60110 2001
973 0 0 1 0 0 0 15 46 1496 60110 2001
980 0 0 0 0 1 0 4 68 142 40502 2001
994 1 0 0 0 0 0 13 33 86 81101 2001
995 0 0 0 1 0 0 9 55 183 31201 2001
1024 1 0 0 0 0 0 10 26 3 80803 2001}
I get my dependent and independent variables from there:
Y1=cell2mat(Atotal(:,2));
X1=cell2mat(Atotal(:,3));
And then I regress them. Considering that my dependent variable Y1 is binary and my independent variable X1 is also a categorical variable, I use the follwoing code, still not sure if it is the correct one.
mdl1 = fitlm(X1,Y1,'CategoricalVars',logical([1]));
Then I add more dummies and try the same code:
X2=cell2mat(Atotal(:,4));
X3=cell2mat(Atotal(:,5));
X4=cell2mat(Atotal(:,6));
X5=cell2mat(Atotal(:,7));
mdl2 = fitlm(X1,X2,X3,X4,X5,Y1,'CategoricalVars',logical([1,2,3,4,5]));
But now it gives me a lt of errors:
Error using internal.stats.parseArgs (line 42)
Parameter name must be text.
Error in LinearModel.fit (line 849)
[intercept,predictorVars,responseVar,weights,exclude, ...
Error in fitlm (line 117)
model = LinearModel.fit(X,varargin{:});
Could someone help me? Thank you

I think there are two problems with your code.
The first problem is that fitlm expects the following arguments:
mdl = fitlm(X,y,modelspec)
which basically means that you have to collect your predictor variables into one matrix, and use it as its first argument. So you should do the following:
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, ...)
The second problem is that for the CategoricalVars argument fitlm expects either a logical vector (a vector which is one where the variable is categorical, and zero where continuous) or a numeric index vector. So the correct usage is:
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars',logical([1,1,1,1,1]))
or
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars', [1,2,3,4,5])
The above code snippets should work properly.
However you could consider declaring your categorical variables as categorical (if you have Matlab R2013b or above). In this case you would do the following:
X1 = categorical(cell2mat(Atotal(:,3)));
X2 = categorical(cell2mat(Atotal(:,4)));
X3 = categorical(cell2mat(Atotal(:,5)));
X4 = categorical(cell2mat(Atotal(:,6)));
X5 = categorical(cell2mat(Atotal(:,7)));
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1)
The advantage of this approach is that Matlab knows that your Xi variables are categorical, and they will be treated accordingly, so you do not have to specify the CategoricalVars argument every time you want to run a regression.
Finally, the Matlab documentation of the fitlm function is really good with a lot of examples, so check that out too.
Note: as others have mentioned in the comments, you should also consider running a logit regression as your response variable is binary. In this case you would estimate your model the following way:
X = [X1, X2, X3, X4, X5];
fitglm(X, Y1, 'Distribution', 'binomial', 'Link', 'logit')
However if you do this be sure to understand what a logistic model is, what are its assumptions and what is the interpretation of its coefficients.

Related

fmincon with lower bound fails, even though solution is at initial point

I'm trying to minimize a non-linear objective function (my actual function is much more complicated than that, but I found that even this simple function illustrates the point), where I know that minimum is obtained at the initial point x0:
fun = #(x) x(1)^2+x(2)^2;
x0 = [0 0];
lb1 = [0 0];
lb2 = [-1 -1];
[xc1 fvalc1] = fmincon(fun, x0, [],[],[],[], lb1, [Inf Inf])
Which outputs:
>> xc1 = 1.0e-03 * [0.6457 0.6457]
>> fvalc1 = 8.3378e-07
However, both using a different lower bound or using fminsearch instead work correctly:
[xc2 fvalc2] = fmincon(fun, x0, [],[],[],[], lb2, [Inf Inf])
>> xc2 = [0 0]
>> fvalc2 = 0
[xs fvals] = fminsearch(fun, x0)
>> xs = [0 0]
>> fvals = 0
What goes wrong in the first fmincon call?
We can diagnose this using the output output argument as specified in the docs
[xc1, fvalc1, ~, output] = fmincon(fun, x0, [],[],[],[], lb1, [Inf Inf])
The value output.stepsize is the final step size taken in the iterative solving process. In this case:
output.stepsize
>> ans = 6.586e-4
The estimated minima was at x = [6.457e-4, 6.457e-4] and the lower bounds you've permitted are [0 0], so the solver is not permitted to take another step! Another step would give x = [-1.29e-5, -1.29e-5] which is outside of the boundaries.
When you allow the lower bounds to be [-1, -1] the solver can over-shoot the minimum and approach it from all directions.
Moreover, we can use the options input to get even better insight!
options.Display = 'iter';
[xc1, fvalc1, ~, output] = fmincon(fun, x0, [],[],[],[], lb1, [Inf Inf], [], options);
Printed to the command window we see this:
Your initial point x0 is not between bounds lb and ub; FMINCON
shifted x0 to strictly satisfy the bounds.
First-order Norm of
Iter F-count f(x) Feasibility optimality step
0 3 1.960200e+00 0.000e+00 9.900e-01
1 6 1.220345e-02 0.000e+00 8.437e-01 1.290e+00
2 9 4.489374e-02 0.000e+00 4.489e-02 1.014e-01
3 12 1.172900e-02 0.000e+00 1.173e-02 1.036e-01
4 15 3.453565e-03 0.000e+00 3.454e-03 4.953e-02
5 18 1.435780e-03 0.000e+00 1.436e-03 2.088e-02
6 21 4.659097e-04 0.000e+00 4.659e-04 1.631e-02
7 24 2.379407e-04 0.000e+00 2.379e-04 6.160e-03
8 27 6.048934e-05 0.000e+00 6.049e-05 7.648e-03
9 30 1.613884e-05 0.000e+00 1.614e-05 3.760e-03
10 33 5.096660e-06 0.000e+00 5.097e-06 1.760e-03
11 36 2.470360e-06 0.000e+00 2.470e-06 6.858e-04
12 39 8.337765e-07 0.000e+00 8.338e-07 6.586e-04
So your x0 is invalid! This is why the solver doesn't return the result with 1 iteration and lower bounds of [0 0].
fminsearch also works for the same reason - you've not imposed a lower bound on which the solution sits.

transform categorical predictors to numerical variable matlab

I am new to matlab.
I have a categorical input predictor(X) and the set of past results (Y, binary).
I would like to convert it to numeric variable in the following method.
For each category calculate the average of Y and replace the value with the average.
for example:
X Y X'
1 1 1
2 0 0
3 1 0.5
1 1 1
2 0 0
3 0 0.5
Please help.
you are looking for accumarray with mean function with Y as vals and X as subs
Xprime = accumarray( X, Y, [], #mean );
Xprime = Xptime( X );

Matlab calculate 3D similarity transformation. fitgeotrans for 3D

How can I calculate in MatLab similarity transformation between 4 points in 3D?
I can calculate transform matrix from
T*X = Xp,
but it will give me affine matrix due to small errors in points coordinates. How can I fit that matrix to similarity one? I need something like fitgeotrans, but in 3D
Thanks
If I am interpreting your question correctly, you seek to find all coefficients in a 3D transformation matrix that will best warp one point to another. All you really have to do is put this problem into a linear system and solve. Recall that warping one point to another in 3D is simply:
A*s = t
s = (x,y,z) is the source point, t = (x',y',z') is the target point and A would be the 3 x 3 transformation matrix that is formatted such that:
A = [a00 a01 a02]
[a10 a11 a12]
[a20 a21 a22]
Writing out the actual system of equations of A*s = t, we get:
a00*x + a01*y + a02*z = x'
a10*x + a11*y + a12*z = y'
a20*x + a21*y + a22*z = z'
The coefficients in A are what we need to solve for. Re-writing this in matrix form, we get:
[x y z 0 0 0 0 0 0] [a00] [x']
[0 0 0 x y z 0 0 0] * [a01] = [y']
[0 0 0 0 0 0 x y z] [a02] [z']
[a10]
[a11]
[a12]
[a20]
[a21]
[a22]
Given that you have four points, you would simply concatenate rows of the matrix on the left side and the vector on the right
[x1 y1 z1 0 0 0 0 0 0] [a00] [x1']
[0 0 0 x1 y1 z1 0 0 0] [a01] [y1']
[0 0 0 0 0 0 x1 y1 z1] [a02] [z1']
[x2 y2 z2 0 0 0 0 0 0] [a10] [x2']
[0 0 0 x2 y2 z2 0 0 0] [a11] [y2']
[0 0 0 0 0 0 x2 y2 z2] [a12] [z2']
[x3 y3 z3 0 0 0 0 0 0] * [a20] = [x3']
[0 0 0 x3 y3 z3 0 0 0] [a21] [y3']
[0 0 0 0 0 0 x3 y3 z3] [a22] [z3']
[x4 y4 z4 0 0 0 0 0 0] [x4']
[0 0 0 x4 y4 z4 0 0 0] [y4']
[0 0 0 0 0 0 x4 y4 z4] [z4']
S * a = T
S would now be a matrix that contains your four source points in the format shown above, a is now a vector of the transformation coefficients in the matrix you want to solve (ordered in row-major format), and T would be a vector of target points in the format shown above.
To solve for the parameters, you simply have to use the mldivide operator or \ in MATLAB, which will compute the least squares estimate for you. Therefore:
a = S^{-1} * T
As such, simply build your matrix like above, then use the \ operator to solve for your transformation parameters in your matrix. When you're done, reshape T into a 3 x 3 matrix. Therefore:
S = ... ; %// Enter in your source points here like above
T = ... ; %// Enter in your target points in a right hand side vector like above
a = S \ T;
similarity_matrix = reshape(a, 3, 3).';
With regards to your error in small perturbations of each of the co-ordinates, the more points you have the better. Using 4 will certainly give you a solution, but it isn't enough to mitigate any errors in my opinion.
Minor Note: This (more or less) is what fitgeotrans does under the hood. It computes the best homography given a bunch of source and target points, and determines this using least squares.
Hope this answered your question!
The answer by #rayryeng is correct, given that you have a set of up to 3 points in a 3-dimensional space. If you need to transform m points in n-dimensional space (m>n), then you first need to add m-n coordinates to these m points such that they exist in m-dimensional space (i.e. the a matrix in #rayryeng becomes a square matrix)... Then the procedure described by #rayryeng will give you the exact transformation of points, you then just need to select only the coordinates of the transformed points in the original n-dimensional space.
As an example, say you want to transform the points:
(2 -2 2) -> (-3 5 -4)
(2 3 0) -> (3 4 4)
(-4 -2 5) -> (-4 -1 -2)
(-3 4 1) -> (4 0 5)
(5 -4 0) -> (-3 -2 -3)
Notice that you have m=5 points which are n=3-dimensional. So you need to add coordinates to these points such that they are n=m=5-dimensional, and then apply the procedure described by #rayryeng.
I have implemented a function that does that (find it below). You just need to organize the points such that each of the source-points is a column in a matrix u, and each of the target points is a column in a matrix v. The matrices u and v are going to be, thus, 3 by 5 each.
WARNING:
the matrix A in the function may require A LOT of memory for moderately many points nP, because it has nP^4 elements.
To overcome this, for square matrices u and v, you can simply use T=v*inv(u) or T=v/u in MATLAB notation.
The code may run very slowly...
In MATLAB:
u = [2 2 -4 -3 5;-2 3 -2 4 -4;2 0 5 1 0]; % setting the set of source points
v = [-3 3 -4 4 -3;5 4 -1 0 -2;-4 4 -2 5 -3]; % setting the set of target points
T = findLinearTransformation(u,v); % calculating the transformation
You can verify that T is correct by:
I = eye(5);
uu = [u;I((3+1):5,1:5)]; % filling-up the matrix of source points so that you have 5-d points
w = T*uu; % calculating target points
w = w(1:3,1:5); % recovering the 3-d points
w - v % w should match v ... notice that the error between w and v is really small
The function that calculates the transformation matrix:
function [T,A] = findLinearTransformation(u,v)
% finds a matrix T (nP X nP) such that T * u(:,i) = v(:,i)
% u(:,i) and v(:,i) are n-dim col vectors; the amount of col vectors in u and v must match (and are equal to nP)
%
if any(size(u) ~= size(v))
error('findLinearTransform:u','u and v must be the same shape and size n-dim vectors');
end
[n,nP] = size(u); % n -> dimensionality; nP -> number of points to be transformed
if nP > n % if the number of points to be transform exceeds the dimensionality of points
I = eye(nP);
u = [u;I((n+1):nP,1:nP)]; % then fill up the points to be transformed with the identity matrix
v = [v;I((n+1):nP,1:nP)]; % as well as the transformed points
[n,nP] = size(u);
end
A = zeros(nP*n,n*n);
for k = 1:nP
for i = ((k-1)*n+1):(k*n)
A(i,mod((((i-1)*n+1):(i*n))-1,n*n) + 1) = u(:,k)';
end
end
v = v(:);
T = reshape(A\v, n, n).';
end

Create the following matrix by typing one command. Do not type all individual elements explicitly

I want to create the following matrix in MATLAB:
M= [ 0 0 1 10 20
0 0 3 8 26
0 0 5 6 32
0 0 0 0 0]
but I don't want to input all elements manually.
I tried M (1:3,3:5)=[x;y;z]
where
x is the linspace of 1 to 5
y is the linspace of 10 to 6
z is the linspace of 20 to 32
but it doesn't work (the last row of zeros is missing). How can I create M in a smart way?
I'm assuming that this is some programming assignment, because otherwise I don't see a reason why this needs to be done in a single line. I'm also assuming that concatenating a vector of zeros is unwanted. Having said that here are several suggestions:
Suppose that your vectors are defined like so:
x = 1:2:5;
y = 10:-2:6;
z = 20:6:32;
The "cleanest" way (under the assumption of no-zero-vector-concatenation) is probably:
M = subsasgn(zeros(4,5),substruct('()',{1:3,3:5}),[x',y',z']);
Alternatively, if using external functions, you can use the insertrows submission on FEX:
M = insertrows(insertrows([(x)',(y)',(z)'],0,4)',0,[0,0])';
With two commands, and assuming that M doesn't exist, you can do:
M(2:4,3:5)=([fliplr(x)',fliplr(y)',fliplr(z)']);
M = flipud(M);
If you are looking for one-liner that requires no other input(s) and creates the linspace values internally, here's one -
M(1:4,3:5)=[bsxfun(#plus,[1 10 20],bsxfun(#times,[2 -2 6],[0:2]'));zeros(1,3)]
Output -
M =
0 0 1 10 20
0 0 3 8 26
0 0 5 6 32
0 0 0 0 0
Strange question, but here you go, in one command. This assumes M is previously non-existent, and that x, y and z are defined as x = 1:2:5; y = 10:-2:6; z = 20:6:36;:
M(1:4,3:5) = [[x;y;z].'; zeros(1,3)];
You can of course avoid x, y and z by defining their values on the fly:
M(1:4,3:5) = [[1:2:5; 10:-2:6; 20:6:36].'; zeros(1,3)];

Linear Programming solvable by MATLAB

I want to solve linear Programming by MATLAB . For this purpose , I am following the following link . Linear Programming .
Here , a sample problem is given :
Find x that minimizes
f(x) = –5x1 – 4x2 –6x3,
subject to
x1 – x2 + x3 ≤ 20
3x1 + 2x2 + 4x3 ≤ 42
3x1 + 2x2 ≤ 30
0 ≤ x1, 0 ≤ x2, 0 ≤ x3.
First, enter the coefficients
f = [-5; -4; -6];
A = [1 -1 1
3 2 4
3 2 0];
b = [20; 42; 30];
lb = zeros(3,1);
Next, call a linear programming routine.
[x,fval,exitflag,output,lambda] = linprog(f,A,b,[],[],lb);
My question is that what is meant by this line ?
lb = zeros(3,1);
Without this line , all problems solvable by MATLAB is seen as infeasible . Can you help me in this purpose ?
This is not common to ALL linear problems. Here you deal with a problem where there are some constraints on the minimal values of the solution:
0 ≤ x1, 0 ≤ x2, 0 ≤ x3
You have to set up these constraints in the parameters of your problem. The way to do so is by specifying lower boundaries of the solution, which is the 5th argument.
Without this line, the domain on which you search for a solution is not bounded, and exitflag has the value -3 after calling the function, which is precisely the error code for unbounded problems.