Is this correct vector implementation of gradient descent for multiple theta values? - matlab

This is my matlab code to predict [1;1;1] given [1;0;1] :
m = 1;
alpha = .00001;
x = [1;0;1;0;0;0;0;0;0];
y = [1;1;1;0;0;0;0;0;0];
theta1 = [4.7300;3.2800;1.4600;0;0;0;4.7300;3.2800;1.4600];
theta1 = theta1 - (alpha/m .* (x .* theta1-y)' * x)'
theta1 = reshape(theta1(1:9) , 3 , 3)
sigmoid(theta1 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta2 = [8.892;6.167;2.745;8.892;6.167;2.745;8.892;6.167;2.745];
theta2 = theta2 - (alpha/m .* (x .* theta2-y)' * x)'
theta2 = reshape(theta2(1:9) , 3 , 3)
sigmoid(theta2 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta3 = [9.446;6.55;2.916;9.351;6.485;2.886;8.836;6.127;2.727];
theta3 = theta3 - (alpha/m .* (x .* theta3-y)' * x)'
theta3 = reshape(theta3(1:9) , 3 , 3)
sigmoid(theta3 * [1; 0; 1])
I'm computing theta1, theta2, theta3 individually but I think they
should be linked between each computation ?
Though gradient descent appears to be working as :
sigmoid(theta1 * [1; 0; 1]) =
0.9999
0.9986
0.9488
sigmoid(theta2 * [1; 0; 1]) =
1.0000
1.0000
0.9959
sigmoid(theta3 * [1; 0; 1]) =
1.0000
1.0000
0.9965
This shows for each theta value (layer in the network) the prediction is moving closer to [1;1;1]
Update : sigmoid function :
function g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
end
Update2 :
After extended discussion with user davidhigh who provided key insights have made following changes :
x = [1;0;1];
y = [1;1;1];
theta1 =
4.7300 3.2800 1.4600
0 0 0
4.7300 3.2800 1.4600
theta2 =
8.8920 8.8920 8.8920
6.1670 6.1670 6.1670
2.7450 2.7450 2.7450
theta3 =
9.4460 6.5500 2.9160
9.3510 6.4850 2.8860
8.8360 6.1270 2.7270
The crux of my issue is that I don't feed output of each layer into the next layer, once I made this change I get better result :
z1 = sigmoid(theta1 * x)
z1 =
0.9980
0.5000
0.9980
z2 = sigmoid(theta2 * z1)
z2 =
1.0000
1.0000
0.9989
z3 = sigmoid(theta3 * z2)
z3 =
1.0000
1.0000
1.0000
z3 is the predicted value which correctly is [1;1;1;] whereas previously it is approx [1;1;1;]

Related

how to find all the possible intersections between two vectors with such organization in matlab

I have a matrix y and a vector x, I need to find all the possible vectors resulted from the mapping of each value in x into each vector in y.
That is difficult to be understood; let's explain is with an example:
Here is an example, I have the vector x = [0.7 + 0.7i; 0.7-0.7i]; The matrix y = [1 0; 2 0; 1 2]; the resulted matrix is supposed to be like this one Z = [0.7 + 0.7i 0; 0.7-0.7i 0; 0 0.7 + 0.7i; 0 0.7-0.7i; 0.7 + 0.7i 0.7 + 0.7i; 0.7 + 0.7i 0.7-0.7i; 0.7 - 0.7i 0.7-0.7i ; 0.7 - 0.7i 0.7+0.7i]; . That is equivalent into Z = [x_1 0; x_2 0; 0 x_1; 0 x_2; x_1 x_1; x_1 x_2; x_2 x_2; x_2 x_1];. That means it map each value in x into the row of Z according to the index value in y.
Here is my try code:
clear all; clc;
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x=nchoosek(v,i);
m = zeros(size(x,1),G-i);
y =[y ; x m]; % creat the matrix y
end
x = [0.7 + 0.7i; 0.7-0.7i];
Z = []; s = zeros(G,1);
for k=1:size(x,1)
for i=1:size(y,1)
n=y(i,:);
n=n(n ~= 0);
s(n)=x(k);
Z=[Z s];
s = zeros(G,1);
end
end
The problem in my code that matrix Z show the inverse, it means it takes the input x_1 from x and then map it into all possible values in y. For example the matrix Z starts with [x_1 0; 0 x_1; x_1 x_1 ….], however that should be the inverse, which means takes each values in x and map it as shown in the above example [x_1 0; x_2 0; x_3 0 …..]. The second issue, when y contains more than non-zeros values, my code cannot get all possible vectors, it can only get [x_1 x_1; x_2 x_2]; but I cannot map the other possibilities which are [x_1 x_2; x_2 x_1] and so on.
How can I solve that issue?
UPDATE
Here is the updated question with clear description. I have the vector x and matrix y, I need to fill the matrix z following the index taken from each row in matrix y. For example, if the first row in matrix y is [1 0] or [0 1]; then I will take all possible values from x and put it in z following the number taken from the row in y which is 1 in this case. Then, the same case for row 2 in matrix y which is [2 0] or [0 2]; it means that second column in z will be filled with all possible values in x.
Then, the two columns in z can be filled which is equivalent into the case [1 2] in y, so it will take the first value from x and fill it with all other possible values from x, and so on. The rows in z should not be repeated.
The matrix Z is exactly as shown with below answer of AboAmmar below, but using the loop if with longer vector x and bigger matrix y will be little bit complicated.
As you describe it, there are 4 distinct cases for each row of y and the corresponding output:
[0 1] or [1 0] => [x 0]
[0 2] or [2 0] => [0 x]
[1 2] => [x1 x1; x1 x2; x2 x2; x2 x1]
[2 1] => [x1 x1; x2 x1; x2 x2; x1 x2]
These don't seem to follow any obvious rule. So, the easiest (but not smartest) solution is to use if-else and select the suitable case from the above. We don't have all the information about the possible indices, or if rows like [1 1] and [2 2] might happen, so the following solution is by no means exhaustive; surprising errors might happen if other inputs are fed into y matrix.
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x = nchoosek(v,i);
m = zeros(size(x,1),G-i);
y = [y ; x m]; % creat the matrix y
end
Z = [];
x = [0.7 + 0.7i; 0.7-0.7i]
for i = 1:size(y,1)
r = y(i,:);
if ismember(r, [1 0; 0 1], 'rows')
Z(end+1:end+2,:) = [x [0; 0]];
elseif ismember(r, [2 0; 0 2], 'rows')
Z(end+1:end+2,:) = [[0; 0] x];
elseif ismember(r, [1 2], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(1) x(2); x(2) x(2); x(2) x(1)];
elseif ismember(r, [2 1], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(2) x(1); x(2) x(2); x(1) x(2)];
end
end
Z =
0.7000 + 0.7000i 0.0000 + 0.0000i
0.7000 - 0.7000i 0.0000 + 0.0000i
0.0000 + 0.0000i 0.7000 + 0.7000i
0.0000 + 0.0000i 0.7000 - 0.7000i
0.7000 + 0.7000i 0.7000 + 0.7000i
0.7000 + 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 + 0.7000i
Your code is valid if you have fix length in y, for example if each vector in y has one value and others are zeros, or two non-zeros values ...etc.
So you can do your code for each length separately and then build the matrix Z by combining all other matrices.

How to find transformation matrix from the output with Gaussian noise?

For the below given input and output, the matrix A can be found out by pseudoinverse or mrdivision in MATLAB. Similarly, I would now like to know, how to determine A, if my output signal Y matrix contains additive zero mean, uncorrelated, Gaussian noise?
x1 = [1 1 1]';
x2 = [0 1 1]';
x3 = [0 0 1]';
x4 = [1 0 1]';
y1 = [1 2 0]';
y2 = [-1 0 3]';
y3 = [3 1 1]';
y4 = [5 3 -2]';
X = [x1 x2 x3 x4];
Y = [y1 y2 y3 y4];
A = Y/X
Also, I have modelled the unknown noisy output as below:
y1_n = y1 + sqrt(var(y1))*randn(size(y1));
y2_n = y2 + sqrt(var(y2))*randn(size(y2));
y3_n = y3 + sqrt(var(y3))*randn(size(y3));
y4_n = y4 + sqrt(var(y4))*randn(size(y4));
Y = [y1_n y2_n y3_n y4_n];
The statement A = Y/X solves the linear system of equations A*X = Y. If the system is overdetermined, as in your case, the solution given is the least squares solution. Thus, if you have additive, zero mean, uncorrelated, Gaussian noise on Y, then A = Y/X will give you the best possible, unbiased, estimate of A.
Note that the noise you add to your Y matrix is quite large, hence the estimate of A is far away from the ideal. If you add less noise, the estimate will be closer:
x1 = [1 1 1]';
x2 = [0 1 1]';
x3 = [0 0 1]';
x4 = [1 0 1]';
X = [x1 x2 x3 x4];
y1 = [1 2 0]';
y2 = [-1 0 3]';
y3 = [3 1 1]';
y4 = [5 3 -2]';
Y = [y1 y2 y3 y4];
for n = [1,0.1,0.01,0]
Y_n = Y + n*randn(size(Y));
A = Y_n/X;
fprintf('n = %f, A = \n',n)
disp(A)
end
Output:
n = 1.000000, A =
2.9728 -5.5407 2.8011
2.6563 -1.3166 0.6596
-3.3366 1.1349 1.5342
n = 0.100000, A =
2.0011 -4.0256 2.9402
1.9223 -1.0029 1.0921
-3.1383 1.9874 1.0913
n = 0.010000, A =
1.9903 -3.9912 2.9987
1.9941 -1.0001 1.0108
-3.0015 2.0001 1.0032
n = 0.000000, A =
2.0000 -4.0000 3.0000
2.0000 -1.0000 1.0000
-3.0000 2.0000 1.0000
Of course if you make X and Y larger by adding more vectors you'll get a better estimate too, and will be able to compensate more noisy data.

Creating a matrix of 2D cosines waves with coefficients and variable number of entries

After I posted this question yesterday, I realized that I want to create similar matrices of different n x n dimensions with each entry of the form
a * cos(j * x + k * y)
where a is a vector of coefficients; and j, x, k and y are indexes from 0 to n - 1.
If, for instance, n = 4,
>> n = 4;
>> x = 0:(n-1);
>> y = 0:(n-1);
>> [x,y] = meshgrid(x,y)
x =
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3
y =
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
The resultant matrix would have 16 entries which could be computed by the function:
f = #(x, y,a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3)...
a0*cos(0*x + 0*y) + a1*cos(0*x + 1*y) +...
a2*cos(0*x + 2*y) + a3*cos(0*x + 3*y) + ...
b0*cos(1*x + 0*y) + b1*cos(1*x + 1*y) + ...
b2*cos(1*x + 2*y) + b3*cos(1*x + 3*y) + ...
c0*cos(2*x + 1*y) + c1*cos(2*x + 1*y) + ...
c2*cos(2*x + 2*y) + c3*cos(2*x + 3*y) + ...
d0*cos(3*x + 1*y) + d1*cos(3*x + 1*y) + ...
d2*cos(3*x + 2*y) + d3*cos(3*x + 3*y)
Of course, aside from the need to furnish the coefficients in front of the cosines, typing all these cosine expressions is not doable if I want to generate a 256 x 256 matrix, for example...
I played with for-loops but I didn't get what I am after, getting error regarding the number of independent indexing loops within a function.
EDIT: I edited my initial answer, adding the idea given in Guille's comment. (Haven't seen that in first place...) Please, see the updated code.
Smee again. You can combine anonymous functions / function handles like this:
f = #(x) sin(x);
g = #(x) cos(x);
h = #(x) f(x) + g(x);
Nevertheless, I guess, it's necessary to encapsulate the setup of your function (handle) f into some "real" MATLAB function, see the following code:
function f = setupF(n, a)
% Possibly, add some checks, e.g. for numel(a) == n^2, and so on.
% Initialize function handle.
f = #(x, y) 0;
ind = 0;
% Iteratively add cosine parts.
for ii = 0:(n-1)
for jj = 0:(n-1)
ind = ind + 1;
g = #(x, y) a(ind) * cos(ii * x + jj * y);
f = #(x, y) f(x, y) + g(x, y);
end
end
end
Here comes a test script:
% Set up parameters.
n = 3;
a = reshape(1:n^2, n, n);
% Set up f(x, y) by function.
f = setupF(n, a);
% Set up f explicitly, as g(x, y).
g = #(x, y) ...
a(1) * cos(0*x + 0*y) + ...
a(2) * cos(0*x + 1*y) + ...
a(3) * cos(0*x + 2*y) + ...
a(4) * cos(1*x + 0*y) + ...
a(5) * cos(1*x + 1*y) + ...
a(6) * cos(1*x + 2*y) + ...
a(7) * cos(2*x + 0*y) + ...
a(8) * cos(2*x + 1*y) + ...
a(9) * cos(2*x + 2*y);
% Set up f(x, y) by vectorization, as h(x, y).
I = 0:(n-1);
J = 0:(n-1);
[I, J] = meshgrid(I, J);
h = #(x, y, n, a) sum(reshape(a .* cos(x * I + y * J), n^2, 1));
h = #(x, y, n, a) arrayfun(#(x, y) h(x, y, n, a), x, y);
% Set up test data.
x = linspace(0, 2*pi, 5);
y = linspace(0, 2*pi, 5);
[X, Y] = meshgrid(x, y);
% Compare outputs.
fRet = f(X, Y)
gRet = g(X, Y)
hRet = h(X, Y, n, a)
And, the output:
fRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
gRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
hRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
And, of course, the "vectorization" approach wins in terms of performance:

writing optimization constraints in MATLAB involving function calls

I am trying to solve following optimization problem:
I am using calculating θ1, θ2 and θ3 for every value of phi discretized between 30 deg. to 150 deg.
function thetas = inverse_kinematics_1(l1,l2,l3,phi)
x = 100;
y = 0;
x1 = x - (l3*cos(phi));
y1 = y - (l3*sin(phi));
a = sqrt(x1^2 + y1^2);
y2 = -y1/a;
x2 = -x1/a;
gamma = atan2(y2,x2);
c = (- x1^2 - y1^2 - l1^2 + l2^2)/(2*l1*a);
d = acos(c);
theta1 = gamma + d;
if theta1 < 0
theta1 = theta1 + 2*pi;
end
e = (y1 - l1*sin(theta1))/l2;
f = (x1 - l1*cos(theta1))/l2;
theta2 = atan2(e,f) - theta1;
if theta2 < 0
theta2 = theta2 + 2*pi;
end
theta3 = (phi)- (theta1 + theta2);
if theta3 < 0
theta3 = theta3 + 2*pi;
end
thetas = [theta1,theta2,theta3].*180/pi;
end
How can I write the constraints in this situation ?

XOR with Neural Networks (Matlab)

So, I'm hoping this is a real dumb thing I'm doing, and there's an easy answer. I'm trying to train a 2x3x1 neural network to do the XOR problem. It wasn't working, so I decided to dig in to see what was happening. Finally, I decided to assign the weights my self. This was the weight vector I came up with:
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
(In Matlab notation). I deliberately tried to make no two weights be the same (barring the zeros)
And, my code, really simple in matlab is
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
end
end
I believe that's right. I tested various parameters (of the thetas) with the finite differences method to see if they were right, and they seemed to be.
But, when I run it, it eventually just all boils down to returning all zeros. If I do xornn(1) (for 1 iteration) I get
0.0027 0.9966 0.9904 0.0008
But, if I do xornn(35)
0.0026 0.9949 0.9572 0.0007
(It's started a descent in the wrong direction) and by the time I get to xornn(45) I get
0.0018 0.0975 0.0000 0.0003
If I run it for 10,000 iterations, it just returns all 0's.
What is going on? Must I add regularization? I would have thought such a simple network wouldn't need it. But, regardless, why does it move away from an obvious good solution that I have hand fed it?
Thanks!
AAARRGGHHH! The solution was simply a matter of changing
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
to
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
sigh
Now tho, I need to figure out how I'm computing the negative derivative somehow when what I thought I was computing was the ... Never mind. I'll post here anyway, just in case it helps someone else.
So, z = is the sum of inputs to the sigmoid, and y is the output of the sigmoid.
C = -(T * Log[y] + (1-T) * Log[(1-y))
dC/dy = -((T/y) - (1-T)/(1-y))
= -((T(1-y)-y(1-T))/(y(1-y)))
= -((T-Ty-y+Ty)/(y(1-y)))
= -((T-y)/(y(1-y)))
= ((y-T)/(y(1-y))) # This is the source of all my woes.
dy/dz = y(1-y)
dC/dz = ((y-T)/(y(1-y))) * y(1-y)
= (y-T)
So, the problem, is that I accidentally was computing T-y, because I forgot about the negative sign in front of the cost function. Then, I was subtracting what I thought was the gradient, but was in fact the negative gradient. And, there. That's the problem.
Once I did that:
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
end
end
xornn(50) returns 0.0028 0.9972 0.9948 0.0009 and
xornn(10000) returns 0.0016 0.9989 0.9993 0.0005
Phew! Maybe this will help someone else in debugging their version..