Is this correct vector implementation of gradient descent for multiple theta values?

Is this correct vector implementation of gradient descent for multiple theta values? - matlab

This is my matlab code to predict [1;1;1] given [1;0;1] :
m = 1;
alpha = .00001;
x = [1;0;1;0;0;0;0;0;0];
y = [1;1;1;0;0;0;0;0;0];
theta1 = [4.7300;3.2800;1.4600;0;0;0;4.7300;3.2800;1.4600];
theta1 = theta1 - (alpha/m .* (x .* theta1-y)' * x)'
theta1 = reshape(theta1(1:9) , 3 , 3)
sigmoid(theta1 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta2 = [8.892;6.167;2.745;8.892;6.167;2.745;8.892;6.167;2.745];
theta2 = theta2 - (alpha/m .* (x .* theta2-y)' * x)'
theta2 = reshape(theta2(1:9) , 3 , 3)
sigmoid(theta2 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta3 = [9.446;6.55;2.916;9.351;6.485;2.886;8.836;6.127;2.727];
theta3 = theta3 - (alpha/m .* (x .* theta3-y)' * x)'
theta3 = reshape(theta3(1:9) , 3 , 3)
sigmoid(theta3 * [1; 0; 1])
I'm computing theta1, theta2, theta3 individually but I think they
should be linked between each computation ?
Though gradient descent appears to be working as :
sigmoid(theta1 * [1; 0; 1]) =
0.9999
0.9986
0.9488
sigmoid(theta2 * [1; 0; 1]) =
1.0000
1.0000
0.9959
sigmoid(theta3 * [1; 0; 1]) =
1.0000
1.0000
0.9965
This shows for each theta value (layer in the network) the prediction is moving closer to [1;1;1]
Update : sigmoid function :
function g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
end
Update2 :
After extended discussion with user davidhigh who provided key insights have made following changes :
x = [1;0;1];
y = [1;1;1];
theta1 =
4.7300 3.2800 1.4600
0 0 0
4.7300 3.2800 1.4600
theta2 =
8.8920 8.8920 8.8920
6.1670 6.1670 6.1670
2.7450 2.7450 2.7450
theta3 =
9.4460 6.5500 2.9160
9.3510 6.4850 2.8860
8.8360 6.1270 2.7270
The crux of my issue is that I don't feed output of each layer into the next layer, once I made this change I get better result :
z1 = sigmoid(theta1 * x)
z1 =
0.9980
0.5000
0.9980
z2 = sigmoid(theta2 * z1)
z2 =
1.0000
1.0000
0.9989
z3 = sigmoid(theta3 * z2)
z3 =
1.0000
1.0000
1.0000
z3 is the predicted value which correctly is [1;1;1;] whereas previously it is approx [1;1;1;]

Related

how to find all the possible intersections between two vectors with such organization in matlab

I have a matrix y and a vector x, I need to find all the possible vectors resulted from the mapping of each value in x into each vector in y.
That is difficult to be understood; let's explain is with an example:
Here is an example, I have the vector x = [0.7 + 0.7i; 0.7-0.7i]; The matrix y = [1 0; 2 0; 1 2]; the resulted matrix is supposed to be like this one Z = [0.7 + 0.7i 0; 0.7-0.7i 0; 0 0.7 + 0.7i; 0 0.7-0.7i; 0.7 + 0.7i 0.7 + 0.7i; 0.7 + 0.7i 0.7-0.7i; 0.7 - 0.7i 0.7-0.7i ; 0.7 - 0.7i 0.7+0.7i]; . That is equivalent into Z = [x_1 0; x_2 0; 0 x_1; 0 x_2; x_1 x_1; x_1 x_2; x_2 x_2; x_2 x_1];. That means it map each value in x into the row of Z according to the index value in y.
Here is my try code:
clear all; clc;
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x=nchoosek(v,i);
m = zeros(size(x,1),G-i);
y =[y ; x m]; % creat the matrix y
end
x = [0.7 + 0.7i; 0.7-0.7i];
Z = []; s = zeros(G,1);
for k=1:size(x,1)
for i=1:size(y,1)
n=y(i,:);
n=n(n ~= 0);
s(n)=x(k);
Z=[Z s];
s = zeros(G,1);
end
end
The problem in my code that matrix Z show the inverse, it means it takes the input x_1 from x and then map it into all possible values in y. For example the matrix Z starts with [x_1 0; 0 x_1; x_1 x_1 ….], however that should be the inverse, which means takes each values in x and map it as shown in the above example [x_1 0; x_2 0; x_3 0 …..]. The second issue, when y contains more than non-zeros values, my code cannot get all possible vectors, it can only get [x_1 x_1; x_2 x_2]; but I cannot map the other possibilities which are [x_1 x_2; x_2 x_1] and so on.
How can I solve that issue?
UPDATE
Here is the updated question with clear description. I have the vector x and matrix y, I need to fill the matrix z following the index taken from each row in matrix y. For example, if the first row in matrix y is [1 0] or [0 1]; then I will take all possible values from x and put it in z following the number taken from the row in y which is 1 in this case. Then, the same case for row 2 in matrix y which is [2 0] or [0 2]; it means that second column in z will be filled with all possible values in x.
Then, the two columns in z can be filled which is equivalent into the case [1 2] in y, so it will take the first value from x and fill it with all other possible values from x, and so on. The rows in z should not be repeated.
The matrix Z is exactly as shown with below answer of AboAmmar below, but using the loop if with longer vector x and bigger matrix y will be little bit complicated.

As you describe it, there are 4 distinct cases for each row of y and the corresponding output:
[0 1] or [1 0] => [x 0]
[0 2] or [2 0] => [0 x]
[1 2] => [x1 x1; x1 x2; x2 x2; x2 x1]
[2 1] => [x1 x1; x2 x1; x2 x2; x1 x2]
These don't seem to follow any obvious rule. So, the easiest (but not smartest) solution is to use if-else and select the suitable case from the above. We don't have all the information about the possible indices, or if rows like [1 1] and [2 2] might happen, so the following solution is by no means exhaustive; surprising errors might happen if other inputs are fed into y matrix.
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x = nchoosek(v,i);
m = zeros(size(x,1),G-i);
y = [y ; x m]; % creat the matrix y
end
Z = [];
x = [0.7 + 0.7i; 0.7-0.7i]
for i = 1:size(y,1)
r = y(i,:);
if ismember(r, [1 0; 0 1], 'rows')
Z(end+1:end+2,:) = [x [0; 0]];
elseif ismember(r, [2 0; 0 2], 'rows')
Z(end+1:end+2,:) = [[0; 0] x];
elseif ismember(r, [1 2], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(1) x(2); x(2) x(2); x(2) x(1)];
elseif ismember(r, [2 1], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(2) x(1); x(2) x(2); x(1) x(2)];
end
end
Z =
0.7000 + 0.7000i 0.0000 + 0.0000i
0.7000 - 0.7000i 0.0000 + 0.0000i
0.0000 + 0.0000i 0.7000 + 0.7000i
0.0000 + 0.0000i 0.7000 - 0.7000i
0.7000 + 0.7000i 0.7000 + 0.7000i
0.7000 + 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 + 0.7000i

Your code is valid if you have fix length in y, for example if each vector in y has one value and others are zeros, or two non-zeros values ...etc.
So you can do your code for each length separately and then build the matrix Z by combining all other matrices.

How to find transformation matrix from the output with Gaussian noise?

For the below given input and output, the matrix A can be found out by pseudoinverse or mrdivision in MATLAB. Similarly, I would now like to know, how to determine A, if my output signal Y matrix contains additive zero mean, uncorrelated, Gaussian noise?
x1 = [1 1 1]';
x2 = [0 1 1]';
x3 = [0 0 1]';
x4 = [1 0 1]';
y1 = [1 2 0]';
y2 = [-1 0 3]';
y3 = [3 1 1]';
y4 = [5 3 -2]';
X = [x1 x2 x3 x4];
Y = [y1 y2 y3 y4];
A = Y/X
Also, I have modelled the unknown noisy output as below:
y1_n = y1 + sqrt(var(y1))*randn(size(y1));
y2_n = y2 + sqrt(var(y2))*randn(size(y2));
y3_n = y3 + sqrt(var(y3))*randn(size(y3));
y4_n = y4 + sqrt(var(y4))*randn(size(y4));
Y = [y1_n y2_n y3_n y4_n];

The statement A = Y/X solves the linear system of equations A*X = Y. If the system is overdetermined, as in your case, the solution given is the least squares solution. Thus, if you have additive, zero mean, uncorrelated, Gaussian noise on Y, then A = Y/X will give you the best possible, unbiased, estimate of A.
Note that the noise you add to your Y matrix is quite large, hence the estimate of A is far away from the ideal. If you add less noise, the estimate will be closer:
x1 = [1 1 1]';
x2 = [0 1 1]';
x3 = [0 0 1]';
x4 = [1 0 1]';
X = [x1 x2 x3 x4];
y1 = [1 2 0]';
y2 = [-1 0 3]';
y3 = [3 1 1]';
y4 = [5 3 -2]';
Y = [y1 y2 y3 y4];
for n = [1,0.1,0.01,0]
Y_n = Y + n*randn(size(Y));
A = Y_n/X;
fprintf('n = %f, A = \n',n)
disp(A)
end
Output:
n = 1.000000, A =
2.9728 -5.5407 2.8011
2.6563 -1.3166 0.6596
-3.3366 1.1349 1.5342
n = 0.100000, A =
2.0011 -4.0256 2.9402
1.9223 -1.0029 1.0921
-3.1383 1.9874 1.0913
n = 0.010000, A =
1.9903 -3.9912 2.9987
1.9941 -1.0001 1.0108
-3.0015 2.0001 1.0032
n = 0.000000, A =
2.0000 -4.0000 3.0000
2.0000 -1.0000 1.0000
-3.0000 2.0000 1.0000
Of course if you make X and Y larger by adding more vectors you'll get a better estimate too, and will be able to compensate more noisy data.

Creating a matrix of 2D cosines waves with coefficients and variable number of entries

After I posted this question yesterday, I realized that I want to create similar matrices of different n x n dimensions with each entry of the form
a * cos(j * x + k * y)
where a is a vector of coefficients; and j, x, k and y are indexes from 0 to n - 1.
If, for instance, n = 4,
>> n = 4;
>> x = 0:(n-1);
>> y = 0:(n-1);
>> [x,y] = meshgrid(x,y)
x =
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3
y =
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
The resultant matrix would have 16 entries which could be computed by the function:
f = #(x, y,a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3)...
a0*cos(0*x + 0*y) + a1*cos(0*x + 1*y) +...
a2*cos(0*x + 2*y) + a3*cos(0*x + 3*y) + ...
b0*cos(1*x + 0*y) + b1*cos(1*x + 1*y) + ...
b2*cos(1*x + 2*y) + b3*cos(1*x + 3*y) + ...
c0*cos(2*x + 1*y) + c1*cos(2*x + 1*y) + ...
c2*cos(2*x + 2*y) + c3*cos(2*x + 3*y) + ...
d0*cos(3*x + 1*y) + d1*cos(3*x + 1*y) + ...
d2*cos(3*x + 2*y) + d3*cos(3*x + 3*y)
Of course, aside from the need to furnish the coefficients in front of the cosines, typing all these cosine expressions is not doable if I want to generate a 256 x 256 matrix, for example...
I played with for-loops but I didn't get what I am after, getting error regarding the number of independent indexing loops within a function.

EDIT: I edited my initial answer, adding the idea given in Guille's comment. (Haven't seen that in first place...) Please, see the updated code.
Smee again. You can combine anonymous functions / function handles like this:
f = #(x) sin(x);
g = #(x) cos(x);
h = #(x) f(x) + g(x);
Nevertheless, I guess, it's necessary to encapsulate the setup of your function (handle) f into some "real" MATLAB function, see the following code:
function f = setupF(n, a)
% Possibly, add some checks, e.g. for numel(a) == n^2, and so on.
% Initialize function handle.
f = #(x, y) 0;
ind = 0;
% Iteratively add cosine parts.
for ii = 0:(n-1)
for jj = 0:(n-1)
ind = ind + 1;
g = #(x, y) a(ind) * cos(ii * x + jj * y);
f = #(x, y) f(x, y) + g(x, y);
end
end
end
Here comes a test script:
% Set up parameters.
n = 3;
a = reshape(1:n^2, n, n);
% Set up f(x, y) by function.
f = setupF(n, a);
% Set up f explicitly, as g(x, y).
g = #(x, y) ...
a(1) * cos(0*x + 0*y) + ...
a(2) * cos(0*x + 1*y) + ...
a(3) * cos(0*x + 2*y) + ...
a(4) * cos(1*x + 0*y) + ...
a(5) * cos(1*x + 1*y) + ...
a(6) * cos(1*x + 2*y) + ...
a(7) * cos(2*x + 0*y) + ...
a(8) * cos(2*x + 1*y) + ...
a(9) * cos(2*x + 2*y);
% Set up f(x, y) by vectorization, as h(x, y).
I = 0:(n-1);
J = 0:(n-1);
[I, J] = meshgrid(I, J);
h = #(x, y, n, a) sum(reshape(a .* cos(x * I + y * J), n^2, 1));
h = #(x, y, n, a) arrayfun(#(x, y) h(x, y, n, a), x, y);
% Set up test data.
x = linspace(0, 2*pi, 5);
y = linspace(0, 2*pi, 5);
[X, Y] = meshgrid(x, y);
% Compare outputs.
fRet = f(X, Y)
gRet = g(X, Y)
hRet = h(X, Y, n, a)
And, the output:
fRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
gRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
hRet =
45.0000 -18.0000 15.0000 -18.0000 45.0000
-6.0000 -5.0000 -2.0000 5.0000 -6.0000
15.0000 -6.0000 5.0000 -6.0000 15.0000
-6.0000 5.0000 -2.0000 -5.0000 -6.0000
45.0000 -18.0000 15.0000 -18.0000 45.0000
And, of course, the "vectorization" approach wins in terms of performance:

writing optimization constraints in MATLAB involving function calls

I am trying to solve following optimization problem:
I am using calculating θ1, θ2 and θ3 for every value of phi discretized between 30 deg. to 150 deg.
function thetas = inverse_kinematics_1(l1,l2,l3,phi)
x = 100;
y = 0;
x1 = x - (l3*cos(phi));
y1 = y - (l3*sin(phi));
a = sqrt(x1^2 + y1^2);
y2 = -y1/a;
x2 = -x1/a;
gamma = atan2(y2,x2);
c = (- x1^2 - y1^2 - l1^2 + l2^2)/(2*l1*a);
d = acos(c);
theta1 = gamma + d;
if theta1 < 0
theta1 = theta1 + 2*pi;
end
e = (y1 - l1*sin(theta1))/l2;
f = (x1 - l1*cos(theta1))/l2;
theta2 = atan2(e,f) - theta1;
if theta2 < 0
theta2 = theta2 + 2*pi;
end
theta3 = (phi)- (theta1 + theta2);
if theta3 < 0
theta3 = theta3 + 2*pi;
end
thetas = [theta1,theta2,theta3].*180/pi;
end
How can I write the constraints in this situation ?

XOR with Neural Networks (Matlab)

So, I'm hoping this is a real dumb thing I'm doing, and there's an easy answer. I'm trying to train a 2x3x1 neural network to do the XOR problem. It wasn't working, so I decided to dig in to see what was happening. Finally, I decided to assign the weights my self. This was the weight vector I came up with:
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
(In Matlab notation). I deliberately tried to make no two weights be the same (barring the zeros)
And, my code, really simple in matlab is
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
end
end
I believe that's right. I tested various parameters (of the thetas) with the finite differences method to see if they were right, and they seemed to be.
But, when I run it, it eventually just all boils down to returning all zeros. If I do xornn(1) (for 1 iteration) I get
0.0027 0.9966 0.9904 0.0008
But, if I do xornn(35)
0.0026 0.9949 0.9572 0.0007
(It's started a descent in the wrong direction) and by the time I get to xornn(45) I get
0.0018 0.0975 0.0000 0.0003
If I run it for 10,000 iterations, it just returns all 0's.
What is going on? Must I add regularization? I would have thought such a simple network wouldn't need it. But, regardless, why does it move away from an obvious good solution that I have hand fed it?
Thanks!

AAARRGGHHH! The solution was simply a matter of changing
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
to
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
sigh
Now tho, I need to figure out how I'm computing the negative derivative somehow when what I thought I was computing was the ... Never mind. I'll post here anyway, just in case it helps someone else.
So, z = is the sum of inputs to the sigmoid, and y is the output of the sigmoid.
C = -(T * Log[y] + (1-T) * Log[(1-y))
dC/dy = -((T/y) - (1-T)/(1-y))
= -((T(1-y)-y(1-T))/(y(1-y)))
= -((T-Ty-y+Ty)/(y(1-y)))
= -((T-y)/(y(1-y)))
= ((y-T)/(y(1-y))) # This is the source of all my woes.
dy/dz = y(1-y)
dC/dz = ((y-T)/(y(1-y))) * y(1-y)
= (y-T)
So, the problem, is that I accidentally was computing T-y, because I forgot about the negative sign in front of the cost function. Then, I was subtracting what I thought was the gradient, but was in fact the negative gradient. And, there. That's the problem.
Once I did that:
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
end
end
xornn(50) returns 0.0028 0.9972 0.9948 0.0009 and
xornn(10000) returns 0.0016 0.9989 0.9993 0.0005
Phew! Maybe this will help someone else in debugging their version..

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is this correct vector implementation of gradient descent for multiple theta values? - matlab

Related

how to find all the possible intersections between two vectors with such organization in matlab

How to find transformation matrix from the output with Gaussian noise?

Creating a matrix of 2D cosines waves with coefficients and variable number of entries

writing optimization constraints in MATLAB involving function calls

XOR with Neural Networks (Matlab)

Categories

Resources