XOR with Neural Networks (Matlab) - matlab

So, I'm hoping this is a real dumb thing I'm doing, and there's an easy answer. I'm trying to train a 2x3x1 neural network to do the XOR problem. It wasn't working, so I decided to dig in to see what was happening. Finally, I decided to assign the weights my self. This was the weight vector I came up with:
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
(In Matlab notation). I deliberately tried to make no two weights be the same (barring the zeros)
And, my code, really simple in matlab is
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
end
end
I believe that's right. I tested various parameters (of the thetas) with the finite differences method to see if they were right, and they seemed to be.
But, when I run it, it eventually just all boils down to returning all zeros. If I do xornn(1) (for 1 iteration) I get
0.0027 0.9966 0.9904 0.0008
But, if I do xornn(35)
0.0026 0.9949 0.9572 0.0007
(It's started a descent in the wrong direction) and by the time I get to xornn(45) I get
0.0018 0.0975 0.0000 0.0003
If I run it for 10,000 iterations, it just returns all 0's.
What is going on? Must I add regularization? I would have thought such a simple network wouldn't need it. But, regardless, why does it move away from an obvious good solution that I have hand fed it?
Thanks!

AAARRGGHHH! The solution was simply a matter of changing
theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;
to
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
sigh
Now tho, I need to figure out how I'm computing the negative derivative somehow when what I thought I was computing was the ... Never mind. I'll post here anyway, just in case it helps someone else.
So, z = is the sum of inputs to the sigmoid, and y is the output of the sigmoid.
C = -(T * Log[y] + (1-T) * Log[(1-y))
dC/dy = -((T/y) - (1-T)/(1-y))
= -((T(1-y)-y(1-T))/(y(1-y)))
= -((T-Ty-y+Ty)/(y(1-y)))
= -((T-y)/(y(1-y)))
= ((y-T)/(y(1-y))) # This is the source of all my woes.
dy/dz = y(1-y)
dC/dz = ((y-T)/(y(1-y))) * y(1-y)
= (y-T)
So, the problem, is that I accidentally was computing T-y, because I forgot about the negative sign in front of the cost function. Then, I was subtracting what I thought was the gradient, but was in fact the negative gradient. And, there. That's the problem.
Once I did that:
function layer2 = xornn(iters)
if nargin < 1
iters = 50
end
function s = sigmoid(X)
s = 1.0 ./ (1.0 + exp(-X));
end
T = [0 1 1 0];
X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];
for i = [1:iters]
layer1 = [sigmoid(theta1 * X); 1 1 1 1];
layer2 = sigmoid(theta2 * layer1)
delta2 = T - layer2;
delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
% remove the bias from delta 1. There's no real point in a delta on the bias.
delta1 = delta1(1:3,:);
theta2d = delta2 * layer1';
theta1d = delta1 * X';
theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;
end
end
xornn(50) returns 0.0028 0.9972 0.9948 0.0009 and
xornn(10000) returns 0.0016 0.9989 0.9993 0.0005
Phew! Maybe this will help someone else in debugging their version..

Related

how to find all the possible intersections between two vectors with such organization in matlab

I have a matrix y and a vector x, I need to find all the possible vectors resulted from the mapping of each value in x into each vector in y.
That is difficult to be understood; let's explain is with an example:
Here is an example, I have the vector x = [0.7 + 0.7i; 0.7-0.7i]; The matrix y = [1 0; 2 0; 1 2]; the resulted matrix is supposed to be like this one Z = [0.7 + 0.7i 0; 0.7-0.7i 0; 0 0.7 + 0.7i; 0 0.7-0.7i; 0.7 + 0.7i 0.7 + 0.7i; 0.7 + 0.7i 0.7-0.7i; 0.7 - 0.7i 0.7-0.7i ; 0.7 - 0.7i 0.7+0.7i]; . That is equivalent into Z = [x_1 0; x_2 0; 0 x_1; 0 x_2; x_1 x_1; x_1 x_2; x_2 x_2; x_2 x_1];. That means it map each value in x into the row of Z according to the index value in y.
Here is my try code:
clear all; clc;
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x=nchoosek(v,i);
m = zeros(size(x,1),G-i);
y =[y ; x m]; % creat the matrix y
end
x = [0.7 + 0.7i; 0.7-0.7i];
Z = []; s = zeros(G,1);
for k=1:size(x,1)
for i=1:size(y,1)
n=y(i,:);
n=n(n ~= 0);
s(n)=x(k);
Z=[Z s];
s = zeros(G,1);
end
end
The problem in my code that matrix Z show the inverse, it means it takes the input x_1 from x and then map it into all possible values in y. For example the matrix Z starts with [x_1 0; 0 x_1; x_1 x_1 ….], however that should be the inverse, which means takes each values in x and map it as shown in the above example [x_1 0; x_2 0; x_3 0 …..]. The second issue, when y contains more than non-zeros values, my code cannot get all possible vectors, it can only get [x_1 x_1; x_2 x_2]; but I cannot map the other possibilities which are [x_1 x_2; x_2 x_1] and so on.
How can I solve that issue?
UPDATE
Here is the updated question with clear description. I have the vector x and matrix y, I need to fill the matrix z following the index taken from each row in matrix y. For example, if the first row in matrix y is [1 0] or [0 1]; then I will take all possible values from x and put it in z following the number taken from the row in y which is 1 in this case. Then, the same case for row 2 in matrix y which is [2 0] or [0 2]; it means that second column in z will be filled with all possible values in x.
Then, the two columns in z can be filled which is equivalent into the case [1 2] in y, so it will take the first value from x and fill it with all other possible values from x, and so on. The rows in z should not be repeated.
The matrix Z is exactly as shown with below answer of AboAmmar below, but using the loop if with longer vector x and bigger matrix y will be little bit complicated.
As you describe it, there are 4 distinct cases for each row of y and the corresponding output:
[0 1] or [1 0] => [x 0]
[0 2] or [2 0] => [0 x]
[1 2] => [x1 x1; x1 x2; x2 x2; x2 x1]
[2 1] => [x1 x1; x2 x1; x2 x2; x1 x2]
These don't seem to follow any obvious rule. So, the easiest (but not smartest) solution is to use if-else and select the suitable case from the above. We don't have all the information about the possible indices, or if rows like [1 1] and [2 2] might happen, so the following solution is by no means exhaustive; surprising errors might happen if other inputs are fed into y matrix.
y = [];
G = 2;
v = 1 : G;
for i = 1: G
x = nchoosek(v,i);
m = zeros(size(x,1),G-i);
y = [y ; x m]; % creat the matrix y
end
Z = [];
x = [0.7 + 0.7i; 0.7-0.7i]
for i = 1:size(y,1)
r = y(i,:);
if ismember(r, [1 0; 0 1], 'rows')
Z(end+1:end+2,:) = [x [0; 0]];
elseif ismember(r, [2 0; 0 2], 'rows')
Z(end+1:end+2,:) = [[0; 0] x];
elseif ismember(r, [1 2], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(1) x(2); x(2) x(2); x(2) x(1)];
elseif ismember(r, [2 1], 'rows')
Z(end+1:end+4,:) = [x(1) x(1); x(2) x(1); x(2) x(2); x(1) x(2)];
end
end
Z =
0.7000 + 0.7000i 0.0000 + 0.0000i
0.7000 - 0.7000i 0.0000 + 0.0000i
0.0000 + 0.0000i 0.7000 + 0.7000i
0.0000 + 0.0000i 0.7000 - 0.7000i
0.7000 + 0.7000i 0.7000 + 0.7000i
0.7000 + 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 - 0.7000i
0.7000 - 0.7000i 0.7000 + 0.7000i
Your code is valid if you have fix length in y, for example if each vector in y has one value and others are zeros, or two non-zeros values ...etc.
So you can do your code for each length separately and then build the matrix Z by combining all other matrices.

Is this correct vector implementation of gradient descent for multiple theta values?

This is my matlab code to predict [1;1;1] given [1;0;1] :
m = 1;
alpha = .00001;
x = [1;0;1;0;0;0;0;0;0];
y = [1;1;1;0;0;0;0;0;0];
theta1 = [4.7300;3.2800;1.4600;0;0;0;4.7300;3.2800;1.4600];
theta1 = theta1 - (alpha/m .* (x .* theta1-y)' * x)'
theta1 = reshape(theta1(1:9) , 3 , 3)
sigmoid(theta1 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta2 = [8.892;6.167;2.745;8.892;6.167;2.745;8.892;6.167;2.745];
theta2 = theta2 - (alpha/m .* (x .* theta2-y)' * x)'
theta2 = reshape(theta2(1:9) , 3 , 3)
sigmoid(theta2 * [1; 0; 1])
x = [1;0;1;0;0;0;0;0;0];
y = [1; 1; 1;0;0;0;0;0;0];
theta3 = [9.446;6.55;2.916;9.351;6.485;2.886;8.836;6.127;2.727];
theta3 = theta3 - (alpha/m .* (x .* theta3-y)' * x)'
theta3 = reshape(theta3(1:9) , 3 , 3)
sigmoid(theta3 * [1; 0; 1])
I'm computing theta1, theta2, theta3 individually but I think they
should be linked between each computation ?
Though gradient descent appears to be working as :
sigmoid(theta1 * [1; 0; 1]) =
0.9999
0.9986
0.9488
sigmoid(theta2 * [1; 0; 1]) =
1.0000
1.0000
0.9959
sigmoid(theta3 * [1; 0; 1]) =
1.0000
1.0000
0.9965
This shows for each theta value (layer in the network) the prediction is moving closer to [1;1;1]
Update : sigmoid function :
function g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
end
Update2 :
After extended discussion with user davidhigh who provided key insights have made following changes :
x = [1;0;1];
y = [1;1;1];
theta1 =
4.7300 3.2800 1.4600
0 0 0
4.7300 3.2800 1.4600
theta2 =
8.8920 8.8920 8.8920
6.1670 6.1670 6.1670
2.7450 2.7450 2.7450
theta3 =
9.4460 6.5500 2.9160
9.3510 6.4850 2.8860
8.8360 6.1270 2.7270
The crux of my issue is that I don't feed output of each layer into the next layer, once I made this change I get better result :
z1 = sigmoid(theta1 * x)
z1 =
0.9980
0.5000
0.9980
z2 = sigmoid(theta2 * z1)
z2 =
1.0000
1.0000
0.9989
z3 = sigmoid(theta3 * z2)
z3 =
1.0000
1.0000
1.0000
z3 is the predicted value which correctly is [1;1;1;] whereas previously it is approx [1;1;1;]

How to write a varying matrix in matlab?

I have this equation system a set of 1 ≤ n ≤ 30
−(2 + α)x1 + x2 = b1,
xj−1 − (2 + α)xj + xj+1 = bj , for 2 ≤ j ≤ 29,
x29 − (2 + α)x30 = b30.
α = 1
We assume that the membrane is held at the end points (i.e x0 = 0 and x31 = 0). There is no weight on the membrane so all bj = 0 for j = 1 . . . 30 except for j = 6 where a load is applied: b6 = 2.
I want to calculate LU factorization of the system .
I do not know how to implement the left side of the system in matlab.
The right side I made it like this :
b=[0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]';
How to do the left side?
Thanks
It's unclear why you have included the entire linear system if you are only interested in the LU factorization of A? Regardless, here is some code which generates your A matrix as described above, and solves the linear system and shows the LU factorization.
% equation A*X = b
b = zeros(30,1);
b(6) = 2;
alpha = 1;
A = zeros(30, 30);
A(1, 1) = -(2 + alpha);
A(1, 2) = 1;
for i = 2:29
A(i, i-1) = 1;
A(i, i) = -(2 + alpha);
A(i, i+1) = 1;
end
A(30, 29) = 1;
A(30, 30) = -(2 + alpha);
You can then get the LU factorization using lu(A) or solve the linear system of equations using linsolve(A,b).

Solving a linear system of equation with two variables in MATLAB

It might seem a simple question. I need it, though. Let's assume we have two equations:
2 * y + x + 1 = 0
and
y - 2 * x = 0
I would like to find their bisection which can be calculated from this equation:
|x + 2 * y + 1| |-2 *x + y |
------------------- = -----------------
(sqrt(2^2 + 1^2)) (sqrt(1^2 + 2^2))
To make the long story short, we only need to solve this below system of equation:
2 * y + x + 1 = -2 *x + y
and
2 * y + x + 1 = 2 *x - y
However, using solve function of MATLAB:
syms x y
eqn1 = 2 * y + x + 1 == -2 *x + y ;
eqn2 = 2 * y + x + 1 == 2 *x - y ;
[x, y] = solve (eqn1 , eqn2, x, y) ;
Will give me:
x = -1/5 and y = -2/5
But, I am looking for the result equations, which is:
y = -3 * x - 1 and 3 * y = 2 * x - 1
So, does anyone know how I can get the above line equation instead of the result point? Thanks,
The following should solve both equations with y on the left-hand-side:
y1 = solve(eqn1,y)
y2 = solve(eqn2,y)
Result:
y1 =
- 3*x - 1
y2 =
x/3 - 1/3
As an aside, it would be much faster to solve this system by thinking of it it as a matrix inversion problem Ax=b rather than using MATLAB's symbolic tools:
A = [1 2; -2 1];
b = [-1; 0];
x = A\b
Result:
x =
-0.2000
-0.4000

Plot discrete points and some circles that enclose them in matlab

I'm trying to plot some eigenvalues along with their Gershgorin circles in matlab, but don't seem to be able to find the syntax to get the discrete points (the eigenvalues) to show up. This is what I've tried:
clear all ;
m = [ 1 -1 0 0 ;
-1 2 -1 0 ;
0 -1 2 1 ;
0 0 -1 1 ]
e = eig( m ) ;
n = 30 ;
z1 = zeros( n + 1, 1 ) ;
z2 = zeros( n + 1, 1 ) ;
for i = [ 1 : n + 1 ]
z1(i) = 2 + 2 * exp(j * 2 * pi * (i - 1)/ 30) ;
z2(i) = 1 + exp(j * 2 * pi * (i - 1)/ 30) ;
end
h = plot( real(e(1)), imag(e(1)), real(e(2)), imag(e(2)), real(e(3)), imag(e(3)), real(e(4)), imag(e(4)), real(z1), imag(z1), real(z2), imag(z2) )
set(h(1),'LineWidth',2) ;
set(h(2),'LineWidth',2) ;
set(h(3),'LineWidth',2) ;
set(h(4),'LineWidth',2) ;
Which produces a plot in which I can see the circles, but not the points:
If I use the same set command on h(5) or h(6) it does make the circle plots show up thicker as I would have expected.
Well it does not show up because of the call to plot to plot points (horrible sentence sorry!). It's fine if you use scatter.
I modified a bit your code, that's why this is not a comment haha.
1) I vectorized your for-loop, which is quite faster on my computer. BTW, using i as an index is risky, especially when dealing with complex numbers. The safe way to go is either use something else, or
2) use 1j or 1i to represent the imaginary unit. That's also faster.
Anyhow here is the code with the points bigger:
clear
clc
close all
m = [ 1 -1 0 0 ;
-1 2 -1 0 ;
0 -1 2 1 ;
0 0 -1 1 ]
e = eig( m ) ;
n = 30 ;
%// see below for vectorized version
% for k = [ 1 : n + 1 ]
% z1(k) = 2 + 2 * exp(1j * 2 * pi * (k - 1)/ 30) ;
% z2(k) = 1 + exp(1j * 2 * pi * (k - 1)/ 30) ;
% end
%// vectorized loop with 1j as imaginary unit.
z1 = 2 + 2 * exp(1j * 2 * pi * ((1:n+1) - 1)/ 30) ;
z2 = 1 + exp(1j * 2 * pi * ((1:n+1) - 1)/ 30) ;
%// plot the circles and then use scatter for the points.
plot(real(z1), imag(z1), real(z2), imag(z2));
hold on
scatter(real(e),imag(e))
hold off
which gives the following:
You can of course customize the scatter plot as you wish. Hope that helps!
Try this:
h = plot( real(e), imag(e), 'x', real(z1), imag(z1), real(z2), imag(z2) )
More info in the plot documentation.