Related
I'm having trouble on an easy exercise about an artificial neural network with 2 features, a hidden layer of 5 neurons and two possible outputs (0 or 1).
My X matrix is a 51x2 matrix, and y is a 51x1 vector.
I know I'm not supposed to do the while E>1 but I wanted to see if eventually my error would be lower than 1
I'd like to know what I am doing wrong. My error doesn't seem to lower (around 1.5 no matter how much iterations I'm doing). Do you see in the code where I am doing a mistake? I'm supposed to use gradient descent.
function [E, v,w] = costFunction(X, y,alpha1,alpha2)
[m n] = size(X);
E = 1;
v = 2*rand(5,3)-1;
w = 2*rand(2,6)-1;
grad_v=zeros(size(v));
grad_w=zeros(size(w));
K = 2;
E = 2;
while E> 1
a1 = [ones(m,1) X];
z2 = a1 * v';
a2 = sigmoid(z2);
a2 = [ones(size(a2,1),1),a2];
z3 = a2 * w';
h = sigmoid(z3);
cost = sum((-y.*log(h)) - ((1-y).*log(1-h)),2);
E = (1/m)*sum(cost);
Delta1=0;
Delta2=0;
for t = 1:m
a1 = [1;X(t,:)'];
z2 = v * a1;
a2 = sigmoid(z2);
a2 = [1;a2];
z3 = w * a2;
a3 = sigmoid(z3);
d3 = a3 - y(t,:)';
d2 = (w(:,2:end)'*d3).*sigmoidGradient(z2);
Delta2 += (d3*a2');
Delta1 += (d2*a1');
end
grad_v = (1/m) * Delta1;
grad_w = (1/m) * Delta2;
v -= alpha1 * grad_v;
w -= alpha2 * grad_w;
end
end
I have a problem with the equation shown below. I want to enter a vector in t2 and find the roots of the equation from different values in t2.
t2=[10:10:100]
syms x
p = x^3 + 3*x - t2;
R = solve(p,x)
R1 = vpa(R)
Easy! Don't use syms and use the general formula:
t2 = [10:10:100];
%p = x^3 + 3*x - t2;
a = 1;
b = 0;
c = 3;
d = -t2;
D0 = b*b - 3*a*c;
D1 = 2*b^3 - 9*a*b*c + 27*a^2*d;
C = ((D1+sqrt(D1.^2 - 4*D0.^3))/2).^(1/3);
C1 = C*1;
C2 = C*(-1-sqrt(3)*1i)/2;
C3 = C*(-1+sqrt(3)*1i)/2;
f = -1/(3*a);
x1 = f*(b + C1 + D0./C1);
x2 = f*(b + C2 + D0./C2);
x3 = f*(b + C3 + D0./C3);
Since b = 0, you can simplify this a bit:
% ... polynomial is the same
D0 = -3*a*c;
D1 = 27*a^2*d;
% ... the different C's are the same
f = -1/(3*a);
x1 = f*(C1 + D0./C1);
x2 = f*(C2 + D0./C2);
x3 = f*(C3 + D0./C3);
Trial>> syms x p
Trial>> EQUS = p == x^3 + 3*x - t2
It is unknown that you want to solve an equation or a system.
Suppose that you want to solve a system.
Trial>> solx = solve(Eqns,x)
But, I do not think you can find roots.
You can solve one equation.
Trial>> solx = solve(Eqns(3),x)
As far as I know, Maple can do this batter.
In general, loops should be avoided, but this is the only solution that hit my brain right now.
t2=[10:10:100];
pp=repmat([1,0,3,0],[10,1]);
pp(:,4)=-t2;
for i=1:10
R(i,:) =roots(pp(i,:))';
end
Is it possible to further optimise the following script by adopting meshgrid, reshape, linear indexing, and/or any other good optimisation techniques?
clear all
% tic;
N1 = 5000;
Nt = 2500;
u = randn(Nt,2);
A = zeros(2,2,N1);
B = zeros(2,2,N1);
C = zeros(2,2,N1);
D = zeros(2,2,N1);
X = zeros(2,1,N1);
Y = zeros(2,Nt,N1);
U = zeros(2,Nt,N1);
tic;
parfor i=1:N1
A(:,:,i) = [unifrnd(0.25,0.75),unifrnd(0.15,0.45);unifrnd(0.4,1.2),unifrnd(0.25,0.75)];
B(:,:,i) = [unifrnd(1,3),unifrnd(2.5,7.5);unifrnd(-22.5,-7.5),unifrnd(-1.95,-0.65)];
C(:,:,i) = [unifrnd(-1,1),unifrnd(-19.5,-6.5);unifrnd(0.5,1.5),unifrnd(-22.5,-7.5)];
D(:,:,i) = [unifrnd(0.1,0.3),unifrnd(-1,1);unifrnd(1,3),unifrnd(-1,1)];
U(:,:,i) = u';
X(:,:,i) = zeros(2,1);
Y(:,:,i) = zeros(2,Nt);
end
toc;
A1 = gpuArray(A);
B1 = gpuArray(B);
C1 = gpuArray(C);
D1 = gpuArray(D);
X3 = gpuArray(X);
U3 = gpuArray(U);
AX = gpuArray(zeros(2,Nt,N1));
X3update = zeros(2,Nt,N1);
X3update = gpuArray(X3update);
%%
tic;
DU = pagefun(#mtimes, D1, U3(:,1:Nt,:));
BU = pagefun(#mtimes, B1, U3(:,1:Nt,:));
for j = 1:Nt
X3update = pagefun(#mtimes,A1,X3) + BU(:,j,:);
AX(:,j,:) = X3;
X3 = X3update;
end
CX = pagefun(#mtimes, C1, AX(:,1:Nt,:));
Y3 = CX + DU;
toc;
Apologies in advance as this is quite a specific question. However, I am trying to increase the computational efficiency of state-space simulations, so quite widely applicable.
Thanks!
I'm not familiar with the parallel toolbox, but couldn't you at least speed up the generation of the uniform random numbers with:
A = [unifrnd(0.25, 0.75, [1 1 N1]), unifrnd(0.15, 0.45, [1 1 N1]); unifrnd(0.4, 1.2, [1 1 N1]), unifrnd(0.25, 0.75, [1 1 N1])];
and so forth?
I am attempting to create a Abeles matrix formalism model to analyse some experimental data - I have attached a wiki link to this for reference so that you can see what I an attempting to achieve.
The crux of my issue is that I am unable to multiply four sets of matrices against each other, as in: A[1]*B[1]*C[1]*D[1], A[2]*B[2]*C[2]*D[2], ..., A[n]*B[n]*C[n]*D[n]. I then need to store the results as individual matrices of their own - each matrix represents the corresponding momentum transfer value from Qmin:Qstep:Qmax.
Also, when I attempt to carry out the final step; R = abs((ABCD(2,1)./ABCD(1,1)).^2) I end up with a single value rather than a value of R for each Q value.
Due to the size of the code a simple for loop isn't a realistic option.
my 'test' code is:
%import data fid = fopen('run_22208_09.dat');
%A = textscan(fid,'%f%f%f',270,'headerlines',0,'delimiter',',');
NQ = size(A{1,1});
NQ = NQ(1);
Qmin = A{1,1}(1);
Qmax = A{1,1}(NQ);
Qstep = A{1,1}(2) - A{1,1}(1);
fclose('all');
s0 = 2e-6;
s1 = 10e-6;
s2 = 6e-6;
s3 = 4e-6;
s4 = 8e-6;
sn = 12e-6;
r1 = 2;
r2 = 10;
r3 = 3;
r4 = 7;
t1 = 10;
t2 = 45;
t3 = 5;
t4 = 20;
Q=Qmin:Qstep:Qmax;
k = 2.*Q;
k1 = (((k).^2) - 4.*pi.*(s1 - s0)).^0.5;
k2 = (((k).^2) - 4.*pi.*(s2 - s0)).^0.5;
k3 = (((k).^2) - 4.*pi.*(s3 - s0)).^0.5;
k4 = (((k).^2) - 4.*pi.*(s4 - s0)).^0.5;
kn = (((k).^2) - 4.*pi.*(sn - s0)).^0.5;
layer1 = ((k1 - k2)./(k1 + k2)).*(exp(-2.*k1.*k2.*(r1.^2)));
beta1 = (sqrt(-1)).*k1.*t1;
layer2 = ((k2 - k3)./(k2 + k3)).*(exp(-2.*k2.*k3.*(r2.^2)));
beta2 = (sqrt(-1)).*k2.*t2;
layer3 = ((k3 - k4)./(k3 + k4)).*(exp(-2.*k3.*k4.*(r3.^2)));
beta3 = (sqrt(-1)).*k3.*t3;
layer4 = ((k4 - kn)./(k4 + kn)).*(exp(-2.*k4.*kn.*(r4.^2)));
beta4 = (sqrt(-1)).*k4.*t4;
%general matrix
C1 = [exp(beta1),layer1.*(exp(beta1));layer1.*exp(-beta1),exp(-beta1)]
C2 = [exp(beta2),layer2.*(exp(beta2));layer2.*exp(-beta2),exp(-beta2)];
C3 = [exp(beta3),layer3.*(exp(beta3));layer3.*exp(-beta3),exp(-beta3)];
C4 = [exp(beta4),layer4.*(exp(beta4));layer4.*exp(-beta4),exp(-beta4)];
% CA = bsxfun(#times,C1,C2)
% CB = bsxfun(#times,CA,C3);
% C = bsxfun(#times,CB,C4)
% R = abs((C(2,1)./C(1,1)).^2)
For element-wise multiplications of arrays, you write
M = C1 .* C2 .* C3 .* C4;
I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!