Octave backpropagation implementation issues - matlab

I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.

In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.

Related

What type of (probably syntactic) mistake am I making when using a generic function on an array for plotting?

I am trying to plot a geodesic on a 3D surface (tractrix) with Matlab. This worked for me in the past when I didn't need to parametrize the surface (see here). However, the tractrix called for parameterization, chain rule differentiation, and collection of u,v,x,y and f(x,y) values.
After many mistakes I think that I'm getting the right values for x = f1(u,v) and y = f2(u,v) describing a spiral right at the base of the surface:
What I can't understand is why the z value or height of the 3D plot of the curve is consistently zero, when I'm applying the same mathematical formula that allowed me to plot the surface in the first place, ie. f = #(x,y) a.* (y - tanh(y)) .
Here is the code, which runs without any errors on Octave. I'm typing a special note in upper case on the crucial calls. Also note that I have restricted the number of geodesic lines to 1 to decrease the execution time.
a = 0.3;
u = 0:0.1:(2 * pi);
v = 0:0.1:5;
[X,Y] = meshgrid(u,v);
% NOTE THAT THESE FORMULAS RESULT IN A SUCCESSFUL PLOT OF THE SURFACE:
x = a.* cos(X) ./ cosh(Y);
y = a.* sin(X) ./ cosh(Y);
z = a.* (Y - tanh(Y));
h = surf(x,y,z);
zlim([0, 1.2]);
set(h,'edgecolor','none')
colormap summer
hold on
% THESE ARE THE GENERIC FUNCTIONS (f) WHICH DON'T SEEM TO WORK AT THE END:
f = #(x,y) a.* (y - tanh(y));
f1 = #(u,v) a.* cos(u) ./ cosh(v);
f2 = #(u,v) a.* sin(u) ./ cosh(v);
dfdu = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u+eps,v)-f1(u-eps,v))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u+eps,v)-f2(u-eps,v))/(2*eps));
dfdv = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u,v+eps)-f1(u,v-eps))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u,v+eps)-f2(u,v-eps))/(2*eps));
% Normal vector to the surface:
N = #(u,v) [- dfdu(u,v), - dfdv(u,v), 1]; % Normal vec to surface # any pt.
% Some colors to draw the lines:
C = {'k','r','g','y','m','c'};
for s = 1:1 % No. of lines to be plotted.
% Starting points:
u0 = [0, u(length(u))];
v0 = [0, v(length(v))];
du0 = 0.001;
dv0 = 0.001;
step_size = 0.00005; % Will determine the progression rate from pt to pt.
eta = step_size / sqrt(du0^2 + dv0^2); % Normalization.
eps = 0.0001; % Epsilon
max_num_iter = 100000; % Number of dots in each line.
% Semi-empty vectors to collect results:
U = [[u0(s), u0(s) + eta*du0], zeros(1,max_num_iter - 2)];
V = [[v0(s), v0(s) + eta*dv0], zeros(1,max_num_iter - 2)];
for i = 2:(max_num_iter - 1) % Creating the geodesic:
ut = U(i);
vt = V(i);
xt = f1(ut,vt);
yt = f2(ut,vt);
ft = f(xt,yt);
utm1 = U(i - 1);
vtm1 = V(i - 1);
xtm1 = f1(utm1,vtm1);
ytm1 = f2(utm1,vtm1);
ftm1 = f(xtm1,ytm1);
usymp = ut + (ut - utm1);
vsymp = vt + (vt - vtm1);
xsymp = f1(usymp,vsymp);
ysymp = f2(usymp,vsymp);
fsymp = ft + (ft - ftm1);
df = fsymp - f(xsymp,ysymp); % Is the surface changing? How much?
n = N(ut,vt); % Normal vector at point t
gamma = df * n(3); % Scalar x change f x z value of N
xtp1 = xsymp - gamma * n(1); % Gamma to modulate incre. x & y.
ytp1 = ysymp - gamma * n(2);
U(i + 1) = usymp - gamma * n(1);;
V(i + 1) = vsymp - gamma * n(2);;
end
% THE PROBLEM! f(f1(U,V),f2(U,V)) below YIELDS ALL ZEROS!!! The expected values are between 0 and 1.2.
P = [f1(U,V); f2(U,V); f(f1(U,V),f2(U,V))]; % Compiling results into a matrix.
units = 35; % Determines speed (smaller, faster)
packet = floor(size(P,2)/units);
P = P(:,1: packet * units);
for k = 1:packet:(packet * units)
hold on
plot3(P(1, k:(k+packet-1)), P(2,(k:(k+packet-1))), P(3,(k:(k+packet-1))),
'.', 'MarkerSize', 5,'color',C{s})
drawnow
end
end
The answer is to Cris Luengo's credit, who noticed that the upper-case assigned to the variable Y, used for the calculation of the height of the curve z, was indeed in the parametrization space u,v as intended, and not in the manifold x,y! I don't use Matlab/Octave other than for occasional simulations, and I was trying every other syntactical permutation I could think of without realizing that f fed directly from v (as intended). I changed now the names of the different variables to make it cleaner.
Here is the revised code:
a = 0.3;
u = 0:0.1:(3 * pi);
v = 0:0.1:5;
[U,V] = meshgrid(u,v);
x = a.* cos(U) ./ cosh(V);
y = a.* sin(U) ./ cosh(V);
z = a.* (V - tanh(V));
h = surf(x,y,z);
zlim([0, 1.2]);
set(h,'edgecolor','none')
colormap summer
hold on
f = #(x,y) a.* (y - tanh(y));
f1 = #(u,v) a.* cos(u) ./ cosh(v);
f2 = #(u,v) a.* sin(u) ./ cosh(v);
dfdu = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u+eps,v)-f1(u-eps,v))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u+eps,v)-f2(u-eps,v))/(2*eps));
dfdv = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u,v+eps)-f1(u,v-eps))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u,v+eps)-f2(u,v-eps))/(2*eps));
% Normal vector to the surface:
N = #(u,v) [- dfdu(u,v), - dfdv(u,v), 1]; % Normal vec to surface # any pt.
% Some colors to draw the lines:
C = {'y','r','k','m','w',[0.8 0.8 1]}; % Color scheme
for s = 1:6 % No. of lines to be plotted.
% Starting points:
u0 = [0, -pi/2, 2*pi, 4*pi/3, pi/4, pi];
v0 = [0, 0, 0, 0, 0, 0];
du0 = [0, -0.0001, 0.001, - 0.001, 0.001, -0.01];
dv0 = [0.1, 0.01, 0.001, 0.001, 0.0005, 0.01];
step_size = 0.00005; % Will determine the progression rate from pt to pt.
eta = step_size / sqrt(du0(s)^2 + dv0(s)^2); % Normalization.
eps = 0.0001; % Epsilon
max_num_iter = 180000; % Number of dots in each line.
% Semi-empty vectors to collect results:
Uc = [[u0(s), u0(s) + eta*du0(s)], zeros(1,max_num_iter - 2)];
Vc = [[v0(s), v0(s) + eta*dv0(s)], zeros(1,max_num_iter - 2)];
for i = 2:(max_num_iter - 1) % Creating the geodesic:
ut = Uc(i);
vt = Vc(i);
xt = f1(ut,vt);
yt = f2(ut,vt);
ft = f(xt,yt);
utm1 = Uc(i - 1);
vtm1 = Vc(i - 1);
xtm1 = f1(utm1,vtm1);
ytm1 = f2(utm1,vtm1);
ftm1 = f(xtm1,ytm1);
usymp = ut + (ut - utm1);
vsymp = vt + (vt - vtm1);
xsymp = f1(usymp,vsymp);
ysymp = f2(usymp,vsymp);
fsymp = ft + (ft - ftm1);
df = fsymp - f(xsymp,ysymp); % Is the surface changing? How much?
n = N(ut,vt); % Normal vector at point t
gamma = df * n(3); % Scalar x change f x z value of N
xtp1 = xsymp - gamma * n(1); % Gamma to modulate incre. x & y.
ytp1 = ysymp - gamma * n(2);
Uc(i + 1) = usymp - gamma * n(1);;
Vc(i + 1) = vsymp - gamma * n(2);;
end
x = f1(Uc,Vc);
y = f2(Uc,Vc);
P = [x; y; f(Uc,Vc)]; % Compiling results into a matrix.
units = 35; % Determines speed (smaller, faster)
packet = floor(size(P,2)/units);
P = P(:,1: packet * units);
for k = 1:packet:(packet * units)
hold on
plot3(P(1, k:(k+packet-1)), P(2,(k:(k+packet-1))), P(3,(k:(k+packet-1))),
'.', 'MarkerSize', 5,'color',C{s})
drawnow
end
end

expectation maximization algorithm matlab out of memory error

I am implementing Expectation Maximization algorithm in matlab. Algorithm is operating on 214096 x 2 data matrix and While computing probabilities, there is multiplication of ( 214096 x 2 ) * (2 x 2) * ( 2 x 214096 ) matrices, which is resulting in error of out of memory in matlab. Is there a way to fix this problem?
Equation
Matlab Code:
enter image description here D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components ( Also number of clusters )
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi, probabilities of clusters, apriori probability of cluster
[idx,mu] = kmeans(X,K); % initial means of the components, theta is mu and variance
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
tempmat = X(idx==k,:);
sigma(:,:,k) = cov(tempmat); % Sigma j
sigma_det(k) = det(sigma(:,:,k));
end
% calculate x-mu
for k=1: K
check=length( X(idx == k,1))
for lidx = 1: length( X(idx == k,1))
cidx = find( idx == k) ;
Xmu(cidx(lidx),:) = X(cidx(lidx),:) - mu(k,:); %( x-mu ) calculation on cluster level
end
end
% compute P(Cj|x; theta(t)), and take log to simplified calculation
%Eq 14.14 denominator
denom = 0;
for k=1:K
calc_sigma_1_2 = sigma_det(k)^(-1/2);
calc_x_mu = Xmu(idx == k,:);
calc_sigma_inv = inv(sigma(:,:,k));
calc_x_mu_tran = calc_x_mu.';
factor = calc_sigma_1_2 * exp (-1/2 * calc_x_mu * calc_sigma_inv * calc_x_mu_tran ) * p(k);
denom = denom + factor;
end
for k =1:K
calc_sigma_1_2 = sigma_det(k)^(-1/2);
calc_x_mu = Xmu(idx == k,:);
calc_sigma_inv = inv(sigma(:,:,k));
calc_x_mu_tran = calc_x_mu.';
factor = calc_sigma_1_2 * exp (-1/2 * calc_x_mu_tran * calc_sigma_inv * calc_x_mu ) * p(k);
pdf(k) = factor/denom;
end
%%%% Equation 14.14 ends
It seems that you tried to apply vector based equation by simply substituting vector for matrix, this is not how it works
(x - mu).' * Inv(sigma) * (x-mu)
is supposed to be mahalanobis norm of (x-mu), and you want to obtain this value per each row of matrix X, thus
(X - mu).' * Inv(sigma) =: A <- this is ok, this results in N x d matrix
and now you have to do point-wise multiplication of A with (X - mu), not a dot product, and finally sum over second axis (columns), this way you end up with N element vector, each containing a mahalanobis norm of corresponding row from X.

heterogeneous class recognition with ANN / MLP

I have put together a classifying 3 layer artificial neural network that appears to work on other datasets. Playing around some artificial datasets that I made, I was unable to correctly predict between two classes when one class was positive in one feature or another feature.
Clearly class1 is can be identified by asking if either feature 1 or feature 2 is equal to 1 but I can't get the algorithm to predict the dataset correctly (there are 20 examples following this pattern in the dataset).
Can ANN/MLPs recognize this type of pattern? If so, what am I missing? If not, are there other methods that can predict this type of pattern (maybe SVM)?
I used Octave as that was what was used in the online course offered from coursera. I have listed most of the code here although it is structured slightly differently when I run it. As you can see I do use bias units on the first and second layers and I have also varied the number of hidden units in the second layer from 1-5 with no improvement over random guessing.
% Load dataset
y = [1; 1; 2; 2]
X = [1, 0; 0, 1; 0, 0; 0, 0]
m = size(X, 1);
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));
% Randomly initialize weight parameters
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
% Add bias units to layers and feedforward
Xbias = [ones(m,1), X];
L2bias = [ones(m,1), sigmoid(Xbias*Theta1')];
L3 = sigmoid(L2bias * Theta2');
% Create class matrix Y
Y = zeros(m, num_labels);
for r = 1:m;
Y(r, y(r)) = 1;
end
% Set cost function
J = (sum(sum(Y.*log(L3) + (1-Y).*log(1-L3))))/-m + lambda*(sum(sum((Theta1(:,2:columns(Theta1))).^2)) + sum(sum((Theta2(:,2:columns(Theta2))).^2)))/2/m;
% Initialize weight gradient matrices
D2 = zeros(rows(Theta2),columns(Theta2));
D1 = zeros(rows(Theta1),columns(Theta1));
% Calculate gradient with backpropagation
for t = 1:m;
a1 = [1 X(t,:)]';
z2 = Theta1*a1;
a2 = [1; sigmoid(z2)];
z3 = Theta2*a2;
a3 = sigmoid(z3);
d3 = a3 - Y(t,:)';
d2 = (Theta2'*d3)(2:end).*sigmoidGradient(z2);
D2 = D2 + d3*a2';
D1 = D1 + d2*a1';
end
Theta2_grad = D2/m;
Theta1_grad = D1/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda*Theta2(:,2:end)/m;
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda*Theta1(:,2:end)/m;
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
% Compute cost (Feed forward)
[J,grad] = nnCostFunction(initial_nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);
% Create "short hand" for the cost function to be minimized using fmincg
costFunction = #(p) nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);
% Train the neural network using fmincg
options = optimset('MaxIter', 1000);
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));
NN can recognize any pattern. Universal Approximation Theorem proves that (as well as many others).
The most obvious reason I can think of is lack of bias neuron. Althouh for more valuable answers you have to include your code.

Neural Networks: Sigmoid Activation Function for continuous output variable

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.
In particular, the neural network which I had completed correctly as part of the assignment was as follows:
Sigmoid activation function: g(z) = 1/(1+e^(-z))
10 output units, each which could take 0 or 1
1 hidden layer
Back-propagation method used to minimize cost function
Cost function:
where L=number of layers, s_l = number of units in layer l, m = number of training examples, K = number of output units
Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have
Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
Updated references to the number of output units
Updated the cost function in the back-propagation algorithm to:
where a_3 is the value of the output unit determined from forward propagation.
I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).
Can anyone advise of what other steps would be required?
Coded in Matlab as follows:
% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);
% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;
% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );
% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
I have since realised that this question is similar to that asked by #Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.
First, your cost function should be:
J = 1/m * sum( (a3-y).^2 );
I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).
Check this slide for more details.
EDIT:
In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)
function [J grad] = nnCostFunctionRegression(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
X = [ones(m, 1) X];
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);
y_recode = zeros(length(y),num_labels);
for i=1:length(y)
y_recode(i,y(i))=1;
end
y = y_recode;
regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');
delta_cap2 = delta_3' * z1;
delta_cap1 = delta_2' * X;
Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));
Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
If you want to have continuous output try not to use sigmoid activation when computing target value.
a1 = [ones(m, 1) X];
a2 = sigmoid(X * Theta1');
a2 = [ones(m, 1) z1];
a3 = z1 * Theta2';
ht = a3;
Normalize input before using it in nnCostFunction. Everything else remains same.

Recomposing vector input to algorithm from matrix output

I've written some code to implement an algorithm that takes as input a vector q of real numbers, and returns as an output a complex matrix R. The Matlab code below produces a plot showing the input vector q and the output matrix R.
Given only the complex matrix output R, I would like to obtain the input vector q. Can I do this using least-squares optimization? Since there is a recursive running sum in the code (rs_r and rs_i), the calculation for a column of the output matrix is dependent on the calculation of the previous column.
Perhaps a non-linear optimization can be set up to recompose the input vector q from the output matrix R?
Looking at this in another way, I've used an algorithm to compute a matrix R. I want to run the algorithm "in reverse," to get the input vector q from the output matrix R.
If there is no way to recompose the starting values from the output, thereby treating the problem as a "black box," then perhaps the mathematics of the model itself can be used in the optimization? The program evaluates the following equation:
The Utilde(tau, omega) is the output matrix R. The tau (time) variable comprises the columns of the response matrix R, whereas the omega (frequency) variable comprises the rows of the response matrix R. The integration is performed as a recursive running sum from tau = 0 up to the current tau timestep.
Here are the plots created by the program posted below:
Here is the full program code:
N = 1001;
q = zeros(N, 1); % here is the input
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv); % R is output matrix
rows = wSize;
cols = N;
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure; imagesc(abs(R)); title('Matrix output of algorithm')
colorbar
Here is the function that performs the calculation:
function response = get_response(N, Q, dt, wSize, Glim, ginv)
fs = 1 / dt;
Npad = wSize - 1;
N1 = wSize + Npad;
N2 = floor(N1 / 2 + 1);
f = (fs/2)*linspace(0,1,N2);
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
sigma2 = exp(-(0.23*Glim + 1.63));
sign = 1;
if(ginv == 1)
sign = -1;
end
ratio = omega ./ omegah;
rs_r = zeros(N2, 1);
rs_i = zeros(N2, 1);
termr = zeros(N2, 1);
termi = zeros(N2, 1);
termr_sub1 = zeros(N2, 1);
termi_sub1 = zeros(N2, 1);
response = zeros(N2, N);
% cycle over cols of matrix
for ti = 1:N
term0 = omega ./ (2 .* Q(ti));
gamma = 1 / (pi * Q(ti));
% calculate for the real part
if(ti == 1)
Lambda = ones(N2, 1);
termr_sub1(1) = 0;
termr_sub1(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
else
termr(1) = 0;
termr(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
rs_r = rs_r - dt.*(termr + termr_sub1);
termr_sub1 = termr;
Beta = exp( -1 .* -0.5 .* rs_r );
Lambda = (Beta + sigma2) ./ (Beta.^2 + sigma2); % vector
end
% calculate for the complex part
if(ginv == 1)
termi(1) = 0;
termi(2:end) = (ratio(2:end).^(sign .* gamma) - 1) .* omega(2:end);
else
termi = (ratio.^(sign .* gamma) - 1) .* omega;
end
rs_i = rs_i - dt.*(termi + termi_sub1);
termi_sub1 = termi;
integrand = exp( 1i .* -0.5 .* rs_i );
if(ginv == 1)
response(:,ti) = Lambda .* integrand;
else
response(:,ti) = (1 ./ Lambda) .* integrand;
end
end % ti loop
No, you cannot do so unless you know the "model" itself for this process. If you intend to treat the process as a complete black box, then it is impossible in general, although in any specific instance, anything can happen.
Even if you DO know the underlying process, then it may still not work, as any least squares estimator is dependent on the starting values, so if you do not have a good guess there, it may converge to the wrong set of parameters.
It turns out that by using the mathematics of the model, the input can be estimated. This is not true in general, but for my problem it seems to work. The cumulative integral is eliminated by a partial derivative.
N = 1001;
q = zeros(N, 1);
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv);
rows = wSize;
cols = N;
cut_val = 200;
imagLogR = imag(log(R));
Mderiv = zeros(rows, cols-2);
for k = 1:rows
val = deriv_3pt(imagLogR(k,:), dt);
val(val > cut_val) = 0;
Mderiv(k,:) = val(1:end-1);
end
disp('Running iteration');
q0 = 10;
q1 = 500;
NN = cols - 2;
qout = zeros(NN, 1);
for k = 1:NN
data = Mderiv(:,k);
qout(k) = fminbnd(#(q) curve_fit_to_get_q(q, dt, rows, data),q0,q1);
end
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure;
plot(qout); title('Reconstructed q')
ylim([0 200]); xlim([0 1001])
Here are the supporting functions:
function output = deriv_3pt(x, dt)
% Function to compute dx/dt using the 3pt symmetrical rule
% dt is the timestep
N = length(x);
N0 = N - 1;
output = zeros(N0, 1);
denom = 2 * dt;
for k = 2:N0
output(k - 1) = (x(k+1) - x(k-1)) / denom;
end
function sse = curve_fit_to_get_q(q, dt, rows, data)
fs = 1 / dt;
N2 = rows;
f = (fs/2)*linspace(0,1,N2); % vector for frequency along cols
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
ratio = omega ./ omegah;
gamma = 1 / (pi * q);
termi = ((ratio.^(gamma)) - 1) .* omega;
Error_Vector = termi - data;
sse = sum(Error_Vector.^2);