matlab convert subwindow into vector - matlab

% obtain small windows of the image (say 16x16 windows and possibly using the crop function)
[rows columns] = size(B);
blockSizeR = 25; % Rows in block.
blockSizeC = 25; % Columns in block.
wholeBlockRows = floor(rows / blockSizeR);
wholeBlockCols = floor(columns / blockSizeC);
image3d = zeros(wholeBlockRows, wholeBlockCols);
sliceNumber = 1;
dataStruct = [];
for row = 1 : blockSizeR : rows
for col = 1 : blockSizeC : columns
row1 = row;
row2 = row1 + blockSizeR - 1;
col1 = col;
col2 = col1 + blockSizeC - 1;
oneBlock = B(row1:row2, col1:col2);
subplot(4, 4, sliceNumber);
imshow(oneBlock);
caption = sprintf('Block #%d of 16', sliceNumber);
title(caption);
drawnow;
dataStruct = [dataStruct, oneBlock(:)];
sliceNumber = sliceNumber + 1;
end
end
I am trying to extract 16 25x25 subwindows from a 100x100 pixel image, then convert each subwindow into a 125 column vector, but my data structure for appending all these vectors seem to be 625 x 16 instead of 125 x 16.
The subwindows seem to be displayed fine in the figure. Any clues as to where i went wrong would be much appreciated.

You can use mat2cell and cellfun:
dataStruct = mat2cell( B, blockSizeR * ones( 1, wholeBlockRows ), ...
blockSizeC * ones( 1, wholeBlockCols ) );
dataStruct = cellfun( #(x) x(:), dataStruct, 'uni', 0 );
dataStruct = [dataStruct{:}];

Related

Plotting numbers in a Cell array

I want to just plot the data, which is all real numbers, that is stored in a Cell Array. My cell array is 1-100 1-dimensional, but I am confused on how to actually apply the plot() function with the hold on function.
Here is my code:
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = cell(1,6); % User inputted initial point
I1 = cell(1,100);
I2 = cell(1,100);
I3 = cell(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a1 = 0; a2 = 2; a3 = 4; a4 = 6;
b1 = 2; b2 = 3; b3 = 7; b4 = 10;
% Declare the \lambda_i, i=1,..., 6, variables
L = cell(1,6);
L1 = abs((b2 - b3)/(a2 - a3));
L2 = abs((b1 - b3)/(a1 - a3));
L3 = abs((b1 - b2)/(a1 - a2));
L4 = abs((b1 - b4)/(a1 - a4));
L5 = abs((b2 - b4)/(a2 - a4));
L6 = abs((b3 - b4)/(a3 - a4));
L{1,1} = L1;
L{1,2} = L2;
L{1,3} = L3;
L{1,4} = L4;
L{1,5} = L5;
L{1,6} = L6;
% Create function handle for floww()
F = #floww;
for j = 1:6
D{1,j} = input('Input in1 through in6: ');
end
% Iterate through floww()
k = [0:5:100];
for i = 1: 100
C{1,i} = F(D{1,1}, D{1,2}, D{1,3}, D{1,4}, D{1,5}, D{1,6},L); % Output from floww() is a 6-by-1 vector
for j = 1:6
D{1,j} = C{1,i}(j,1); % Reassign input values to put back into floww()
end
% First integrals as described in the paper
I1{1,i} = 2*(C{1,i}(1,1)).^2 + 2*(C{1,i}(2,1)).^2 + 2*(C{1,i}(3,1)).^2 + 2*(C{1,i}(4,1)).^2 + 2*(C{1,i}(5,1)).^2 + 2*(C{1,i}(6,1)).^2;
I2{1,i} = (-C{1,i}(3,1))*(-C{1,i}(6,1)) - (C{1,i}(2,1))*(-C{1,i}(5,1)) + (-C{1,i}(1,1))*(-C{1,i}(4,1));
I3{1,i} = 2*L1*(C{1,i}(1,1)).^2 + 2*L2*(C{1,i}(2,1)).^2 + 2*L3*(C{1,i}(3,1)).^2 + 2*L4*(C{1,i}(4,1)).^2 + 2*L5*(C{1,i}(5,1)).^2 + 2*L6*(C{1,i}(6,1)).^2;
plot(k, I1{1,i});
hold on;
end
% This function will solve the linear system
% Bx^(n+1) = x detailed in the research notes
function [out1] = floww(in1, in2, in3, in4, in5, in6, L)
% A_ij = (lambda_i - lambda_j)
% Declare relevant A_ij values
A32 = L{1,3} - L{1,2};
A65 = L{1,6} - L{1,5};
A13 = L{1,1} - L{1,3};
A46 = L{1,4} - L{1,6};
A21 = L{1,2} - L{1,1};
A54 = L{1,5} - L{1,4};
A35 = L{1,3} - L{1,5};
A62 = L{1,6} - L{1,2};
A43 = L{1,4} - L{1,3};
A16 = L{1,1} - L{1,6};
A24 = L{1,2} - L{1,4};
A51 = L{1,5} - L{1,1};
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
B = [1, -A32*(delT/2)*in3, -A32*(delT/2)*in2, 0, -A65*(delT/2)*in6, -A65*(delT/2)*in5;
-A13*(delT/2)*in3, 1, -A13*(delT/2)*in1, -A46*(delT/2)*in6, 0, A46*(delT/2)*in4;
-A21*(delT/2)*in2, -A21*(delT/2)*in1, 1, -A54*(delT/2)*in5, -A54*(delT/2)*in4, 0;
0, -A62*(delT/2)*in6, -A35*(delT/2)*in5, 1, -A35*(delT/2)*in3, -A62*(delT/2)*in2;
-A16*(delT/2)*in6, 0, -A43*(delT/2)*in4, -A43*(delT/2)*in3, 1, -A16*(delT/2)*in1;
-A51*(delT/2)*in5, -A24*(delT/2)*in4, 0, -A24*(delT/2)*in2, -A51*(delT/2)*in1, 1];
% Declare input vector
N = [in1; in2; in3; in4; in5; in6];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
x = B\N;
% Assign output variables
out1 = x;
%disp(x);
%disp(out1(2,1));
end
The plotting takes place in the for-loop with plot(k, I1{1,i});. The figure that is outputted is not what I expect nor want:
Can someone please explain to me what I am doing wrong and/or how to get what I want?
You need to stop using cell arrays for numeric data, and indexed variable names when an array would be way simpler.
I've edited your code, below, to plot the I1 array.
To make it work, I changed almost all cell arrays to numeric arrays and simplified a bunch of the indexing. Note initialisation is now with zeros instead of cell, therefore indexing with parentheses () not curly braces {}.
I didn't change the structure too much, because your comments indicated you were following some literature with this layout
For the plotting, you were trying to plot single points during the loop - to do that you have no line (the points are distinct), so need to specify a marker like plot(x,y,'o'). However, what I've done is just plot after the loop - since you're storing the resulting I1 array anyway.
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = zeros(1,6); % User inputted initial point
I1 = zeros(1,100);
I2 = zeros(1,100);
I3 = zeros(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a1 = 0; a2 = 2; a3 = 4; a4 = 6;
b1 = 2; b2 = 3; b3 = 7; b4 = 10;
% Declare the \lambda_i, i=1,..., 6, variables
L = zeros(1,6);
L(1) = abs((b2 - b3)/(a2 - a3));
L(2) = abs((b1 - b3)/(a1 - a3));
L(3) = abs((b1 - b2)/(a1 - a2));
L(4) = abs((b1 - b4)/(a1 - a4));
L(5) = abs((b2 - b4)/(a2 - a4));
L(6) = abs((b3 - b4)/(a3 - a4));
for j = 1:6
D(j) = input('Input in1 through in6: ');
end
% Iterate through floww()
for i = 1:100
C{i} = floww(D(1), D(2), D(3), D(4), D(5), D(6), L); % Output from floww() is a 6-by-1 vector
for j = 1:6
D(j) = C{i}(j,1); % Reassign input values to put back into floww()
end
% First integrals as described in the paper
I1(i) = 2*(C{i}(1,1)).^2 + 2*(C{i}(2,1)).^2 + 2*(C{i}(3,1)).^2 + 2*(C{i}(4,1)).^2 + 2*(C{i}(5,1)).^2 + 2*(C{i}(6,1)).^2;
I2(i) = (-C{i}(3,1))*(-C{i}(6,1)) - (C{i}(2,1))*(-C{i}(5,1)) + (-C{i}(1,1))*(-C{i}(4,1));
I3(i) = 2*L(1)*(C{i}(1,1)).^2 + 2*L(2)*(C{i}(2,1)).^2 + 2*L(3)*(C{i}(3,1)).^2 + 2*L(4)*(C{i}(4,1)).^2 + 2*L(5)*(C{i}(5,1)).^2 + 2*L(6)*(C{i}(6,1)).^2;
end
plot(1:100, I1);
% This function will solve the linear system
% Bx^(n+1) = x detailed in the research notes
function [out1] = floww(in1, in2, in3, in4, in5, in6, L)
% A_ij = (lambda_i - lambda_j)
% Declare relevant A_ij values
A32 = L(3) - L(2);
A65 = L(6) - L(5);
A13 = L(1) - L(3);
A46 = L(4) - L(6);
A21 = L(2) - L(1);
A54 = L(5) - L(4);
A35 = L(3) - L(5);
A62 = L(6) - L(2);
A43 = L(4) - L(3);
A16 = L(1) - L(6);
A24 = L(2) - L(4);
A51 = L(5) - L(1);
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
B = [1, -A32*(delT/2)*in3, -A32*(delT/2)*in2, 0, -A65*(delT/2)*in6, -A65*(delT/2)*in5;
-A13*(delT/2)*in3, 1, -A13*(delT/2)*in1, -A46*(delT/2)*in6, 0, A46*(delT/2)*in4;
-A21*(delT/2)*in2, -A21*(delT/2)*in1, 1, -A54*(delT/2)*in5, -A54*(delT/2)*in4, 0;
0, -A62*(delT/2)*in6, -A35*(delT/2)*in5, 1, -A35*(delT/2)*in3, -A62*(delT/2)*in2;
-A16*(delT/2)*in6, 0, -A43*(delT/2)*in4, -A43*(delT/2)*in3, 1, -A16*(delT/2)*in1;
-A51*(delT/2)*in5, -A24*(delT/2)*in4, 0, -A24*(delT/2)*in2, -A51*(delT/2)*in1, 1];
% Declare input vector
N = [in1; in2; in3; in4; in5; in6];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
x = B\N;
% Assign output variables
out1 = x;
end
Output with in1..6 = 1 .. 6:
Note: you could simplify this code a lot if you embraced arrays over clunky variable names. The below achieves the exact same result for the body of your script, but is much more flexible and maintainable:
See how much simpler your integral expressions become!
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = zeros(1,6); % User inputted initial point
I1 = zeros(1,100);
I2 = zeros(1,100);
I3 = zeros(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a = [0, 2, 4, 6];
b = [2, 3, 7, 10];
% Declare the \lambda_i, i=1,..., 6, variables
L = abs( ( b([2 1 1 1 2 3]) - b([3 3 2 4 4 4]) ) ./ ...
( a([2 1 1 1 2 3]) - a([3 3 2 4 4 4]) ) );
for j = 1:6
D(j) = input('Input in1 through in6: ');
end
% Iterate through floww()
k = 1:100;
for i = k
C{i} = floww(D(1), D(2), D(3), D(4), D(5), D(6), L); % Output from floww() is a 6-by-1 vector
D = C{i}; % Reassign input values to put back into floww()
% First integrals as described in the paper
I1(i) = 2*sum(D.^2);
I2(i) = sum( D(1:3).*D(4:6) );
I3(i) = 2*sum((L.').*D.^2).^2;
end
plot( k, I1 );
Edit:
You can simplify the floww function by using a couple of things
A can be declared really easily as a single matrix.
Notice delT/2 is a factor in almost every element, factor it out!
The only non-zero elements where delT/2 isn't a factor are the diagonal of ones... add this in using eye instead.
Input the in1..6 variables as a vector. You already have the vector when you call floww - makes no sense breaking it up.
With the input as a vector, we can use utility functions like hankel to do some neat indexing. This one is a stretch for a beginner, but I include it as a demo.
Code:
% In code body, call floww with an array input
C{i} = floww(D, L);
% ...
function [out1] = floww(D, L)
% A_ij = (lambda_i - lambda_j)
% Declare A_ij values in a matrix
A = L.' - L;
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
% Factored out (delt/2) and the D coefficients
B = eye(6,6) - (delT/2) * D( hankel( [4 3 2 1 6 5], [5 4 3 2 1 6] ) ) .*...
[ 0, A(3,2), A(3,2), 0, A(6,5), A(6,5);
A(1,3), 0, A(1,3), A(4,6), 0, -A(4,6);
A(2,1), A(2,1), 0, A(5,4), A(5,4), 0;
0, A(6,2), A(3,5), 0, A(3,5), A(6,2);
A(1,6), 0, A(4,3), A(4,3), 0, A(1,6);
A(5,1), A(2,4), 0, A(2,4), A(5,1), 0];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
out1 = B\D(:);
end
You see when we simplify things like this, code is easier to read. For instance, it looks to me (without knowing the literature at all) like you've got a sign error in your B(2,6) element, it's the opposite sign to all other elements...

Matlab: iterate through image blocks

I would like to divide an image into 8 by 6 blocks and then from each block would like to get the average of red, green and blue values then store the average values from each block into an array. Say that if I have image divided into 4 blocks the result array would be:
A = [average_red, average_green, average_blue,average_red, ...
average_green, average_blue,average_red, average_green, ...
average_blue,average_red, average_green, average_blue,...
average_red, average_green, average_blue,]
The loop I have created looks very complicated, takes a long time to run and I'm not even sure if it's working properly or not as I have no clue how to check. Is there any simpler way to implement this.
Here is the loop:
[rows, columns, ~] = size(img);
[rows, columns, ~] = size(img);
rBlock = 6;
cBlock = 8;
NumberOfBlocks = rBlock * cBlock;
bRow = ceil(rows/rBlock);
bCol = ceil(columns/cBlock);
row = bRow;
col = bCol;
r = zeros(row*col,1);
g = zeros(row*col,1);
b = zeros(row*col,1);
n = 1;
cl = 1;
rw = 1;
for x = 1:NumberOfBlocks
for i = cl : col
for j = rw : row
% some code
end
end
%some code
if i == columns && j ~= rows
cl = 1;
rw = j - (bRow -1);
col = (col - col) + bCol;
row = row + bRaw;
elseif a == columns && c == rows
display('done');
else
cl = i + 1;
rw = j - (bRow -1);
col = col + col;
row = row + row;
end
end
Because there are only 48 block, you may use simple for loop iterating blocks. (I think it's going to be fast enough).
Here is my code:
%Build test image
img = double(imresize(imread('peppers.png'), [200, 300]));
[rows, columns, ~] = size(img);
rBlock = 6;
cBlock = 8;
NumberOfBlocks = rBlock * cBlock;
bRow = ceil(rows/rBlock);
bCol = ceil(columns/cBlock);
idx = 1;
A = zeros(1, rBlock*cBlock*3);
for y = 0:rBlock-1
for x = 0:cBlock-1
%Block (y,x) boundaries: (x0,y0) to (x1,y1)
x0 = x*bCol+1;
y0 = y*bRow+1;
x1 = min(x0+bCol-1, columns); %Limit x1 to columns
y1 = min(y0+bRow-1, rows); %Limit y1 to rows
redMean = mean2(img(y0:y1, x0:x1, 1)); %Mean of red pixel in block (y,x)
greenMean = mean2(img(y0:y1, x0:x1, 2)); %Mean of green pixel in block (y,x)
blueMean = mean2(img(y0:y1, x0:x1, 3)); %Mean of blue pixel in block (y,x)
%Fill 3 elements of array A.
A(idx) = redMean;
A(idx+1) = greenMean;
A(idx+2) = blueMean;
%Advance index by 3.
idx = idx + 3;
end
end

I have segmented image into 16*16 blocks. Now i want to convert each block into images

I have an image divided into 16x16 blocks, where each block are like an array. How can I convert each block as an image?
My code is:
I=imread(image);
img=rgb2gray(I);
[col, row] = find(img<250);
imout = I(min(col):max(col), min(row):max(row));
imshow(imout);
[rows columns numberOfBands]=size(imout);
blockSizeR = 16;
blockSizeC = 16;
wholeBlockRows = floor(rows / blockSizeR);
wholeBlockCols = floor(columns / blockSizeC);
blockNumber=1;
for row = 1 : blockSizeR : rows
for col = 1 : blockSizeC : columns
row1 = row;
row2 = row1 + blockSizeR - 1;
row2 = min(rows, row2);
col1 = col;
col2 = col1 + blockSizeC - 1;
col2 = min(columns, col2);
block=imout(row1:row2, col1:col2);
subplot(16,16,blockNumber);
imshow(block);
blockNumber = blockNumber + 1;
end
end
To create an image file with an array, you should use the built-in imwrite function.
Here is a piece of code that does pretty much the same as what you already do, plus saving it into files:
% ...
% imout contains the image to split
n = 16;
% Divide into nxn images
s1 = [n*ones(1,floor(size(imout,1)/n)) mod(size(imout,1),n)];
s2 = [n*ones(1,floor(size(imout,2)/n)) mod(size(imout,2),n)];
C = mat2cell(imout, s1, s2);
% Save into files
outpref = '/your/path/img_';
outsuff = '.png';
[I, J] = meshgrid(1:numel(s1), 1:numel(s2));
cellfun(#(x,i,j) imwrite(x, [outpref num2str(i) '_' num2str(j) outsuff]), C, num2cell(I'), num2cell(J'));

For loop to subtract values between matrices

I have a matrix with 1 column of data in it. The column has 1556480 points of data in it. Call the matrix Vmatrix. I have another matrix with 1520 values. Call this Vmean_matrix. Is it possible to that a for loop can be created to subtract the first value in Vmean_matrix from the first 1024 values in Vmatrix and the second value in Vmean_matrix from the values 1025 - 2048 in matrix Vmatrix and so on?
Reshape Vmatrix into a 1024-row matrix, reshape Vmean_matrix into a single row, and subtract with bsxfun:
result = bsxfun(#minus, reshape(Vmatrix, 1024, []), Vmean_matrix(:).'); %'// 1024 rows
result = result(:); %// linearize if needed
This may be a way:
% // Vmatrix = ...
% // Vmean_matrix = ...
len = length(Vmean_matrix);
sub = [];
for ii = 0 : len - 1
sub = [sub; Vmatrix( ii*1024+1 : (ii+1)*1024 ) - Vmean_matrix(ii+1)];
end
Or to make it faster, you can write it like this way:
% // Vmatrix = ...
% // Vmean_matrix = ...
len = length(Vmean_matrix);
sub = zeros(length(Vmatrix), 1);
for ii = 0 : len - 1
sub( ii*1024+1 : (ii+1)*1024 ) = Vmatrix( ii*1024+1 : (ii+1)*1024 ) - Vmean_matrix(ii+1);
end

Application of Neural Network in MATLAB

I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!