Neural Network implementation in octave/matlab - matlab

I'm trying to make a simple neural network formed by three layers to resolve a binary classification problem. The first two layers have eight neurons (+ the bias units). I'm using fminunc. This is my cost function:
1 function [jVal, gradient] = cost2(thetaVec, X, y)
2 Theta1 = reshape(thetaVec(1:72),8, 9); % my weights used for
3 Theta2 = reshape(thetaVec(73:81),1, 9); %forward propagation
4 Delta1 = 0; %Delta is divided in Delta1 and Delta2 for simplicity but
5 Delta2 = 0; %they're combined to eventually calculate the gradient
6 jVal = 0; %the value of the costfunction
7 m = length(y);
8 for i = 1:m
9 a1 = X(i, :); %X size: 3x9, a1 size: 1x9
10 z2 = Theta1 * a1';
11 a2 = 1 ./(1 + exp(-z2)); %a2 size: 8x1
12 a2 = [ones(columns(a2), 1) a2']; % bias unit added to a2: a2 size: 1x9
13 z3 = Theta2 * a2';
14 a3 = 1 ./(1 + exp(-z3)); %a3 = h(x(i)) size: 1x1
15 jVal += (-1/m) * (y(i) * log(a3) + (1 - y(i)) * log(1 - a3));
16 delta3 = a3 - y(i); %delta3 size: 1x1
17 delta2 = Theta2' * delta3 .* a2 .* (1 - a2); %delta2 size: 9x9
18 Delta2 += delta3 * a2'; %I use Delta1 and Delta2 as accumulators
19 Delta1 += delta2 * a1'; %size Delta2: 9x1, size Delta1: 9x1
20 endfor
21 jVal = jVal/m; %avarage of jVal
22 Delta = [Delta1;Delta2]; %Deltas are combined. Size Delta: 18x1
23 gradient = (1/m) * Delta;% size gradient: 18x1
24 endfunction
My main:
%the values of the vector from which I derive my weights are chosen randomly
INIT_EPSILON = 0.1; %between thi interval
Theta1 = rand(8, 9) * (2*INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(1, 9) * (2*INIT_EPSILON) - INIT_EPSILON;
thetaVec = [ Theta1(:); Theta2(:)];
options = optimset('GradObj', 'on', 'MaxIter', 10000);
[optTheta, functionVal, exitFlag] = fminunc(#(t) cost2(t, X, y), thetaVec, options)
gradient should be a matrix 9x9, instead it is 18x1, so I can't use fminunc. Actually, I tried to modify the backpropagation part in my cost function several times to obtain a gradient 9x9 (in particular I used to change delta2). However, it never worked, the output was:
optTheta = %a vector of various values
functionVal = 0.71681 %or a similar value
exitFlag = 1
So, even if the exitflag was 1 it didn't converged. Where am I doing wrong?

You currently have the following code:
delta3 = a3 - y(i); % (1×1)
delta2 = Theta2' * delta3 .* a2 .* (1 - a2); % (9×1).*(1×9) = (9×9)
Delta2 += delta3 * a2'; % (9×9)* (9×1) = (9×1)
Delta1 += delta2 * a1'; % (9×9)* (n×1) = (9×1)
I think instead it should be something like:
delta3 = a3 - y(i); % (1×1)
delta2 = Theta2 * delta3 .* a2 .* (1 - a2); % (1×9).*(1×9) = (1×9)
Delta2 += delta3 * a2'; % (9×9)* (9×1) = (9×1)
Delta1 += delta2.' * a1; % (9×1)* (1×9) = (9×9)
And then you discard the gradients of the bias in Delta1 at each step, ending up with an (8×9) matrix as your ongoing Delta1 component. (you may need to transpose Delta1 first, I haven't followed your transpositions closely).
Finally, the whole point of the vertical concatenation step at the end is to "implode" your matrices back into a single-column long vector form, so that they follow they same "specification" as the input "thetaVec", therefore you'd take your (8×9) and (1×9) Delta1 and Delta2 objects, and 'implode' them, i.e.:
[Delta1(:) ; Delta2(:)]
UPDATE
Continuing here the discussion in the comments above.
Consider what Delta1 and Delta2 are. This is the total error (over all observations) corresponding to the elements in Theta1 and Theta2 respectively. In other words, Delta1 should have the same size as Theta1, and Delta2 should have the same size as Theta2. Now that you have included the matrix sizes in your code, you can see instantly that this is not the case.
Furthermore, since each iteration adds to these matrices, the result of each iteration should be a Delta1 and Delta2 of the right size, such that you add the errors contributed at each iteration, to get the total error (per parameter) over all iterations.
Also, consider what delta2 and delta3 are. These are also errors, but they do not refer to errors in the parameters, they refer to errors in nodes. In other words, they show the contribution / responsibility of each node in a layer to the final error. Therefore their size needs to be a vector with the same number of elements as there are nodes in the respective layer. You can already see that it makes no sense for delta2 to have a size of 9x9!
So the logic of the algorithm is this. Take the contribution of the nodes to the error, and backpropagate it, both to the previous nodes, and to the the parameters between them.
E.g. multiply each delta3 (in this case there's only one node) with each node in layer 2, to find how errors were distributed over each parameter element in Theta2.
Similarly, to obtain delta2, we considered the error in delta3, and distributed it to each node in layer 2, by following the parameter multiplication backwards. Then on top of that, we multiplied by the gradient (since this defines to what extent / rate that error would have been propagated forwards).
Now that we've dealt with layer 3 and its interaction with layer 2, we move on to 2 and its interaction with 1. So, similarly, multiply each delta2 (there are 9 nodes in layer 2, so delta2 should have 9 elements) with each node in layer 1. This gives a 9x9 matrix reflecting how each node in layer 1, through parameters Theta1, gave rise to the errors for each node of layer 2. However, because we do not care about the contributions to the 'bias' node of layer 2, we remove this part from Delta1, which leaves us with an 8x9 matrix, exactly the same size as Theta1.
In theory you could have also backpropagated delta2 to find a delta1, but since we have no use for it, we skip this step. After all, what we really care about is the error in the parameters, not in the nodes. (i.e. we only care about the errors in the nodes at each step because we need them to get the errors in the parameters from the layer before).

Related

My matlab neural network backpropagation algorithm seems buggy

Here is my code. I think it is wrong because the difference between this computed gradient and my numerical estimate is too significant. It doesn't seem to be due to wrongly inverting matrices, etc.
For context, Y is the output layer, X is the input layer, and there is only 1 hidden layer. Theta1 is the weights for the first input layer and Theta2 is the weights for the hidden layer.
for t = 1:m
% do fw prop again...
a1 = [1 X(i,:)];
a2 = [1 sigmoid(a1 * Theta1')];
a3 = sigmoid(a2 * Theta2');
delta_3 = a3' - Y(:, t);
delta_2 = Theta2' * delta_3 .* a2' .* (1 - a2)';
delta_2 = delta_2(2:end,:);
Theta1_grad = Theta1_grad + delta_2 * [1 X(i, :)];
Theta2_grad = Theta2_grad + delta_3 * [1 sigmoid([1 X(i,:)] * Theta1')];
end
grad = [Theta1_grad(:) ; Theta2_grad(:)];

Code wont produce the value of a definite integral in MATLAB

I've had problems with my code as I've tried to make an integral compute, but it will not for the power, P2.
I've tried using anonymous function handles to use the integral() function on MATLAB as well as just using int(), but it will still not compute. Are the values too small for MATLAB to integrate or am I just missing something small?
Any help or advice would be appreciated to push me in the right direction. Thanks!
The problem in the code is in the bottom of the section labelled "Power Calculations". My integral also gets quite messy if that makes a difference.
%%%%%%%%%%% Parameters %%%%%%%%%%%%
n0 = 1; %air
n1 = 1.4; %layer 1
n2 = 2.62; %layer 2
n3 = 3.5; %silicon
L0 = 650*10^(-9); %centre wavelength
L1 = 200*10^(-9): 10*10^(-9): 2200*10^(-9); %lambda from 200nm to 2200nm
x = ((pi./2).*(L0./L1)); %layer phase thickness
r01 = ((n0 - n1)./(n0 + n1)); %reflection coefficient 01
r12 = ((n1 - n2)./(n1 + n2)); %reflection coefficient 12
r23 = ((n2 - n3)./(n2 + n3)); %reflection coefficient 23
t01 = ((2.*n0)./(n0 + n1)); %transmission coefficient 01
t12 = ((2.*n1)./(n1 + n2)); %transmission coefficient 12
t23 = ((2.*n2)./(n2 + n3)); %transmission coefficient 23
Q1 = [1 r01; r01 1]; %Matrix Q1
Q2 = [1 r12; r12 1]; %Matrix Q2
Q3 = [1 r23; r23 1]; %Matrix Q3
%%%%%%%%%%%% Graph of L vs R %%%%%%%%%%%
R = zeros(size(x));
for i = 1:length(x)
P = [exp(j.*x(i)) 0; 0 exp(-j.*x(i))]; %General Matrix P
T = ((1./(t01.*t12.*t23)).*(Q1*P*Q2*P*Q3)); %Transmission
T11 = T(1,1); %T11 value
T21 = T(2,1); %T21 value
R(i) = ((abs(T21./T11))^2).*100; %Percent reflectivity
end
plot(L1,R)
title('Percent Reflectance vs. wavelength for 2 Layers')
xlabel('Wavelength (m)')
ylabel('Reflectance (%)')
%%%%%%%%%%% Power Calculation %%%%%%%%%%
syms L; %General lamda
y = ((pi./2).*(L0./L)); %Layer phase thickness with variable Lamda
P1 = [exp(j.*y) 0; 0 exp(-j.*y)]; %Matrix P with variable Lambda
T1 = ((1./(t01.*t12.*t23)).*(Q1*P1*Q2*P1*Q3)); %Transmittivity matrix T1
I = ((6.16^(15))./((L.^(5)).*exp(2484./L) - 1)); %Blackbody Irradiance
Tf11 = T1(1,1); %New T11 section of matrix with variable Lambda
Tf2 = (((abs(1./Tf11))^2).*(n3./n0)); %final transmittivity
P1 = Tf2.*I; %Power before integration
L_initial = 200*10^(-9); %Initial wavelength
L_final = 2200*10^(-9); %Final wavelength
P2 = int(P1, L, L_initial, L_final) %Power production
I've refactored your code
to make it easier to read
to improve code reuse
to improve performance
to make it easier to understand
Why do you use so many unnecessary parentheses?!
Anyway, there's a few problems I saw in your code.
You used i as a loop variable, and j as the imaginary unit. It was OK for this one instance, but just barely so. In the future it's better to use 1i or 1j for the imaginary unit, and/or m or ii or something other than i or j as the loop index variable. You're helping yourself and your colleagues; it's just less confusing that way.
Towards the end, you used the variable name P1 twice in a row, and in two different ways. Although it works here, it's confusing! Took me a while to unravel why a matrix-producing function was producing scalars instead...
But by far the biggest problem in your code is the numerical problems with the blackbody irradiance computation. The term
L⁵ · exp(2484/L) - 1
for λ₀ = 200 · 10⁻⁹ m will require computing the quantity
exp(1.242 · 10¹⁰)
which, needless to say, is rather difficult for a computer :) Actually, the problem with your computation is two-fold. First, the exponentiation is definitely out of range of 64 bit IEEE-754 double precision, and will therefore result in ∞. Second, the parentheses are wrong; Planck's law should read
C/L⁵ · 1/(exp(D) - 1)
with C and D the constants (involving Planck's constant, speed of light, and Boltzmann constant), which you've presumably precomputed (I didn't check the values. I do know choice of units can mess these up, so better check).
So, aside from the silly parentheses error, I suspect the main problem is that you simply forgot to rescale λ to nm. Changing everything in the blackbody equation to nm and correcting those parentheses gives the code
I = 6.16^(15) / ( (L*1e+9)^5 * (exp(2484/(L*1e+9)) - 1) );
With this, I got a finite value for the integral of
P2 = 1.052916498836486e-010
But, again, you'd better double-check everything.
Note that I used quadgk(), because it's one of the better ones available on R2010a (which I'm stuck with), but you can just as easily replace this with integral() available on anything newer than R2012a.
Here's the code I ended up with:
function pwr = my_fcn()
% Parameters
n0 = 1; % air
n1 = 1.4; % layer 1
n2 = 2.62; % layer 2
n3 = 3.5; % silicon
L0 = 650e-9; % centre wavelength
% Reflection coefficients
r01 = (n0 - n1)/(n0 + n1);
r12 = (n1 - n2)/(n1 + n2);
r23 = (n2 - n3)/(n2 + n3);
% Transmission coefficients
t01 = (2*n0) / (n0 + n1);
t12 = (2*n1) / (n1 + n2);
t23 = (2*n2) / (n2 + n3);
% Quality factors
Q1 = [1 r01; r01 1];
Q2 = [1 r12; r12 1];
Q3 = [1 r23; r23 1];
% Initial & Final wavelengths
L_initial = 200e-9;
L_final = 2200e-9;
% plot reflectivity for selected lambda range
plot_reflectivity(L_initial, L_final, 1000);
% Compute power production
pwr = quadgk(#power_production, L_initial, L_final);
% Helper functions
% ========================================
% Graph of lambda vs reflectivity
function plot_reflectivity(L_initial, L_final, N)
L = linspace(L_initial, L_final, N);
R = zeros(size(L));
for ii = 1:numel(L)
% Transmission
T = transmittivity(L(ii));
% Percent reflectivity
R(ii) = 100 * abs(T(2,1)/T(1,1))^2 ;
end
plot(L, R)
title('Percent Reflectance vs. wavelength for 2 Layers')
xlabel('Wavelength (m)')
ylabel('Reflectance (%)')
end
% Compute transmittivity matrix for a single wavelength
function T = transmittivity(L)
% Layer phase thickness with variable Lamda
y = pi/2 * L0/L;
% Matrix P with variable Lambda
P1 = [exp(+1j*y) 0
0 exp(-1j*y)];
% Transmittivity matrix T1
T = 1/(t01*t12*t23) * Q1*P1*Q2*P1*Q3;
end
% Power for a specific wavelength. Note that this function
% accepts vector-valued wavelengths; needed for quadgk()
function pwr = power_production(L)
pwr = zeros(size(L));
for ii = 1:numel(L)
% Transmittivity matrix
T1 = transmittivity(L(ii));
% Blackbody Irradiance
I = 6.16^(15) / ( (L(ii)*1e+9)^5 * (exp(2484/(L(ii)*1e+9)) - 1) );
% final transmittivity
Tf2 = abs(1/T1(1))^2 * n3/n0;
% Power before integration
pwr(ii) = Tf2 * I;
end
end
end

Matlab/CUDA: ocean wave simulation

I've studied "Simulating Ocean Water" article by Jerry Tessendorf and tried to program the Statistical Wave Model but I didn't get correct result and I don't understand why.
In my program I tried only to create a wave height field at time t = 0 without any further changes in time. After execution of my program I got not what I was expecting:
Here's my source code:
clear all; close all; clc;
rng(11); % setting seed for random numbers
meshSize = 64; % field size
windDir = [1, 0]; % ||windDir|| = 1
patchSize = 64;
A = 1e+4;
g = 9.81; % gravitational constant
windSpeed = 1e+2;
x1 = linspace(-10, 10, meshSize+1); x = x1(1:meshSize);
y1 = linspace(-10, 10, meshSize+1); y = y1(1:meshSize);
[X,Y] = meshgrid(x, y);
H0 = zeros(size(X)); % height field at time t = 0
for i = 1:meshSize
for j = 1:meshSize
kx = 2.0 * pi / patchSize * (-meshSize / 2.0 + x(i)); % = 2*pi*n / Lx
ky = 2.0 * pi / patchSize * (-meshSize / 2.0 + y(j)); % = 2*pi*m / Ly
P = phillips(kx, ky, windDir, windSpeed, A, g); % phillips spectrum
H0(i,j) = 1/sqrt(2) * (randn(1) + 1i * randn(1)) * sqrt(P);
end
end
H0 = H0 + conj(H0);
surf(X,Y,abs(ifft(H0)));
axis([-10 10 -10 10 -10 10]);
And the phillips function:
function P = phillips(kx, ky, windDir, windSpeed, A, g)
k_sq = kx^2 + ky^2;
L = windSpeed^2 / g;
k = [kx, ky] / sqrt(k_sq);
wk = k(1) * windDir(1) + k(2) * windDir(2);
P = A / k_sq^2 * exp(-1.0 / (k_sq * L^2)) * wk^2;
end
Is there any matlab ocean simulation source code which could help me to understand my mistakes? Fast google search didn't get any results.
Here's a "correct" result I got from "CUDA FFT Ocean Simulation". I didn't achieve this behavior in Matlab yet but I've ploted "surf" in matlab using data from "CUDA FFT Ocean Simulation". Here's what it looks like:
I've made an experiment and got an interesting result:
I've taken generated h0 from "CUDA FFT Ocean Simulation". So I have to do ifft to transform from frequency domain to spatial domain to plot the graph. I've done it for the same h0 using matlab ifft and using cufftExecC2C from CUDA library. Here's the result:
CUDA ifft:
Matlab ifft:
Either I don't understand some aspects of realization of cufftExecC2C or cufftExecC2C and matlab ifft are different algorithms with different results.
By the way parameters for generating such surface are:
meshSize = 32
A = 1e-7
patchSize = 80
windSpeed = 10
Well that was definitely a funny exercise. This is a completely rewritten answer since you found the issues you were asking about by yourself.
Instead of deleting my answer, there is still merit in posting to help you vectorize and/or explain a few bits of code.
I completely rewrote the GUI I gave in my former answer in order to incorporate your changes and add a couple of options. It started to grew arms and legs so I won't put the listing here but you can find the full file there:
ocean_simulator.m.
This is completely self contained and it includes all the calculating functions I vectorized and list separately below.
The GUI will allow you to play with the parameters, animate the waves, export GIF file (and a few other options like the "preset", but they are not too ironed out yet). A few examples of what you can achieve:
Basic
This is what you get with the quick default settings, and a couple of rendering options. This uses a small grid size and a fast time step, so it runs pretty quickly on any machine.
I am quite limited at home (Pentium E2200 32bit), so I could only practice with limited settings. The gui will run even with the settings maxed but it will become to slow to really enjoy.
However, with a quick run of ocean_simulator at work (I7 64 bit, 8 cores, 16GB ram, 2xSSD in Raid), it makes it much more fun! Here are a few examples:
Although done on a much better machine, I didn't use any parallel functionality nor any GPU calculations, so Matlab was only using a portion of these specs, which means it could probably run just as good on any 64bit system with decent RAM
Windy lake
This is a rather flat water surface like a lake. Even high winds do not produce high amplitude waves (but still a lot of mini wavelets). If you're a wind surfer looking at that from your window on top of the hill, your heart is going to skip a beat and your next move is to call Dave "Man! gear up. Meet you in five on the water!"
Swell
This is you looking from the bridge of your boat on the morning, after having battled with the storm all night. The storm has dissipated and the long large waves are the last witness of what was definitely a shaky night (people with sailing experience will know ...).
T-Storm
And this was what you were up to the night before...
second gif done at home, hence the lack of detail ... sorry
To the bottom:
Finally, the gui will let you add a patch around the water domain. In the gui it is transparent so you could add objects underwater or a nice ocean bottom. Unfortunately, the GIF format cannot include an alpha channel so no transparency here (but if you export in a video then you should be ok).
Moreover, the export to GIF degrade the image, the joint between the domain border and the water surface is flawless if you run that in Matlab. In some case it also make Matlab degrade the rendering of the lighting, so this is definitely not the best option for export, but it allows more things to play within matlab.
Now onto the code:
Instead of listing the full GUI, which would be super long (this post is long enough already), I will just list here the re-written version of your code, and explain the changes.
You should notice a massive increase of speed execution (orders of magnitude), thanks to the remaining vectorization, but mostly for two reasons:
(i) A lot of calculations were repeated. Caching values and reusing them is much faster than recalculating full matrices in loops (during the animation part).
(ii) Note how I defined the surface graphic object. It is defined only once (empty even), then all the further calls (in the loop) only update the underlying ZData of the surface object (instead of re-creating a surface object at each iteration.
Here goes:
%% // clear workspace
clear all; close all; clc;
%% // Default parameters
param.meshsize = 128 ; %// main grid size
param.patchsize = 200 ;
param.windSpeed = 100 ; %// what unit ? [m/s] ??
param.winddir = 90 ; %// Azimuth
param.rng = 13 ; %// setting seed for random numbers
param.A = 1e-7 ; %// Scaling factor
param.g = 9.81 ; %// gravitational constant
param.xLim = [-10 10] ; %// domain limits X
param.yLim = [-10 10] ; %// domain limits Y
param.zLim = [-1e-4 1e-4]*2 ;
gridSize = param.meshsize * [1 1] ;
%% // Define the grid X-Y domain
x = linspace( param.xLim(1) , param.xLim(2) , param.meshsize ) ;
y = linspace( param.yLim(1) , param.yLim(2) , param.meshsize ) ;
[X,Y] = meshgrid(x, y);
%% // get the grid parameters which remain constants (not time dependent)
[H0, W, Grid_Sign] = initialize_wave( param ) ;
%% // calculate wave at t0
t0 = 0 ;
Z = calc_wave( H0 , W , t0 , Grid_Sign ) ;
%% // populate the display panel
h.fig = figure('Color','w') ;
h.ax = handle(axes) ; %// create an empty axes that fills the figure
h.surf = handle( surf( NaN(2) ) ) ; %// create an empty "surface" object
%% // Display the initial wave surface
set( h.surf , 'XData',X , 'YData',Y , 'ZData',Z )
set( h.ax , 'XLim',param.xLim , 'YLim',param.yLim , 'ZLim',param.zLim )
%% // Change some rendering options
axis off %// make the axis grid and border invisible
shading interp %// improve shading (remove "faceted" effect)
blue = linspace(0.4, 1.0, 25).' ; cmap = [blue*0, blue*0, blue]; %'// create blue colormap
colormap(cmap)
%// configure lighting
h.light_handle = lightangle(-45,30) ; %// add a light source
set(h.surf,'FaceLighting','phong','AmbientStrength',.3,'DiffuseStrength',.8,'SpecularStrength',.9,'SpecularExponent',25,'BackFaceLighting','unlit')
%% // Animate
view(75,55) %// no need to reset the view inside the loop ;)
timeStep = 1./25 ;
nSteps = 2000 ;
for time = (1:nSteps)*timeStep
%// update wave surface
Z = calc_wave( H0,W,time,Grid_Sign ) ;
h.surf.ZData = Z ;
pause(0.001);
end
%% // This block of code is only if you want to generate a GIF file
%// be carefull on how many frames you put there, the size of the GIF can
%// quickly grow out of proportion ;)
nFrame = 55 ;
gifFileName = 'MyDancingWaves.gif' ;
view(-70,40)
clear im
f = getframe;
[im,map] = rgb2ind(f.cdata,256,'nodither');
im(1,1,1,20) = 0;
iframe = 0 ;
for time = (1:nFrame)*.5
%// update wave surface
Z = calc_wave( H0,W,time,Grid_Sign ) ;
h.surf.ZData = Z ;
pause(0.001);
f = getframe;
iframe= iframe+1 ;
im(:,:,1,iframe) = rgb2ind(f.cdata,map,'nodither');
end
imwrite(im,map,gifFileName,'DelayTime',0,'LoopCount',inf)
disp([num2str(nFrame) ' frames written in file: ' gifFileName])
You'll notice that I changed a few things, but I can assure you the calculations are exactly the same. This code calls a few subfunctions but they are all vectorized so if you want you can just copy/paste them here and run everything inline.
The first function called is initialize_wave.m
Everything calculated here will be constant later (it does not vary with time when you later animate the waves), so it made sense to put that into a block on it's own.
function [H0, W, Grid_Sign] = initialize_wave( param )
% function [H0, W, Grid_Sign] = initialize_wave( param )
%
% This function return the wave height coefficients H0 and W for the
% parameters given in input. These coefficients are constants for a given
% set of input parameters.
% Third output parameter is optional (easy to recalculate anyway)
rng(param.rng); %// setting seed for random numbers
gridSize = param.meshsize * [1 1] ;
meshLim = pi * param.meshsize / param.patchsize ;
N = linspace(-meshLim , meshLim , param.meshsize ) ;
M = linspace(-meshLim , meshLim , param.meshsize ) ;
[Kx,Ky] = meshgrid(N,M) ;
K = sqrt(Kx.^2 + Ky.^2); %// ||K||
W = sqrt(K .* param.g); %// deep water frequencies (empirical parameter)
[windx , windy] = pol2cart( deg2rad(param.winddir) , 1) ;
P = phillips(Kx, Ky, [windx , windy], param.windSpeed, param.A, param.g) ;
H0 = 1/sqrt(2) .* (randn(gridSize) + 1i .* randn(gridSize)) .* sqrt(P); % height field at time t = 0
if nargout == 3
Grid_Sign = signGrid( param.meshsize ) ;
end
Note that the initial winDir parameter is now expressed with a single scalar value representing the "azimuth" (in degrees) of the wind (anything from 0 to 360). It is later translated to its X and Y components thanks to the function pol2cart.
[windx , windy] = pol2cart( deg2rad(param.winddir) , 1) ;
This insure that the norm is always 1.
The function calls your problematic phillips.m separately, but as said before it works even fully vectorized so you can copy it back inline if you like. (don't worry I checked the results against your versions => strictly identical). Note that this function does not output complex numbers so there was no need to compare the imaginary parts.
function P = phillips(Kx, Ky, windDir, windSpeed, A, g)
%// The function now accept scalar, vector or full 2D grid matrix as input
K_sq = Kx.^2 + Ky.^2;
L = windSpeed.^2 ./ g;
k_norm = sqrt(K_sq) ;
WK = Kx./k_norm * windDir(1) + Ky./k_norm * windDir(2);
P = A ./ K_sq.^2 .* exp(-1.0 ./ (K_sq * L^2)) .* WK.^2 ;
P( K_sq==0 | WK<0 ) = 0 ;
end
The next function called by the main program is calc_wave.m. This function finishes the calculations of the wave field for a given time. It is definitely worth having that on its own because this is the mimimun set of calculations which will have to be repeated for each given time when you want to animate the waves.
function Z = calc_wave( H0,W,time,Grid_Sign )
% Z = calc_wave( H0,W,time,Grid_Sign )
%
% This function calculate the wave height based on the wave coefficients H0
% and W, for a given "time". Default time=0 if not supplied.
% Fourth output parameter is optional (easy to recalculate anyway)
% recalculate the grid sign if not supplied in input
if nargin < 4
Grid_Sign = signGrid( param.meshsize ) ;
end
% Assign time=0 if not specified in input
if nargin < 3 ; time = 0 ; end
wt = exp(1i .* W .* time ) ;
Ht = H0 .* wt + conj(rot90(H0,2)) .* conj(wt) ;
Z = real( ifft2(Ht) .* Grid_Sign ) ;
end
The last 3 lines of calculations require a bit of explanation as they received the biggest changes (all for the same result but a much better speed).
Your original line:
Ht = H0 .* exp(1i .* W .* (t * timeStep)) + conj(flip(flip(H0,1),2)) .* exp(-1i .* W .* (t * timeStep));
recalculate the same thing too many times to be efficient:
(t * timeStep) is calculated twice on the line, at each loop, while it is easy to get the proper time value for each line when time is initialised at the beginning of the loop for time = (1:nSteps)*timeStep.
Also note that exp(-1i .* W .* time) is the same than conj(exp(1i .* W .* time)). Instead of doing 2*m*n multiplications to calculate them each, it is faster to calculate one once, then use the conj() operation which is much faster.
So your single line would become:
wt = exp(1i .* W .* time ) ;
Ht = H0 .* wt + conj(flip(flip(H0,1),2)) .* conj(wt) ;
Last minor touch, flip(flip(H0,1),2)) can be replaced by rot90(H0,2) (also marginally faster).
Note that because the function calc_wave is going to be repeated extensively, it is definitely worth reducing the number of calculations (as we did above), but also by sending it the Grid_Sign parameter (instead of letting the function recalculate it every iteration). This is why:
Your mysterious function signCor(ifft2(Ht),meshSize)), simply reverse the sign of every other element of Ht. There is a faster way of achieving that: simply multiply Ht by a matrix the same size (Grid_Sign) which is a matrix of alternated +1 -1 ... and so on.
so signCor(ifft2(Ht),meshSize) becomes ifft2(Ht) .* Grid_Sign.
Since Grid_Sign is only dependent on the matrix size, it does not change for each time in the loop, you only calculate it once (before the loop) then use it as it is for every other iteration. It is calculated as follow (vectorized, so you can also put it inline in your code):
function sgn = signGrid(n)
% return a matrix the size of n with alternate sign for every indice
% ex: sgn = signGrid(3) ;
% sgn =
% -1 1 -1
% 1 -1 1
% -1 1 -1
[x,y] = meshgrid(1:n,1:n) ;
sgn = ones( n ) ;
sgn(mod(x+y,2)==0) = -1 ;
end
Lastly, you will notice a difference in how the grids [Kx,Ky] are defined between your version and this one. They do produce slightly different result, it's just a matter of choice.
To explain with a simple example, let's consider a small meshsize=5. Your way of doing things will split that into 5 values, equally spaced, like so:
Kx(first line)=[-1.5 -0.5 0.5 1.5 2.5] * 2 * pi / patchSize
while my way of producing the grid will produce equally spaced values, but also centered on the domain limits, like so:
Kx(first line)=[-2.50 -1.25 0.0 1.25 2.50] * 2 * pi / patchSize
It seems to respect more your comment % = 2*pi*n / Lx, -N/2 <= n < N/2 on the line where you define it.
I tend to prefer symmetric solutions (plus it is also slightly faster but it is only calculated once so it is not a big deal), so I used my vectorized way, but it is purely a matter of choice, you can definitely keep your way, it only ever so slightly "offset" the whole result matrix, but it doesn't perturbate the calculations per se.
last remains of the first answer
Side programming notes:
I detect you come from the C/C++ world or family. In Matlab you do not need to define decimal number with a coma (like 2.0, you used that for most of your numbers). Unless specifically defined otherwise, Matlab by default cast any number to double, which is a 64 bit floating point type. So writing 2 * pi is enough to get the maximum precision (Matlab won't cast pi as an integer ;-)), you do not need to write 2.0 * pi. Although it will still work if you don't want to change your habits.
Also, (one of the great benefit of Matlab), adding . before an operator usually mean "element-wise" operation. You can add (.+), substract (.-), multiply (.*), divide (./) full matrix element wise this way. This is how I got rid of all the loops in your code. This also work for the power operator: A.^2 will return a matrix the same size as A with every element squared.
Here's the working program.
First of all - source code:
clear all; close all; clc;
rng(13); % setting seed for random numbers
meshSize = 128; % field size
windDir = [0.1,1];
patchSize = 200;
A = 1e-7;
g = 9.81; % gravitational constant
windSpeed = 100;
timeStep = 1/25;
x1 = linspace(-10, 10, meshSize+1); x = x1(1:meshSize);
y1 = linspace(-10, 10, meshSize+1); y = y1(1:meshSize);
[X,Y] = meshgrid(x,y); % wave field
i = 1:meshSize; j = 1:meshSize; % indecies
[I,J] = meshgrid(i,j); % field of indecies
Kx = 2.0 * pi / patchSize * (-meshSize / 2.0 + I); % = 2*pi*n / Lx, -N/2 <= n < N/2
Ky = 2.0 * pi / patchSize * (-meshSize / 2.0 + J); % = 2*pi*m / Ly, -M/2 <= m < M/2
K = sqrt(Kx.^2 + Ky.^2); % ||K||
W = sqrt(K .* g); % deep water frequencies (empirical parameter)
P = zeros(size(X)); % Cant compute P without loops
for i = 1:meshSize
for j = 1:meshSize
P(i,j) = phillips(Kx(i,j), Ky(i,j), windDir, windSpeed, A, g); % phillips spectrum
end
end
H0 = 1/sqrt(2) .* (randn(size(X)) + 1i .* randn(size(X))) .* sqrt(P); % height field at time t = 0
rotate3d on;
for t = 1:10000 % 10000 * timeStep (sec)
Ht = H0 .* exp(1i .* W .* (t * timeStep)) + ...
conj(flip(flip(H0,1),2)) .* exp(-1i .* W .* (t * timeStep));
[az,el] = view;
surf(X,Y,real(signCor(ifft2(Ht),meshSize)));
axis([-10 10 -10 10 -1e-4 1e-4]); view(az,el);
blue = linspace(0.4, 1.0, 25)'; map = [blue*0, blue*0, blue];
%shading interp; % improve shading (remove "faceted" effect)
colormap(map);
pause(1/60);
end
phillips.m: (I've tried to vectorize the computation of Phillips spectrum but I faced with a difficulty which I'll show further)
function P = phillips(kx, ky, windDir, windSpeed, A, g)
k_sq = kx^2 + ky^2;
if k_sq == 0
P = 0;
else
L = windSpeed^2 / g;
k = [kx, ky] / sqrt(k_sq);
wk = k(1) * windDir(1) + k(2) * windDir(2);
P = A / k_sq^2 * exp(-1.0 / (k_sq * L^2)) * wk^2;
if wk < 0
P = 0;
end
end
end
signCor.m: (This function is an absolutely mystery for me... I've copied it from "CUDA FFT Ocean Simulation" realization. Simulation works much worse without it. And again I don't know how to vectorize this function.)
function H = signCor(H1, meshSize)
H = H1;
for i = 1:meshSize
for j = 1:meshSize
if mod(i+j,2) == 0
sign = -1; % works fine if we change signs vice versa
else
sign = 1;
end
H(i,j) = H1(i,j) * sign;
end
end
end
The biggest mistake that I've done is that I used ifft instead of using ifft2, that's why CUDA ifft and Matlab ifft didn't match.
My second mistake was in this lines of code:
kx = 2.0 * pi / patchSize * (-meshSize / 2.0 + x(i)); % = 2*pi*n / Lx
ky = 2.0 * pi / patchSize * (-meshSize / 2.0 + y(j)); % = 2*pi*m / Ly
I should've write:
kx = 2.0 * pi / patchSize * (-meshSize / 2.0 + i); % = 2*pi*n / Lx
ky = 2.0 * pi / patchSize * (-meshSize / 2.0 + j); % = 2*pi*m / Ly
I've played a bit with parameters A, meshSize, patchSize and I came to the conclusion that:
Somehow plausible parameter of wave amplitude is A * (patchSize / meshSize), where A is nothing but a scaling factor.
For 'calm' patchSize / meshSize <= 0.5.
For 'tsunami' patchSize / meshSize >= 3.0.
Difficulty with a vectorization of Phillips spectrum:
I have 2 functions:
% non-vectorized spectrum
function P = phillips1(kx, ky, windDir, windSpeed, A, g)
k_sq = kx^2 + ky^2;
if k_sq == 0
P = 0;
else
L = windSpeed^2 / g;
k = [kx, ky] / sqrt(k_sq);
wk = k(1) * windDir(1) + k(2) * windDir(2);
P = A / k_sq^2 * exp(-1.0 / (k_sq * L^2)) * wk^2;
if wk < 0
P = 0;
end
end
end
% vectorized spectrum
function P = phillips2(Kx, Ky, windDir, windSpeed, A, g)
K_sq = Kx .^ 2 + Ky .^ 2;
L = -g^2 / windSpeed^4;
WK = (Kx ./ K_sq) .* windDir(1) + (Ky ./ K_sq) .* windDir(2);
P = (A ./ (K_sq .^ 2)) .* ( exp(L ./ K_sq) .* (WK .^ 2) );
P(K_sq == 0) = 0;
P(WK < 0) = 0;
P(isinf(P)) = 0;
end
After I compute P1 using phillips1 and P2 using phillips2 I plot their difference:
subplot(2,1,1); surf(X,Y,real(P2-P1)); title('Difference in real part');
subplot(2,1,2); surf(X,Y,imag(P2-P1)); title('Difference in imaginary part');
It perfectly illustrates that there's a huge difference between this 2 spectrums in real part.

Neural Network not fitting XOR

I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function.
x Input 4x2 matrix [0 0; 0 1; 1 0; 1 1]
y Output 4x1 matrix [0; 1; 1; 0]
theta Hidden / output layer weights
z Weighted sums
a Activation function applied to weighted sums
m Sample count (4 here)
My weights are initialized as follows
epsilon_init = 0.12;
theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init;
theta2 = rand(outputCount, hiddenCount + 1) * 2 * epsilon_init * epsilon_init;
Feed forward
a1 = x;
a1_with_bias = [ones(m, 1) a1];
z2 = a1_with_bias * theta1';
a2 = sigmoid(z2);
a2_with_bias = [ones(size(a2, 1), 1) a2];
z3 = a2_with_bias * theta2';
a3 = sigmoid(z3);
Then I compute the logistic cost function
j = -sum((y .* log(a3) + (1 - y) .* log(1 - a3))(:)) / m;
Back propagation
delta2 = (a3 - y);
gradient2 = delta2' * a2_with_bias / m;
delta1 = (delta2 * theta2(:, 2:end)) .* sigmoidGradient(z2);
gradient1 = delta1' * a1_with_bias / m;
The gradients were verified to be correct using gradient checking.
I then use these gradients to find the optimal values for theta using gradient descent, though using Octave's fminunc function yields the same results. The cost function converges to ln(2) (or 0.5 for a squared errors cost function) because the network outputs 0.5 for all four inputs no matter how many hidden units I use.
Does anyone know where my mistake is?
Start with a larger range when initialising weights, including negative values. It is difficult for your code to "cross-over" between positive and negative weights, and you probably meant to put * 2 * epsilon_init - epsilon_init; when instead you put * 2 * epsilon_init * epsilon_init;. Fixing that may well fix your code.
As a rule of thumb, I would do something like this:
theta1 = ( 0.5 * sqrt ( 6 / ( inputCount + hiddenCount) ) *
randn( hiddenCount, inputCount + 1 ) );
theta2 = ( 0.5 * sqrt ( 6 / ( hiddenCount + outputCount ) ) *
randn( outputCount, hiddenCount + 1 ) );
The multiplier is just some advice I picked up on a course, I think that it is backed by a research paper that compared a few different approaches.
In addition, you may need a lot of iterations to learn XOR if you run basic gradient descent. I suggest running for at least 10000 before declaring that learning isn't working. The fminunc function should do better than that.
I ran your code with 2 hidden neurons, basic gradient descent and the above initialisations, and it learned XOR correctly. I also tried adding momentum terms, and the learning was faster and more reliable, so I suggest you take a look at that next.
You need at least 3 neurons in the hidden layer and correct the initialization as the first answer suggest.
If the sigmoidGradient(z2) means a2.*(1-a2) all the rest of the code seems ok to me.
Best reggards,

Neural Networks: Sigmoid Activation Function for continuous output variable

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.
In particular, the neural network which I had completed correctly as part of the assignment was as follows:
Sigmoid activation function: g(z) = 1/(1+e^(-z))
10 output units, each which could take 0 or 1
1 hidden layer
Back-propagation method used to minimize cost function
Cost function:
where L=number of layers, s_l = number of units in layer l, m = number of training examples, K = number of output units
Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have
Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
Updated references to the number of output units
Updated the cost function in the back-propagation algorithm to:
where a_3 is the value of the output unit determined from forward propagation.
I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).
Can anyone advise of what other steps would be required?
Coded in Matlab as follows:
% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);
% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;
% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );
% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
I have since realised that this question is similar to that asked by #Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.
First, your cost function should be:
J = 1/m * sum( (a3-y).^2 );
I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).
Check this slide for more details.
EDIT:
In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)
function [J grad] = nnCostFunctionRegression(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
X = [ones(m, 1) X];
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);
y_recode = zeros(length(y),num_labels);
for i=1:length(y)
y_recode(i,y(i))=1;
end
y = y_recode;
regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');
delta_cap2 = delta_3' * z1;
delta_cap1 = delta_2' * X;
Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));
Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
If you want to have continuous output try not to use sigmoid activation when computing target value.
a1 = [ones(m, 1) X];
a2 = sigmoid(X * Theta1');
a2 = [ones(m, 1) z1];
a3 = z1 * Theta2';
ht = a3;
Normalize input before using it in nnCostFunction. Everything else remains same.