I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset.
For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);
I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors.
Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here.
rbmvislinear.m:
epsilonw = 0.001; % Learning rate for weights
epsilonvb = 0.001; % Learning rate for biases of visible units
epsilonhb = 0.001; % Learning rate for biases of hidden units
weightcost = 0.0002;
initialmomentum = 0.5;
finalmomentum = 0.9;
[numcases numdims numbatches]=size(batchdata);
if restart ==1,
restart=0;
epoch=1;
% Initializing symmetric weights and biases.
vishid = 0.1*randn(numdims, numhid);
hidbiases = zeros(1,numhid);
visbiases = zeros(1,numdims);
poshidprobs = zeros(numcases,numhid);
neghidprobs = zeros(numcases,numhid);
posprods = zeros(numdims,numhid);
negprods = zeros(numdims,numhid);
vishidinc = zeros(numdims,numhid);
hidbiasinc = zeros(1,numhid);
visbiasinc = zeros(1,numdims);
sigmainc = zeros(1,numhid);
batchposhidprobs=zeros(numcases,numhid,numbatches);
end
for epoch = epoch:maxepoch,
fprintf(1,'epoch %d\r',epoch);
errsum=0;
for batch = 1:numbatches,
if (mod(batch,100)==0)
fprintf(1,' %d ',batch);
end
%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data = batchdata(:,:,batch);
poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));
batchposhidprobs(:,:,batch)=poshidprobs;
posprods = data' * poshidprobs;
poshidact = sum(poshidprobs);
posvisact = sum(data);
%%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
poshidstates = poshidprobs > rand(numcases,numhid);
%%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean
neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1)));
negprods = negdata'*neghidprobs;
neghidact = sum(neghidprobs);
negvisact = sum(negdata);
%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err= sum(sum( (data-negdata).^2 ));
errsum = err + errsum;
if epoch>5,
momentum=finalmomentum;
else
momentum=initialmomentum;
end;
%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
vishidinc = momentum*vishidinc + ...
epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);
visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);
hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);
vishid = vishid + vishidinc;
visbiases = visbiases + visbiasinc;
hidbiases = hidbiases + hidbiasinc;
%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
fprintf(1, '\nepoch %4i error %f \n', epoch, errsum);
end
dofacedeepauto.m:
clear all
close all
maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine.
numhid=2000; numpen=1000; numpen2=500; numopen=30;
fprintf(1,'Pretraining a deep autoencoder. \n');
fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);
load fdata
%makeFaceData;
[numcases numdims numbatches]=size(batchdata);
fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);
restart=1;
rbmvislinear;
hidrecbiases=hidbiases;
save mnistvh vishid hidrecbiases visbiases;
maxepoch=50;
fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);
batchdata=batchposhidprobs;
numhid=numpen;
restart=1;
rbm;
hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
save mnisthp hidpen penrecbiases hidgenbiases;
fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);
batchdata=batchposhidprobs;
numhid=numpen2;
restart=1;
rbm;
hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;
fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);
batchdata=batchposhidprobs;
numhid=numopen;
restart=1;
rbmhidlinear;
hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
save mnistpo hidtop toprecbiases topgenbiases;
backpropface;
Thanks for your time
Silly me, I had forgotten to change the back-propagation fine-tuning script (backprop.m). One has to change the output layer (where the faces get reconstructed) to be for real-valued units. I.e.
dataout = w7probs*w8;
Related
Currently, I'm working on a simple two Layer NN (25 input - sigmoid, 199 outputs - softmax) from scratch for debug reasons - Precisely, I want to track some values.
My input are batches or generally speaking matrices of dimension (rows x 25) in order to fit the input layer structure. Regarding my weight matrices: the first but last rows are the weights w_ij. The last row includes the biases.
The forward method seems to work correctly but I think I have a wrong backpropagation.
My backpropagation code snippet:
%Error gradient for the softmax output
error = single(output) - single(targets);
%Error for the input layer - W21 includes w_ij
error_out_to_input = error*(W21.');
gradient_outputLayer = single(zeros(26,199));
gradient_outputLayer = single(first_layerout_zerofilled.')*single(error);
biasGrad = single(sum(error,1));
gradient_outputLayer(26:26,:) = single(biasGrad);
%InputLayer
%derivative of sigmoid o(1-o)
%1
grad = single(1);
%1-o
grad = single(grad) - single(first_layerout_zerofilled);
%o(1-o)
grad = single(first_layerout_zerofilled) .* single((grad));
%final error
grad = single(grad) .* single(error_out_to_input);
gradient_inputLayer = single(zeros(26,25));
gradient_inputLayer = single(inputs.')*single(grad);
biasGrad = single(sum(grad,1));
gradient_inputLayer(26:26,:) = single(biasGrad);
%Update
W1 = W1-gradient_inputLayer * learning_rate;
W2 = W2-gradient_outputLayer * learning_rate;
This is not a question of efficiency. I just want to be sure that my backprogation calculates the correct gradients. I hope someone can review.
I am trying to train a Convolutional Neural Network using Sparse autoenconders in order to compute the filters for the convolution layer. I am using UFLDL code in order to construct patches and to train the CNN network. My code is the following:
===========================================================================
imageDim = 30; % image dimension
imageChannels = 3; % number of channels (rgb, so 3)
patchDim = 10; % patch dimension
numPatches = 100000; % number of patches
visibleSize = patchDim * patchDim * imageChannels; % number of input units
outputSize = visibleSize; % number of output units
hiddenSize = 400; % number of hidden units
epsilon = 0.1; % epsilon for ZCA whitening
poolDim = 10; % dimension of pooling region
optTheta = zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);
ZCAWhite = zeros(visibleSize, visibleSize);
meanPatch = zeros(visibleSize, 1);
load patches_16_1
===========================================================================
% Display and check to see that the features look good
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
displayColorNetwork( (W*ZCAWhite));
stepSize = 100;
assert(mod(hiddenSize, stepSize) == 0, stepSize should divide hiddenSize);
load train.mat % loads numTrainImages, trainImages, trainLabels
load train.mat % loads numTestImages, testImages, testLabels
% size 30x30x3x8862
numTestImages = 8862;
numTrainImages = 8862;
pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, floor((imageDim - patchDim + 1) / poolDim), floor((imageDim - patchDim + 1) / poolDim) );
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...
floor((imageDim - patchDim + 1) / poolDim), ...
floor((imageDim - patchDim + 1) / poolDim) );
tic();
testImages = trainImages;
for convPart = 1:(hiddenSize / stepSize)
featureStart = (convPart - 1) * stepSize + 1;
featureEnd = convPart * stepSize;
fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);
Wt = W(featureStart:featureEnd, :);
bt = b(featureStart:featureEnd);
fprintf('Convolving and pooling train images\n');
convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
trainImages, Wt, bt, ZCAWhite, meanPatch);
pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
toc();
clear convolvedFeaturesThis pooledFeaturesThis;
fprintf('Convolving and pooling test images\n');
convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
testImages, Wt, bt, ZCAWhite, meanPatch);
pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
toc();
clear convolvedFeaturesThis pooledFeaturesThis;
end
I have problems calculating the convolution and pooling layers. I am getting pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis; subscripted assignment dimension mismatch. The pathces have normally calculated and they are:
I am trying to understand what exactly the convPart variable is doing and what pooledFeaturesThis. Secondly I notice that my problem is a mismatch in this line pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
where I got the message that the variables is mismatching. THe size of pooledFeaturesThis is 100x3x2x2 where the size of pooledFeaturesTrain is 400x8862x2x2. What exactly pooledFeaturesTrain represents? Is the 2x2 result for every filter? CnnConvolve could be found here :
EDIT: I have changed a little bit my code and it works. However I a little bit concerned about the comprehension of the code.
Ok so in this line you are setting the pooling region.
poolDim = 10; % dimension of pooling region
This part means that for each kernel in each layer you are taking the image and pooling and area of 10x10 pixels. From your code it looks like you are applying a mean function, which means that it a patch and computes the mean and outputs this in the next layer... aka, takes the image from say 100x100 to 10x10. In your network you are repeating convolution+pooling until you get down to a 2x2 image, based on this output (btw, this is not generally good practice in my experience).
400x8862x2x2
Anyways back to your code. Notice that at the beginning of your training you do the following initialization:
pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, floor((imageDim - patchDim + 1) / poolDim), floor((imageDim - patchDim + 1) / poolDim) );
So your error is quite simple and correct - the size of the matrix which holds the output of the convolution+pooling is not the size of the matrix you initialized.
The question is now how to fix it. I supposed a lazy man's way to fix it is to take out the initialization. It will drastically slow down your code, and is not guaranteed to work if you have more than 1 layer.
I suggest you instead have pooledFeaturesTrain be a struct of 3 dimensional array. So instead of this
pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
you'd do something more along the lines of this:
pooledFeaturesTrain{n}(:, :, :) = pooledFeaturesThis;
where n is the current layer.
CNN nets aren't as easy as they're cracked up to be- and even when they don't crash getting them to train well is a feat. I highly suggest reading up on the theory of CNNs - it will make coding and debugging much easier.
Good luck with it ! :)
I tried to implement the Widrow - Nguyen weight initialization on MATLAB 2014a. to compare its performance against HARD RANDOM weight init technique.
a = -1;
b = 1;
% WIDROW weights for Layer Input to Hidden Layer 1
sum_sq_wts = 0;
for k=1:30
iw(:,:) = zeros(num_input, nodes_hidden_layer);
for i=1:num_input
for j=1:nodes_hidden_layer
iw(i,j)=(b-a)*rand(1,1) + a;
sum_sq_wts = sum_sq_wts + (iw(i,j)*iw(i,j));
end
end
norm = sqrt(sum_sq_wts);
beta = 0.7*nodes_hidden_layer.^(1/num_input);
for i=1:num_input
for j=1:nodes_hidden_layer
iw(i,j) = beta*iw(i,j)/norm;
end
end
IW{k}=iw';
end
% WIDROW weights for Hidden Layer 1 to output Layer
sum_sq_wts = 0;
for k=1:30
lw(:,:) = zeros(nodes_hidden_layer,1);
for i=1:nodes_hidden_layer
for j=1:1
iw(i,j)=(b-a)*rand(1,1) + a;
sum_sq_wts = sum_sq_wts + iw(i,j)*iw(i,j);
end
end
norm = sqrt(sum_sq_wts);
beta = 0.7*nodes_hidden_layer.^(1/num_input);
for i=1:nodes_hidden_layer
for j=1:1
lw(i,j) = beta*lw(i,j)/norm;
end
end
LW{k}=lw';
end
WidNgu{1,1} = IW;
WidNgu{1,2} = LW;
I am generating 30 different set of Widrow weights in the above code. The problem is that the weights generated by the above code generate a lesser performance value for a neural network trained using them as compared to the random set of weights. The problem i used to train was a simple function approx prob.
One thing more interesting i observe is that, the first weight set generated by the above, at times performs better than the random weight approach, but the rest 29 sets that i created are always poor performing.
Where have i gone wrong in this??
I am attempting to fit some UV/Vis absorbance spectra to published reference standards. In general, the absorbance one obtains from the spectrometer is equal to a linear combination of the concentration of each absorber multiplied by the cross-section of absorption for each molecule at each wavelength (and multiplied by the pathlength of the spectrometer).
That said, not all spectrometers are precise in the x axis (wavelength), so some adjustment may be necessary to fit one's experimental data to the reference standards.
In this script, I am adjusting the index of my wavelength and spectral intensity to see if integer steps of my spectra result in a better fit to the reference standards (each step is 0.08 nm). Of course, I need to save the output of the fit parameters; however, since each fit has a different set of dimensions, I'm having difficulty just throwing them into a structure(k) (commented out in the following code snippet).
If anyone has a tip or hint, I'd be very appreciative. The relevant portion of my sample code follows:
for i = -15:15
lengthy = length(wavelengthy)
if i >= 0
xvalue = (1:lengthy - abs(i))
yvalue = (1+abs(i):lengthy)
else
xvalue = (1+abs(i):lengthy)
yvalue = (1: lengthy - abs(i))
end
Phi = #(k,wavelengthy) ( O3standard(yvalue) .* k(1) + Cl2standard(yvalue) .* k(2) + ClOstandard(yvalue) .* k(3) + OClOstandard(yvalue) .* k(4));
[khat, resnorm, residual, exitflag, output, lambda, jacobian] = lsqcurvefit(Phi,k0,wavelengthy(xvalue),workingspectra(yvalue), lowerbound, upperbound, options);
%parameters.khat(k,:) = khat;
%parameters.jacobian(k,:) = jacobian;
%parameters.exitflag(k,:) = exitflag;
%parameters.output(k,:) = output;
%parameters.residuals(k,:) = residual
%concentrations(:,k) = khat./pathlength
k=k+1
end
Can someone tell me how I can implement Harmonic Product Spectrum using MATLAB to find the fundamental frequency of a note in the presence of harmonics?? I know I'm supposed to downsample my signal a number of times (after performing fft of course) and then multiply them with the original signal.
Say my fft signal is "FFT1"
then the code would roughly be like
hps1 = downsample(FFT1,2);
hps2 = downsample(FFT1,3);
hps = FFT1.*hps1.*hps2;
Is this code correct??? I want to know if I've downsampled properly and since each variable has a different length multiplying them results in matrix dimension error.. I really need some real quick help as its for a project work... Really desperate....
Thanx in advance....
OK you can't do "hps = FFT1.*hps1.*hps2;" for each downsampled data, do you have different sizes ...
I did a example for you how make a very simple Harmonic Product Spectrum (HPS) using 5 harmonics decimation (downsample), I just test in sinusoidal signals, I get very near fundamental frequency in my tests.
This code only shows how to compute the main steps of the algorithm, is very likely that you will need improve it !
Source:
%[x,fs] = wavread('ederwander_IN_250Hz.wav');
CorrectFactor = 0.986;
threshold = 0.2;
%F0 start test
f = 250;
fs = 44100;
signal= 0.9*sin(2*pi*f/fs*(0:9999));
x=signal';
framed = x(1:4096);
windowed = framed .* hann(length(framed));
FFT = fft(windowed, 4096);
FFT = FFT(1 : size(FFT,1) / 2);
FFT = abs(FFT);
hps1 = downsample(FFT,1);
hps2 = downsample(FFT,2);
hps3 = downsample(FFT,3);
hps4 = downsample(FFT,4);
hps5 = downsample(FFT,5);
y = [];
for i=1:length(hps5)
Product = hps1(i) * hps2(i) * hps3(i) * hps4(i) * hps5(i);
y(i) = [Product];
end
[m,n]=findpeaks(y, 'SORTSTR', 'descend');
Maximum = n(1);
%try fix octave error
if (y(n(1)) * 0.5) > (y(n(2))) %& ( ( m(2) / m(1) ) > threshold )
Maximum = n(length(n));
end
F0 = ( (Maximum / 4096) * fs ) * CorrectFactor
plot(y)
HPS usually generates an error showing the pitch one octave up, I change a bit a code, see above :-)