Unable to figure out nInputPlane in SpatialConvolution in torch? - neural-network

Documentaion for Spatial Convolution define it as
module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH], [padW], [padH])
nInputPlane: The number of expected input planes in the image given into forward().
nOutputPlane: The number of output planes the convolution layer will produce.
I don't have any experience with torch but i guess i have used a similar function in keras
Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256))
which takes as input the shape of the image that is 256*256 in rgb.
I have read usage of Spatial Convolution in torch as below but unable to figure out what does the nInputPlane and nOutputPlane paramter correspond to?
local convLayer = nn.SpatialConvolutionMM(384, 384, 1, 1, 1, 1, 0, 0)
In the code above what does these 384,384 represent ?

In case I'm not speaking common language you can refer to this.
nIputPlane is the number of layers coming in to the convolution, nOutputPlane is number of layers coming out of the convolution. If you have an rgb image nInputPlane = 3 (assuming your tensor is setup correctly). nOutputPlane can be any number of layers that you want to come out of the spatial convolution but of course make sure the next layer input is equal to nOutputPlane.
If that isn't clear I'd recommend the 60-minute blitz.

Related

Feature extraction from AlexNet fc7 layer in MATLAB

I have this AlexNet model in MATLAB:
net = alexnet;
layers = net.Layers;
layers(end-2) = fullyConnectedLayer(numClasses);
layers(end) = classificationLayer;
I'm using it to learn features from sequencies of frames from videos of different classes. So i need to extract learned features from the 'fc7' layer of this model to save these features as a vector and pass it to an LSTM layer.
The training process of this model for transfer learning its ok, all right.
I divided my data set in a x_train and a x_test sets using splitEachLabel() in my imageDatastore(), and using the function augmentedImageSource() to resize all the images for the network. Everything ok!
But when i try yo use this snippet of code shown bellow to resize images from my imageDatastore to be readed by the function activations(), to save the features as a vector, i'm getting an error:
imageSize = [227 227 3];
auimds = augmentedImageSource(imageSize, imds, 'ColorPreprocessing', 'gray2rgb');
Function activations:
layer = 'fc7';
fclayer = activations(mynet, auimds, layer,'OutputAs','columns');
The error:
Error using SeriesNetwork>iDataDispatcher (line 1113)
For an image input layer, the input data for predict must be a single image, a 4D array of images, or an imageDatastore with the correct size.
Error in SeriesNetwork/activations (line 791)
dispatcher = iDataDispatcher( X, miniBatchSize, precision, ...
Someone help me, please!
Thanks for the support!
Did you check the input size of that layer? The error you are getting is related with the input size of the current layer. Can you check your mynet structure and its fc7 layer input's size in your workspace in Matlab?

Feed Forward - Neural Networks Keras

for my input in the feed forward neural network that I have implemented in Keras, I just wanted to check that my understanding is correct.
[[ 25.26000023 26.37000084 24.67000008 23.30999947]
[ 26.37000084 24.67000008 23.30999947 21.36000061]
[ 24.67000008 23.30999947 21.36000061 19.77000046]...]
So in the data above it is a time window of 4 inputs in an array. My input layer is
model.add(Dense(4, input_dim=4, activation='sigmoid'))
model.fit(trainX, trainY, nb_epoch=10000,verbose=2,batch_size=4)
and batch_size is 4, in theory when I call the fit function will the function go over all these inputs in each nb_epoch? and does the batch_size need to be 4 in order for this time window to work?
Thanks John
and batch_size is 4, in theory when I call the fit function will the function go over all these inputs in each nb_epoch?
Yes, each epoch is iteration over all training samples
and does the batch_size need to be 4 in order for this time window to work?
No, these are completely unrelated things. Batch is simply a subset of your training data which is used to compute approximation of the true gradient of the cost function. Bigger the batch - closer you get to the true gradient (and original Gradient Descent), but training gets slower. Closer to 1 you get - it becomes more and more stochastic, noisy approxmation (and closer to Stochastic Gradient Descent). The fact that you matched batch_size and data dimensionality is just an odd-coincidence, and has no meaning.
Let me put this in more generall setting, what you do in gradient descent with additive loss function (which neural nets usually use) is going against the gradient which is
grad_theta 1/N SUM_i=1^N loss(x_i, pred(x_i), y_i|theta) =
= 1/N SUM_i=1^N grad_theta loss(x_i, pred(x_i), y_i|theta)
where loss is some loss function over your pred (prediction) as compared to y_i.
And in batch based scenatio (the rough idea) is that you do not need to go over all examples, but instead some strict subset, like batch = {(x_1, y_1), (x_5, y_5), (x_89, y_89) ... } and use approximation of the gradient of form
1/|batch| SUM_(x_i, y_i) in batch: grad_theta loss(x_i, pred(x_i), y_i|theta)
As you can see this is not related in any sense to the space where x_i live, thus there is no connection with dimensionality of your data.
Let me explain this with an example:
When you have 32 training examples and you call model.fit with a batch_size of 4, the neural network will be presented with 4 examples at a time, but one epoch will still be defined as one complete pass over all 32 examples. So in this case the network will go through 4 examples at a time, and will ,theoretically at least, call the forward pass (and the backward pass) 32 / 4 = 8 times.
In the extreme case when your batch_size is 1, that is plain old stochastic gradient descent. When your batch_size is greater than 1 then it's called batch gradient descent.

What exactly is returned from PCA in MatLab?

I = double(image1Cropped);
X = reshape(I,size(I,1)*size(I,2),3 );
coeff1 = pca(X);
What exactly is happening in the above 3 lines of code?
Why covert an image into double before passing into reshape?
What is the purpose of reshape?
What is returned from pca(X)?
Could I use coeff1 to compare images (for example, comparing faces)?
From PCA, the principal conponents are returned. Of course.
Check the documentation or any online course to understand what a PCA is.
As PCA is a mathematical tool, it needs floating point data to work, that's why there is a double in the first line, it is converting the data (most likely uint8) into floating point data.
reshape is reshaping your image to a huge matrix of size(I,1)*size(I,2),3, so every X(ii,:) will be 3 of length.
My guess here is that the image is a RGB image, and that this code tries to get the "principal colours" of the image. What the code does is transform you data to points of 3 values, Red, Green and Blue (as opposed to the normal XYZ) and then getting the principal components of the image. The principal components will be "the principal 3 Colors (conbinations of RGB)" that are in the image.
If you search "PCA of an RGB image" on Google you will find lots of information of how/why to do this.

Can't get FFT to work in Octave

I have been working on a (potentially super simple) assignment in which one of the steps is to get a Fourrier transform. I have followed a guide from my university to transform a wave-sound which can be found here (channels: 2, samples: 17600, sample frequency: 16KHz).
Looking at the graph, it seems to work:
[y,fs,wmode,fidx]=readwav('piano.wav','r',-1,0);
left=y(:,1);
amountOfSlices = 6;
samplesPerSlice = fix(length(left) / 6);
frames=enframe(left, samplesPerSlice);
frames=transpose(frames);
fftdata=rfft(frames);
fftdata=fftdata.*conj(fftdata);
plot(fftdata);
Next, I created a code file with the tutorial code as a basis with the addition of accepting parameters (which are needed for the assignment, but have been left out for brevity).
samplerate = 512;
% Read the file with raw unscaled audio data from begin to end
[multiData,fs,wmode,fidx]=readwav(filename,'r',-1,0);
disp(sprintf('Number of channels: %d', fidx(5)))
disp(sprintf('Number of samples: %d', fidx(4)))
disp(sprintf('Sample frequency: %d Hz', fs))
% Extract the left channel of the data
leftData = multiData(:, 1);
% Slice the left channel into pieces of the size of 'samplerate'
samplesPerSlice = samplerate
% Splits the leftData vector up into frames of length equal to sample rate
slicedLeftData = enframe(leftData, samplerate)';
% Apply the real data fast fourrier transformation on each data slice
fftdata=rfft(slicedLeftData);
fftdata=fftdata.*conj(fftdata);
plot(fftdata);
Do you guys have any idea what I'm doing wrong here?
What my actual question is: Why isn't the second data in the frequency domain of 0 to 16.000 Hz? What am I doing wrong?
I guess by "whats wrong here" you mean the multiple colors? If so, looks like the fftdata plottet in both images are matrices (you want an array, correct?). Doesn't the .* perform the operation on each and every element? Check the dimensions for your ingoing arguments.
Furthermore, remember Nyquist theorem: You can never resolve any frequencies greater than half your sample frequency. In the case of having a sample frequency of 16KHz, you may have datapoints beyong 8kHz but it will not contain any information, so no need to include that in the frequency domain plot.

column to block using sliding window in matlab

using im2col sliding window in matlab i have converted the input image block into column and again by using col2im i do the inverse process but the output is not same as the input image. How can i recover the input image? can anyone please help me.
Here is the code
in=imread('tire.tif');
[mm nn]=size(in);
m=8;n=8;
figure,imshow(in);
i1=im2col(in,[8 8],'sliding');
i2 = reshape( sum(i1),mm-m+1,nn-n+1);
out=col2im(i2,[m n],[mm nn],'sliding');
figure,imshow(out,[]);
thanks in advance...
You didn't specify exactly what the problem is, but I see a few potential sources:
You shouldn't expect the output to be exactly the same as the input, since you're replacing each pixel value with the sum of pixels in an 8-by-8 neighborhood. Also, you will get a shrinkage of the resulting image by 7 pixels in each direction (i.e. [m-1 n-1]) since the 'sliding' option of IM2COL does not pad the array with zeroes to create neighborhoods for pixels near the edges.
These two lines are redundant:
i2 = reshape( sum(i1),mm-m+1,nn-n+1);
out=col2im(i2,[m n],[mm nn],'sliding');
You only need one or the other, not both:
%# Use this:
out = reshape(sum(i1),mm-m+1,nn-n+1);
%# OR this:
out = col2im(sum(i1),[m n],[mm nn],'sliding');
Image data in MATLAB is typically of type 'uint8', meaning each pixel is represented as an unsigned 8-bit integer spanning the range 0 to 255. Assuming this is what in is, when you perform your sum operation you will implicitly end up converting it to type 'double' (since an unsigned 8-bit integer will likely not be big enough to hold the sum totals). When image pixel values are represented with a double type, the pixel values are expected to span the range 0 to 1, so you will want to scale your resulting image by its maximum value to get it to display properly:
out = out./max(out(:));
Lastly, check what kind of input image you are using. For your code, you are essentially assuming in is 2-D (i.e. a grayscale intensity image). If it is a truecolor (i.e. RGB) image, the third dimension is going to cause you some trouble, and you will have to either process each color plane separately and recombine them or convert the RGB image to grayscale. If it is an indexed image (with an associated color map), you will not be able to do the sort of processing you describe above without first converting it to a grayscale representation.
Why are you expecting the output to be the same?
i2 is the result of performing a SUM around a pixel neighborhood (essentially a low-pass filter), which is the final blurry image that you see. i.e you are NOT doing an inverse process with the COL2IM call.
i1 obtained from 'sliding' option has the information that you would get from 'distinct' option as well, which you need to filter out. Now, this may not be the best way to code it up but it works. Assume that mm is a multiple of m and nn is a multiple of n. If this is not the case, then you'll have to zero-pad accordingly to make this the case.
in=imread('tire.tif');
[mm nn]=size(in);
m=8;n=8;
i1 = im2col(in,[m,n],'sliding');
inSel = [];
for k=0:mm/m-1
inSel = [inSel 1:n:nn+(nn-n+1)*n*k];
end
out = col2im(i1(:,inSel),[m,n],[mm,nn],'distinct');