I am trying to transcode a MATLAB CNN to a PyTorch based CNN but I am not getting the same results. In fact the PyTorch code is not learning anything. The input is black and white images at a 64X64 size with batch size of 64. How can I code the MATLAB code below into PyTorch?
MATLAB CNN:
boxsize = 64;
layers = [imageInputLayer([boxSize boxSize 1])
convolution2dLalyer(3,16,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding', 1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,64,'Padding',1)
batchNormalizationLayer
reluLayer
fullyConnvectedLayer(Nclasses)
softmaxLayer
classificationLayer];
Did you try Small Matlab and Octave to Python? It may works for it.
Related
I am new to the deep learning toolbox and I am myself learning convolutional neural network (CNN). My dataset consists of 1000 RGB images of 100x40 size. Therefore, Xdata = 1x1x1000 of data type double.
Out of these I used first 700 for training, Xtrain = 1x1x700 of data type Image.
I am getting this error
Error using trainNetwork (line 150)
Invalid training data. X must be a 4-D array of images, an ImageDatastore, or a table.
I cannot understand how to use the table data structure and What is the proper way to input the data into CNN? Is not possible to input RGB image directly as image data type or do I need to convert each channel and feed 3 matrices of 2 D?
Please help.
imageSize = [100 40];
dropoutProb = 0.1;
numF = 8;
layers = [
imageInputLayer(imageSize)
convolution2dLayer(3,numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,2*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer([1 13])
dropoutLayer(dropoutProb)
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
miniBatchSize = 50;
validationFrequency = floor(numel(Ytrain)/miniBatchSize);
options = trainingOptions('adam', ...
'InitialLearnRate',3e-4, ...
'MaxEpochs',25, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','every-epoch', ...
'Plots','training-progress', ...
'Verbose',false, ...
'ValidationData',{XValidation,YValidation}, ...
'ValidationFrequency',validationFrequency, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',20);
trainedNet = trainNetwork(Xtrain,layers,options);
The input dimensions are wrong. The 4D array should be of shape:
[height, width, number_of_channels, number of images]
So in your case you'd need the train image dimensions to be:
[100, 40, 3, 700]
And test image dimensions to be:
[100, 40, 3, 300]
You also have a dropout layer before the final only fully connected layer, should there be an additional fully connected layer before it? Now you are throwing away your max pooling results, which can be done but is quite aggressive.
trainNetwork() can also take other inputs if you don't specifically want to use the 4-D datastore. I prefer the augmented image datastore made from an image data store, it's a very easy way to augment your images which you definitely should be doing if you haven't. If not, consider changing your image datatypes from double to uint8, 3 uint8 channels are enough to represent a typical input image completely and it should speed up your training.
The shape of your 4D array should be:
[height, width, number_of_channels, number of images]
This question already has an answer here:
Matlab Grayscale Normalization
(1 answer)
Closed 5 years ago.
I'm a newer on machine learning and now trying to train a CNN on MNIST.
I have 60k png training set of MNIST, but the Layer class, the imageinputlayer(), it can only zero-center the image, and can't normalize it.
what should I do to scale the image input to 0¬1?
What I mean is: I want deploy a image normalization Layer on the Class Layer when using function trainNetwork() to train a CNN.
in the document of MATLAB: https://cn.mathworks.com/help/nnet/examples/create-simple-deep-learning-network-for-classification.html
There is a demo code
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
'nndatasets','DigitDataset');
digitData = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
[trainDigitData,testDigitData] = splitEachLabel(digitData, ...
trainingNumFiles,'randomize');
layers = [imageInputLayer([28 28 1])
convolution2dLayer(5,20)
reluLayer
maxPooling2dLayer(2,'Stride',2)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer()];
options = trainingOptions('sgdm','MaxEpochs',15, ...
'InitialLearnRate',0.0001);
convnet = trainNetwork(trainDigitData,layers,options);
as we all know, the png image data is int type from 0 to 255.
but in the same time, the input of a cnn deep learning need to be normalized to [0-1]
the normalization options on the class imageInputLayer(), is zero center (data/mean of data)only, don't have a scale normalization options(data/255).
so how could I add a normalization Layer to the Layer structure?
or there is no need to do a normalization for the training set?
Maybe you mean this? This is how I preprocessing the imdb data and it works for my images. The data is stored as (dim1, dim2, sample_size) form
imdb.images.data = (imdb.images.data-min(imdb.images.data(:)))/(max(imdb.images.data(:)-min(imdb.images.data(:))));
imageMean = mean(imdb.images.data,3);
for j= 1:size(imdb.images.data,3)
imdb.images.data(:,:,j) = imdb.images.data(:,:,j) - imageMean ;
end
I am having svm training with several images. This is my first project with SVM. I am extracting features with HOG feature extraction. Training features and label their locations 1 if it is on the horizon line, 0 if it is on the background. I have 74 images for training and 7 images for testing. Unfortunately, I can't go above 50 percent accuracy. I have changed image sizes, I have played cell sizes in feature extraction. It does not change that much. What can I try? And what is the ideal dataset number, how many images for training and testing? For example in one image it predicts all correct in next image all wrong.
This is how I am calculating accuracy;
%%%%% Evaluation
% Testing Data
hfsTest = vertcat(dataset.HorizonFeatsTest{:});
bfsTest = vertcat(dataset.BgFeatsTest{:});
test_data = [hfsTest;bfsTest];
% Labels
hlabelTest = ones(size(hfsTest,1),1);
blabelTest = zeros(size(bfsTest,1),1);
test_label = [hlabelTest;blabelTest];
Predict_label = vertcat(results.predicted_label{:});
acc = numel(find(Predict_label==test_label))/length(test_label);
disp(['Accuracy ', num2str(acc)]);
%done
% Training Data
hfs = vertcat(dataset.HorizonFeats{:});
bfs = vertcat(dataset.BgFeats{:});
train_data = [hfs;bfs];
% Labels
hlabel = ones(size(hfs,1),1);
blabel = zeros(size(bfs,1),1);
train_label = [hlabel;blabel];
%%%
% do training ...
svmModel = svmtrain(train_data, train_label,'BoxConstraint',2e-1);
and I have used Predict_label_image = svmclassify (svmModel, image_feats); for testing.
You need to do a lot of tunning. Here in the documentation you have all the hyperparameters you can play with. I'll start with a rbf kernel and trying [0.01, 0.1, 1, 10] for BoxConstraint.
I'm afraid you can't expect svm to work if you don't try different hyperparameter configurations.
I want to use learning vector quantization (LVQ) to classify F_CK data with 7 classes.
When I use MLP, error is about 15% . but when I use LVQ, error is about 75% :(
I see that LVQ only classifies one class very good but doesn't classify other classes.
my code:
data = load('F_CK+');
x = data.X';
y_data = data.Y';
t = ind2vec(y_data);
net = lvqnet(4,0.1,'learnlv2');
net.divideFcn = 'dividerand';
net.divideMode = 'sample';
net.divideParam.trainRatio = 85/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 0/100;
net.trainParam.epochs = 15;
net = train(net, x, t);
y = net(x);
classes = vec2ind(y);
figure, plotconfusion(t,y);
confusion matrix of my result.
FC_K
Can any one help me, why this network only classify one class and what is my fault ?
dataset link:
https://dl.dropboxusercontent.com/u/100069389/File/Stackoverflow/F_CK.rar
https://mega.nz/#!J8ES1DRS!NZwDsD0FFojeZiI-OpORzxGLbMp9rx0XKsfOvGDOaR0
I don't know what is my fault but I do something that improve the accuracy of classification.
1. normalize data between -1 and 1
2. increase the subclasses/ LVQ neurons to 64 to cover all of image class.
as far as I'm remembered, the LVQ network must more accurate than MLP, but my accuracy with LVQ is increased to 80%.
I am working in the Stereo Vision for the first time. I am trying to rectify the stereoImages. The following is the result
I can't understand why the image is getting cropped
The following is my code
% Read in the stereo pair of images.
I1 = imread('sceneReconstructionLeft.jpg');
I2 = imread('sceneReconstructionRight.jpg');
% Rectify the images.
[J1, J2] = rectifyStereoImages(I1, I2, stereoParams);
% Display the images before rectification.
figure;
imshow(stereoAnaglyph(I1, I2), 'InitialMagnification', 50);
title('Before Rectification');
% Display the images after rectification.
figure;
imshow(stereoAnaglyph(J1, J2), 'InitialMagnification', 50);
title('After Rectification');
I am trying to follow this guide
http://www.mathworks.com/help/vision/examples/stereo-calibration-and-scene-reconstruction.html
The images I used
Try doing the following:
[J1, J2] = rectifyStereoImages(I1, I2, stereoParams, 'OutputView', 'Full');
This way you will see the entire images.
By default, rectifyStereoImages crops the output images to only contain the overlap between the two frames. In this case the overlap is very small compared to the disparity.
What is happening here is that the baseline (distance between the cameras) is too wide, and the distance to the objects is too short. This results in a very large disparity, which will be hard to compute reliably. I suggest that you either move the cameras closer together, or move the cameras further away from the objects of interest, or both.