My original pictures are gray scale 200x200x3.
I have downscaled them to 50x50x3.
They are mug shots of 100 different people. I have taken the copy of 30 of them, and corrupted, and put back in the same image matrix cell, and it became 130 pictures.
Afterwards, I have created a 130x7500 array which involves each picture as a row. Then, I have classified that matrix into training and test data set. Then, classified the divided data using MATLAB decision tree tool, and knn tool. Now my question is, how to manage it by using neural network tool.
And I have a classification matrix 130x1.
If I want to do the same thing using neural network tool, what I should do?
Related
I am trying to classify a set of images using transfer learning approach. All of the tutorials I came across used Alexnet to fine tune and transfer learning. However, I am trying to use a less complicated model like CIFAR-10. I went through this Matlab Tutorial. At the point where they start transfer learning, they used Matlab's sample data. They loaded the ground truth data in a table,
%Load the ground truth data
data = load('stopSignsAndCars.mat', 'stopSignsAndCars');
stopSignsAndCars = data.stopSignsAndCars;
The table contains the image filename and ROI labels for stop signs, car fronts, and rears. Each ROI label is a bounding box around objects of interest within an image. Then they kept only labels for stop signs.
What should I do, if I want to use my own images for this network in a similar approach used for Alexnet,
net=alexnet;
layersTransfer = net.Layers(1:end-3);
readAllDir=uigetdir('','Select All directory to read scaled image files');
AllSample=imageDatastore(readAllDir, 'IncludeSubfolders', true, 'LabelSource','foldernames');
By the similar approach, I mean making the 'imageDatastore' for all the labelled images and using them to train the network.
Recently, I've been playing with the MATLAB's RCNN deep learning example here. In this example, MATLAB has designed a basic 15 layer CNN with the input size of 32x32. They use the CIFAR10 dataset to pre-train this CNN. The CIFAR10 dataset has training images of size 32x32 too. Later they use a small dataset of stop signs to fine tune this CNN to detect stop signs. This small dataset of stop signs has only 41 images; so they use these 41 images to fine tune the CNN and namely train an RCNN network. this is how they detect a stop sign:
As you see the bounding box almost covers the whole stop signs except a small part on the top.
Playing with the code I decided to fine tune the same network pre-trained on the CIFAR10 dataset with the PASCAL VOC dataset but only for the "aeroplane" class.
These are some results I get:
As you see the detected bounding boxes barely cover the whole airplane; so this causes the precision to be 0 later when I evaluate them. I understand that in the original RCNN paper mentioned in the MATLAB example the input size 227x227 and their CNN has 25 layers. Could this be why the detections are not accurate? How does the input size of a CNN affect the end result?
almost surely yes!
when you pass an image through a net, the net tries to minimize the data taken from the image until it gets the most relevant data. during this process, the input shrinks again and again. If, for example, you insert to a net an image that smaller than the wanted, all the data from the image may lost during the pass in the net.
In your case, an optional reason to your results is that the net "looks for" features in limited resolution and maybe the big airplane has over high resolution.
I am trying to classify cataract images, First i crop out the pupil area and save it in another folder, then i execute the wavelet transform on these cropped images using wavedec function and sym8 filter, then i take the approximation coefficients as the feature vector and the final step would be sending these feature vectors to neural network, after trying the neural network to classify my data set which contains 51 images,I have tried 10,20, 50, 70 hidden layers,But It shows that only 90% is the percentage of the correct classified images. so I want to increase this percentage,
So, any suggestions on what to use other than wavelet?
why when i increased the dataset size to 83 images the neural network showed me bad results such as 70% or 60%?!
should i stop using Neural and start looking for another classifier?
some image from the dataset
another one
So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.
I read some books but still cannot make sure how should I organize the network. For example, I have pgm image with size 120*100, how the input should be like(like a one dimensional array with size 120*100)? and how many nodes should I adapt.
It's typically best to organize your input image as a 2D matrix. The reason is that the layers at the lower levels of the neural networks used in machine perception tasks are typically locally connected. For example, each neuron of the first layer of such a neural net will only process the pixels of a small NxN patch of the input image. This naturally leads to a 2D structure which can be more easily described with 2D matrices.
For a detailed explanation I'll refer you to the DeepFace paper which describes the stat of the art in face recognition systems.
120*100 one dimensional vector is fine. The locations of the pixel values in that vector does not matter, because all nodes are fully connected with the nodes in the next layer anyway. But you must be consistent with their locations between training, validating, and testing.
The most successful approach so far was to go with a convolutional neural network with 2D input, just as #benoitsteiner stated. For a far simpler example I'd refer you to a LeNet-5, a small neural network developed for MNIST hand-written digit recognition. It is used in EBLearn for face recognition with quite good results.