I training a CNN, many authors have mentioned of randomly cropping images from the center of the original image with a factor of 2048 data augmentation. Can anyone plz elaborate what does it mean?
I believe you are referring to the ImageNet Classification with Deep Convolutional Neural Networks data augmentation scheme. The 2048x aspect of their data augmentation scheme goes as follows:
First all images are rescaled down to 256x256
Then for each image they take random 224x224 sized crops.
For each random 224x224 crop, they additionally augment by taking horizontal reflections of these 224x224 patches.
So my guess as to how they get to the 2048x data augmentation factor:
There are 32*32 = 1024 possible 224x224 sized image crops of a 256x256 image. To see this simply observe that 256-224=32, so we have 32 possible horizontal indices and 32 possible vertical indices for our crops.
Doing horizontal reflections of each crop doubles the size.
1024 * 2 = 2048.
The center crop aspect of your question stems from the fact that the original images are not all the same size. So what the authors did was they rescaled each rectangular image so that the shortest side was now of size 256, and they they took the center crop from this, thereby rescaling the entire dataset to 256x256. Once they have rescaled all the images to 256x256, they can perform the above (up to)-2048x data augmentation scheme.
Related
I am trying to use different low resolution images for my work. Recently, I was reading the LOW RESOLUTION CONVOLUTIONAL NEURAL NETWORK FOR AUTOMATIC TARGET
RECOGNITION in which they didn't mentioned the way how they made the low resolution images.
Resolution adaptation for feature computation: To show the influence
of resolution on the performances of these image representations, we
focus on seven specific resolutions ranging from 200 × 200 to 10 × 10
pixels
Here is the example images from the paper.
Anyone please help me to implement this method in MATLAB?
Currently, I am using this way to make the Low resolution images:
img = im2double(imread('cameraman.tif'));
conv_mat = ones(6) / 36;
img_low = convn(img,conv_mat,'same');
figure, imshow(img), title('Original');
figure, imshow(img_low), title('Low Resolution')
You have a good start there. The convolution makes it so that each pixel contains the average of a 6x6 neighborhood. Now all that is left is to keep only one pixel in each 6x6 neighborhood. This pixel will have an average of the deleted information:
img = im2double(imread('cameraman.tif'));
conv_mat = ones(6) / 36;
img_low = convn(img,conv_mat,'same');
img_low = img_low(3:6:end,3:6:end)
figure, imshow(img), title('Original');
figure, imshow(img_low), title('Low Resolution')
The 3:6:end simply indicates which columns and which rows to keep. I start the subsampling at 3, to avoid the pixels that were averaged with the background.
Judging from the images you posted, they used this averaging method. Other alternatives are to take the max in the neighborhood (as is done in the max-pooling layers of a convolutional neural network), or simply subsample without any filtering (introduces aliasing, I don't recommend this method).
I have a binary image with 4 blobs. 3 of them has an aspect ratio of more than 1. and 1 has aspect ratio of 1. Now I want to reduce that blobs which aspect ratio more than 1 in binary image. How could i do this. Can some one please provide a code??
Here is a link of the binary image. I want to reduce that 3 blobs which has an aspect ratio more than 1. And only want to keep that triangle shape.
https://www.dropbox.com/s/mngjlcsin46fgim/demo.png?dl=0
you can use regionprops for that , for example:
s=regionprops(bw,'BoundingBox','PixelIdxList');
where bw is your binary image.
The output of s.BoundingBox is a [x,y,width,height] vector
you can loop over s
for i=1:numel(s)
ar(i) = s(i).BoundingBox(3)/s(i).BoundingBox(4)
end
and see if the width/height ratio ar (or whatever you define aspect ratio) is approx above 1 or not (because of noise I'd take a value of ar>1.2) . Then for that i use you can use the pixel list s(i).PixelIdxList
bw(s(ar>1.2).PixelIdxList)=0;
to zero these intensities...
I am trying to train my own network on Caffe, similar to Imagenet model. But I am confused with the crop layer. Till the point I understand about crop layer in Imagenet model, during training it will take random 227x227 image crops and train the network. But during testing it will take the center 227x227 image crop, does not we loose the information from image while we crop the center 227x27 image from 256x256 image? And second question, how can we define the number of crops to be taken during training?
And also, I trained the same network(same number of layers, same convolution size FC neurons will differ obviously), first taking 227x227 crop from 256x256 image, and second time taking 255x255 crop from 256x256 image. According to my intuition, the model with 255x255 crop should give me the best result. But I am getting higher accuracy with 227x227 image, can anyone explain me the intuition behind it, or am i doing something wrong?
Your observations are not specific to Caffe.
The sizes of the cropped images during training and testing need to be the same (227x227 in your case), because the upstream network layers (convolutions, etc) need the images to be the same size. Random crops are done during training is because you want data augmentation. However, during testing, you want to test against a standard dataset. Otherwise, the accuracy reported during testing would also depend on a shifting test database.
The crops are made dynamically at each iteration. All images in a training batch are randomly cropped. I hope this answers your second question.
Your intuition is not complete: With a bigger crop (227x227), you have more data augmentation. Data augmentation essentially creates "new" training samples out of nothing. This is vital to prevent overfitting during training. With a smaller crop (255x255), you should expect a better training accuracy but lower test accuracy, since the data is more likely be overfitted.
Of course, cropping can be overdone. Too much cropping and you lose too much information from an image. For image categorization, the ideal crop size is one that does not alter the category of an image, (ie, only background is cropped away).
I have images of different sizes (i.e. 50x100, 20x90). At the moment for the input to Convolution Neural Network (CNN) is 28x28, so I just use imresize function in MATLAB to 28x28. But I think that will increase the noise in the image. is there any other to first make the image of equal size then resize it to 28x28?
I need know that how many slices that the segmented image contains in image segmentation using mat-lab And also i want to know the size of single slice
Its hard to know how to answer without more details about the images you are working with. If you have the image processing toolbox, you can load the image with:
IM = imread('your_image.tif');
Once you have your image loaded, you can figure out the size simply by:
size(IM)
For example, if you loaded a RGB image that was 512x512 pixels, the size() output would be (512, 512, 3), indicating that you have 3 planes, each 512 pixels by 512 pixels.