Why VGG-16 takes input size 512 * 7 * 7? - neural-network

According to https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
I don`t understand why VGG models take 512 * 7 * 7 input_size of fully-connected layer.
Last convolution layer is
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2, dilation=1)
Codes in above link.
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)

To understand this you have to know how the convolution operator works for CNNs.
nn.Conv2d(512, 512, kernel_size=3, padding=1) means that the input image to that convolution has 512 channels and that the output after the convolution is gonna be also 512 channels. The input image is going to be convolved with a kernel of size 3x3 that moves as a sliding window. Finally, the padding=1 means that before applying the convolution, we symmetrically add zeroes to the edges of the input matrix.
In the example you are saying, you can think that 512 is the depth while 7x7 is the width and height that is obtained by applying several convolutions. Imagine that we have an image with some width and height and we feed it to a convolution, the resulting size will be
owidth = floor(((width + 2*padW - kW) / dW) + 1)
oheight = floor(((height + 2*padH - kH) / dH) + 1)
where height and width are the original sizes, padW and padH are height and width (horizontal and vertical) padding, kW and kH are the kernel sizes and dW and dH are the width and height (horizontal and vertical) pixels that the kernel moves (i.e. if it is dW=1 first the kernel will be at pixel (0,0) and then move to (1,0) )
Usually the first convolution operator in a CNN looks like: nn.Conv2d(3, D, kernel_size=3, padding=1) because the original image has 3 input channels (RGB). Assuming that the input image has a size of 256x256x3 pixels if we apply the operator as defined before, the resulting image has the same width and height as the input image but its depth is now D. Simarly if we define the convolution as c = nn.Conv2d(3, 15, kernel_size=25, padding=0, stride=5) with kernel_size=25, no padding in the input image and with stride=5 (dW=dH=5, which means that the kernel moves 5 pixels each time if we are at (0,0) then it moves to (5,0), until we reach the end of the image on the x-axis then it moves to (0,5) -> (5,5) -> (5,15) until it reaches the end again) the resulting output image will have a size of 47x47xD

The VGG neural net has two sections of layers: the "feature" layer and the "classifier" layer. The input to the feature layer is always an image of size 224 x 224 pixels.
The feature layer has 5 nn.MaxPool2d(kernel_size=2, stride=2) convolutions. See referenced source code line 76: each 'M' character in the configurations sets up one MaxPool2d convolution.
A MaxPool2d convolution with these specific parameters reduces the tensor size in half. So we have 224 --> 112 --> 56 --> 28 --> 14 --> 7 which means that the output of the feature layer is a 512 channels * 7 * 7 tensor. This is the input to the "classifier" layer.

Related

Automatically convert pixels to millimeters in Mathematica

I can get the drop contour through a GetDropProfile command.
However, I can't find the conversion factor from pixels to millimeters. As the contour of the drop is obtained point by point starting from left to right, then the first ordered pair in the list gives the coordinates of the first pixel on the left. Consequently, the last ordered pair gives the value of the last pixel on the right. Since they are opposite each other, they therefore have the same y, so the difference in x of these two points is the diameter of the drop. How can I automate this process of converting pixels into millimeters and viewing the graph in millimeters, smoothing the contour of the discrete curve automatically giving us how many points to the right and left we should take?
It follows the image of the drop and the contour in pixels obtained.
As posted here, assuming the axes are in millimetres, the scale can be obtained from the x-axis ticks, which can be sampled from the row 33 from the bottom. As can be observed by executing the code below, the left- and rightmost ticks occupy one pixel each, coloured RGB {0.4, 0.4, 0.4}. So there are 427 pixels per 80mm.
img = Import["https://i.stack.imgur.com/GIuYq.png"];
{wd, ht} = ImageDimensions[img];
data = ImageData[img];
(* View the left- and rightmost pixel data *)
Take[data[[-33]], 20]
Take[data[[-33]], -20]
p1 = LengthWhile[data[[-33]], # == {1., 1., 1.} &];
p2 = LengthWhile[Reverse[data[[-33]]], # == {1., 1., 1.} &];
p120 = wd - p1 - p2 - 1
427
(* Showing the sampled row in the graphic *)
data[[-33]] = ConstantArray[{1, 0, 0}, wd];
Graphics[Raster[Reverse[data]]]
You might ask about smoothing the curve here https://mathematica.stackexchange.com

Cropping layer in my keras model has zero dimensions

I create simple model with keras to understand the cropping layer
def other_model():
x = keras.Input(shape = (64,64,3))
conv = keras.layers.Conv2D(5, 2)(x)
crop = keras.layers.Cropping2D(cropping = 32)(conv)
model = keras.Model(x,crop)
model.summary()
return model
But I get the following summary
Layer (type) Output Shape Param #
input_12 (InputLayer) (None, 64, 64, 3) 0
conv2d_21 (Conv2D) (None, 63, 63, 5) 65
cropping2d_13 (Cropping2D) (None, 0, 0, 5) 0
Total params: 65
Trainable params: 65
Non-trainable params: 0
Why are the 1st and the 2nd dimensions of Cropping2D equal to zero?
They are supposed to be 32
You can just choose the number of pixels which will be cut off at every side of your image. I would chose it bigger or equal than the half size of the image, so it didn't work
It is a bit unclear in the documentation, but if you give a single integer value (cropping=32) as parameter, it crops off 32 pixels on each side of the image.
If you have an image with 64x64 pixels and cropping=32, the target size therefore will be 0x0 pixels...
If you want to have a target size of 32x32 pixels, you have to give cropping=16

How do I compute the convolution input shape/size?

If I have the output shape, filter shape, strides and padding,
filter shape: [kernel_height, kernel_width, output_depth, input_depth]
output shape: [batch, height, width, depth]
strides=[1,1,1,1]
padding='VALID'
Can I get the input shape?
For example ,
filter shape: [3, 3, 1, 1]
output shape: [1, 1, 1, 1]
Can I compute the fixed input shape [1,3,3,1] and How ?
Do you have the code to compute the shape? Because I think I need not to write it myself..
It's batch, height + kernel_height - 1, width + kernel_width - 1, input_depth
batch at the beginning is somewhat obvious, so is input_depth at the end. To understand height + kernel_height - 1, consider how kernel is applied. If you input image was say 10 by 10 and you applied a 3 by 3 kernel, horizontally you would apply it at positions 0, 1, ..., 7, a total of 8 different positions, similar thinking applies to how kernel moves vertically, which would result in output map of size 8x8. If you generalize this thinking, you will see that the size of the output map is width + kernel - 1, height + kernel - 1, which means that if you have the size of the output map, to get the size of the input you need to invert the operation, which will result in width - kernel + 1, height - kernel + 1.
This is all only valid for padding type "VALID". If the type was "SAME", the output would be padded to match the dimensions of the input, and as such the input shape would be batch, height, width, input_depht

How to crop rows and columns of pixels out in MATLAB

I have an image with a white border around it, and I need to get rid of the border. There are 20 rows of white pixels above the image, 5 columns of white to the left, 5 of white columns to the right, and 5 rows of white below the image. I wan't to crop the image exactly out of that border, how do I do this in matlab? Thanks for any help you can give!
(The image is a tiff, which is why I can't use an online service for this, they won't let me upload .tiff)
What you need is the built-in MATLAB function imcrop. To use it, specify something like
B = imcrop(A,[xmin ymin width height]);
if A is your original image. First find the dimensions of your image. Say its 800 by 600. Then you are looking to crop a 770 by 580 image so these numbers respectively will be your width and height in the above function. Your x and y would be something like 5 and 20, respectively.
U can use imcrop for this if you have image processing toolbox or you can make new image as follows:
I2 = I(21:end-5, 6:end-5)
For 3 dimensions, you can use:
I2 = I(21:end-5,6:end-5,:)
For example as per your comment:
I = rand(153,1510,3);
size(I); % 153 1510 3
I2 = I(21:end-5,6:end-5,:);
size(I2); % 128 1500 3
newIm = oldIm(20:length(oldIm(:,1))-5,5:length(oldIm(1,:))-5)

size difference in an image after downscale/upscale operation using imresize

I resized an image with scale of 0.25 then upscaled it using scale of 4.
imageReduced = imresize(imageOriginal, 0.25, 'nearest');
imageGenerated = imresize(imageReduced, 4, 'nearest');
I want to calculate mean square error between imageOriginal and imageGenerated so they must have same height x width values. But after downscale and upscale operations image size changes slightly because of the division.
For example;
size of imageOriginal is 4811 x 6449 and
size of imageGenerated is 4812 x 6452
How can I make downscale and upscale operations to make imageGenerated same size with imageOriginal to calculate mean square error between them?
imresize support resize with fixed number of cols and rows: imresize(img, [rows, cols]). You can use this function variant for second resize.
imageReduced = imresize(imageOriginal, 0.25, 'nearest');
imageGenerated = imresize(imageReduced, size(imageOriginal), 'nearest');