How do I compute the convolution input shape/size? - neural-network

If I have the output shape, filter shape, strides and padding,
filter shape: [kernel_height, kernel_width, output_depth, input_depth]
output shape: [batch, height, width, depth]
strides=[1,1,1,1]
padding='VALID'
Can I get the input shape?
For example ,
filter shape: [3, 3, 1, 1]
output shape: [1, 1, 1, 1]
Can I compute the fixed input shape [1,3,3,1] and How ?
Do you have the code to compute the shape? Because I think I need not to write it myself..

It's batch, height + kernel_height - 1, width + kernel_width - 1, input_depth
batch at the beginning is somewhat obvious, so is input_depth at the end. To understand height + kernel_height - 1, consider how kernel is applied. If you input image was say 10 by 10 and you applied a 3 by 3 kernel, horizontally you would apply it at positions 0, 1, ..., 7, a total of 8 different positions, similar thinking applies to how kernel moves vertically, which would result in output map of size 8x8. If you generalize this thinking, you will see that the size of the output map is width + kernel - 1, height + kernel - 1, which means that if you have the size of the output map, to get the size of the input you need to invert the operation, which will result in width - kernel + 1, height - kernel + 1.
This is all only valid for padding type "VALID". If the type was "SAME", the output would be padded to match the dimensions of the input, and as such the input shape would be batch, height, width, input_depht

Related

How do I fill Matrix4 with translation, skew and scale values in flutter?

Suppose, I have these values for a container of height 200 and width 300:
scaleX = 0.9198
scaleY = 0.9198
skewX = -0.3923
skewY = 0.3923
translateX = 150
translateY = 150
Now, how do I fill this values in Matrix4 correctly?
I tried doing this:
Matrix4(
0.9198, 0, 0, 0, //
0, 0.9198, 0, 0, //
0, 0, 1, 0, //
150, 150, 0, 1,
)
which is,
Matrix4(
scaleX, 0, 0, 0, //
0, scaleY, 0, 0, //
0, 0, 1, 0, //
translateX, translateY, 0, 1,
)
But I am not sure where to put skewX and skewY values in this matrix. Please help me with this.
Skew Values
This is a bit of a nuanced topic, as it could be interpreted in a couple of different ways. There are specific cells of a matrix that are associated with specific names, as identified in your question, translate x, translate y, scale x, and scale y. In this context, you most likely mean the values from a matrix that are called skew x and skew y (also sometimes known as shear x and shear y), which refers to indices 4 and 1 (zero-based, column-major order). They're called these names because when put into an identity matrix by themselves, they do that operation (translate, scale, or skew), but it gets more complicated when there are multiple values.
On the other hand, this could also be interpreted as a series of operations (e.g. scale by (0.9198, 0.9198, 1), then skew by (-0.3923, 0.3923), then translate by (150, 150, 0)), and then it's a series of matrix multiplications that would ultimately result in a similar-looking, but numerically different matrix. I'll assume you don't mean this for this question. You can read more about it here though.
You can consult the Flutter Matrix4 documentation, which also provides implementation notes for Matrix4.skewX and Matrix4.skewY. The skews are stored in (zero-based) indices 4, and 1, as the tangent of the skewed angle.
Matrix4(
scaleX, skewY, 0, 0, // skewY could also be tan(ySkewAngle)
skewX, scaleY, 0, 0, // skewX could also be tan(xSkewAngle)
0, 0, 1, 0, //
translateX, translateY, 0, 1,
)
Note to those that aren't familiar with Flutter's data structures that values are stored in column-major order which means that each row in the above code is actually a column, so if you were to represent the matrix as a normal transformation matrix, it's transposed.
More information:
Transformation Matrices: https://en.wikipedia.org/wiki/Transformation_matrix
How matrices are used with CSS Transforms: How do I use the matrix transform and other transform CSS properties?

how to determine cairo drawing with line width even?

I use a cr (from Gtk's widget draw event) and I want to create a rectangle (or a line) using an even number of pixels (2,4,6 etc.) without any «context transformations». According to this the line will be "around the path". And according to this "the diameter of a pen that is circular".
But in a rectangle will it be less outside and more inside or the opposite? And in a line will be up,down, left or right?
I understand that in an odd line width, "around the path" means 1 in the center and the rest are equally around.
But in an even line width, as when the line width is 2, will be 1 pixel inside the path or outside?
Is there a stable way to determine the pixels affected or it is random?
The walk around of creating two times every stroke'ing, first with line width 1 and then by using the remainder (an odd) number is pain killing and time consuming.
But in a rectangle will it be less outside and more inside or the opposite? And in a line will be up,down, left or right?
The line will have the same width on either side of the path. If this does not align with the pixel grid, you get some anti-aliased result.
I understand that in an odd line width, "around the path" means 1 in the center and the rest are equally around. But in an even line width, as when the line width is 2, will be 1 pixel inside the path or outside?
With a line width of 3, 1.5 pixels will be drawn on either side of the path.
With a line width of 4, 2 pixels are drawn on either side of the path.
Perhaps the following example makes this clearer. This is written in Lua and uses LGI as cairo bindings for Lua, but this maps directly to the C API:
local cairo = require("lgi").cairo
s = cairo.ImageSurface(cairo.Format.RGB24, 100, 30)
cr = cairo.Context(s)
cr:set_source_rgb(1, 1, 1)
cr:paint()
cr:set_source_rgb(0, 0, 0)
cr:set_line_width(2)
cr:rectangle(5, 10, 5, 5)
cr:stroke()
cr:set_line_width(6)
cr:rectangle(15, 10, 14, 14)
cr:stroke()
cr:set_line_width(7)
cr:rectangle(40.5, 10.5, 14, 14)
cr:stroke()
cr:set_line_width(7)
cr:rectangle(70, 10, 14, 14)
cr:stroke()
s:write_to_png("out.png")
The resulting image is:
The first rectangle has a line width of 2. It is drawn with integer coordinates, so that there is e.g. a line from (5, 10) to (10, 10) (the top line). Half the line width is drawn on either side of the line, so this line corresponds to a "filled rectangle" from (4, 9) to (6, 11).
The last rectangle has a line width of 7 and is also drawn with integer coordinates. Its top line goes from (70, 10) to (70, 24). Since half the line width is on either side of the line, the "filled rectangle" goes from (66.5, 6.5) to (73.5, 27.5). These numbers are not integers and you can see in the result that some anti-aliasing was applied.
In contrast, the second to last rectangle has its position shifted by 0.5. This causes the "filled rectangle" for its "top line" to end up on the pixel grid again.
See also this FAQ entry: https://www.cairographics.org/FAQ/#sharp_lines

Why VGG-16 takes input size 512 * 7 * 7?

According to https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
I don`t understand why VGG models take 512 * 7 * 7 input_size of fully-connected layer.
Last convolution layer is
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2, dilation=1)
Codes in above link.
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
To understand this you have to know how the convolution operator works for CNNs.
nn.Conv2d(512, 512, kernel_size=3, padding=1) means that the input image to that convolution has 512 channels and that the output after the convolution is gonna be also 512 channels. The input image is going to be convolved with a kernel of size 3x3 that moves as a sliding window. Finally, the padding=1 means that before applying the convolution, we symmetrically add zeroes to the edges of the input matrix.
In the example you are saying, you can think that 512 is the depth while 7x7 is the width and height that is obtained by applying several convolutions. Imagine that we have an image with some width and height and we feed it to a convolution, the resulting size will be
owidth = floor(((width + 2*padW - kW) / dW) + 1)
oheight = floor(((height + 2*padH - kH) / dH) + 1)
where height and width are the original sizes, padW and padH are height and width (horizontal and vertical) padding, kW and kH are the kernel sizes and dW and dH are the width and height (horizontal and vertical) pixels that the kernel moves (i.e. if it is dW=1 first the kernel will be at pixel (0,0) and then move to (1,0) )
Usually the first convolution operator in a CNN looks like: nn.Conv2d(3, D, kernel_size=3, padding=1) because the original image has 3 input channels (RGB). Assuming that the input image has a size of 256x256x3 pixels if we apply the operator as defined before, the resulting image has the same width and height as the input image but its depth is now D. Simarly if we define the convolution as c = nn.Conv2d(3, 15, kernel_size=25, padding=0, stride=5) with kernel_size=25, no padding in the input image and with stride=5 (dW=dH=5, which means that the kernel moves 5 pixels each time if we are at (0,0) then it moves to (5,0), until we reach the end of the image on the x-axis then it moves to (0,5) -> (5,5) -> (5,15) until it reaches the end again) the resulting output image will have a size of 47x47xD
The VGG neural net has two sections of layers: the "feature" layer and the "classifier" layer. The input to the feature layer is always an image of size 224 x 224 pixels.
The feature layer has 5 nn.MaxPool2d(kernel_size=2, stride=2) convolutions. See referenced source code line 76: each 'M' character in the configurations sets up one MaxPool2d convolution.
A MaxPool2d convolution with these specific parameters reduces the tensor size in half. So we have 224 --> 112 --> 56 --> 28 --> 14 --> 7 which means that the output of the feature layer is a 512 channels * 7 * 7 tensor. This is the input to the "classifier" layer.

GtkPaned widget for arbitrary geometries?

I would like my application window to be divided into rectangles with sides perpendicular to the window borders. The number of rectangles would normally be quite large, and the user should be able to resize the rectangles.
Is there a Gtk widget which would allow for that? GTkPaned comes close - by embedding several GtkPaned widgets one can get such rectangle disivisons, but not all of them are possible - one obvious constraint is that there must be an edge which spans the whole window either horizontally or vertically. The simplest arrangement I know of which doesn't have this property, and so can't be built with Gtkpaned, is: one square in the middle and four rectangles of the same size each, around the square.
Is there a widget which allows for such arbitrary resizable rectangle arrangements in Gtk?
If you don't have to have the dividing lines draggable, then use a GtkGrid:
grid = Gtk.Grid()
grid.attach(widget1, 0, 0, 3, 1)
grid.attach(widget2, 3, 0, 1, 3)
grid.attach(widget3, 0, 1, 1, 3)
grid.attach(widget4, 1, 3, 3, 1)
grid.attach(widget5, 1, 1, 2, 2)
# 1112
# 3552
# 3552
# 3444

Matlab gradient equivalent in opencv

I am trying to migrate some code from Matlab to Opencv and need an exact replica of the gradient function. I have tried the cv::Sobel function but for some reason the values in the resulting cv::Mat are not the same as the values in the Matlab version. I need the X and Y gradient in separate matrices for further calculations.
Any workaround that could achieve this would be great
Sobel can only compute the second derivative of the image pixel which is not what we want.
(f(i+1,j) + f(i-1,j) - 2f(i,j)) / 2
What we want is
(f(i+i,j)-f(i-1,j)) / 2
So we need to apply
Mat kernelx = (Mat_<float>(1,3)<<-0.5, 0, 0.5);
Mat kernely = (Mat_<float>(3,1)<<-0.5, 0, 0.5);
filter2D(src, fx, -1, kernelx)
filter2D(src, fy, -1, kernely);
Matlab treats border pixels differently from inner pixels. So the code above is wrong at the border values. One can use BORDER_CONSTANT to extent the border value out with a constant number, unfortunately the constant number is -1 by OpenCV and can not be changed to 0 (which is what we want).
So as to border values, I do not have a very neat answer to it. Just try to compute the first derivative by hand...
You have to call Sobel 2 times, with arguments:
xorder = 1, yorder = 0
and
xorder = 0, yorder = 1
You have to select the appropriate kernel size.
See documentation
It might still be that the MatLab implementation was different, ideally you should retrieve which kernel was used there...
Edit:
If you need to specify your own kernel, you can use the more generic filter2D. Your destination depth will be CV_16S (16bit signed).
Matlab computes the gradient differently for interior rows and border rows (the same is true for the columns of course). At the borders, it is a simple forward difference gradY(1) = row(2) - row(1). The gradient for interior rows is computed by the central difference gradY(2) = (row(3) - row(1)) / 2.
I think you cannot achieve the same result with just running a single convolution filter over the whole matrix in OpenCV. Use cv::Sobel() with ksize = 1, then treat the borders (either manually or by applying a [ 1 -1 ] filter).
Pei's answer is partly correct. Matlab uses these calculations for the borders:
G(:,1) = A(:,2) - A(:,1);
G(:,N) = A(:,N) - A(:,N-1);
so used the following opencv code to complete the gradient:
static cv::Mat kernelx = (cv::Mat_<double>(1, 3) << -0.5, 0, 0.5);
static cv::Mat kernely = (cv::Mat_<double>(3, 1) << -0.5, 0, 0.5);
cv::Mat fx, fy;
cv::filter2D(Image, fx, -1, kernelx, cv::Point(-1, -1), 0, cv::BORDER_REPLICATE);
cv::filter2D(Image, fy, -1, kernely, cv::Point(-1, -1), 0, cv::BORDER_REPLICATE);
fx.col(fx.cols - 1) *= 2;
fx.col(0) *= 2;
fy.row(fy.rows - 1) *= 2;
fy.row(0) *= 2;
Jorrit's answer is partly correct.
In some cases, the value of the directional derivative may be negative, and MATLAB will retain these negative numbers, but OpenCV Mat will set the negative number to 0.