Resnet18 first layer output dimensions - neural-network

I am looking at the model implementation in PyTorch. The 1st layer is a convolutional layer with filter size = 7, stride = 2, pad = 3. The standard input size to the network is 224x224x3. Based on these numbers, the output dimensions are (224 + 3*2 - 7)/2 + 1, which is not an integer. Does the original implementation contain non-integer dimensions? I see that the network has adaptive pooling before the FC layer, so the variable input dimensions aren't a problem (I tested this by varying the input size). Am I doing something wrong, or why would the authors choose a non-integer dimension while designing the ResNet?

The dimensions always have to be integers. From nn.Conv2d - Shape:
The brackets that are only closed towards the bottom denote the floor operation (round down). The calculation becomes:
import math
math.floor((224 + 3*2 - 7)/2 + 1) # => 112
# Or using the integer division (two slashes //)
(224 + 3*2 - 7) // 2 + 1 # => 112
Using an integer division has the same effect, since that always rounds it down to the nearest integer.

Related

Convolution using 'valid' in Matlab's conv() function

Here is an example of convolution given:
I have two questions here:
Why is the vector 𝑥 padded with two 0s on each side? As, the length of kernel ℎ is 3. If 𝑥 is padded with one 0 on each side, the middle element of convolution output would be within the range of the length of 𝑥, why not one 0 on each side?
Explain the following output to me:
>> x = [1, 2, 1, 3];
>> h = [2, 0, 1];
>> y = conv(x, h, 'valid')
y =
3 8
>>
What is valid doing here in the context of the previously shown mathematics on vectors 𝑥 and ℎ?
I can't speak as to the amount of zero padding that is proper .... That being said, any zero padding is making up data that is not there. This isn't necessarily wrong, but you should be aware that the values computing this information may be biased. Sometimes you care about this, sometimes you don't. Introducing 1 zero (in this case) would leave the middle kernel value always in the data, but why should that be a stopping criteria? Importantly, adding on 2 zeros still leaves one multiplication of values that are actually present in the data and the kernel (the x[0]*h[0] and x[3]*h[2] - using 0 based indexing). Adding on a 3rd zero (or more) would just yield zeros in the output since 3 is the length of the kernel. In other words zero padding will always yield an output that is partially based on the actual data (but not completely) for any zero padding from n=1 to n = length(h)-1 (in this case either 1 or 2).
Even though zero padding with length 2 or 1 still has multiplications based on real data, some values are summed over "fake" data (those multiplied with a padded zero). In this case Matlab gives you 3 options for how you want the data returned. First, you can get the full convolution, which includes values that are biased because they include adding in 0 values that aren't really in the data. Alternatively you can get same, which means the length of the output is the length of the data y = [4 3 8 1]. This corresponds to 1 zero but note that for longer kernels you could technically get other lengths between full and same, Matlab just doesn't return those for you.
Finally, and probably most important to understand out of all this, you have the valid option. In your example only 2 samples of the output are computed from summations that occur only from multiplications over real data (i.e. from multiplying samples of the kernel with samples from x and not from zeros). More specifically:
y[2] = h[2]*x[0] + h[1]*x[1] + h[2]*x[2] = 3 //0 based indexing like example
y[3] = h[2]*x[1] + h[1]*x[2] + h[2]*x[3] = 8
Note none of the other y values are computed with only h and x, they all involve a padded zero which is not necessarily indicative of the real data. For example:
y[4] = h[2]*x[2] + h[1]*x[3] + h[2]*0 <= padded zero

Quantizing an image in matlab

So I'm trying to figure out why my code doesn't seem to be displaying the properly uniformed quantized image into 4 levels.
Q1 =uint8(zeros(ROWS, COLS, CHANNELS));
for band = 1 : CHANNELS,
for x = 1 : ROWS,
for y = 1 : COLS,
Q1(ROWS,COLS,CHANNELS) = uint8(double(I1(ROWS,COLS,CHANNELS) / 2^4)*2^4);
end
end
end
No5 = figure;
imshow(Q1);
title('Part D: K = 4');
It is because you are not quantifying. You divide a double by 16, then multiply again by 16, then convert it to uint8. The right way to quantize is to divide by 16, throw away any decimals, then multiply by 16:
Q1 = uint8(floor(I1 / 16) * 16);
In the code snippet above, I assume I1 is a double. Convert it to double if its not: I1=double(I1).
Note that you don't need the loops, MATLAB will apply the operation to each element in the matrix.
Note also that if I1 is an integer type, you can do something like this:
Q1 = (uint8(I1) / 16) * 16;
but this is actually equivalent to replacing the floor by round in the first example. This means you get an uneven distribution of values: 0-7 are mapped to 0, 8-23 are mapped to 16, etc. and 248-255 are all mapped to 255 (not a multiple of 16!). That is, 8 numbers are mapped to 0, and 8 are mapped to 255, instead of mapping 16 numbers to each possible multiple of 16 as the floor code does.
The 16 in the code above means that there will be 256/16=16 different grey levels in the output. If you want a different number, say n, use 256/n instead of 16.
It's because you are using ROWS, COLS, CHANNELS as your index, it should be x,y,band. Also, the final multiplication of 2^4 has be after the uint8 cast otherwise no rounding ever takes place.
In practice you should avoid the for loops in Matlab since matrix operations are much faster. Replace your code with
Q1=uint8(double(I1/2^4))*2^4
No5 = figure;
imshow(Q1);
title('Part D: K = 4');

Encog RBF C#, Total number of RBF neurons must be some integer to the power of 'dimensions'

I get constantly error Total number of RBF neurons must be some integer to the power of 'dimensions' with using method SetRBFCentersAndWidthsEqualSpacing in C#.
Can someone who is familiar with RBF network in Encog check the line 232 in RBFNetwork.cs. I think there is maybe a bug or I miss something:
var expectedSideLength = (int) Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
double cmp = Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
if (expectedSideLength != cmp) -> error
these two variables can't be equal, because (int) rounds the number. It's coincidence that it works for XOR example, it won't work with different dimenson like 19 for example.
This is how I create RBF network:
dataSet is VersatileMLDataSet
RBFNetwork n = new RBFNetwork(dataSet.CalculatedInputSize, dataSet.Count, 1, RBFEnum.Gaussian);
n.SetRBFCentersAndWidthsEqualSpacing(0, 1, RBFEnum.Gaussian, 2.0/(dataSet.CalculatedInputSize * dataSet.CalculatedInputSize), true);
My dataset has 19 attributes (dimension) with 731 records.
The number of hidden neurons is an integer raised to the power of the number of input neurons. So if you have 3 input attributes and a window size of 2, hidden neurons would be any integer (say 3) raised to the power of 6 (3 x 2) or 729. This limits the number of input attributes and window size as the number of hidden neurons gets very large very quickly.

Calculating the Local Ternary Pattern of an image?

I am calculating the Local Ternary Pattern of an image. My code is given below. Am I going in the right direction or not?
function [ I3 ] = LTP(I2)
m=size(I2,1);
n=size(I2,2);
for i=2:m-1
for j=2:n-1
J0=I2(i,j);
I3(i-1,j-1)=I2(i-1,j-1)>J0;
end
end
I2 is the image LTP is applied to.
This isn't quite correct. Here's an example of LTP given a 3 x 3 image patch and a threshold t:
(source: hindawi.com)
The range that you assign a pixel in a window to 0 is when the threshold is between c - t and c + t, where c is the centre intensity of the pixel. Therefore, because the intensity is 34 in the centre of this window, the range is between [29,39]. Any values that are beyond 39 get assigned 1 and any values that are below 29 get assigned -1. Once you determine the ternary codes, you split up the codes into upper and lower patterns. Basically, any values that get assigned a -1 get assigned 0 for upper patterns and any values that get assigned a -1 get assigned 1 for lower patterns. Also, for the lower pattern, any values that are 1 from the original window get mapped to 0. The final pattern is reading the bit pattern starting from the east location with respect to the centre (row 2, column 3), then going around counter-clockwise. Therefore, you should probably modify your function so that you're outputting both lower patterns and upper patterns in your image.
Let's write the corrected version of your code. Bear in mind that I will not give an optimized version. Let's get a basic algorithm working, and it'll be up to you on how you want to optimize this. As such, change your code to something like this, bearing in mind all of the stuff I talked about above. BTW, your function is not defined properly. You can't use spaces to define your function, as well as your variables. It will interpret each word in between spaces as variables or functions, and that's not what you want. Assuming your neighbourhood size is 3 x 3 and your image is grayscale, try something like this:
function [ ltp_upper, ltp_lower ] = LTP(im, t)
%// Get the dimensions
rows=size(im,1);
cols=size(im,2);
%// Reordering vector - Essentially for getting binary strings
reorder_vector = [8 7 4 1 2 3 6 9];
%// For the upper and lower LTP patterns
ltp_upper = zeros(size(im));
ltp_lower = zeros(size(im));
%// For each pixel in our image, ignoring the borders...
for row = 2 : rows - 1
for col = 2 : cols - 1
cen = im(row,col); %// Get centre
%// Get neighbourhood - cast to double for better precision
pixels = double(im(row-1:row+1,col-1:col+1));
%// Get ranges and determine LTP
out_LTP = zeros(3, 3);
low = cen - t;
high = cen + t;
out_LTP(pixels < low) = -1;
out_LTP(pixels > high) = 1;
out_LTP(pixels >= low & pixels <= high) = 0;
%// Get upper and lower patterns
upper = out_LTP;
upper(upper == -1) = 0;
upper = upper(reorder_vector);
lower = out_LTP;
lower(lower == 1) = 0;
lower(lower == -1) = 1;
lower = lower(reorder_vector);
%// Convert to a binary character string, then use bin2dec
%// to get the decimal representation
upper_bitstring = char(48 + upper);
ltp_upper(row,col) = bin2dec(upper_bitstring);
lower_bitstring = char(48 + lower);
ltp_lower(row,col) = bin2dec(lower_bitstring);
end
end
Let's go through this code slowly. First, I get the dimensions of the image so I can iterate over each pixel. Also, bear in mind that I'm assuming that the image is grayscale. Once I do this, I allocate space to store the upper and lower LTP patterns per pixel in our image as we will need to output this to the user. I have decided to ignore the border pixels where when we consider a pixel neighbourhood, if the window goes out of bounds, we ignore these locations.
Now, for each valid pixel that is within the valid borders of the image, we extract our pixel neighbourhood. I convert these to double precision to allow for negative differences, as well as for better precision. I then calculate the low and high ranges, then create a LTP pattern following the guidelines we talked about above.
Once I calculate the LTP pattern, I create two versions of the LTP pattern, upper and lower where any values of -1 for the upper pattern get mapped to 0 and 1 for the lower pattern. Also, for the lower pattern, any values that were 1 from the original window get mapped to 0. After, this, I extract out the bits in the order that I laid out - starting from the east, go counter-clockwise. That's the purpose of the reorder_vector as this will allow us to extract those exact locations. These locations will now become a 1D vector.
This 1D vector is important, as we now need to convert this vector into character string so that we can use bin2dec to convert the value into a decimal number. These numbers for the upper and lower LTPs are what are finally used for the output, and we place those in the corresponding positions of both output variables.
This code is untested, so it'll be up to you to debug this if it doesn't work to your specifications.
Good luck!

Matlab im2col function

This is a bit more of a general question, but, no matter how many times I read the description of MATLAB's im2col function, I cannot fully understand it. I need it for the computational efficiency because MATLAB is awful with nested for loops. Here's what I'm attempting to do, but using nested for loops:
[TRIMMED]=TM_FILTER(IMAGE, FILTER_SIZE, PERCENT)
Takes a 2-D array and returns the array, filtered with a
square trimed mean filter with length/width equal to FILTER_SIZE and percent equal to PERCENT.
%}
function [trimmed]=tm_filter(image, filter_size, percent)
if rem(filter_size, 2)==0 %make sure filter has a center pixel
error('filter size must be odd numbered'); %error and return if number is odd
return
end
if percent > 100 || percent < 0
error('Percentage must be ? [0, 100]');
return
end
[rows, columns]=size(image); %figure out pixels needed
n=(filter_size-1)/2; %n is pixel distance from center pixel to boundaries
padded=(padarray(image, [n,n],128)); %padding on boundaries so center pixel always has neighborhood
for i=1+n:rows %rows from first non-padded entry to last nonpadded entry
for j=1+n:columns %colums from first non-padded entry to last nonpadded entry
subimage=padded(i-n:i+n,j-n:j+n); %neighborhood same size as filter
average=trimmean(trimmean(subimage, percent), percent); %computes trimmed mean of neighborhood as trimmed mean of vector of trimmed means
trimmed(i-n, j-n)=average; %stores averaged pixel in new array
end
end
trimmed=uint8(trimmed); %converts image to gray levels from 0-255
Here is the code you want: note the entire nested loop was replaced with a single statement.
[TRIMMED]=TM_FILTER(IMAGE, FILTER_SIZE, PERCENT)
Takes a 2-D array and returns the array, filtered with a
square trimed mean filter with length/width equal to FILTER_SIZE and percent equal to PERCENT.
%}
function [trimmed]=tm_filter(image, filter_size, percent)
if rem(filter_size, 2)==0 %make sure filter has a center pixel
error('filter size must be odd numbered'); %error and return if number is odd
return
end
if percent > 100 || percent < 0
error('Percentage must be ? [0, 100]');
return
end
trimmed = (uint8)trimmean(im2col(image, filter_size), percent);
Explanation:
the im2col function turns each region of filter_size into a column. Your trimmean function can then operate on each of the regions (columns) in a single operation - much more efficient than extracting each shape in turn. Also note this requires only a single application of trimmean - in your original you first do it on the columns, then again on the rows, which will actually cause a more severe trim than I think you intended (exclude 50% first time, then 50% again - feels like excluding 75%. Not exactly true but you get my point). Also you will find that changing the order of operations (row, then column vs column, then row) would change the result because the filter is nonlinear.
For example
im = reshape(1:9, [3 3]);
disp(im2col(im,[2 2])
results in
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
since you took each of the 4 possible blocks of 2x2 from this matrix:
1 4 7
2 5 8
3 6 9
and turned them into columns
Note - with this technique (applied to the unpadded image) you do lose some pixels on the edge; your method added some padding so that every pixel (even ones on the edge) has a complete neighborhood, and as such the filter returns an image that is the same size as the original (but it's not clear what the effect of padding/filtering will be near the margin, and especially the corner: you have almost 75% percent of pixels fixed at 128 and that is likely to dominate the behavior in the corner).
why im2col? why not nlfilter?
>> trimmed = nlfilter( image, [filter_size filter_size],...
#(x) treimmean( trimmean(x, percent), percent ) );
Are you sure you process the entire image?
i and j only goes up to rows and columns respectively. However, when you update trimmed you access i-n and j-n. What about the last n rows and columns?
Why do you apply trimmean twice for each block? Isn't it more appropriate to process the block at once, as in trimmean( x(:), percent)?
I believe the results of trimmean( trimmean(x, percent), percent) will be different than those of trimmean( x(:), percent). Have you give it a thought?
A small remark, it is best not to use i and j as variable names in matlab.