Getting the output shape of deconvolution layer using tf.nn.conv2d_transpose in tensorflow - neural-network

According to this paper, the output shape is N + H - 1, N is input height or width, H is kernel height or width. This is obvious inverse process of convolution. This tutorial gives a formula to calculate the output shape of convolution which is (W−F+2P)/S+1, W - input size, F - filter size, P - padding size, S - stride. But in Tensorflow, there are test cases like:
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
So we use y_shape, f_shape and x_shape, according to formula (W−F+2P)/S+1 to calculate padding size P. From (12 - 3 + 2P) / 2 + 1 = 6, we get P = 0.5, which is not an integer. How does deconvolution works in Tensorflow?

for deconvolution,
output_size = strides * (input_size-1) + kernel_size - 2*padding
strides, input_size, kernel_size, padding are integer
padding is zero for 'valid'

The formula for the output size from the tutorial assumes that the padding P is the same before and after the image (left & right or top & bottom).
Then, the number of places in which you put the kernel is:
W (size of the image) - F (size of the kernel) + P (additional padding before) + P (additional padding after).
But tensorflow also handles the situation where you need to pad more pixels to one of the sides than to the other, so that the kernels would fit correctly. You can read more about the strategies to choose the padding ("SAME" and "VALID") in the docs. The test you're talking about uses method "VALID".

This discussion is really helpful. Just add some additional information.
padding='SAME' can also let the bottom and right side get the one additional padded pixel. According to TensorFlow document, and the test case below
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
is using padding='SAME'. We can interpret padding='SAME' as:
(W−F+pad_along_height)/S+1 = out_height,
(W−F+pad_along_width)/S+1 = out_width.
So (12 - 3 + pad_along_height) / 2 + 1 = 6, and we get pad_along_height=1. And pad_top=pad_along_height/2 = 1/2 = 0(integer division), pad_bottom=pad_along_height-pad_top=1.
As for padding='VALID', as the name suggested, we use padding when it is proper time to use it. At first, we assume that the padded pixel = 0, if this doesn't work well, then we add 0 padding where any value outside the original input image region. For example, the test case below,
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 13, 9, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
The output shape of conv2d is
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
= ceil(float(13 - 3 + 1) / float(3)) = ceil(11/3) = 6
= (W−F)/S + 1.
Cause (W−F)/S+1 = (13-3)/2+1 = 6, the result is an integer, we don't need to add 0 pixels around the border of the image, and pad_top=1/2, pad_left=1/2 in the TensorFlow document padding='VALID' section are all 0.

Related

Numpy equivalent to MATLAB's hist

For some reason Numpy's hist always returns one less bin than MATLAB's hist:
for example in MATLAB:
x = [1,2,2,2,1,4,4,2,3,3,3,3];
[Rep,Val] = hist(x,unique(x));
gives:
Rep = [2 4 4 2]
Val = [1 2 3 4]
but in Numpy:
import numpy as np
x = np.array([1,2,2,2,1,4,4,2,3,3,3,3])
Rep, Val = np.histogram(x,np.unique(x))
gives:
>>>Rep
array([2, 4, 6])
>>>Val
array([1, 2, 3, 4])
How can I get identical results ti MATLAB's?
Based on dilayapici's answer on this post, a general solution (applied to your example) to run Python's np.histogram in the same way as Matlab's hist, is the following:
x = np.array([1,2,2,2,1,4,4,2,3,3,3,3])
# Convert the bin centers given in Matlab to bin edges needed in Python.
numBins = len(np.unique(x))
bins = np.linspace(np.amin(x), np.amax(x), numBins)
# Edit the 'bins' argument of `np.histogram` by just putting '+inf' as the last element.
bins = np.concatenate((bins, [np.inf]))
Rep, Val = np.histogram(x, bins)
Output:
Rep
array([2, 4, 4, 2], dtype=int64)
Firstly I want to explain this problem.
In Phyton it is running like:
np.unique(x) = [1, 2, 3, 4] so,
The first bin is equal to [1, 2) (including 1, but excluding 2) and therefore ==> Rep[0]=2
The second bin is equal to [2, 3) (including 2, but excluding 3) and therefore ==> Rep[1]=4
The last bin is equal to [3, 4], which includes 4. Therefore ==> Rep[2] = 6
In MATLAB hist() function is running like:
The first bin is equal to [1, 2) (including 1, but excluding 2) and therefore ==> Rep[0]=2
The second bin is equal to [2, 3) (including 2, but excluding 3) and therefore ==> Rep[1]=4
The third bin is equal to [3, 4) (including 3, but excluding 4) and therefore ==> Rep[2]=4
The last bin is equal to [4, ∞) and therefore ==> Rep[3]=2
Now If you want same result in Pyhton, you have to use different function in Matlab. This is histogram() function. We can decide "bins number".
x = [1,2,2,2,1,4,4,2,3,3,3,3];
nbins=3 ;
h= histogram(x,nbins);
h.Values
You can see h.Values equals to [2,4,6].
I hope, I could help :)

Find the largest index of the minimum in Matlab

I have an array of positive numbers and there are some duplicates. I want to find the largest index of the minimum value.
For example, if a=[2, 3, 1, 1, 4, 1, 3, 2, 1, 5, 5] then [i, v] = min(a) returns i=3, however I want i=9.
Using find and min.
A = [2, 3, 1, 1, 4, 1, 3, 2, 1, 5, 5];
minA = min(A);
maxIndex = max(find(A==minA));
min get the minimun value, and find return de index of values that meet the condition A==minA. max return de maximun index.
Here's a different idea, which only requires one function, sort:
[~,y] = sort(a,'descend');
i = y(end)
ans =
9
You can use imreginalmin as well with time complexity O(n):
largestMinIndex = find(imregionalmin(A),1,'last');

What does tf.nn.conv2d do in tensorflow?

I was looking at the docs of tensorflow about tf.nn.conv2d here. But I can't understand what it does or what it is trying to achieve. It says on the docs,
#1 : Flattens the filter to a 2-D matrix with shape
[filter_height * filter_width * in_channels, output_channels].
Now what does that do? Is that element-wise multiplication or just plain matrix multiplication? I also could not understand the other two points mentioned in the docs. I have written them below :
# 2: Extracts image patches from the the input tensor to form a virtual tensor of shape
[batch, out_height, out_width, filter_height * filter_width * in_channels].
# 3: For each patch, right-multiplies the filter matrix and the image patch vector.
It would be really helpful if anyone could give an example, a piece of code (extremely helpful) maybe and explain what is going on there and why the operation is like this.
I've tried coding a small portion and printing out the shape of the operation. Still, I can't understand.
I tried something like this:
op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]),
tf.random_normal([2,10,10,10]),
strides=[1, 2, 2, 1], padding='SAME'))
with tf.Session() as sess:
result = sess.run(op)
print(result)
I understand bits and pieces of convolutional neural networks. I studied them here. But the implementation on tensorflow is not what I expected. So it raised the question.
EDIT:
So, I implemented a much simpler code. But I can't figure out what's going on. I mean how the results are like this. It would be extremely helpful if anyone could tell me what process yields this output.
input = tf.Variable(tf.random_normal([1,2,2,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print("input")
print(input.eval())
print("filter")
print(filter.eval())
print("result")
result = sess.run(op)
print(result)
output
input
[[[[ 1.60314465]
[-0.55022103]]
[[ 0.00595062]
[-0.69889867]]]]
filter
[[[[-0.59594476]]]]
result
[[[[-0.95538563]
[ 0.32790133]]
[[-0.00354624]
[ 0.41650501]]]]
Ok I think this is about the simplest way to explain it all.
Your example is 1 image, size 2x2, with 1 channel. You have 1 filter, with size 1x1, and 1 channel (size is height x width x channels x number of filters).
For this simple case the resulting 2x2, 1 channel image (size 1x2x2x1, number of images x height x width x x channels) is the result of multiplying the filter value by each pixel of the image.
Now let's try more channels:
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here the 3x3 image and the 1x1 filter each have 5 channels. The resulting image will be 3x3 with 1 channel (size 1x3x3x1), where the value of each pixel is the dot product across channels of the filter with the corresponding pixel in the input image.
Now with a 3x3 filter
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here we get a 1x1 image, with 1 channel (size 1x1x1x1). The value is the sum of the 9, 5-element dot products. But you could just call this a 45-element dot product.
Now with a bigger image
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
The output is a 3x3 1-channel image (size 1x3x3x1).
Each of these values is a sum of 9, 5-element dot products.
Each output is made by centering the filter on one of the 9 center pixels of the input image, so that none of the filter sticks out. The xs below represent the filter centers for each output pixel.
.....
.xxx.
.xxx.
.xxx.
.....
Now with "SAME" padding:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This gives a 5x5 output image (size 1x5x5x1). This is done by centering the filter at each position on the image.
Any of the 5-element dot products where the filter sticks out past the edge of the image get a value of zero.
So the corners are only sums of 4, 5-element dot products.
Now with multiple filters.
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This still gives a 5x5 output image, but with 7 channels (size 1x5x5x7). Where each channel is produced by one of the filters in the set.
Now with strides 2,2:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
Now the result still has 7 channels, but is only 3x3 (size 1x3x3x7).
This is because instead of centering the filters at every point on the image, the filters are centered at every other point on the image, taking steps (strides) of width 2. The x's below represent the filter center for each output pixel, on the input image.
x.x.x
.....
x.x.x
.....
x.x.x
And of course the first dimension of the input is the number of images so you can apply it over a batch of 10 images, for example:
input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
This performs the same operation, for each image independently, giving a stack of 10 images as the result (size 10x3x3x7)
2D convolution is computed in a similar way one would calculate 1D convolution: you slide your kernel over the input, calculate the element-wise multiplications and sum them up. But instead of your kernel/input being an array, here they are matrices.
In the most basic example there is no padding and stride=1. Let's assume your input and kernel are:
When you use your kernel you will receive the following output: , which is calculated in the following way:
14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1
TF's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels] for the kernel it is [filter_height, filter_width, in_channels, out_channels]. So we need to provide the data in the correct format:
import tensorflow as tf
k = tf.constant([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
], dtype=tf.float32, name='k')
i = tf.constant([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel')
image = tf.reshape(i, [1, 4, 4, 1], name='image')
Afterwards the convolution is computed with:
res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
print sess.run(res)
And will be equivalent to the one we calculated by hand.
For examples with padding/strides, take a look here.
Just to add to the other answers, you should think of the parameters in
filter = tf.Variable(tf.random_normal([3,3,5,7]))
as '5' corresponding to the number of channels in each filter. Each filter is a 3d cube, with a depth of 5. Your filter depth must correspond to your input image's depth. The last parameter, 7, should be thought of as the number of filters in the batch. Just forget about this being 4D, and instead imagine that you have a set or a batch of 7 filters. What you do is create 7 filter cubes with dimensions (3,3,5).
It is a lot easier to visualize in the Fourier domain since convolution becomes point-wise multiplication. For an input image of dimensions (100,100,3) you can rewrite the filter dimensions as
filter = tf.Variable(tf.random_normal([100,100,3,7]))
In order to obtain one of the 7 output feature maps, we simply perform the point-wise multiplication of the filter cube with the image cube, then we sum the results across the channels/depth dimension (here it's 3), collapsing to a 2d (100,100) feature map. Do this with each filter cube, and you get 7 2D feature maps.
I tried to implement conv2d (for my studying). Well, I wrote that:
def conv(ix, w):
# filter shape: [filter_height, filter_width, in_channels, out_channels]
# flatten filters
filter_height = int(w.shape[0])
filter_width = int(w.shape[1])
in_channels = int(w.shape[2])
out_channels = int(w.shape[3])
ix_height = int(ix.shape[1])
ix_width = int(ix.shape[2])
ix_channels = int(ix.shape[3])
filter_shape = [filter_height, filter_width, in_channels, out_channels]
flat_w = tf.reshape(w, [filter_height * filter_width * in_channels, out_channels])
patches = tf.extract_image_patches(
ix,
ksizes=[1, filter_height, filter_width, 1],
strides=[1, 1, 1, 1],
rates=[1, 1, 1, 1],
padding='SAME'
)
patches_reshaped = tf.reshape(patches, [-1, ix_height, ix_width, filter_height * filter_width * ix_channels])
feature_maps = []
for i in range(out_channels):
feature_map = tf.reduce_sum(tf.multiply(flat_w[:, i], patches_reshaped), axis=3, keep_dims=True)
feature_maps.append(feature_map)
features = tf.concat(feature_maps, axis=3)
return features
Hope I did it properly. Checked on MNIST, had very close results (but this implementation is slower). I hope this helps you.
In addition to other answers, conv2d operation is operating in c++ (cpu) or cuda for gpu machines that requires to flatten and reshape data in certain way and use gemmBLAS or cuBLAS(cuda) matrix multiplication.
It's performing convulition throught the picture when you are trying for example image classifation thuis function has all the parameters need to do that.
When you are basically can chose the filter dimension. Strides. Padding. Before to used its need to undestant the concepts of convolution
this explanation complements:
Keras Conv2d own filters
I had some doubts about the filter parameters in keras.conv2d because when I learned I was supposed to set my own filter design. But this parameters tells how many filters to test and keras itself will try to find the best filters weights.

Sequence in MATLAB

Write a single MATLAB expression to generate a vector that contains first 100 terms of the following sequence: 2, -4, 8, -16, 32, …
My attempt :
n = -1
for i = 1:100
n = n * 2
disp(n)
end
The problem is that all values of n is not displayed in a single (1 x 100) vector. Neither the alternating positive and negative terms are shown. How to do that ?
You are having a geometric series where r = -2.
To produce 2, -4, 8, -16, 32, type this:
>>-(-2).^[1:5]
2, -4, 8, -16, 32
You can change the value of 5 accordingly.
Though there are better methods, as mentioned in the answer by #lakesh. I will point out the mistakes in your code.
By typing n = n * 2, how can it become a vector?
By doing n=n * 2, you are going to generate -2, -4, -8, -16, ...
Therefore, the correct code should be:
n = -1
for i = 2:101 % 1 extra term since first term has to be discarded later
n(i) = -n(i-1) * 2;
disp(n)
end
You can discard first element of n, to get the exact series you want.
n(end)=[];

How to get a regular sampled matrix in Scilab

I'm trying to program a function (or even better it it already exists) in scilab that calculates a regular timed samples of values.
IE: I have a vector 'values' which contains the value of a signal at different times. This times are in the vector 'times'. So at time times(N), the signal has value values(N).
At the moment the times are not regular, so the variable 'times' and 'values' can look like:
times = [0, 2, 6, 8, 14]
values= [5, 9, 10, 1, 6]
This represents that the signal had value 5 from second 0 to second 2. Value 9 from second 2 to second 6, etc.
Therefore, if I want to calculate the signal average value I can not just calculate the average of vector 'values'. This is because for example the signal can be for a long time with the same value, but there will be only one value in the vector.
One option is to take the deltaT to calculate the media, but I will also need to perform other calculations:average, etc.
Other option is to create a function that given a deltaT, samples the time and values vectors to produce an equally spaced time vector and corresponding values. For example, with deltaT=2 and the previous vectors,
[sampledTime, sampledValues] = regularSample(times, values, 2)
sampledTime = [0, 2, 4, 6, 8, 10, 12, 14]
sampledValues = [5, 9, 9, 10, 1, 1, 1, 6]
This is easy if deltaT is small enough to fit exactly with all the times. If the deltaT is bigger, then the average of values or some approximation must be done...
Is there anything already done in Scilab?
How can this function be programmed?
Thanks a lot!
PS: I don't know if this is the correct forum to post scilab questions, so any pointer would also be useful.
If you like to implement it yourself, you can use a weighted sum.
times = [0, 2, 6, 8, 14]
values = [5, 9, 10, 1, 6]
weightedSum = 0
highestIndex = length(times)
for i=1:(highestIndex-1)
// Get the amount of time a certain value contributed
deltaTime = times(i+1) - times(i);
// Add the weighted amount to the total weighted sum
weightedSum = weightedSum + deltaTime * values(i);
end
totalTimeDelta = times($) - times(1);
average = weightedSum / totalTimeDelta
printf( "Result is %f", average )
Or If you want to use functionally the same, but less readable code
timeDeltas = diff(times)
sum(timeDeltas.*values(1:$-1))/sum(timeDeltas)