Is there a way to filter an image with a VARIABLE filter (pixels of another image) in MATLAB without using nested for loops? - matlab

I have an image U, and when I want to convolve it with a box filter:
0 1 0
1 -4 1
0 1 0
I use imfilter function with a constant 2D array and there is no problem. But, when I have the following operation:
u(i,j) = v(i-1,j)^2 * u(i-1,j) + v(i+1,j)^2 * u(i+1,j) + v(i, j+1)^2 * u(i,j+1) + v(i,j-1)^2 * u(i,j-1)
(A simplified version of my filter). In other words, my filter to be used over image U is related to the pixel values of image V, but in the same location which the filter is applied. Is there a way to implement such an operation in MATLAB, WITHOUT using nested for loops for each pixel?

You can solve it using im2col and col2im as following:
% Filtering A using B
mA = randn(10, 10); mB = randn(10, 10);
mACol = im2col(mA, [3, 3], 'sliding'); mBCol = im2col(mB, [3, 3],
'sliding');
mAColFilt = sum(mACol .* (mBCol .^ 2));
mAFilt = col2im(mAColFilt, [3, 3], [10, 10]);
I skipped the to get the correct coefficients (In your case, zero few of them and raise to the power of 2 the rest, I only raised all to the power of 2).
Pay attention that the filtered image is smaller (Bounadries of the filter).
You should pad it as required.

Related

MATLAB vector operation. How to get previous element in vector to compute next element?

I have a vector A, say
A = [1, 0, 0, 0]
I want to perform an operation on this vector to get the next element. For example, say
A(i) = A(i - 1) * 5 [for i >= 2]
This can be easily achieved via a loop. But I want to achieve it by using vector operation. So far I have tried
A = [1, 0, 0, 0]
A(2:4) = A(1:3) * 5
But the content in A after this operation is coming as
A = [1 5 0 0]
The targeted answer should be
A = [1 5 25 125]
Please, mention the necessary changes to be made to achieve the target.
[Note: Please do not simply consider the above example as the elements which are power of 5 but consider A(i) = A(i - 1) * 5.]
how about that:
A(1)*5.^[0:numel(A)-1]

Resize a boolean map

I have a boolean map, thus an array made just with zeros and ones. It has dimension 512x512 and I need to resize it to 256x256.
If I use Matlab imresize, values will be re-scaled and I will not have anymore only 0 and 1 but also other values which I don't want.
How can I do this?
Thanks
Some possible approaches:
Discard even-indexed entries:
map_resize = map(1:2:end, 1:2:end);
Discard odd-indexed entries:
map_resize = map(2:2:end, 2:2:end);
For each 2×2 block compute the mean and then round to 0 or 1:
map = randi([0 1], 6, 6); % example input
sz = size(map);
map_resize = col2im(mean(im2col(map, [2 2], 'distinct'), 1), [1 1], sz/2) >= .5;
__ = imresize(___,method) specifies the interpolation method used.
By default, imresize uses bicubic interpolation.
if im not mistaken 'nearest' should work in this case.
https://www.mathworks.com/help/images/ref/imresize.html#inputarg_method

Will Matlab matrix change if reshaped once then reshaped back to the original size?

Basically I have an original 128 x 128 x 3 matrix which describes a RGB image (128 * 128 points with each points being a 1x3 vector containing Red, Green, and Blue intensity respectively) , and I also have 16 points chosen from them (this is the easy part), and now I want to do pairwise distance calculations between the 128 x 128 x 3 matrix and the 16 points.
But the problem is the Matlab function for that, "pdist2" only takes 2 matrices of size M1 x N and M2 x N and not anything else, so I plan to convert the 128 x 128 x 3 matrix to a (128 * 128) x 3 one. Later on, after some calculations with the newly transformed matrix, I need to convert it back to the original size in order to display the image and check the results. But I'm not sure if the elements will remain in their place or will they be shuffled around ? Please help me thank you very much!
From the documentation
The data type and number of elements in B are the same as the data type and number of elements in A. The elements in B preserve their column-wise ordering from A.
If you need the result to be the same size as the original, just store the size before the initial transformation and use this as the input to reshape after performing your manipulation of the data.
% Store the original size
originalSize = size(data);
% Reshape it to your new 2D array
data = reshape(data, [], 3);
% Do stuff
% Reshape it back to it's original size
data = reshape(data, originalSize);
In the 2D version, the elements won't technically be in the same position as they were in the 3D matrix because...well..it's 2D not 3D. But, if you reshape it back to the 3D (without moving elements around), the element ordering would be the same as the original 3D matrix.
Update
You can easily check this for yourself.
R = rand([10, 20, 3]);
isequal(R, reshape(reshape(R, [], 3), size(R)))
The reason for why this is is because reshape does not actually change the underlying data but rather the way in which it is accessed. We can easily check this by using format debug to see where the data is stored.
We can also use a little anonymous function I wrote to see where in memory a given variable is stored.
format debug;
memoryLocation = #(x)regexp(evalc('disp(x)'), '(?<=pr\s*=\s*)[a-z0-9]*', 'match')
Ok so let's create a matrix and check where MATLAB stored it in memory
A = rand(10);
memoryLocation(A)
% 7fa58f2ed9c0
Now let's reshape it and check the memory location again to see if it is in a different place (i.e. the order or values were modified)
B = reshape(A, [], 10);
memoryLocation(B)
% 7fa58f2ed9c0
As you can see, the memory location hasn't changed meaning that the ordering of elements has to be the same otherwise MATLAB would have needed to make a copy in memory.
Underlying data representation and why Suever's answer is correct:
The underlying array data in MATLAB is essentially an array of double precision floating point. In c++ it would be a:
double *array_data;
MATLAB stores data in column major format. If an arrow had n_rows rows, the element A_{i,j} (i.e. ith row, jth column (zero indexed) would be given by:
array_data[i + j * n_rows]
When you call reshape function, what changes are the variables n_rows, n_cols, etc.... It doesn't change array_data.
Example (no need to touch array data to resize array):
array_data = [1, 2, 3, 4, 5, 6];
With n_rows = 2 and column-major format, this would be:
A = [1, 3, 5
2, 4, 6]
A11 = array_data[0 + 0 * 2] = array_data[0] = 1
A21 = array_data[1 + 0 * 2] = array_data[1] = 2
A12 = array_data[0 + 1 * 2] = array_data[2] = 3
A22 = array_data[1 + 1 * 2] = array_data[3] = 4
A13 = array_data[0 + 2 * 2] = array_data[4] = 5
A23 = array_data[1 + 2 * 2] = array_data[5] = 6
With n_rows = 3 and the same underlying array_data, you would have:
A = [1, 4
2, 5
3, 6]
A11 = array_data[0 + 0 * 3] = array_data[0] = 1
A21 = array_data[1 + 0 * 3] = array_data[1] = 2
A31 = array_data[2 + 0 * 3] = array_data[2] = 3
A12 = array_data[0 + 1 * 3] = array_data[3] = 4
A22 = array_data[1 + 1 * 3] = array_data[4] = 5
A32 = array_data[2 + 1 * 3] = array_data[5] = 6
resize just changes n_rows, n_cols etc...

Print the value of a multidimensional array with the output as compatible matlab code

For matrices with dimensions equal or less then 2 the command is:
For instance:
>> mat2str(ones(2,2))
ans =
[1 1;1 1]
However, as the help states, this does not work for higher dimensions:
>> mat2str(rand(2,2,2))
Error using mat2str (line 49)
Input matrix must be 2-D.
How to output matrices with higher dimensions than 2 with that is code compatible, without resorting to custom made for loops?
This isn't directly possible because there is no built-in character to represent concatenation in the third dimension (an analog to the comma and semicolon in 2D). One potential workaround for this would be to perform mat2str on all "slices" in the third dimension and wrap them in a call to cat which, when executed, would concatenate all of the 2D matrices in the third dimension to recreate your input matrix.
M = reshape(1:8, [2 2 2]);
arrays = arrayfun(#(k)mat2str(M(:,:,k)), 1:size(M, 3), 'uni', 0);
result = ['cat(3', sprintf(', %s', arrays{:}), ')'];
result =
'cat(3, [1 3;2 4], [5 7;6 8])'
isequal(eval(result), M)
1
UPDATE
After thinking about this some more, a more elegant solution is to flatten the input matrix, run mat2str on that, and then in the string used to recreate the data, we utilize reshape combined with the original dimensions to provide a command which will recreate the data. This will work for any dimension of data.
result = sprintf('reshape(%s, %s);', mat2str(M(:)), mat2str(size(M)));
So for the following 4D input
M = randi([0 9], 1, 2, 3, 4);
result = sprintf('reshape(%s, %s);', mat2str(M(:)), mat2str(size(M)));
'reshape([6;9;4;6;5;2;6;1;7;2;1;7;2;1;6;2;2;8;3;1;1;3;8;5], [1 2 3 4]);'
Now if we reconstruct the data using this generated string, we can ensure that we get the correct data back.
Mnew = eval(result);
size(Mnew)
1 2 3 4
isequal(Mnew, M)
1
By specifying both the class and precision inputs to mat2str, we can even better approximate the input data including floating point numbers.
M = rand(1,2,3,4,5);
result = sprintf('reshape(%s, %s);', mat2str(M(:),64,'class'), mat2str(size(M)));
isequal(eval(result), M)
1

What does tf.nn.conv2d do in tensorflow?

I was looking at the docs of tensorflow about tf.nn.conv2d here. But I can't understand what it does or what it is trying to achieve. It says on the docs,
#1 : Flattens the filter to a 2-D matrix with shape
[filter_height * filter_width * in_channels, output_channels].
Now what does that do? Is that element-wise multiplication or just plain matrix multiplication? I also could not understand the other two points mentioned in the docs. I have written them below :
# 2: Extracts image patches from the the input tensor to form a virtual tensor of shape
[batch, out_height, out_width, filter_height * filter_width * in_channels].
# 3: For each patch, right-multiplies the filter matrix and the image patch vector.
It would be really helpful if anyone could give an example, a piece of code (extremely helpful) maybe and explain what is going on there and why the operation is like this.
I've tried coding a small portion and printing out the shape of the operation. Still, I can't understand.
I tried something like this:
op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]),
tf.random_normal([2,10,10,10]),
strides=[1, 2, 2, 1], padding='SAME'))
with tf.Session() as sess:
result = sess.run(op)
print(result)
I understand bits and pieces of convolutional neural networks. I studied them here. But the implementation on tensorflow is not what I expected. So it raised the question.
EDIT:
So, I implemented a much simpler code. But I can't figure out what's going on. I mean how the results are like this. It would be extremely helpful if anyone could tell me what process yields this output.
input = tf.Variable(tf.random_normal([1,2,2,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print("input")
print(input.eval())
print("filter")
print(filter.eval())
print("result")
result = sess.run(op)
print(result)
output
input
[[[[ 1.60314465]
[-0.55022103]]
[[ 0.00595062]
[-0.69889867]]]]
filter
[[[[-0.59594476]]]]
result
[[[[-0.95538563]
[ 0.32790133]]
[[-0.00354624]
[ 0.41650501]]]]
Ok I think this is about the simplest way to explain it all.
Your example is 1 image, size 2x2, with 1 channel. You have 1 filter, with size 1x1, and 1 channel (size is height x width x channels x number of filters).
For this simple case the resulting 2x2, 1 channel image (size 1x2x2x1, number of images x height x width x x channels) is the result of multiplying the filter value by each pixel of the image.
Now let's try more channels:
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here the 3x3 image and the 1x1 filter each have 5 channels. The resulting image will be 3x3 with 1 channel (size 1x3x3x1), where the value of each pixel is the dot product across channels of the filter with the corresponding pixel in the input image.
Now with a 3x3 filter
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here we get a 1x1 image, with 1 channel (size 1x1x1x1). The value is the sum of the 9, 5-element dot products. But you could just call this a 45-element dot product.
Now with a bigger image
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
The output is a 3x3 1-channel image (size 1x3x3x1).
Each of these values is a sum of 9, 5-element dot products.
Each output is made by centering the filter on one of the 9 center pixels of the input image, so that none of the filter sticks out. The xs below represent the filter centers for each output pixel.
.....
.xxx.
.xxx.
.xxx.
.....
Now with "SAME" padding:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This gives a 5x5 output image (size 1x5x5x1). This is done by centering the filter at each position on the image.
Any of the 5-element dot products where the filter sticks out past the edge of the image get a value of zero.
So the corners are only sums of 4, 5-element dot products.
Now with multiple filters.
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This still gives a 5x5 output image, but with 7 channels (size 1x5x5x7). Where each channel is produced by one of the filters in the set.
Now with strides 2,2:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
Now the result still has 7 channels, but is only 3x3 (size 1x3x3x7).
This is because instead of centering the filters at every point on the image, the filters are centered at every other point on the image, taking steps (strides) of width 2. The x's below represent the filter center for each output pixel, on the input image.
x.x.x
.....
x.x.x
.....
x.x.x
And of course the first dimension of the input is the number of images so you can apply it over a batch of 10 images, for example:
input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
This performs the same operation, for each image independently, giving a stack of 10 images as the result (size 10x3x3x7)
2D convolution is computed in a similar way one would calculate 1D convolution: you slide your kernel over the input, calculate the element-wise multiplications and sum them up. But instead of your kernel/input being an array, here they are matrices.
In the most basic example there is no padding and stride=1. Let's assume your input and kernel are:
When you use your kernel you will receive the following output: , which is calculated in the following way:
14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1
TF's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels] for the kernel it is [filter_height, filter_width, in_channels, out_channels]. So we need to provide the data in the correct format:
import tensorflow as tf
k = tf.constant([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
], dtype=tf.float32, name='k')
i = tf.constant([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel')
image = tf.reshape(i, [1, 4, 4, 1], name='image')
Afterwards the convolution is computed with:
res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
print sess.run(res)
And will be equivalent to the one we calculated by hand.
For examples with padding/strides, take a look here.
Just to add to the other answers, you should think of the parameters in
filter = tf.Variable(tf.random_normal([3,3,5,7]))
as '5' corresponding to the number of channels in each filter. Each filter is a 3d cube, with a depth of 5. Your filter depth must correspond to your input image's depth. The last parameter, 7, should be thought of as the number of filters in the batch. Just forget about this being 4D, and instead imagine that you have a set or a batch of 7 filters. What you do is create 7 filter cubes with dimensions (3,3,5).
It is a lot easier to visualize in the Fourier domain since convolution becomes point-wise multiplication. For an input image of dimensions (100,100,3) you can rewrite the filter dimensions as
filter = tf.Variable(tf.random_normal([100,100,3,7]))
In order to obtain one of the 7 output feature maps, we simply perform the point-wise multiplication of the filter cube with the image cube, then we sum the results across the channels/depth dimension (here it's 3), collapsing to a 2d (100,100) feature map. Do this with each filter cube, and you get 7 2D feature maps.
I tried to implement conv2d (for my studying). Well, I wrote that:
def conv(ix, w):
# filter shape: [filter_height, filter_width, in_channels, out_channels]
# flatten filters
filter_height = int(w.shape[0])
filter_width = int(w.shape[1])
in_channels = int(w.shape[2])
out_channels = int(w.shape[3])
ix_height = int(ix.shape[1])
ix_width = int(ix.shape[2])
ix_channels = int(ix.shape[3])
filter_shape = [filter_height, filter_width, in_channels, out_channels]
flat_w = tf.reshape(w, [filter_height * filter_width * in_channels, out_channels])
patches = tf.extract_image_patches(
ix,
ksizes=[1, filter_height, filter_width, 1],
strides=[1, 1, 1, 1],
rates=[1, 1, 1, 1],
padding='SAME'
)
patches_reshaped = tf.reshape(patches, [-1, ix_height, ix_width, filter_height * filter_width * ix_channels])
feature_maps = []
for i in range(out_channels):
feature_map = tf.reduce_sum(tf.multiply(flat_w[:, i], patches_reshaped), axis=3, keep_dims=True)
feature_maps.append(feature_map)
features = tf.concat(feature_maps, axis=3)
return features
Hope I did it properly. Checked on MNIST, had very close results (but this implementation is slower). I hope this helps you.
In addition to other answers, conv2d operation is operating in c++ (cpu) or cuda for gpu machines that requires to flatten and reshape data in certain way and use gemmBLAS or cuBLAS(cuda) matrix multiplication.
It's performing convulition throught the picture when you are trying for example image classifation thuis function has all the parameters need to do that.
When you are basically can chose the filter dimension. Strides. Padding. Before to used its need to undestant the concepts of convolution
this explanation complements:
Keras Conv2d own filters
I had some doubts about the filter parameters in keras.conv2d because when I learned I was supposed to set my own filter design. But this parameters tells how many filters to test and keras itself will try to find the best filters weights.