Incorrect results with vDSP_conv()

Incorrect results with vDSP_conv() - swift

I am getting inconsitent results when attempting to do convolution using vDSP_conv() from Accelerate when compared to the MATLAB implementation. There have been a couple of StackOverflow posts about weird results when using this function to calculate convolution, however as far as I can tell, I am using the framework correctly and have incorporated the suggestions from the other Stack Overflow posts. Here is my code:
public func conv(x: [Float], k: [Float]) -> [Float] {
let resultSize = x.count + k.count - 1
var result = [Float](count: resultSize, repeatedValue: 0)
let kEnd = UnsafePointer<Float>(k).advancedBy(k.count - 1)
let xPad: [Float] = [Float](count: (2*k.count)+1, repeatedValue: 0.0)
let xPadded = x + xPad
vDSP_conv(xPadded, 1, kEnd, -1, &result, 1, vDSP_Length(resultSize), vDSP_Length(k.count))
}
As far as I can tell, I am doing the correct zero padding as specified in the Accelerate framework documentation here
I defined two test arrays A: [Float] = [0, 0, 1, 0, 0] and B: [float] = [1, 0, 0].
In MATLAB, when I run conv(A, B), I get [0, 0, 1, 0, 0, 0, 0].
However, when I run the above vDSP conv() I get, [1, 0, 0, 0, 0, 0, 0].
What is wrong with my implementation? I have gone over this a number of times and looked through all the SO posts that I could find, and still haven't been able to account for this inconsistency.
Beyond that, is there a more efficient method to zero-pad the array then what I have here? In order to keep x immutable, I created the new xPadded array but there is undoubtedly a more efficient method of performing this padding.
** EDIT **
As suggested by Martin R, I padded k.count -1 equally at the beginning and end of the array as shown below.
public func conv(x: [Float], k: [Float]) -> [Float] {
let resultSize = x.count + k.count - 1
var result = [Float](count: resultSize, repeatedValue: 0)
let kEnd = UnsafePointer<Float>(k).advancedBy(k.count - 1)
let xPad: [Float] = [Float](count: k.count-1, repeatedValue: 0.0)
let xPadded = xPad + x + xPad
vDSP_conv(xPadded, 1, kEnd, -1, &result, 1, vDSP_Length(resultSize), vDSP_Length(k.count))
return result
}
Using this code, conv(A, B) still returns [1, 0, 0, 0, 0, 0, 0].
I am calling the function as shown below:
let A: [Float] = [0, 0, 1, 0, 0]
let B: [Float] = [1, 0, 0]
let C: [Float] = conv(A, k: B)

For two arrays A and B of length m and n,
the vDSP_conv() function from the Accelerate framework computes a new array of length m - n + 1.
This corresponds to the result of the MATLAB function conv() with the shape
parameter set to "valid":
Only those parts of the convolution that are computed without the zero-padded edges. ...
To get the same result as the with "full" convolution from MATLAB
you have to zero-pad the A array with n-1 elements at the beginning and the end, this gives a result array of length m + n - 1.
Applied to your function:
let xPad = Repeat(count: k.count - 1, repeatedValue: Float(0.0))
let xPadded = xPad + x + xPad
Using Repeat() might be slightly more performant because it
creates a sequence and not an array. But ultimately, a new array
has to be created as an argument to thevDSP_conv() function,
so there is not much room for improvement.

Some clarifications for the next poor soul who stumbles into this:
Apple provides some sample code on how to use vDSP_conv but it's pretty useless. In fact, it was confusing me because a comment in that code says that the input buffer needs to be padded without specifying where the actual input samples should be placed:
The SignalLength defined below is used to allocate space, and it is the filter length rounded up to a multiple of four elements and added to the result length.
SignalLength = (FilterLength+3 & -4u) + ResultLength;
So, the above formula gives you a different length (bigger) than the xPad + x + xPad where xPad is the k.count - 1.
The important thing is where in that padded buffer you copy your input (signal) samples: it needs to be at k.count - 1.
So, the above accepted solution works. But if you trust that comment in Apple's example (which BTW doesn't show up in the official docs) then you can do a compromise: use their formula (the SignalLength above) to calculate and allocate the padded buffer (it will be a bit larger) and use the k.count - 1 (i.e. filter length - 1) as the starting offset for your signal (x in this case). I did this and the results now match ippsConvolve_32f and Matlab.
(Sorry, this should have been a comment but I don't have enough reputation for that).

#MartinR I figured out why my code doesn't work with Arrays. I was writing this code in a project that was using Surge as a linked framework. Surge overloads the + operator for [Float] and [Double] arrays so that it becomes element-wise addition of array elements. So when I was doing x + xPad it wasn't extending the size of the array as expected, it was simply returning x as xPad only contained zeros. However, Surge had not overloaded the +operator for sequences, so using Repeat() successfully extended the array. Thanks for your help - never would have thought to try sequences!

Related

MATLAB vector operation. How to get previous element in vector to compute next element?

I have a vector A, say
A = [1, 0, 0, 0]
I want to perform an operation on this vector to get the next element. For example, say
A(i) = A(i - 1) * 5 [for i >= 2]
This can be easily achieved via a loop. But I want to achieve it by using vector operation. So far I have tried
A = [1, 0, 0, 0]
A(2:4) = A(1:3) * 5
But the content in A after this operation is coming as
A = [1 5 0 0]
The targeted answer should be
A = [1 5 25 125]
Please, mention the necessary changes to be made to achieve the target.
[Note: Please do not simply consider the above example as the elements which are power of 5 but consider A(i) = A(i - 1) * 5.]

how about that:
A(1)*5.^[0:numel(A)-1]

Quantizing an image in matlab

So I'm trying to figure out why my code doesn't seem to be displaying the properly uniformed quantized image into 4 levels.
Q1 =uint8(zeros(ROWS, COLS, CHANNELS));
for band = 1 : CHANNELS,
for x = 1 : ROWS,
for y = 1 : COLS,
Q1(ROWS,COLS,CHANNELS) = uint8(double(I1(ROWS,COLS,CHANNELS) / 2^4)*2^4);
end
end
end
No5 = figure;
imshow(Q1);
title('Part D: K = 4');

It is because you are not quantifying. You divide a double by 16, then multiply again by 16, then convert it to uint8. The right way to quantize is to divide by 16, throw away any decimals, then multiply by 16:
Q1 = uint8(floor(I1 / 16) * 16);
In the code snippet above, I assume I1 is a double. Convert it to double if its not: I1=double(I1).
Note that you don't need the loops, MATLAB will apply the operation to each element in the matrix.
Note also that if I1 is an integer type, you can do something like this:
Q1 = (uint8(I1) / 16) * 16;
but this is actually equivalent to replacing the floor by round in the first example. This means you get an uneven distribution of values: 0-7 are mapped to 0, 8-23 are mapped to 16, etc. and 248-255 are all mapped to 255 (not a multiple of 16!). That is, 8 numbers are mapped to 0, and 8 are mapped to 255, instead of mapping 16 numbers to each possible multiple of 16 as the floor code does.
The 16 in the code above means that there will be 256/16=16 different grey levels in the output. If you want a different number, say n, use 256/n instead of 16.

It's because you are using ROWS, COLS, CHANNELS as your index, it should be x,y,band. Also, the final multiplication of 2^4 has be after the uint8 cast otherwise no rounding ever takes place.
In practice you should avoid the for loops in Matlab since matrix operations are much faster. Replace your code with
Q1=uint8(double(I1/2^4))*2^4
No5 = figure;
imshow(Q1);
title('Part D: K = 4');

What does tf.nn.conv2d do in tensorflow?

I was looking at the docs of tensorflow about tf.nn.conv2d here. But I can't understand what it does or what it is trying to achieve. It says on the docs,
#1 : Flattens the filter to a 2-D matrix with shape
[filter_height * filter_width * in_channels, output_channels].
Now what does that do? Is that element-wise multiplication or just plain matrix multiplication? I also could not understand the other two points mentioned in the docs. I have written them below :
# 2: Extracts image patches from the the input tensor to form a virtual tensor of shape
[batch, out_height, out_width, filter_height * filter_width * in_channels].
# 3: For each patch, right-multiplies the filter matrix and the image patch vector.
It would be really helpful if anyone could give an example, a piece of code (extremely helpful) maybe and explain what is going on there and why the operation is like this.
I've tried coding a small portion and printing out the shape of the operation. Still, I can't understand.
I tried something like this:
op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]),
tf.random_normal([2,10,10,10]),
strides=[1, 2, 2, 1], padding='SAME'))
with tf.Session() as sess:
result = sess.run(op)
print(result)
I understand bits and pieces of convolutional neural networks. I studied them here. But the implementation on tensorflow is not what I expected. So it raised the question.
EDIT:
So, I implemented a much simpler code. But I can't figure out what's going on. I mean how the results are like this. It would be extremely helpful if anyone could tell me what process yields this output.
input = tf.Variable(tf.random_normal([1,2,2,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print("input")
print(input.eval())
print("filter")
print(filter.eval())
print("result")
result = sess.run(op)
print(result)
output
input
[[[[ 1.60314465]
[-0.55022103]]
[[ 0.00595062]
[-0.69889867]]]]
filter
[[[[-0.59594476]]]]
result
[[[[-0.95538563]
[ 0.32790133]]
[[-0.00354624]
[ 0.41650501]]]]

Ok I think this is about the simplest way to explain it all.
Your example is 1 image, size 2x2, with 1 channel. You have 1 filter, with size 1x1, and 1 channel (size is height x width x channels x number of filters).
For this simple case the resulting 2x2, 1 channel image (size 1x2x2x1, number of images x height x width x x channels) is the result of multiplying the filter value by each pixel of the image.
Now let's try more channels:
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here the 3x3 image and the 1x1 filter each have 5 channels. The resulting image will be 3x3 with 1 channel (size 1x3x3x1), where the value of each pixel is the dot product across channels of the filter with the corresponding pixel in the input image.
Now with a 3x3 filter
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
Here we get a 1x1 image, with 1 channel (size 1x1x1x1). The value is the sum of the 9, 5-element dot products. But you could just call this a 45-element dot product.
Now with a bigger image
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
The output is a 3x3 1-channel image (size 1x3x3x1).
Each of these values is a sum of 9, 5-element dot products.
Each output is made by centering the filter on one of the 9 center pixels of the input image, so that none of the filter sticks out. The xs below represent the filter centers for each output pixel.
.....
.xxx.
.xxx.
.xxx.
.....
Now with "SAME" padding:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This gives a 5x5 output image (size 1x5x5x1). This is done by centering the filter at each position on the image.
Any of the 5-element dot products where the filter sticks out past the edge of the image get a value of zero.
So the corners are only sums of 4, 5-element dot products.
Now with multiple filters.
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
This still gives a 5x5 output image, but with 7 channels (size 1x5x5x7). Where each channel is produced by one of the filters in the set.
Now with strides 2,2:
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
Now the result still has 7 channels, but is only 3x3 (size 1x3x3x7).
This is because instead of centering the filters at every point on the image, the filters are centered at every other point on the image, taking steps (strides) of width 2. The x's below represent the filter center for each output pixel, on the input image.
x.x.x
.....
x.x.x
.....
x.x.x
And of course the first dimension of the input is the number of images so you can apply it over a batch of 10 images, for example:
input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')
This performs the same operation, for each image independently, giving a stack of 10 images as the result (size 10x3x3x7)

2D convolution is computed in a similar way one would calculate 1D convolution: you slide your kernel over the input, calculate the element-wise multiplications and sum them up. But instead of your kernel/input being an array, here they are matrices.
In the most basic example there is no padding and stride=1. Let's assume your input and kernel are:
When you use your kernel you will receive the following output: , which is calculated in the following way:
14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1
TF's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels] for the kernel it is [filter_height, filter_width, in_channels, out_channels]. So we need to provide the data in the correct format:
import tensorflow as tf
k = tf.constant([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
], dtype=tf.float32, name='k')
i = tf.constant([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel')
image = tf.reshape(i, [1, 4, 4, 1], name='image')
Afterwards the convolution is computed with:
res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
print sess.run(res)
And will be equivalent to the one we calculated by hand.
For examples with padding/strides, take a look here.

Just to add to the other answers, you should think of the parameters in
filter = tf.Variable(tf.random_normal([3,3,5,7]))
as '5' corresponding to the number of channels in each filter. Each filter is a 3d cube, with a depth of 5. Your filter depth must correspond to your input image's depth. The last parameter, 7, should be thought of as the number of filters in the batch. Just forget about this being 4D, and instead imagine that you have a set or a batch of 7 filters. What you do is create 7 filter cubes with dimensions (3,3,5).
It is a lot easier to visualize in the Fourier domain since convolution becomes point-wise multiplication. For an input image of dimensions (100,100,3) you can rewrite the filter dimensions as
filter = tf.Variable(tf.random_normal([100,100,3,7]))
In order to obtain one of the 7 output feature maps, we simply perform the point-wise multiplication of the filter cube with the image cube, then we sum the results across the channels/depth dimension (here it's 3), collapsing to a 2d (100,100) feature map. Do this with each filter cube, and you get 7 2D feature maps.

I tried to implement conv2d (for my studying). Well, I wrote that:
def conv(ix, w):
# filter shape: [filter_height, filter_width, in_channels, out_channels]
# flatten filters
filter_height = int(w.shape[0])
filter_width = int(w.shape[1])
in_channels = int(w.shape[2])
out_channels = int(w.shape[3])
ix_height = int(ix.shape[1])
ix_width = int(ix.shape[2])
ix_channels = int(ix.shape[3])
filter_shape = [filter_height, filter_width, in_channels, out_channels]
flat_w = tf.reshape(w, [filter_height * filter_width * in_channels, out_channels])
patches = tf.extract_image_patches(
ix,
ksizes=[1, filter_height, filter_width, 1],
strides=[1, 1, 1, 1],
rates=[1, 1, 1, 1],
padding='SAME'
)
patches_reshaped = tf.reshape(patches, [-1, ix_height, ix_width, filter_height * filter_width * ix_channels])
feature_maps = []
for i in range(out_channels):
feature_map = tf.reduce_sum(tf.multiply(flat_w[:, i], patches_reshaped), axis=3, keep_dims=True)
feature_maps.append(feature_map)
features = tf.concat(feature_maps, axis=3)
return features
Hope I did it properly. Checked on MNIST, had very close results (but this implementation is slower). I hope this helps you.

In addition to other answers, conv2d operation is operating in c++ (cpu) or cuda for gpu machines that requires to flatten and reshape data in certain way and use gemmBLAS or cuBLAS(cuda) matrix multiplication.

It's performing convulition throught the picture when you are trying for example image classifation thuis function has all the parameters need to do that.
When you are basically can chose the filter dimension. Strides. Padding. Before to used its need to undestant the concepts of convolution

this explanation complements:
Keras Conv2d own filters
I had some doubts about the filter parameters in keras.conv2d because when I learned I was supposed to set my own filter design. But this parameters tells how many filters to test and keras itself will try to find the best filters weights.

Sequence in MATLAB

Write a single MATLAB expression to generate a vector that contains first 100 terms of the following sequence: 2, -4, 8, -16, 32, …
My attempt :
n = -1
for i = 1:100
n = n * 2
disp(n)
end
The problem is that all values of n is not displayed in a single (1 x 100) vector. Neither the alternating positive and negative terms are shown. How to do that ?

You are having a geometric series where r = -2.
To produce 2, -4, 8, -16, 32, type this:
>>-(-2).^[1:5]
2, -4, 8, -16, 32
You can change the value of 5 accordingly.

Though there are better methods, as mentioned in the answer by #lakesh. I will point out the mistakes in your code.
By typing n = n * 2, how can it become a vector?
By doing n=n * 2, you are going to generate -2, -4, -8, -16, ...
Therefore, the correct code should be:
n = -1
for i = 2:101 % 1 extra term since first term has to be discarded later
n(i) = -n(i-1) * 2;
disp(n)
end
You can discard first element of n, to get the exact series you want.
n(end)=[];

Create symbolic matrix with function elements

I'm trying to create a nxm matrix with elements that are functions of other symbolic variables (in this case the time t) with the following code:
syms t x(t) L
N = [ 0, 0, ...
0, 0;
0, 0, ...
0, 0;
1 - 3*(x/L)^2 + 2*(x/L)^3, -x + 2*x^2/L - x^3/(L^2), ...
3*(x/L)^2 - 2*(x/L)^3, x^2/L - x^3/(L^2)];
The problem I have is that MATLAB converts the matrix N into a function, i.e. N(t). When I try to access a specific member
N(1, 1)
or submatrix
N(1, 3:4)
MATLAB trows the following error:
Symbolic function expected 1 inputs and received 2.
I understand the error message but it's not what I was expecting from the code. I dont want a symbolic matrix depending on t and I don't understand MATLABS behaviour in this case (for example why isn't N also a function of L or whatever). A solution is to create an zero symbolic matrix with
N = sym(zeros(3, 4));
and manually fill the elements
N(3, 1) = 1 - 3*(x/L)^2 + 2*(x/L)^3;
N(3, 2) = -x + 2*x^2/L - x^3/(L^2);
N(3, 3) = 3*(x/L)^2 - 2*(x/L)^3;
N(3, 4) = x^2/L - x^3/(L^2);
But as you can see this approach results in a lot of unecessary code. So, what is wrong with my first approach?

When you define x(t) it ends up as a symbolic function (symfun) instead of a symbolic object due to its dependency on t. This dependency is then carried over to your matrix N, making it a symbolic function dependent on t (which explains why it is only dependent on t and not L).
>> syms t x(t) L
>> N = ...
>> whos
Name Size Bytes Class Attributes
L 1x1 112 sym
t 1x1 112 sym
x 1x1 112 symfun
N 1x1 112 symfun
You can avoid the automatic conversion to symfun by the workarounds you do above, or you can define it explicitly when you create you matrix N, like this:
>> N = sym(char([ 0, 0, ...
0, 0;
0, 0, ...
0, 0;
1 - 3*(x/L)^2 + 2*(x/L)^3, -x + 2*x^2/L - x^3/(L^2), ...
3*(x/L)^2 - 2*(x/L)^3, x^2/L - x^3/(L^2)]));
The trick here is the combined use of the sym() and char() functions. If you only use sym() without turning the matrix into a string it won't work.
That being said, I personally find your second approach where you manually fill the elements to be more clear and easier to read.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Incorrect results with vDSP_conv() - swift

Related

MATLAB vector operation. How to get previous element in vector to compute next element?

Quantizing an image in matlab

What does tf.nn.conv2d do in tensorflow?

Sequence in MATLAB

Create symbolic matrix with function elements

Categories

Resources