2-D convolution as a matrix-matrix multiplication [closed] - neural-network

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I know that, in the 1D case, the convolution between two vectors, a and b, can be computed as conv(a, b), but also as the product between the T_a and b, where T_a is the corresponding Toeplitz matrix for a.
Is it possible to extend this idea to 2D?
Given a = [5 1 3; 1 1 2; 2 1 3] and b=[4 3; 1 2], is it possible to convert a in a Toeplitz matrix and compute the matrix-matrix product between T_a and b as in the 1-D case?

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x and 2d kernel k and you want to calculate the convolution x * k. Also let's assume that k is already flipped. Let's also assume that x is of size n×n and k is m×m.
So you unroll k into a sparse matrix of size (n-m+1)^2 × n^2, and unroll x into a long vector n^2 × 1. You compute a multiplication of this sparse matrix with a vector and convert the resulting vector (which will have a size (n-m+1)^2 × 1) into a n-m+1 square matrix.
I am pretty sure this is hard to understand just from reading. So here is an example for 2×2 kernel and 3×3 input.
*
Here is a constructed matrix with a vector:
which is equal to .
And this is the same result you would have got by doing a sliding window of k over x.

1- Define Input and Filter
Let I be the input signal and F be the filter or kernel.
2- Calculate the final output size
If the I is m1 x n1 and F is m2 x n2 the size of the output will be:
3- Zero-pad the filter matrix
Zero pad the filter to make it the same size as the output.
4- Create Toeplitz matrix for each row of the zero-padded filter
5- Create a doubly blocked Toeplitz matrix
Now all these small Toeplitz matrices should be arranged in a big doubly blocked Toeplitz matrix.
6- Convert the input matrix to a column vector
7- Multiply doubly blocked toeplitz matrix with vectorized input signal
This multiplication gives the convolution result.
8- Last step: reshape the result to a matrix form
For more details and python code take a look at my github repository:
Step by step explanation of 2D convolution implemented as matrix multiplication using toeplitz matrices in python

If you unravel k to a m^2 vector and unroll X, you would then get:
a m**2 vectork
a ((n-m)**2, m**2) matrix for unrolled_X
where unrolled_X could be obtained by the following Python code:
from numpy import zeros
def unroll_matrix(X, m):
flat_X = X.flatten()
n = X.shape[0]
unrolled_X = zeros(((n - m) ** 2, m**2))
skipped = 0
for i in range(n ** 2):
if (i % n) < n - m and ((i / n) % n) < n - m:
for j in range(m):
for l in range(m):
unrolled_X[i - skipped, j * m + l] = flat_X[i + j * n + l]
else:
skipped += 1
return unrolled_X
Unrolling X and not k allows a more compact representation (smaller matrices) than the other way around for each X - but you need to unroll each X. You could prefer unrolling k depending on what you want to do.
Here, the unrolled_X is not sparse, whereas unrolled_k would be sparse, but of size ((n-m+1)^2,n^2) as #Salvador Dali mentioned.
Unrolling k could be done like this:
from scipy.sparse import lil_matrix
from numpy import zeros
import scipy
def unroll_kernel(kernel, n, sparse=True):
m = kernel.shape[0]
if sparse:
unrolled_K = lil_matrix(((n - m)**2, n**2))
else:
unrolled_K = zeros(((n - m)**2, n**2))
skipped = 0
for i in range(n ** 2):
if (i % n) < n - m and((i / n) % n) < n - m:
for j in range(m):
for l in range(m):
unrolled_K[i - skipped, i + j * n + l] = kernel[j, l]
else:
skipped += 1
return unrolled_K

The code shown above doesn't produce the unrolled matrix of the right dimensions. The dimension should be (n-k+1)*(m-k+1), (k)(k). k: filter dimension, n: num rows in input matrix, m: num columns.
def unfold_matrix(X, k):
n, m = X.shape[0:2]
xx = zeros(((n - k + 1) * (m - k + 1), k**2))
row_num = 0
def make_row(x):
return x.flatten()
for i in range(n- k+ 1):
for j in range(m - k + 1):
#collect block of m*m elements and convert to row
xx[row_num,:] = make_row(X[i:i+k, j:j+k])
row_num = row_num + 1
return xx
For more details, see my blog post:
http://www.telesens.co/2018/04/09/initializing-weights-for-the-convolutional-and-fully-connected-layers/

Related

Discatenate a column vector to get back to its original square matrix in MATLAB

I had to convert an n x n matrix to n^2 x 1 column vector for ease of some operations. Now, that the operations are done, how do I return to the n x n form from the n^2 x 1 vector.
It is supposed to be opposite of this: concatenation
Thanks!
You can use the reshape() function:
//M is your n^2 x 1 column vector, A is your nxn matrix that you want to recover
A = reshape(M, [n n])
If your n x n matrix is 3x3, then:
A = reshape(M, [3 3])
For more info: http://www.mathworks.com/help/matlab/ref/reshape.html

"Flatten" a 3D Matrix with L2 Norm Reduction

I have a n x m x d matrix A (i.e. A is like d n x m matrices). I would like to convert this into one n x m matrix B where each element B(i,j) is function of A(i,j,1), ..., A(i,j,d), more specifically the L2 norm of these values:
B(i,j) = sqrt[A(i,j,1)^2 + ... + A(i,j,d)^2]
Meaning I would like to condens or "flatten" the information in matrix A. How can I achieve this without resorting to a nested for loop?
Do elementwise squaring and sum along the third dimension to produce a N x M matrix and then apply square-root for a vectorized implementation, like so -
B = sqrt(sum(A.^2,3))

How to sum a sub-tensor of high dimention tensor in Matlab?

We are given a D-dimensional tensor, represented as a vector of size n^D.
The vector represents a D-dimensional distribution of a random variable X \in {0,1,..,n}^d. That is the (i_1,i_2,...,i_d) entry in the tensor represents the probability of X_1 = i_1, X_2 = i_2, ... X_d = i_d.
I need to compute, for each dimension d, and value i\in [n] the marginal distribution P(X_d = i).
i.e., this means that the answer of P(X_d = i) is the sum of n^(D-1) entries of the vector.
For example, if D=2 and n=4, we have a vector x of size (16,1) and the probability of the first dimension being equal to 1 is
P(X_1 = 1) = x(1) + x(2) + x(3) + x(4)
The probability of the second dimension being equal to 3 is '
P(X_2 = 3) = x(3) + x(7) + x(11) + x(15)
I'm writing Matlab code that needs to compute these marginal distributions, but I'm not familiar enough with Matlab to do it in a simple way (it is doable using some ugly recursion, but there has to be a better option).
To calculate P(X_k=z) for a D-dimensional matrix you can use
xD = reshape(x, n*ones(1,D));
B = permute(xD, [k setdiff(1:D, k)]);
P = sum(B(z,:));
It first makes it a D-dimensional matrix. It brings the dimension of interest k to the beginning and then chooses the z-th element and sums over elements corresponding to that.
Mohsen Nosratinia's answer would be my first option. As an alternative, it can be done without reshaping or permuting dimensions, which can result in faster code:
k = 2; %// chosen dimension
z = 3; %// chosen value (along d-th dimension)
result = sum(x(mod(floor((0:end-1)/n^(k-1)), n)==z-1));

How to compute cosine similarity using two matrices

I have two matrices, A (dimensions M x N) and B (N x P). In fact, they are collections of vectors - row vectors in A, column vectors in B. I want to get cosine similarity scores for every pair a and b, where a is a vector (row) from matrix A and b is a vector (column) from matrix B.
I have started by multiplying the matrices, which results in matrix C (dimensions M x P).
C = A*B
However, to obtain cosine similarity scores, I need to divide each value C(i,j) by the norm of the two corresponding vectors. Could you suggest the easiest way to do this in Matlab?
The simplest solution would be computing the norms first using element-wise multiplication and summation along the desired dimensions:
normA = sqrt(sum(A .^ 2, 2));
normB = sqrt(sum(B .^ 2, 1));
normA and normB are now a column vector and row vector, respectively. To divide corresponding elements in A * B by normA and normB, use bsxfun like so:
C = bsxfun(#rdivide, bsxfun(#rdivide, A * B, normA), normB);
You can use scipy to compute it very easily.
from scipy.spatial import distance
cosine_sim = 1 - sp.distance.cdist(A, B, 'cosine')
All you need to do is pass your 2D matrices in above formula and spicy will return you numpy array.
Refer doc here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

Solve matrix equation in matlab

I have an equation of the type c = Ax + By where c, x and y are vectors of dimensions say 50,000 X 1, and A and B are matrices with dimensions 50,000 X 50,000.
Is there any way in Matlab to find matrices A and B when c, x and y are known?
I have about 100,000 samples of c, x, and y. A and B remain the same for all.
Let X be the collection of all 100,000 xs you got (such that the i-th column of X equals the x_i-th vector).
In the same manner we can define Y and C as 2D collections of ys and cs respectively.
What you wish to solve is for A and B such that
C = AX + BY
You have 2 * 50,000^2 unknowns (all entries of A and B) and numel(C) equations.
So, if the number of data vectors you have is 100,000 you have a single solution (up to linearly dependent samples). If you have more than 100,000 samples you may seek for a least-squares solution.
Re-writing:
C = [A B] * [X ; Y] ==> [X' Y'] * [A';B'] = C'
So, I suppose
[A' ; B'] = pinv( [X' Y'] ) * C'
In matlab:
ABt = pinv( [X' Y'] ) * C';
A = ABt(1:50000,:)';
B = ABt(50001:end,:)';
Correct me if I'm wrong...
EDIT:
It seems like there is quite a fuss around dimensionality here. So, I'll try and make it as clear as possible.
Model: There are two (unknown) matrices A and B, each of size 50,000x50,000 (total 5e9 unknowns).
An observation is a triplet of vectors: (x,y,c) each such vector has 50,000 elements (total of 150,000 observed points at each sample). The underlying model assumption is that an observation is generated by c = Ax + By in this model.
The task: given n observations (that is n triplets of vectors { (x_i, y_i, c_i) }_i=1..n) the task is to uncover A and B.
Now, each sample (x_i,y_i,c_i) induces 50,000 equations of the form c_i = Ax_i + By_i in the unknown A and B. If the number of samples n is greater than 100,000, then there are more than 50,000 * 100,000 ( > 5e9 ) equations and the system is over constraint.
To write the system in a matrix form I proposed to stack all observations into matrices:
A matrix X of size 50,000 x n with its i-th column equals to observed x_i
A matrix Y of size 50,000 x n with its i-th column equals to observed y_i
A matrix C of size 50,000 x n with its i-th column equals to observed c_i
With these matrices we can write the model as:
C = A*X + B*Y
I hope this clears things up a bit.
Thank you #Dan and #woodchips for your interest and enlightening comments.
EDIT (2):
Submitting the following code to octave. In this example instead of 50,000 dimension I work with only 2, instead of n=100,000 observations I settled for n=100:
n = 100;
A = rand(2,2);
B = rand(2,2);
X = rand(2,n);
Y = rand(2,n);
C = A*X + B*Y + .001*randn(size(X)); % adding noise to observations
ABt = pinv( [ X' Y'] ) * C';
Checking the difference between ground truth model (A and B) and recovered ABt:
ABt - [A' ; B']
Yields
ans =
5.8457e-05 3.0483e-04
1.1023e-04 6.1842e-05
-1.2277e-04 -3.2866e-04
-3.1930e-05 -5.2149e-05
Which is close enough to zero. (remember, the observations were noisy and solution is a least-square one).