I'm trying to convert a convolution layer to a fully-connected layer.
For example, there is an example of 3×3 input and 2x2 kernel:
which is equivalent to a vector-matrix multiplication,
Is there a function in PyTorch to get the matrix B?
I can only partially answer your question:
In your example above, you write the kernel as matrix and the input as a vector. If you are fine with writing the input as a matrix, you can use torch.nn.Unfold which explicitly calculates a convolution in the documentation:
# Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
inp = torch.randn(1, 3, 10, 12)
w = torch.randn(2, 3, 4, 5)
inp_unf = torch.nn.functional.unfold(inp, (4, 5))
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = out_unf.view(1, 2, 7, 8)
(torch.nn.functional.conv2d(inp, w) - out).abs().max()
# tensor(1.9073e-06)
If, however, you need to calculate the matrix for the kernel (the smaller matrix) you can use this function, which is based on Warren Weckessers answer:
def toeplitz_1_ch(kernel, input_size):
# shapes
k_h, k_w = kernel.shape
i_h, i_w = input_size
o_h, o_w = i_h-k_h+1, i_w-k_w+1
# construct 1d conv toeplitz matrices for each row of the kernel
toeplitz = []
for r in range(k_h):
toeplitz.append(linalg.toeplitz(c=(kernel[r,0], *np.zeros(i_w-k_w)), r=(*kernel[r], *np.zeros(i_w-k_w))) )
# construct toeplitz matrix of toeplitz matrices (just for padding=0)
h_blocks, w_blocks = o_h, i_h
h_block, w_block = toeplitz[0].shape
W_conv = np.zeros((h_blocks, h_block, w_blocks, w_block))
for i, B in enumerate(toeplitz):
for j in range(o_h):
W_conv[j, :, i+j, :] = B
W_conv.shape = (h_blocks*h_block, w_blocks*w_block)
return W_conv
which is not in pytorch but in numpy. This is for padding = 0 but can easily be adjusted by changing h_blocks and w_blocks and W_conv[i+j, :, j, :].
Update: Multiple output channels are just multiple of these matrices, as each output has its own kernel. Multiple input channels also have their own kernels - and their own matrices - over which you average after the convolution. This can be implemented as follows:
def conv2d_toeplitz(kernel, input):
"""Compute 2d convolution over multiple channels via toeplitz matrix
Args:
kernel: shape=(n_out, n_in, H_k, W_k)
input: shape=(n_in, H_i, W_i)"""
kernel_size = kernel.shape
input_size = input.shape
output_size = (kernel_size[0], input_size[1] - (kernel_size[1]-1), input_size[2] - (kernel_size[2]-1))
output = np.zeros(output_size)
for i,ks in enumerate(kernel): # loop over output channel
for j,k in enumerate(ks): # loop over input channel
T_k = toeplitz_1_ch(k, input_size[1:])
output[i] += T_k.dot(input[j].flatten()).reshape(output_size[1:]) # sum over input channels
return output
To check the correctness:
k = np.random.randn(4*3*3*3).reshape((4,3,3,3))
i = np.random.randn(3,7,9)
out = conv2d_toeplitz(k, i)
# check correctness of convolution via toeplitz matrix
print(np.sum((out - F.conv2d(torch.tensor(i).view(1,3,7,9), torch.tensor(k)).numpy())**2))
>>> 1.0063523219807736e-28
Update 2:
It is also possible to do this without looping in one matrix:
def toeplitz_mult_ch(kernel, input_size):
"""Compute toeplitz matrix for 2d conv with multiple in and out channels.
Args:
kernel: shape=(n_out, n_in, H_k, W_k)
input_size: (n_in, H_i, W_i)"""
kernel_size = kernel.shape
output_size = (kernel_size[0], input_size[1] - (kernel_size[1]-1), input_size[2] - (kernel_size[2]-1))
T = np.zeros((output_size[0], int(np.prod(output_size[1:])), input_size[0], int(np.prod(input_size[1:]))))
for i,ks in enumerate(kernel): # loop over output channel
for j,k in enumerate(ks): # loop over input channel
T_k = toeplitz_1_ch(k, input_size[1:])
T[i, :, j, :] = T_k
T.shape = (np.prod(output_size), np.prod(input_size))
return T
The input has to be flattened and the output reshaped after multiplication.
Checking for correctness (using the same i and k as above):
T = toeplitz_mult_ch(k, i.shape)
out = T.dot(i.flatten()).reshape((1,4,5,7))
# check correctness of convolution via toeplitz matrix
print(np.sum((out - F.conv2d(torch.tensor(i).view(1,3,7,9), torch.tensor(k)).numpy())**2))
>>> 1.5486060830252635e-28
You can use my code for convolution with circular padding:
import numpy as np
import scipy.linalg as linalg
def toeplitz_1d(k, x_size):
k_size = k.size
r = *k[(k_size // 2):], *np.zeros(x_size - k_size), *k[:(k_size // 2)]
c = *np.flip(k)[(k_size // 2):], *np.zeros(x_size - k_size), *np.flip(k)[:(k_size // 2)]
t = linalg.toeplitz(c=c, r=r)
return t
def toeplitz_2d(k, x_size):
k_h, k_w = k.shape
i_h, i_w = x_size
ks = np.zeros((i_w, i_h * i_w))
for i in range(k_h):
ks[:, i*i_w:(i+1)*i_w] = toeplitz_1d(k[i], i_w)
ks = np.roll(ks, -i_w, 1)
t = np.zeros((i_h * i_w, i_h * i_w))
for i in range(i_h):
t[i*i_h:(i+1)*i_h,:] = ks
ks = np.roll(ks, i_w, 1)
return t
def toeplitz_3d(k, x_size):
k_oc, k_ic, k_h, k_w = k.shape
i_c, i_h, i_w = x_size
t = np.zeros((k_oc * i_h * i_w, i_c * i_h * i_w))
for o in range(k_oc):
for i in range(k_ic):
t[(o * (i_h * i_w)):((o+1) * (i_h * i_w)), (i * (i_h * i_w)):((i+1) * (i_h * i_w))] = toeplitz_2d(k[o, i], (i_h, i_w))
return t
if __name__ == "__main__":
import torch
k = np.random.randint(50, size=(3, 2, 3, 3))
x = np.random.randint(50, size=(2, 5, 5))
t = toeplitz_3d(k, x.shape)
y = t.dot(x.flatten()).reshape(3, 5, 5)
xx = torch.nn.functional.pad(torch.from_numpy(x.reshape(1, 2, 5, 5)), pad=(1, 1, 1, 1), mode='circular')
yy = torch.conv2d(xx, torch.from_numpy(k))
err = ((y - yy.numpy()) ** 2).sum()
print(err)
While the other answers are correct, there is a faster way. In your example, you give an input of size 3x3 with a kernel of size 2x2. And your resulting circulant matrix multiplied by the input image is 9x9x4 operations, or 324 in total. Here is a method that does this with 4 x 4 x 4, or 64 operations in total. We will use Pytorch, but this could be done in Numpy, as well.
Assume an image input of shape (batch, channels, height, width):
import torch
def get_kernel_inputs(image, kernel):
out = torch.empty(image.size()[0], 0, 1, kernel.size()[-2] * kernel.size()[-1])
for k in range(image.size()[-2] - kernel.size()[-2] + 1):
for l in range(image.size()[-1] - kernel.size()[-1] + 1):
out = torch.cat([out,image[:, :, k:k+kernel.size()[-2],l:l + kernel.size()[-1]].reshape(image.size()[0], -1, 1, kernel.size()[-1] * kernel.size()[-2])], dim=1)
return out
Now let's test to see what size out this gives:
img = torch.rand(1, 1, 3, 3)
kernel = torch.rand(2, 2)
kernelized_img = get_kernel_inputs(img, kernel)
print(kernelized_img.size())
This yields a size of:
torch.Size([1, 4, 1, 4])
So there are 16 values stored in the above tensor. Now let's matrix multiply:
print(torch.matmul(kernelized_img, kernel.view(4)))
This is 16 x 4 multiplications.
Finally, let's test that this is, in fact, giving out the correct value by using the Torch Conv2d module:
import torch.nn as nn
mm = nn.Conv2d(1, 1, (2,2), bias=False)
with torch.no_grad():
kernel_test = mm.weight
print("Control ", mm(img))
print("Test", torch.matmul(kernelized_img, kernel_test.view(4)).view(1, 1, 2, 2))
Control tensor([[[[-0.0089, 0.0178],
[-0.1419, 0.2720]]]], grad_fn=<ThnnConv2DBackward>)
Test tensor([[[[-0.0089, 0.0178],
[-0.1419, 0.2720]]]], grad_fn=<ViewBackward>)
All we are doing differently in the above is reshaping the image instead of the kernel.
Setting the image height and width equal and the kernel height and width equal, where
i=image height/width
k=kernel height/width
Then the difference in the number of calculations in the Toeplitz method vs. the above method is:
Edit Addition:
The above implementation only worked on single-channel inputs. For this definition to work on multiple channel inputs and outputs, plus handle batches, can do the following:
def get_kernel_inputs(image, kernel):
out=torch.empty(image.size()[0], image.size()[1], 0, kernel.size()[-2]*kernel.size()[-1])
out_size=[image.size()[-2]-kernel.size()[-2]+1,(image.size()[-1]-kernel.size()[-1]+1)]
for k in range(out_size[0]):
for l in range(out_size[1]):
out=torch.cat([out,image[:,:,k:k+kernel.size()[-2],l:l+kernel.size()[-1]].reshape(image.size()[0],-1,1,kernel.size()[-1]*kernel.size()[-2])],dim=2)
preout=out.permute(0,2,1,3).reshape(image.size()[0],-1,image.size()[1]*kernel.size()[-2]*kernel.size()[-1])
kernel1 = kernel.view(kernel.size()[0], -1)
out = torch.matmul(preout, kernel1.T).permute(0, 2, 1).reshape(image.size()[0], kernel.size()[0],
out_size[0], out_size[1])
return out
images=torch.rand(5, 3, 32, 32)
mm=nn.Conv2d(3, 32, (3, 3), bias=False)
#Set the kernel to Conv2d init for testing
with torch.no_grad():
kernel=mm.weight
print(get_kernel_inputs(images, kernel))
print(mm(images))
I am trying to pass three cell2mat arrays for I,V,H through this function and plot the result of the parameter from the nlinfit model below. But when the code is run it plots nothing and only stores one value. Any help is appreciated:)
function [Icp] = Fraunhofer_Function(I,V,H)
V1 = #(b,I)(b(1).*sign(I).*real(sqrt(I.^2 - (sign(I).*( (b(2)+b(3)/2) )).^2)) + b(4));
Vthresx = find(V<=1e-3 & V>=0);
Ithresvec = max(I(Vthresx));
Voffsetx = find(I<=0.1e-3 & I>=-.1e-3);
Voffset = max(V(Voffsetx));
Rn = (max(V)-min(V))/(max(I)-min(I));
beta1 = [Rn; Ithresvec; -Ithresvec; Voffset]; %Init values b1=Rn b2 = Icp, b3 = Icm, b4 = Voffset
opts = statset('MaxIter', 500000, 'MaxFunEvals', 100000, 'RobustWgtFun', 'andrews');
B1 = nlinfit(I, V, V1, beta1, opts ); %Fit
Icp = V1(B1,V);
end
files = dir('*.xlsx*');
for k =1:length(files)
filenames = files(k).name;
txt = 'I,V,H';
[num,txt,raw] = xlsread(filenames);
%Put data into numerical columns
Idata = num(:,1)'; Vdata = num(:,2)'; Hdata = num(:,3)';
[Hu,~,idx] = unique(Hdata);
Isplit = splitapply(#(x) {x}, [Idata(:)],idx);
Vsplit = splitapply(#(x) {x}, [Vdata(:)],idx);
Hsplit = splitapply(#(x) {x}, [Hdata(:)],idx);
for l = 1:length(Isplit)
I = (Isplit{l,1});
V = (Vsplit{l,1});
H = (Hsplit{l,1});
%fit the data to the functional form
Icp = Fraunhofer_Function(I,V,H);
end
end
Example of the I,V,H, data is below:enter image description here
Right now you are just setting Icp to the second estimated coefficient. From the Matlab docs nlinfit, the output beta:
beta — Estimated regression coefficients
vector
Estimated regression coefficients, returned as a vector. The number of elements in beta equals the number of elements in beta0.
So to use the estimated parameters, you should call your modelfun, with the parameters stored in B1:
Icp = V1(B1,V);
plot(H,Icp);
My understanding was, that these three should be the same, however matlab gives completely different results. The first and third one are in sync with what I calculated by hand, the second one is different.
x_1 = [1, 2, 0, 5];
x_2 = [1/2, -1/4, 1, 0, 3/4];
y_2_1 = ifft(fft(x_1, 2) .* fft(x_2, 2))
y_2_2 = cconv(x_2, x_1, 2)
y_2_3 = cconv(x_2(1:2), x_1(1:2), 2)
From the documentation:
The modulo-2 circular convolution is equivalent to splitting the linear convolution into two-element arrays and summing the arrays.
So it is not the same to do
res = cconv(x_2, x_1, 2);
as to do
res2 = cconv(x_2, x_1);
res2 = res(1:2);
The former is equivalent to
res = cconv(x_2, x_1);
res = res(1:2) + res(3:4) + res(5:6) + ...;
(padding with zeros if res is odd in size).
On the other hand,
res3 = ifft(fft(x_1, 2) .* fft(x_2, 2));
is equivalent to
res3 = fft(x_1(1:2)) .* fft(x_2(1:2));
res3 = ifft(res3);
and different from either of the two cconv results.
I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.
In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.
The weights that I get from training, when implied directly on input, return different results!
I'll show it on a very simple example
let's say we have an input vector x= 0:0.01:1;
and target vector t=x^2 (I know it better to use non linear network)
after training, 2 layer, linear network, with one neuron at each layer, we get:
sim(net,0.95) = 0.7850 (some error in training - that's ok and should be)
weights from net.IW,net.LW,net.b:
IW =
0.4547
LW =
2.1993
b =
0.3328 -1.0620
if I use the weights: Out = purelin(purelin(0.95*IW+b(1))*LW+b(2)) = 0.6200! , I get different result from the result of the sim!
how can it be? what's wrong?
the code:
%Main_TestWeights
close all
clear all
clc
t1 = 0:0.01:1;
x = t1.^2;
hiddenSizes = 1;
net = feedforwardnet(hiddenSizes);
[Xs,Xi,Ai,Ts,EWs,shift] = preparets(net,con2seq(t1),con2seq(x));
net.layers{1,1}.transferFcn = 'purelin';
[net,tr,Y,E,Pf,Af] = train(net,Xs,Ts,Xi,Ai);
view(net);
IW = cat(2,net.IW{1});
LW = cat(2,net.LW{2,1});
b = cat(2,[net.b{1,1},net.b{2,1}]);
%Result from Sim
t2=0.95;
Yk = sim(net,t2)
%Result from Weights
x1 = IW*t2'+b(1)
x1out = purelin(x1)
x2 = purelin(x1out*(LW)+b(2))
The neural network toolbox rescales inputs and outputs to the [-1,1] range. You must therefore rescale and unscale it so that your simulation output is the same sim()'s output:
%Result from Weights
x1 = 2*t2 - 1; # rescale
x1 = IW*x1+b(1);
x1out = purelin(x1);
x2 = purelin(x1out*(LW)+b(2));
x2 = (x2+1)/2 # unscale
then
>> x2 == Yk
ans =
1