Backpropagation in convolution - neural-network

I am having some trouble understanding how the backpropagation is working in the convolution layers. Indeed, after calculating the error in hidden layers, we can represent it in an error image. But after that, how do I update the kernel?
I give an example in the Figure below : we have an image error (l+1) (with the backpropagation calculated) connected to the parent's output (l) with the associated kernel K.
At the position (x, y),the error will be : err = e1.k'1+e2.k'2+...+e9.k'9 (k' the coefficients of the kernel in (l+1)). So, if we have no moment, no activation function and no learning rate, then the correction of K will be :
K1 = K1 + err * e1
K2 = K2 + err * e2
...
Is this first explanation correct ?
After that, how propage error ? Do we propagate error only at the (x, y) position or at (x+kx, y+ky) position with (kx, ky) the filter K position?
http://i.stack.imgur.com/vBJyZ.png

Backpropagation works in convolutional networks just like how it works in deep neural nets. The only difference is that due to the weight sharing mechanism in the convolution process, the amount of update applied to the weights in the convolution layer is also shared.

Related

Why modifying the weights of a recurrent neural network in MATLAB does not cause the output to change when predicting on same data?

I consider the following recurrent neural network (RNN):
RNN under consideration
where x is the input (a vector of reals), h the hidden state vector and y is the output vector. I trained the network on Matlab using some data x and obtained W, V, and U.
However, in MATLAB after changing matrix W to W', and keeping U,V the same, the output (y) of the RNN that uses W is the same as the output (y') of the RNN that uses W' when both predict on the same data x. Those two outputs should be different just by looking at the above equation, but I don't seem to be able to do that in MATLAB (when I modify V or U, the outputs do change). How could I fix the code so that the outputs (y) and (y') are different as they should be?
The relevant code is shown below:
[x,t] = simplefit_dataset; % x: input data ; t: targets
net = newelm(x,t,5); % Recurrent neural net with 1 hidden layer (5 nodes) and 1 output layer (1 node)
net.layers{1}.transferFcn = 'tansig'; % 'tansig': equivalent to tanh and also is the activation function used for hidden layer
net.biasConnect = [0;0]; % biases set to zero for easier experimenting
net.derivFcn ='defaultderiv'; % defaultderiv: tells Matlab to pick whatever derivative scheme works best for this net
view(net) % displays the network topology
net = train(net,x,t); % trains the network
W = net.LW{1,1}; U = net.IW{1,1}; V = net.LW{2,1}; % network matrices
Y = net(x); % Y: output when predicting on data x using W
net.LW{1,1} = rand(5,5); % This is the modified matrix W, W'
Y_prime = net(x) % Y_prime: output when predicting on data x using W'
max(abs(Y-Y_prime )); % The difference between the two outputs is 0 when it probably shouldn't be.
Edit: minor corrections.
This is the recursion in your first layer: (from the docs)
The weight matrix for the weight going to the ith layer from the jth
layer (or a null matrix [ ]) is located at net.LW{i,j} if
net.layerConnect(i,j) is 1 (or 0).
So net.LW{1,1} are the weights to the first layer from the first layer (i.e. recursion), whereas net.LW{2,1} stores the weights to the second layer from the first layer. Now, what does it mean when one can change the weights of the recursion randomly without any effect (in fact, you can set them to zero net.LW{1,1} = zeros(size(W)); without an effect). Note that this essentially is the same as if you drop the recursion and create as simple feed-forward network:
Hypothesis: The recursion has no effect.
You will note that if you change the weights to the second layer (1 neuron) from the first layer (5 neurons) net.LW{2,1} = zeros(size(V));, it will affect your prediction (the same is of course true if you change the input weights net.IW).
Why does the recursion has no effect?
Well, that beats me. I have no idea where this special glitch is or what the theory is behind the newelm network.

Adaptive Linear regression

Let's say I have a set of samples, which consists of a non-stationary stochastic process with a uniform probability distribution (Gaussian). I need an adaptive linear regression over the set of samples. Basically I want the 'best-fit' line to behave a certain way. I have a separate signal, and I know the 'best-fit' line of the form Y=Mx+B will have a slope M proportional to that other signal. So I need the optimization problem to minimize the distance between the points BUT giving me a slope proportional to the other signal. What's the simplest machine learning/stats approach to use for this problem?
If i understand your question correctly, you can just use normal regression, or a gradient descent type algorithm, but instead of having the degrees of freedom as M and B, you can use a proportionality constant to M of the known data, and a separate B.
ie. the known signal:
Y1 = M1*x + B1
Y2 = k*M1*x + B2
solve for k and B2 such that the mean difference to x and y is minimised.
In theory, this seems to be intrinsic anyway. If you solved the problem for a linear solution in the first place. k would be M2 / M1 ....

Machine Learning: How can I include sensitivity or specificity in error calculation?

I have a working model of neural network used for classification. At the moment, I have applied cross-entropy to calculate an error between Test Outcome (model output) and Condition Outcome (true output). The model is used for binary classification but will be extended to handle multiple classes. So far, the error is calculated using cross-entropy in MATLAB:
err = sum( y.*log(h(x)) + (1-y).*log(1-h(x)) )
I would like the model to perform in such the way that it can classify or detect more False Positives then False Negatives. I know there is so-called confusion matrix where I can specify everything but I don't know how this could correspond to error calculation. Any suggestions are very welcome :)
Cheers!
You can weigh the positive class higher or lower than the negative class by introducing a scalar class weight. Since
y .* log(h(x))
represents the loss on the positive training samples and
(1 - y) .* log(1 - h(x))
is the loss on the negative training samples,
err = -sum(w .* y .* log(h(x)) + (1 - y) .* log(1 - h(x)))
causes the positive training samples to be more important than the negative ones when w>1, and less important when w<1. Make sure you modify the derivatives accordingly.

local inverse of a neural network

I have a neural network with N input nodes and N output nodes, and possibly multiple hidden layers and recurrences in it but let's forget about those first. The goal of the neural network is to learn an N-dimensional variable Y*, given N-dimensional value X. Let's say the output of the neural network is Y, which should be close to Y* after learning. My question is: is it possible to get the inverse of the neural network for the output Y*? That is, how do I get the value X* that would yield Y* when put in the neural network? (or something close to it)
A major part of the problem is that N is very large, typically in the order of 10000 or 100000, but if anyone knows how to solve this for small networks with no recurrences or hidden layers that might already be helpful. Thank you.
If you can choose the neural network such that the number of nodes in each layer is the same, and the weight matrix is non-singular, and the transfer function is invertible (e.g. leaky relu), then the function will be invertible.
This kind of neural network is simply a composition of matrix multiplication, addition of bias and transfer function. To invert, you'll just need to apply the inverse of each operation in the reverse order. I.e. take the output, apply the inverse transfer function, multiply it by the inverse of the last weight matrix, minus the bias, apply the inverse transfer function, multiply it by the inverse of the second to last weight matrix, and so on and so forth.
This is a task that maybe can be solved with autoencoders. You also might be interested in generative models like Restricted Boltzmann Machines (RBMs) that can be stacked to form Deep Belief Networks (DBNs). RBMs build an internal model h of the data v that can be used to reconstruct v. In DBNs, h of the first layer will be v of the second layer and so on.
zenna is right.
If you are using bijective (invertible) activation functions you can invert layer by layer, subtract the bias and take the pseudoinverse (if you have the same number of neurons per every layer this is also the exact inverse, under some mild regularity conditions).
To repeat the conditions: dim(X)==dim(Y)==dim(layer_i), det(Wi) not = 0
An example:
Y = tanh( W2*tanh( W1*X + b1 ) + b2 )
X = W1p*( tanh^-1( W2p*(tanh^-1(Y) - b2) ) -b1 ), where W2p and W1p represent the pseudoinverse matrices of W2 and W1 respectively.
The following paper is a case study in inverting a function learned from Neural Networks. It is a case study from the industry and looks a good beginning for understanding how to go about setting up the problem.
An alternate way of approaching the task of getting the desired x that yields desired y would be start with random x (or input as seed), then through gradient decent (similar algorithm to back propagation, difference being that instead of finding derivatives of weights and biases, you find derivatives of x. Also, mini batching is not needed.) repeatedly adjust x until it yields a y that is close to the desired y. This approach has an advantage that it allows an input of a seed (starting x, if not randomly selected). Also, I have a hypothesis that the final x will have some similarity to initial x(seed), which would imply that this algorithm has the ability to transpose, depending on the context of the neural network application.

Matlab libsvm - how to find the w coefficients

How can find what the vector w is, i.e. the perpendicular to the separation plane?
This is how I did it here. If I remember correctly, this is based on how the dual form of the SVM optimisation works out.
model = svmtrain(...);
w = (model.sv_coef' * full(model.SVs));
And the bias is (and I don't really remember why its negative):
bias = -model.rho;
Then to do the classification (for a linear SVM), for a N-by-M dataset 'features' with N instances and M features,
predictions = sign(features * w' + bias);
If the kernel is not linear, then this won't give you the right answer.
For more information see How could I generate the primal variable w of linear SVM? , from the manual of libsvm.