How can you constrain a neural network layer to simply be a n-dimensional rotation layer? - neural-network

I'm looking to constrain one layer of my neural network to specifically find the best rotation of its input in order to satisfy an objective. (My end goal, where R is the rotation layer, is of the form R.transpose() # f(R # z)).
I am looking to train this (+ other components) via gradient descent. If z is just two dimensional, then I can just say
R = [ cos(theta) -sin(theta)
sin(theta) cos(theta)]
and have theta be a learnable parameter. However, I am lost on how to actually set this up for a d-dimensional space (where d>10). I've tried looking at resources on how to make a d-dimensional rotation matrix and it gets heavy into Linear Algebra and is way over my head. It feels like this should be easier than it seems, so I feel like I'm overlooking something (like maybe R should just be a usual linear layer without any non-linear activations).
Anyone have any ideas? I appreciate you, in advance : )

QR decomposition can help with this (since Q is orthogonal) via having W be an unconstrained learnable matrix (without a bias term) and solve W = QR, and then actually use Q as your orthonormal. If you use the pytorch QR then backprop will be able to get back from the QR decomposition and update W.

Related

Adaptive Linear regression

Let's say I have a set of samples, which consists of a non-stationary stochastic process with a uniform probability distribution (Gaussian). I need an adaptive linear regression over the set of samples. Basically I want the 'best-fit' line to behave a certain way. I have a separate signal, and I know the 'best-fit' line of the form Y=Mx+B will have a slope M proportional to that other signal. So I need the optimization problem to minimize the distance between the points BUT giving me a slope proportional to the other signal. What's the simplest machine learning/stats approach to use for this problem?
If i understand your question correctly, you can just use normal regression, or a gradient descent type algorithm, but instead of having the degrees of freedom as M and B, you can use a proportionality constant to M of the known data, and a separate B.
ie. the known signal:
Y1 = M1*x + B1
Y2 = k*M1*x + B2
solve for k and B2 such that the mean difference to x and y is minimised.
In theory, this seems to be intrinsic anyway. If you solved the problem for a linear solution in the first place. k would be M2 / M1 ....

Linear program with double constraints on the same variable

I have a linear program of the form min(f*x) s.t. A1*x < d1; A2*x < d2. The form with one constraint is implemented in Matlab in command linprog. What command can I use to solve linear program with two constrraints?
I could of course create a block diagonal matrix, and double the size of the variable x, but if there is more efficient way I would like to use it, because the size of the matrix is quite large.
Possibly I don't understand the question right but can't you combine the matrixes A1 und A2 by A = [A1; A2]?
You maybe interested in Dantzig-Wolfe Decomposition algorithm for solving linear programming. It takes advantage of this block diagonal structure. However, I don't think there is an out-of-the box implementation of it in commercial softwares.

How to customize SVM non-linear descision boudary in MATLAB?

I want to train a SVM with non-linear boundary. The boundary is known, expressed with formula
y = sgn( (w11*x1+ w12*x2 + w13*x3)* (w21*x4+ w22*x5 + w23*x6) ), where [x1 x2 ... x6] are 1-bit inputs, [w11 w12 w13 w21 w22 w23] are unknown parameters.
How can I learn [w11 w12 w13 w21 w22 w23] with train data?
SVM is not an algorithm for such task. SVM has its own criterion to maximize, which has nothing to do with the decision boundary shape (ok, not nothing, but it is hard to convert one to another). Obviously, one can try to predefine custom kernel function to do so, but this task seems as almost unsolvable problem (I can't think of any reproducing hilbert space with such decision boundaries).
In short: your question is a bit like "how to make a watermelon remove nails from the wall?". Obviously - you can do some pretty hard "magic" to do so, but this is not what watermelons are for.

fft matrix-vector multiplication

I have to solve in MATLAB a linear system of equations A*x=B where A is symmetric and its elements depend on the difference of the indices: Aij=f(i-j).
I use iterative solvers because the size of A is say 40000x40000. The iterative solvers require to determine the product A*x where x is the test solution. The evaluation of this product turns out to be a convolution and therefore can be done dy means of fast fourier transforms (cputime ~ Nlog(N) instead of N^2). I have the following questions to this problem:
is this convolution circular? Because if it is circular I think that I have to use a specific indexing for the new matrices to take the fft. Is that right?
I find difficult to program the routine for the fft because I cannot understand the indexing I should use. Is there any ready routine which I can use to evaluate by fft directly the product A*x and not the convolution? Actually, the matrix A is constructed of 3x3 blocks and is symmetric. A ready routine for the product A*x would be the best solution for me.
In case that there is no ready routine, could you give me an idea by example how I could construct this routine to evaluate a matrix-vector product by fft?
Thank you in advance,
Panos
Very good and interesting question! :)
For certain special matrix structures, the Ax = b problem can be solved very quickly.
Circulant matrices.
Matrices corresponding to cyclic convolution Ax = h*x (* - is convolution symbol) are diagonalized in
the Fourier domain, and can be solved by:
x = ifft(fft(b)./fft(h));
Triangular and banded.
Triangular matrices and diagonally-dominant banded matrices are solved
efficiently by sparse LU factorization:
[L,U] = lu(sparse(A)); x = U\(L\b);
Poisson problem.
If A is a finite difference approximation of the Laplacian, the problem is efficiently solved by multigrid methods (e.g., web search for "matlab multigrid").
Interesting question!
The convolution is not circular in your case, unless you impose additional conditions. For example, A(1,3) should equal A(2,1), etc.
You could do it with conv (retaining only the non-zero-padded part with option valid), which probably is also N*log(N). For example, let
A = [a b c d
e a b c
f e a b
g f e a];
Then A*x is the same as
conv(fliplr([g f e a b c d]),x,'valid').'
Or more generally, A*x is the same as
conv(fliplr([A(end,1:end-1) A(1,:)]),x,'valid').'
I'd like to add some comments on Pio_Koon's answer.
First of all, I wouldn't advise to follow the suggestion for triangular and banded matrices. The time taken by a call to Matlab's lu() procedure on a large sparse matrix massively overshadows any benefits gained by solving the linear system as x=U\(L\b).
Second, in the Poisson problem you end up with a circulant matrix, therefore you can solve it using the FFT as described. In this specific case, your convolution mask h is a Laplacian, i.e., h=[0 -0.25 0; -0.25 1 -0.25; 0 -0.25 0].

local inverse of a neural network

I have a neural network with N input nodes and N output nodes, and possibly multiple hidden layers and recurrences in it but let's forget about those first. The goal of the neural network is to learn an N-dimensional variable Y*, given N-dimensional value X. Let's say the output of the neural network is Y, which should be close to Y* after learning. My question is: is it possible to get the inverse of the neural network for the output Y*? That is, how do I get the value X* that would yield Y* when put in the neural network? (or something close to it)
A major part of the problem is that N is very large, typically in the order of 10000 or 100000, but if anyone knows how to solve this for small networks with no recurrences or hidden layers that might already be helpful. Thank you.
If you can choose the neural network such that the number of nodes in each layer is the same, and the weight matrix is non-singular, and the transfer function is invertible (e.g. leaky relu), then the function will be invertible.
This kind of neural network is simply a composition of matrix multiplication, addition of bias and transfer function. To invert, you'll just need to apply the inverse of each operation in the reverse order. I.e. take the output, apply the inverse transfer function, multiply it by the inverse of the last weight matrix, minus the bias, apply the inverse transfer function, multiply it by the inverse of the second to last weight matrix, and so on and so forth.
This is a task that maybe can be solved with autoencoders. You also might be interested in generative models like Restricted Boltzmann Machines (RBMs) that can be stacked to form Deep Belief Networks (DBNs). RBMs build an internal model h of the data v that can be used to reconstruct v. In DBNs, h of the first layer will be v of the second layer and so on.
zenna is right.
If you are using bijective (invertible) activation functions you can invert layer by layer, subtract the bias and take the pseudoinverse (if you have the same number of neurons per every layer this is also the exact inverse, under some mild regularity conditions).
To repeat the conditions: dim(X)==dim(Y)==dim(layer_i), det(Wi) not = 0
An example:
Y = tanh( W2*tanh( W1*X + b1 ) + b2 )
X = W1p*( tanh^-1( W2p*(tanh^-1(Y) - b2) ) -b1 ), where W2p and W1p represent the pseudoinverse matrices of W2 and W1 respectively.
The following paper is a case study in inverting a function learned from Neural Networks. It is a case study from the industry and looks a good beginning for understanding how to go about setting up the problem.
An alternate way of approaching the task of getting the desired x that yields desired y would be start with random x (or input as seed), then through gradient decent (similar algorithm to back propagation, difference being that instead of finding derivatives of weights and biases, you find derivatives of x. Also, mini batching is not needed.) repeatedly adjust x until it yields a y that is close to the desired y. This approach has an advantage that it allows an input of a seed (starting x, if not randomly selected). Also, I have a hypothesis that the final x will have some similarity to initial x(seed), which would imply that this algorithm has the ability to transpose, depending on the context of the neural network application.