Why do we "unroll" thetas in neural network back propagation?

Why do we "unroll" thetas in neural network back propagation? - neural-network

In backpropagation implementation, it seems like a norm to unroll (make the thetas as an one-dimensional vectors) thetas and then pass them as a parameter to the cost function.
To illustrate (I asuume 3 layer NN case):
def NNCostFunction(unrolled_thetas, input_layer_size, hidden_layer_size, num_labels, X, y):
# **ROLL AGAIN** unrolled_thetas to theta1, theta2 (3 layer assumption)
# Forward propagate to calculate the cost
# Then Back propagate to calculate the delta
return cost, gradient_theta1, gradient_theta2
What makes me wonder is:
Why do we pass unrolled thetas to the function and then roll it again (form the original shape of thetas) inside of the function? Why don't we just pass original thetas to the cost function?
I think I'm not grasping something important here. There must be a reason why we are doing this. Is it because the optimization implementation in most languages only take theta as a vector? Please shed some light on my understanding!
Thank you.

I figured out. Unrolling was NOT backpropagation specific.
In order to use an of-the-shelf minimizer such as fmincg, the cost function has been set up to unroll the parameters into a single vector params!

I maybe wrong but from what i understand many of functions we use for gradient descent like advance optimization functions such as fminuc does need a vector which is unrolled for better calculation.
Same goes for other matrix calculations, it makes it a lot easier in those things. thats all i know....there may be better reasons for this too.

Related

Double numerical integration in Matlab - Singularity

I have discrete data of a 2D function defined as
theta = linspace(0,pi,nTheta);
phi = linspace(0,2*pi,nPhi);
p=zeros(nPhi,nTheta);%only to show the dimension of my matrix
[np,nt]=ndgrid(phi,theta);
f1 = griddedInterpolant(np,nt,p,'spline');
f2= #(np,nt) f1(np,nt);
integral2(f2,0,2*pi,0,pi)
Note that p is calculated from a complex physical problem, but i showed above how it is initialized.
Also, I can increase nTheta and nPhi, which leads to more accurate calculation of p.
My calculated function (with nPhi=400,nTheta=200) is something like:
I tried 3 ways :
using Trapz function
using the code above but with linear interpolation for gridded interpolant
using the code above with spline interpolation
Although the spline is better than others, i still need to increase nPhi and nTheta, which makes it impossible for me to do the simulation due to its cost.
Is there any suggestion except these 3 methods or any general suggestion how i can do this computation more efficient? (I also took advantage of the symmetry in both directions)
Note that the shape of my function varies in each time step, so a local mesh refinement might be challenging because i don't know the detail of my function in advance.

Nonlinear curve fitting of a matrix function in python

I have the following problem. I have a N x N real matrix called Z(x; t), where x and t might be vectors in general. I have N_s observations (x_k, Z_k), k=1,..., N_s and I'd like to find the vector of parameters t that better approximates the data in the least square sense, which means I want t that minimizes
S(t) = \sum_{k=1}^{N_s} \sum_{i=1}^{N} \sum_{j=1}^N (Z_{k, i j} - Z(x_k; t))^2
This is in general a non-linear fitting of a matrix function. I'm only finding examples in which one has to fit scalar functions which are not immediately generalizable to a matrix function (nor a vector function). I tried using the scipy.optimize.leastsq function, the package symfit and lmfit, but still I don't manage to find a solution. Eventually, I'm ending up writing my own code...any help is appreciated!

You can do curve-fitting with multi-dimensional data. As far as I am aware, none of the low-level algorithms explicitly support multidimensional data, but they do minimize a one-dimensional array in the least-squares sense. And the fitting methods do not really care about the "independent variable(s)" x except in that they help you calculate the array to be minimized - perhaps to calculate a model function to match to y data.
That is to say: if you can write a function that would take the parameter values and calculate the matrix to be minimized, just flatten that 2-d (on n-d) array to one dimension. The fit will not mind.

Optimizing huge amounts of calls of fsolve in Matlab

I'm solving a pair of non-linear equations for each voxel in a dataset of a ~billion voxels using fsolve() in MATLAB 2016b.
I have done all the 'easy' optimizations that I'm aware of. Memory localization is OK, I'm using parfor, the equations are in fairly numerically simple form. All discontinuities of the integral are fed to integral(). I'm using the Levenberg-Marquardt algorithm with good starting values and a suitable starting damping constant, it converges on average with 6 iterations.
I'm now at ~6ms per voxel, which is good, but not good enough. I'd need a order of magnitude reduction to make the technique viable. There's only a few things that I can think of improving before starting to hammer away at accuracy:
The splines in the equation are for quick sampling of complex equations. There are two for each equation, one is inside the 'complicated nonlinear equation'. They represent two equations, one which is has a large amount of terms but is smooth and has no discontinuities and one which approximates a histogram drawn from a spectrum. I'm using griddedInterpolant() as the editor suggested.
Is there a faster way to sample points from pre-calculated distributions?
parfor i=1:numel(I1)
sols = fsolve(#(x) equationPair(x, input1, input2, ...
6 static inputs, fsolve options)
output1(i) = sols(1); output2(i) = sols(2)
end
When calling fsolve, I'm using the 'parametrization' suggested by Mathworks to input the variables. I have a nagging feeling that defining a anonymous function for each voxel is taking a large slice of the time at this point. Is this true, is there a relatively large overhead for defining the anonymous function again and again? Do I have any way to vectorize the call to fsolve?
There are two input variables which keep changing, all of the other input variables stay static. I need to solve one equation pair for each input pair so I can't make it a huge system and solve it at once. Do I have any other options than fsolve for solving pairs of nonlinear equations?
If not, some of the static inputs are the fairly large. Is there a way to keep the inputs as persistent variables using MATLAB's persistent, would that improve performance? I only saw examples of how to load persistent variables, how could I make it so that they would be input only once and future function calls would be spared from the assumedly largish overhead of the large inputs?
EDIT:
The original equations in full form look like:
Where:
and:
Everything else is known, solving for x_1 and x_2. f_KN was approximated by a spline. S_low (E) and S_high(E) are splines, the histograms they are from look like:

So, there's a few things I thought of:
Lookup table
Because the integrals in your function do not depend on any of the parameters other than x, you could make a simple 2D-lookup table from them:
% assuming simple (square) range here, adjust as needed
[x1,x2] = meshgrid( linspace(0, xmax, N) );
LUT_high = zeros(size(x1));
LUT_low = zeros(size(x1));
for ii = 1:N
LUT_high(:,ii) = integral(#(E) Fhi(E, x1(1,ii), x2(:,ii)), ...
0, E_high, ...
'ArrayValued', true);
LUT_low(:,ii) = integral(#(E) Flo(E, x1(1,ii), x2(:,ii)), ...
0, E_low, ...
'ArrayValued', true);
end
where Fhi and Flo are helper functions to compute those integrals, vectorized with scalar x1 and vector x2 in this example. Set N as high as memory will allow.
Those lookup tables you then pass as parameters to equationPair() (which allows parfor to distribute the data). Then just use interp2 in equationPair():
F(1) = I_high - interp2(x1,x2,LUT_high, x(1), x(2));
F(2) = I_low - interp2(x1,x2,LUT_low , x(1), x(2));
So, instead of recomputing the whole integral every time, you evaluate it once for the expected range of x, and reuse the outcomes.
You can specify the interpolation method used, which is linear by default. Specify cubic if you're really concerned about accuracy.
Coarse/Fine
Should the lookup table method not be possible for some reason (memory limitations, in case the possible range of x is too big), here's another thing you could do: split up the whole procedure in 2 parts, which I'll call coarse and fine.
The intent of the coarse method is to improve your initial estimates really quickly, but perhaps not so accurately. The quickest way to approximate that integral by far is via the rectangle method:
do not approximate S with a spline, just use the original tabulated data (so S_high/low = [S_high/low#E0, S_high/low#E1, ..., S_high/low#E_high/low]
At the same values for E as used by the S data (E0, E1, ...), evaluate the exponential at x:
Elo = linspace(0, E_low, numel(S_low)).';
integrand_exp_low = exp(x(1)./Elo.^3 + x(2)*fKN(Elo));
Ehi = linspace(0, E_high, numel(S_high)).';
integrand_exp_high = exp(x(1)./Ehi.^3 + x(2)*fKN(Ehi));
then use the rectangle method:
F(1) = I_low - (S_low * Elo) * (Elo(2) - Elo(1));
F(2) = I_high - (S_high * Ehi) * (Ehi(2) - Ehi(1));
Running fsolve like this for all I_low and I_high will then have improved your initial estimates x0 probably to a point close to "actual" convergence.
Alternatively, instead of the rectangle method, you use trapz (trapezoidal method). A tad slower, but possibly a bit more accurate.
Note that if (Elo(2) - Elo(1)) == (Ehi(2) - Ehi(1)) (step sizes are equal), you can further reduce the number of computations. In that case, the first N_low elements of the two integrands are identical, so the values of the exponentials will only differ in the N_low + 1 : N_high elements. So then just compute integrand_exp_high, and set integrand_exp_low equal to the first N_low elements of integrand_exp_high.
The fine method then uses your original implementation (with the actual integral()s), but then starting at the updated initial estimates from the coarse step.
The whole objective here is to try and bring the total number of iterations needed down from about 6 to less than 2. Perhaps you'll even find that the trapz method already provides enough accuracy, rendering the whole fine step unnecessary.
Vectorization
The rectangle method in the coarse step outlined above is easy to vectorize:
% (uses R2016b implicit expansion rules)
Elo = linspace(0, E_low, numel(S_low));
integrand_exp_low = exp(x(:,1)./Elo.^3 + x(:,2).*fKN(Elo));
Ehi = linspace(0, E_high, numel(S_high));
integrand_exp_high = exp(x(:,1)./Ehi.^3 + x(:,2).*fKN(Ehi));
F = [I_high_vector - (S_high * integrand_exp_high) * (Ehi(2) - Ehi(1))
I_low_vector - (S_low * integrand_exp_low ) * (Elo(2) - Elo(1))];
trapz also works on matrices; it will integrate over each column in the matrix.
You'd call equationPair() then using x0 = [x01; x02; ...; x0N], and fsolve will then converge to [x1; x2; ...; xN], where N is the number of voxels, and each x0 is 1×2 ([x(1) x(2)]), so x0 is N×2.
parfor should be able to slice all of this fairly easily over all the workers in your pool.
Similarly, vectorization of the fine method should also be possible; just use the 'ArrayValued' option to integral() as shown above:
F = [I_high_vector - integral(#(E) S_high(E) .* exp(x(:,1)./E.^3 + x(:,2).*fKN(E)),...
0, E_high,...
'ArrayValued', true);
I_low_vector - integral(#(E) S_low(E) .* exp(x(:,1)./E.^3 + x(:,2).*fKN(E)),...
0, E_low,...
'ArrayValued', true);
];
Jacobian
Taking derivatives of your function is quite easy. Here is the derivative w.r.t. x_1, and here w.r.t. x_2. Your Jacobian will then have to be a 2×2 matrix
J = [dF(1)/dx(1) dF(1)/dx(2)
dF(2)/dx(1) dF(2)/dx(2)];
Don't forget the leading minus sign (F = I_hi/lo - g(x) → dF/dx = -dg/dx)
Using one or both of the methods outlined above, you can implement a function to compute the Jacobian matrix and pass this on to fsolve via the 'SpecifyObjectiveGradient' option (via optimoptions). The 'CheckGradients' option will come in handy there.
Because fsolve usually spends the vast majority of its time computing the Jacobian via finite differences, manually computing a value for it manually will normally speed the algorithm up tremendously.
It will be faster, because
fsolve doesn't have to do extra function evaluations to do the finite differences
the convergence rate will increase due to the improved precision of the Jacobian
Especially if you use the rectangle method or trapz like above, you can reuse many of the computations you've already done for the function values themselves, meaning, even more speed-up.

Rody's answer was the correct one. Supplying the Jacobian was the single largest factor. Especially with the vectorized version, there were 3 orders of magnitude of difference in speed with the Jacobian supplied and not.
I had trouble finding information about this subject online so I'll spell it out here for future reference: It is possible to vectorize independant parallel equations with fsolve() with great gains.
I also did some work with inlining fsolve(). After supplying the Jacobian and being smarter about the equations, the serial version of my code was mostly overhead at ~1*10^-3 s per voxel. At that point most of the time inside the function was spent passing around a options -struct and creating error-messages which are never sent + lots of unused stuff assumedly for the other optimization functions inside the optimisation function (levenberg-marquardt for me). I succesfully butchered the function fsolve and some of the functions it calls, dropping the time to ~1*10^-4s per voxel on my machine. So if you are stuck with a serial implementation e.g. because of having to rely on the previous results it's quite possible to inline fsolve() with good results.
The vectorized version provided the best results in my case, with ~5*10^-5 s per voxel.

How to extrapolate matrix valued functions on Matlab?

I have a matrix valued function which I'm trying to find its limit as x goes to 1.
So, in this example, I have three matrices v1-3, representing respectively the sampled values at [0.85, 0.9, 0.99]. What I do now, which is quite inefficient, is the following:
for i=1:101
for j = 1:160
v_splined = spline([0.85,0.9,0.99], [v1(i,j), v2(i,j), v3(i,j)], [1]);
end
end
There must be a better more efficient way to do this. Especially when soon enough I'll face the situation where v's will be 4-5 dimensional vectors.
Thanks!

Disclaimer: Naively extrapolating is risky business, do so at your own risk
Here's what I would say
Using a spline to extrapolate is risky business and not generally recommended. Do you know anything about the behavior of your function near x=1?
In the case where you only have 3 points you're probably better off using a 2nd order polynomial (a parabola) rather than fitting a spline through the three points. (unless you have a good reason not to do this.)
If you want to use a parabola (or higher order interpolating polynomial when you have more points), you can vectorize your code and use Lagrange or Newton polynomials to perform the extrapolation which will probably give you a nice speed up.
Using interpolating polynomials will also generalize easily to higher order polynomials with more points given. However, this will make extrapolation even more risky since high-order interpolating polynomials tend to oscillate severely near the ends of the domain.
If you want to use Lagrange polynomials to form a parabola, your result is given by:
v_splined = v1*(1-.9)*(1-.99)/( (.85-.9)*(.85-.99) ) ...
+v2*(1-.85)*(1-.99)/( (.9-.85)*(.9-.99) ) ...
+v3*(1-.85)*(1-.9)/( (.99-.85)*(.99-.9) );
I left this un-simplified so you can see how it comes from the Lagrange polynomials, but obviously simplifying is easy. Also note that this eliminates the need for loops.

Matlab recursive curve fitting with custom equations

I have a curve IxV. I also have an equation that I want to fit in this IxV curve, so I can adjust its constants. It is given by:
I = I01(exp((V-R*I)/(n1*vth))-1)+I02(exp((V-R*I)/(n2*vth))-1)
vth and R are constants already known, so I only want to achieve I01, I02, n1, n2. The problem is: as you can see, I is dependent on itself. I was trying to use the curve fitting toolbox, but it doesn't seem to work on recursive equations.
Is there a way to make the curve fitting toolbox work on this? And if there isn't, what can I do?

Assuming that I01 and I02 are variables and not functions, then you should set the problem up like this:
a0 = [I01 I02 n1 n2];
MinFun = #(a) abs(a(1)*(exp(V-R*I)/(a(3)*vth))-1) + a(2)*(exp((V-R*I)/a(4)*vth))-1) - I);
aout = fminsearch(a0,MinFun);
By subtracting I and taking the absolute value, the point where both sides are equal will be the point where MinFun is zero (minimized).

No, the CFTB cannot fit such recursively defined functions. And errors in I, since the true value of I is unknown for any point, will create a kind of errors in variables problem. All you have are the "measured" values for I.
The problem of errors in I MAY be serious, since any errors in I, or lack of fit, noise, model problems, etc., will be used in the expression itself. Then you exponentiate these inaccurate values, potentially casing a mess.
You may be able to use an iterative approach. Thus something like
% 0. Initialize I_pred
I_pred = I;
% 1. Estimate the values of your coefficients, for this model:
% (The curve fitting toolbox CAN solve this problem, given I_pred)
I = I01(exp((V-R*I_pred)/(n1*vth))-1)+I02(exp((V-R*I_pred)/(n2*vth))-1)
% 2. Generate new predictions for I_pred
I_pred = I01(exp((V-R*I_pred)/(n1*vth))-1)+I02(exp((V-R*I_pred)/(n2*vth))-1)
% Repeat steps 1 and 2 until the parameters from the CFTB stabilize.
The above pseudo-code will work only if your starting values are good, and there are not large errors/noise in the model/data. Even on a good day, the above approach may not converge well. But I see little hope otherwise.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why do we "unroll" thetas in neural network back propagation? - neural-network

I figured out. Unrolling was NOT backpropagation specific. In order to use an of-the-shelf minimizer such as fmincg, the cost function has been set up to unroll the parameters into a single vector params!

Related

Double numerical integration in Matlab - Singularity

Nonlinear curve fitting of a matrix function in python

Optimizing huge amounts of calls of fsolve in Matlab

How to extrapolate matrix valued functions on Matlab?

Matlab recursive curve fitting with custom equations

Categories

Resources