How to compute derivative of a neural network model at a particular point using MATLAB? - matlab

In using neural networks for solving differential equations, using MATLAB, I could find derivative of the model using the command 'dlgradient'. i.e.
gradientsU = dlgradient(sum(U,'all'),{dlX},'EnableHigherDerivatives',true)
where U is the model found using fullyconnect and dlX is the input.
Now, how we can calculate derivative at a particular point of the model U.
To be specific I wanted to add derivative of the model at a particular point, say U_0=U'(5) to the loss function. So, how can I compute that?
I have followed the documentation given in MATLAB, given by,
"To evaluate Rosenbrock's function and its gradient at the point [–1,2], create a dlarray of the point and then call dlfeval on the function handle #rosenbrock.
x0 = dlarray([-1,2]);
[fval,gradval] = dlfeval(#rosenbrock,x0);
But I have already called using dlfeval to evaluate the model, so, not able to call dlfeval again. And when I'm trying to directly compute using the command dlfeval(#U,U_0), the output is always zero.
It will be really helpful if some insight could be provided. Thanks in advance.

Related

ANN: Approximating non-linear function with neural network

I am learning to build neural networks for regression problems. It works well approximating linear functions. Setup with 1-5–1 units with linear activation functions in hidden and output layers does the trick and results are fast and reliable. However, when I try to feed it simple quadratic data (f(x) = x*x) here is what happens:
With linear activation function, it tries to fit a linear function through dataset
And with TANH function it tries to fit a a TANH curve through the dataset.
This makes me believe that the current setup is inherently unable to learn anything but a linear relation, since it's repeating the shape of activation function on the chart. But this may not be true because I've seen other implementations learn curves just perfectly. So I may be doing something wrong. Please provide your guidance.
About my code
My weights are randomized (-1, 1) inputs are not normalized. Dataset is fed in random order. Changing learning rate or adding layers, does not change the picture much.
I've created a jsfiddle,
the place to play with is this function:
function trainingSample(n) {
return [[n], [n]];
}
It produces a single training sample: an array of an input vector array and a target vector array.
In this example it produces an f(x)=x function. Modify it to be [[n], [n*n]] and you've got a quadratic function.
The play button is at the upper right, and there also are two input boxes to manually input these values. If target (right) box is left empty, you can test the output of the network by feedforward only.
There is also a configuration file for the network in the code, where you can set learning rate and other things. (Search for var Config)
It's occurred to me that in the setup I am describing, it is impossible to learn non–linear functions, because of the choice of features. Nowhere in forward pass we have input dependency of power higher than 1, that's why I am seeing a snapshot of my activation function in the output. Duh.

Non-linear regression using custom neural network in MatLab

I am very new to MatLab. I got a task for modelling non-linear regression using neural network in MatLab.
I need to create a two-layer neural network where:
The first layer is N neurons with sigmoid activation function.
The second layer is layer with one neuron and a linear activation
function.
Here is how I implemented the network:
net = network(N, 2);
net.layers{1}.transferFcn = 'logsig';
net.layers{1}.size = N
net.layers{2}.size = 1;
Is this implementation correct? How should I assign the linear activation function to the second layer?
A quick reading of the Matlab help on the nntransfer function gives you the list of all possible transfer functions you can use. In your case I think you should either try the poslin (positive linear) or the purelin one (pure linear).
When you have such questions, the best way is actually to 'ask' Matlab the possibilities you have.
In this case, I just typed net.layers{2} in the Matlab console window. This displays the list of the parameters of the 2nd layer. Then, you just click on the link TransferFcn and the Matlab help with the possible options for this parameter value automatically opens. This works for any parameter of your neural network ;)
You didn't determine the transfer function for the second layer.
net.layers{2}.transferFcn='pureline'
The rest is OK.

about backpropagation and sigmoid function

I have been reading this ebook about ANN:https://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf
and got a doubt about the effect of the sigmoid function for calculating the errorB. In the text says that if I have threshold neuron I can use:
Target-Output
but because I have a sigmoid function involved I should add:
Output(1-Output)
and end up with:
ErrorB=OutputB(1-OutputB)(TargetB-OutputB)
I mean why I should add the part of O(1-O), I have tried with different values, but I really do not get the intuition why it should be in that way.
Any help?
Thanks
As Kelu stated, that part of the equation is based on derivatives of your transfer function (in this case sigmoid). To understand why you need derivatives, you need to understand how the delta rule works(*):
Your overall goal is to minimize the error in the network's output using gradient descent. Gradient descent itself tries to find a minimum in the error function (E) by taking steps proportional to the negative of the gradient. A gradient is simply the derivative and the reason you're working with derivatives mathematically is that gradients point in the direction of the greatest rate of increase of the (error) function. Conclusion: Since you wanna minimize the error, you go the opposite way of the gradient.
This is the intuitive reason for using gradients. If you want the mathematical derivation, you should check this basic wiki article (additional comment as it's not mentioned anywhere: the g'(x) in the article is the first derivative of g(x))
Other transfer functions can be used, e.g. linear (in this case there is no g'(x) term as the derivative is simply a constant) or hyperbolic tangent in which case the derivative is something different again.
(*) Equation is derived from following equation where you start by minimizing the error of the output:
It is like that because of the fact that Output(1-Output) is a derivative of sigmoid function (simplified). In general, this part is based on derivatives, you can try with different functions (from sigmoid) and then you have to use their derivatives too to get a proper learning rate.
If you want you can take a look at my implementation (it's far from perfect, but maybe you will get some idea from it ;)), it's a simple project I made on my university - https://github.com/kelostrada/neuron-network

How calculating hessian works for Neural Network learning

Can anyone explain to me in a easy and less mathematical way what is a Hessian and how does it work in practice when optimizing the learning process for a neural network ?
To understand the Hessian you first need to understand Jacobian, and to understand a Jacobian you need to understand the derivative
Derivative is the measure of how fast function value changes withe the change of the argument. So if you have the function f(x)=x^2 you can compute its derivative and obtain a knowledge how fast f(x+t) changes with small enough t. This gives you knowledge about basic dynamics of the function
Gradient shows you in multidimensional functions the direction of the biggest value change (which is based on the directional derivatives) so given a function ie. g(x,y)=-x+y^2 you will know, that it is better to minimize the value of x, while strongly maximize the vlaue of y. This is a base of gradient based methods, like steepest descent technique (used in the traditional backpropagation methods).
Jacobian is yet another generalization, as your function might have many values, like g(x,y)=(x+1, xy, x-z), thus you now have 23 partial derivatives, one gradient per each output value (each of 2 values) thus forming together a matrix of 2*3=6 values.
Now, derivative shows you the dynamics of the function itself. But you can go one step further, if you can use this dynamics to find the optimum of the function, maybe you can do even better if you find out the dynamics of this dynamics, and so - compute derivatives of second order? This is exactly what Hessian is, it is a matrix of second order derivatives of your function. It captures the dynamics of the derivatives, so how fast (in what direction) does the change change. It may seem a bit complex at the first sight, but if you think about it for a while it becomes quite clear. You want to go in the direction of the gradient, but you do not know "how far" (what is the correct step size). And so you define new, smaller optimization problem, where you are asking "ok, I have this gradient, how can I tell where to go?" and solve it analogously, using derivatives (and derivatives of the derivatives form the Hessian).
You may also look at this in the geometrical way - gradient based optimization approximates your function with the line. You simply try to find a line which is closest to your function in a current point, and so it defines a direction of change. Now, lines are quite primitive, maybe we could use some more complex shapes like.... parabolas? Second derivative, hessian methods are just trying to fit the parabola (quadratic function, f(x)=ax^2+bx+c) to your current position. And based on this approximation - chose the valid step.

Matlab Simulink Model of nonlinear Model

I'm trying to create a Matlab simulink model of the following equation:
I am very new to simulink and need some help getting started.
Ok, this is very easy to do.
set the equation so the result is the highest derivative. in you case d^3y/dt^3
There you have. nothing more to do.
How to follow from here you may ask:
you got x, and you can derive it, or apply any equation you want to it. The only doubt may come is: where the hell should i get y from?
Easy! you have the equation, integrate the result once and use that value for 4*(dydy/dt^2)^2 , integrate it again and use it for the last item and integrate it again and use it to multiply x. That's the advantage of simulink. You can close a loop using the "result" in the equation to calculate the "result" (this is no 100% true, as you use the value of 1 step before in each integration, but it works).
This is the power of simulink, still I strongly recommend you to read a bit about it, so you can understand why to use simulink, but I think playing with it is necessary to learn so: go!
In general when setting up equations in Simulink you should set up a number of integrator blocks to get all your states. When that is done you can sum the different factors together.
Unfortunately I can not post the model I made for the equaltion because of my low reputation points (new here).
dddy ddy dy y
+ --------> 1/s ------> 1/s -----> 1/s ----->