Is the order of RKF45 in python 4th or 5th order? - scipy

Howdy just had a numerical analysis question about RKF45. The documentation is a little mysterious in terms if is 4th or 5th order, and even on the wikipedia page "is a method of order O(h^4) with an error estimator of order O(h^5)"
https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.RK45.html
https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta%E2%80%93Fehlberg_method
Is the idea you first do the 4th order method, and then the interpolate of the numerical solution is 5th order to the true solution? So overall the output of this numerical method is 5th order?
Thanks a lot!

classical embedded methods like Fehlberg 4(5) aka RKF45
The method can proceed with the state updates of its 4th or 5th order method, designed for the 4th, nowadays used with 5th. The step size is variable, the optimal step size is always determined for the 4th order method, with the difference to the 5th order method serving as estimate of the local error, which in most cases is very accurate.
The relation between error tolerance and number of ODE function evaluations will mostly correspond to the 4th order, for example dividing the error tolerance by 16 should result in a step size sequence that is locally about half of the original sequence, doubling the number of ODE function evaluations.
If an exact solution is computable, the error of the numerical integration should be proportional to the tolerance if the 4th order steps are taken, and proportional to the power 5/4 of the tolerance if the 5th order steps are taken.
Extrapolation methods like DoPri (4)5 aka ode45 or RK45
The method proceeds with the values of the 5th order method. It uses a variable step size that is controlled via the difference between the 4th order method and the 5th order method, augmented to behave loosely like the actual local (unit step) error of the 5th order method. Despite being more a guidance to the local error size, it is good enough for the actual error to remain in the region of or below the error tolerance and for the step size to not leave the stability region of the 5th order method. The method was explicitly designed to show this behavior.
For test equations that are smooth, one can thus expect that in a log-log diagram of number of evaluations of the ODE function vs. given error tolerance or actual error against an exact solution you get a curve that is mostly linear with slope about 5. (One would of course compute it the other way, get number of function evaluations and actual error for some spread of error tolerances.)
See https://personal.math.ubc.ca/%7Efeldman/math/vble.pdf for some experiments with low-order embedded methods, starting with using the Euler method as embedded method in the explicit midpoint and Heun method.

Related

Activation function to get day of week

I'm writing a program to predict when will something happens. I don't know which activation function to get output in day of week (1-7).
I tried sigmoid function but i need to input the predicted day and it output probability of it, I don't want it to be this way.
I expect the activation function returning 0 to infinite, is ReLU the best activation function for this task?
EDIT:
also, what if i wanted output more than 7 days, for example, x will hapen in 9th day from today, or 15th day from today, etc? I'm looking for dynamic ways to do this
What you are trying to do is solving a classification problem with a regression approach. That's at least unconventional.
You can use any activation function you want and define your output as you want. E.g. linear, relu with output range from 1 to 7 or something between -1(or 0) and 1 like tanh or sigmoid and map the output (-1 -> 1; -0.3 -> 2; ...).
The problem for you will be that you get a floatingpoint number as a result. So your model not only has to learn how to classify correctly but also how to predict the (allmost) exact number you want in your output neuron. That makes the problem more complicated than it has to be. With a model like that it also will be likley that for some outlier datapoints you might get unexpected return values like 0, -1 or 8. What do you do then?
To sum it up: Listen to #venkata krishnan, use softmax and seven output neurons and map this result to a number between 1 and 7 outside the neural network if you have to.
EDIT
What comes to my mind after reading the comments again would be a mix of what you want and what you should do.
You could try to make the second last layer a 7 neuron softmax layer and map those output to a single neuron in the last layer.
Niether did i ever try that nor have i ever read about something like that so i can't tell you if thats a good idea, likely not, but you might consider it worth a try.
I want to add onto the point of #venkata krishnan, which raises a valid point in your problem setting. You will find an answer to your original question further down, but I strongly suggeste you read the following comment first.
Generally, you want to discern between categorical, ordinal and interval variables. I have given a relatively lengthy explanation in a different answer on Stackoverflow, it might be helpful to understand this concept in more detail.
In your scenario, you mostly want to have an understanding of "how wrong" you are. Of course, it is perfectly reasonable to assume what you are doing and interpret it as a interval variable, and therefore have an assumed ordering (and a distance) between different values.
What is problematic, though, is the fact that you are assuming a continuous space on a discrete variable. E.g., it does not make any sense to interpret the output of 4.3, since you can only tell between 4 (Friday, assuming you start numbering your days at 0), or 5 (Saturday). Any value in between would have to be rounded, which is perfectly fine - until you want to perform backpropagation on this loss.
It is problematic, because you are essentially introducing a non-convex and non-continous function, no matter how you "round" your values. Again, to exemplify this, you could assume to round to the nearest number; then, at the value of 4.5, you would see a sudden increase in the loss, which is non-differentialbe, and will therefore put a hard time on your optimizer, potentially limiting convergence of your system.
If, instead, you utilize several output neurons, as suggested by #venkata krishnan, you might lose the information of distance (how many days you are off) on paper, but you can of course still interpret your loss in any way you like. This would certainly be the better option for a discrete-valued variable.
To answer your original question: I personally would make sure that your loss function is bounded both in the upper and lower level, as you could otherwise have undefined/inconsistent loss values, that might lead to subpar optimization. One way to do this is to re-scale a Sigmoid function (the co-domain of sigmoid(R) is [0,1]. Eventually, you can then just multiply your output by 6, to get a value range that is [0,6], and could (after rounding) cover all the values you want.
As far I know, there is no such thing like an activation function which will yield 0 to infinite. You can apply 7 output nodes with a "Softmax" activation function which will return the probability. There is another solution which may work. You can you 3 output nodes with "Binary" activation function which will return either 0 or 1. That means you can have 8 different outputs with only 3 nodes which are 000, 001, 010, 011, 100, 101, 110 and 111. You can use 7 of them. 

Normalising parameter scale with fminsearch

I remember reading once in the Matlab documentation about an optimisation algorithm which allowed the user to specify the "scale" of variation expected for each parameter during the search (at least initially).
I can't remember what this function is, but now I am using fminsearch and there is no such option. In fact, I can't even specify parameter bounds, and the documentation states that it takes 5% of the initial guess as a default step (or 25e-5 if 0). Because this seems to be a relative choice to the initial guess, it makes me think that perhaps I should re-normalise my parameters to a suitable scale, in order to indirectly define a suitable step for my optimisation problem.
For example, if I have a parameter which value is on the order of 10e5 but that I would like steps on the order of 100, then I should "divide it" by 500 during optimisation (obviously I would then multiply it when computing the objective function). However this becomes trickier if a parameter range is centred around 0 for example; then I can rescale it and offset it.
My question is; is it effectively what people usually do when using the downhill-simplex method, and is there a "standard" or "better" way to do it?

Time step computation in Matlab ODE solver

I tried to find out how MATLAB computes the step size (not the initial one) to solve ODEs with, for example, the ode45 solver. The source code is really complex, so does anyone know hot it works?
You should be aware that the step size is dynamically adapted, there no "The" step size.
To get a general simplified idea: The total error E is composed of the atomic errors of every time step. In first order it is summation, more exactly there is some kind of cumulative magnification of the atomic errors involved.
A sensible approach would be that every time step of length h should have an atomic error of about E·h/T, where T is the length of the integration interval. The order 4 method has an local error of C·h^5 where C is in zeroth order a polynomial in the first 4 derivatives of the ODE function. Since the method computes an order 4 and an order 5 step, call them y4 and y5, one can take y5 as the more precise one so that approximately C·h^5 = |y4-y5|. This allows to compute C and to adapt the step size a·h to get the desired atomic error, since one can solve C·(a·h)^5=E/T·(a·h) to get
a = pow( E/T·h/norm(y4-y5), 1/4)
This does not need to be terribly exact, so that one can just use the adapted step size for the next step if the atomic error is not largely out of range.
Another approach is to compare if the local error |y4-y5|/h falls inside a bracket around the desired local error E/T and increase/decrease the step size by a constant factor, with a repetition of the step if the step size needed to be reduced.
There is more to the advanced/actual implementations, taking into account relative and absolute error goals, detecting stiffness, i.e., where the local error formula breaks down, …

Simulink: PID Controller - difference between back-calculation and clamping for anti-windup?

I need to implement an anti-windup (output limitation) for my PID controller. Simulink is offering two options: back calculation and clamping (documentation) which seem to deliver equal results. I know what back calculation is doing mathematically. It requires to define the back-calculation gain Kb. This gain is dependent on how long my controller is saturated, therefore it is actually a dynamic value (because I may have a high variation of saturation times). Do you see a way to control this value? (in this case it probably would be necessary to build my own PID Controller as shown in the documentation above or in the picture below.
Which brings me to the question, what is clamping actually doing? And what are other differences? Which one is faster, which one is more robust against stiff slopes? Does anybody has experiences using both?
Not sure if this fully answers the question, but the PID Controller documentation page, explains a bit more about clamping:
clamping
Stops integration when the sum of the block components
exceeds the output limits and the integrator output and block input
have the same sign. Resumes integration when the sum of the block
components exceeds the output limits and the integrator output and
block input have opposite sign. The integrator portion of the block
is:
The clamping circuit implements the logic necessary to determine whether integration continues.
If you select the clamping option and look under the mask, you can probably see the details of the clamping circuit.
Additionally to am304's answer there are some more things to consider.
Clamping
Clamping will always work. It detects when there is integrator overflow and sets the integral path of the PID-controller to zero to avoid windup by using a simple switch.
Clamping is a commmonly used anti windup method, especially in case of digital control systems. In serious applications however, there is also forward clamping involved - evaluating the controller input as well. This mechanism must me implemented manually.
Back Calculation
Back Calculation highly depends on the back calculation coefficient Kb. If you don't know how to actually calculate the parameter Kb don't use back-calculation. This method calculates the difference between the actual controller output and the saturated output and subtracts it from the I-Gain path, amplified by Kb.
In most of cases the default value Kb = 1 will lead to worse results than clamping, it is even possible that it has no effect at all. Kb should be calculated based on the sampling time or
in case a D-Gain is involded, based on D- and I-Gain. Appropriate literatur should be consulted to calculate the coefficient. Back calculation with a properly set coeffient enables better dynamics than clamping!

MATLAB: Slow convergence of convex optimization algorithm

I want to speed up the convergence of a convex optimization problem in MATLAB.
My objective function is convex having three parameters and I am using gradient ascent for the maximization.
Right now I am manually writing the iteration with the termination condition being the difference between the new parameter value and old parameter value is very small (around 0.0000001). I cannot terminate based upon the number of iterations because it doesn't guarantee that it has converged to the optimum solution.
So, it takes a lot of time to converge - almost 2 days! Is there any way to speed this up?
Actually my objective function has only three parameters. I know that my first parameter's value should be greater than that of the second.
So starting with the initial condition, the second parameter's value starts increasing rapidly. After it has reached a certain point, the first parameter's value starts increasing rapidly. While the first parameter's value starts increasing, the second parameter's value starts decreasing slowly. Eventually, I have the first parameter's value greater than that of second.
Is there any way to speed up the process? 2 days is a very long time. Furthermore, calculating the gradient is also time consuming. It needs a lot of matrix computations.
I don't want to start with the defined parameter values like parameter1's value greater than that of second. Also it's not necessary that the first parameter always has to be greater than the the second. I just know which parameter value should be greater. Any suggestions?
If the calculation of gradients is very slow and you still want to do a manual implementation you could try this, it will take more steps but could be a lot quicker as the steps are so simple:
Define a stepsize
Try all the points where your variable moves -1, 0 or 1 times in the direction of the stepsize (3^3 = 27 possibilities)
Pick the best one
If the best one is your previous one, multiply the stepsize with a factor 0.5
Of course the success of this process depends on the properties of your function. Furthermore it should be noted that a much simpler solution could be to set the desired difference to something like 0.0001