How to smooth rectangular signal with high order rate-limiter in Simulink? - matlab

Imagine I have a rectangular reference value for the position/displacement x and I need to smooth it.
The math for translatoric movements is quite simple:
speed: v = x'
acceleration: a = v' = x''
jerk. j = a' = v'' = x'''
I need to limit all these values. So I thought about using rate limiters in Simulink:
This approach works perfect for ramp signals, as you can see in the following output:
BUT, my reference signals for x are no ramps, they are rectangles/steps. Hence the rate limiters are not working, because the derivatives they get to limit are already infinite and Simulink throws an error. How can I resolve this problem? Is there actually a more elegant way to implement the high order rate-limiters? I guess this approach could be unstable in some cases.
continue reading: related question

Even though it seems absurd, the following approach is working: integration and instant derivation does the trick:
leading to:
More elegant, faster and simpler solutions for the whole smoothing problem are highly appreciated!

It's generally not a good idea to differentiate signals in Simulink because of numerical issues, I would advise to start with the higher order derivatives (e.g. acceleration) and integrate, much more robust numerically. This is what the doc about the derivative block says:
The Derivative block output might be very sensitive to the dynamics of
the entire model. The accuracy of the output signal depends on the
size of the time steps taken in the simulation. Smaller steps allow a
smoother and more accurate output curve from this block. However,
unlike with blocks that have continuous states, the solver does not
take smaller steps when the input to this block changes rapidly.
Depending on the dynamics of the driving signal and model, the output
signal of this block might contain unexpected fluctuations. These
fluctuations are primarily due to the driving signal output and solver
step size.
Because of these sensitivities, structure your models to use
integrators (such as Integrator blocks) instead of Derivative blocks.
Integrator blocks have states that allow solvers to adjust step size
and improve accuracy of the simulation. See Circuit Model for an
example of choosing the best-form mathematical model to avoid using
Derivative blocks in your models.
See also Best-Form Mathematical Models for more details.

I was trying to do something similar. I was looking for a "Smooth Ramp". Here is what I found:
A simpler approach is to combine ramp with a second order lag. Then the signal approachs s-shape. And your derivatives will exist and be smooth as well. Only thing to remember is that the 2nd or lag must be critically damped.
Y(s) = H(s)*X(s) where H(s) = K*wo^2/(s^2 + 2*zeta*wo*s + wo^2). Here you define zeta = 1.0. Then the s-shape is retained for any K and wo value. Note that X(s) has already been hit by a ramp. In matlab or any other tools, linear ramp and 2nd lag are standard blocks.
Good luck!

I think the 'Transfer Fcn' block is what you're looking for.
If you leave the equation in the default form 1/(s+1) you have a low-pass filter which can be tuned to what you need by changing the numerator and denominator coefficients.

Related

Discrete and Dynamic system Matlab

I'm using recusive least squares (RLS) to identify system parameters for a dynamical system. The RLS algorithm is implemented in discrete time, while the real system is continuous. In practice this is easily done, but how can I simulate these two together? A sequential solution doesn't help, since I want to use the RLS estimate to influence the system input.
The built-in event-triggering can only stop integration, if I got that right. Thus, I'd have to stop at each sampling point of the RLS algorithm and then solve the ode between samples. -> How is this implemented in Simulink?
The only real solution I found was to implement my own RK45 with adaptive step size. It is designed to take discrete and continuous systems (ode and difference equations) and solves with adaptive step size until a new sample has to be taken. This method works like a charm - with slow dynamics only the discrete points are sampled for sufficiently small sampling times and fast dynamics yield small integration step sizes, as expected!
Also the implementation was way less effort than expected and compares surprisingly well to matlabs ode45, ie. lower computational cost, higher accuracy, less oscillations after discrete jumps in the ode!

Why do we take the derivative of the transfer function in calculating back propagation algorithm?

What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function. What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.
You must already know that the cost function is a function with the weights as the variables.
For now consider it as f(W).
Our main motive here is to find a W for which we get the minimum value for f(W).
One of the ways for doing this is to plot function f in one axis and W in another....... but remember that here W is not just a single variable but a collection of variables.
So what can be the other way?
It can be as simple as changing values of W and see if we get a lower value or not than the previous value of W.
But taking random values for all the variables in W can be a tedious task.
So what we do is, we first take random values for W and see the output of f(W) and the slope at all the values of each variable(we get this by partially differentiating the function with the i'th variable and putting the value of the i'th variable).
now once we know the slope at that point in space we move a little further towards the lower side in the slope (this little factor is termed alpha in gradient descent) and this goes on until the slope gives a opposite value stating we already reached the lowest point in the graph(graph with n dimensions, function vs W, W being a collection of n variables).
The reason is that we are trying to minimize the loss. Specifically, we do this by a gradient descent method. It basically means that from our current point in the parameter space (determined by the complete set of current weights), we want to go in a direction which will decrease the loss function. Visualize standing on a hillside and walking down the direction where the slope is steepest.
Mathematically, the direction that gives you the steepest descent from your current point in parameter space is the negative gradient. And the gradient is nothing but the vector made up of all the derivatives of the loss function with respect to each single parameter.
Backpropagation is an application of the Chain Rule to neural networks. If the forward pass involves applying a transfer function, the gradient of the loss function with respect to the weights will include the derivative of the transfer function, since the derivative of f(g(x)) is f’(g(x))g’(x).
Your question is a really good one! Why should I move the weight more in one direction when the slope of the error wrt. the weight is high? Does that really make sense? In fact it does makes sense if the error function wrt. the weight is a parabola. However it is a wild guess to assume it is a parabola. As rcpinto says, assuming the error function is a parabola, make the derivation of the a updates simple with the Chain Rule.
However, there are some other parameter update rules that actually addresses this, non-intuitive assumption. You can make update rule that takes the weight a fixed size step in the down-slope direction, and then maybe later in the training decrease the step size logarithmic as you train. (I'm not sure if this method has a formal name.)
There are also som alternative error function that can be used. Look up Cross Entropy in you neural network text book. This is an adjustment to the error function such that the derivative (of the transfer function) factor in the update rule cancels out. Just remember to pick the right cross entropy function based on you output transfer function.
When I first started getting into Neural Nets, I had this question too.
The other answers here have explained the math which makes it pretty clear that a derivative term will appear in your calculations while you are trying to update the weights. But all of those calculations are being done in order to implement Back-propagation, which is just one of the ways of updating weights! Now read on...
You are correct in assuming that at the end of the day, all a neural network tries to do is update its weights to fit the data you feed into it. Within this statement lies your answer too. What you are getting confused with here is the idea of the Back-propagation algorithm. Many textbooks use backprop to update neural nets by default but do not mention that there are other ways to update weights too. This leads to the confusion that neural nets and backprop are the same thing and are inherently connected. This also leads to the false belief that neural nets need backprop to train.
Please remember that Back-propagation is just ONE of the ways out there to train your neural network (although it is the most famous one). Now, you must have seen the math involved in backprop, and hence you can see where the derivative term comes in from (some other answers have also explained that). It is possible that other training methods won't need the derivatives, although most of them do. Read on to find out why....
Think about this intuitively, we are talking about CHANGING weights, the direct mathematical operation related to change is a derivative, makes sense that you should need to evaluate derivatives to change weights.
Do let me know if you are still confused and I'll try to modify my answer to make it better. Just as a parting piece of information, another common misconception is that gradient descent is a part of backprop, just like it is assumed that backprop is a part of neural nets. Gradient descent is just one way to minimize your cost function, there are plenty of others you can use. One of the answers above makes this wrong assumption too when it says "Specifically Gradient Descent". This is factually incorrect. :)
Training a neural network means minimizing an associated "error" function wrt the networks weights. Now there are optimization methods that use only function values (Simplex method of Nelder and Mead, Hooke and Jeeves, etc), methods that in addition use first derivatives (steepest descend, quasi Newton, conjugate gradient) and Newton methods using second derivatives as well. So if you want to use a derivative method, you have to calculate the derivatives of the error function, which in return involves the derivatives of the transfer or activation function.
Back propagation is just a nice algorithm to calculate the derivatives, and nothing more.
Yes, the question was really good, this question was also came in my head while i am understanding the Backpropagation. After doing ForwordPropagation on neural network we do back propagation in network to minimize the total error. And there also many other way to minimize the error.your question is why we are doing derivative in backpropagation, the reason is that, As we all know the meaning of derivative is to find the slope of a function or in other words we can find change of particular thing with respect to particular thing. So here we are doing derivative to minimize the total error with respect to the corresponding weights of the network.
and here by doing the derivation of total error with respect to weights we can find it's slope or in other words we can find what is the change in total error with respect to the small change of the weight, so that we can update the weight to minimize the error with the help of this Gradient Descent formula, that is, Weight= weight-Alpha*(del(Total error)/del(weight)).Or in other words New Weights = Old Weights - learning-rate x Partial derivatives of loss function w.r.t. parameters.
Here Alpha is the learning rate which is control the weight update, means if the derivative the - ve than Alpha make it +ve(Becouse of -Alpha in formula) and if +ve it's remain +ve so that weight update goes in +ve direction and it's reflected to minimize the Total error.And also the as derivative part is multiples with Alpha, it's decrees the step size of Alpha when the weight converge to the optimal value of weight(minimum error). Thats why we are doing derivative to minimize the error.

Matlab: Help in running toolbox for Kalman filter

I have AR(1) model with data samples $N=500$ that is driven by a random input sequence x. THe observation y is corrupted with measurement noise $v$ of zero mean. The model is
y(t) = 0.195y(t-1) + x(t) + v(t) where x(t) is generated as randn(). I am unsure how to represent this as a state space model and how to estimate the parameters $a$ and the states. I tried the state space representation would be
d(t) = \mathbf{a^T} d(t) + x(t)
y(t) = \mathbf{h^T}d(t) + sigma*v(t)
sigma =2.
I cannot understand how to perform parameter and state estimation. Using the toolbox mentioned below, I checked the Equations of KF to be matching with those in textbooks. However, the approach for parameter estimation is different. I will appreciate a recommendation for the implementation procedure.
Implementation 1:
I am following the implementation here : Learning Kalman Filter. This implementation does not use Expectation Maximization to estimate the parameters of AR model and it finds out the Covariance of the process noise. In my case, I don't have a process noise, but an input $x$.
Implementation 2: Kalman Filter by Kevin Murphy is another toolbox which uses EM for parameter estimation of AR model. Now, it is confusing since both the implementations uses different approach for parameter estimation.
I am having a tough time figuring out the correct approach, the state space model representation and the code. Shall appreciate recommendations on how to proceed.
I ran the first implementation for the KalmanARSquareRoot technique and the result is completely different. There is Exponential Moving Average Smoothing being performed and a MA filer of length 30 being used. The toolbox runs fine if I run the Demo examples. But on changing the model, the result is extremely poor. Maybe I am doing something wrong. Do I need to change the equations of KF for my time series?
In the second implementation, I cannot figure out what and where to change the Equations.
In general, if I have to use these tools, then do I need to change the KF equations for every time series model? How do I write the Equations myself if these toolboxes are inappropriate for all the time series model - AR, MA, ARMA?
I only have a bit of experience with Kalman Filters, so take this with a grain of salt.
It seems you shouldn't need to change the equations at all. Working with the second package (learn_kalman), you can create an A0 matrix of size [length(d(t)) length(d(t))]. C0 is the same, and in your case the initial state probably makes sense to be the Identity matrix (unlike your A0. All you need to do is choose a good initial condition.
However, I took a look at your system (plotted an example) and it seems noise dominates your system. KF is an optimal estimator but I have not known it to reject that much noise. It only guarantees a reduced covariance...which means that if your system is mostly dominated by noise, you will calculate a bad model that estimates your system given the noise!
Try plotting [d f] where d is the initial data and f is calculated using the regressive formula f(t) = C * A * f(t-1) :
f(t) = A * f(t-1)
; y(t) = C * f(t)
That is, pretend as if there is no noise but using the estimated AR model. You will see that it rejects all the noise and 'technically' models the system well (since the only unique behaviour is at the very beginning).
For example, if you have a system with A = 0.195, Q=R=0.l then you will converge to an A = 0.2207 but it still isn't good enough. Here the problem is that your initial state is so low, that within a few steps of data and you are essentially at 0 accounting for noise. Naturally KF can converge to a LOT of model solutions that are similar. Any noise will throw off even the best initial condition.
If you increase the resolution of your data in some way (e.g. larger initial condition, more refined timesteps) you will see a good fit. Ex, changing your initial condition to 110 and you'll find the two curves similar, though the model is still fairly different.
I am not aware of any approach to model your data well. If the noise variance is in fact 1 and your system converges to 0 that quickly, it seems doomed to not be effectively modelled since you just don't capture any unique behaviour in the dataset.

MATLAB, averaging multiple fft's ,coherent integration

I have audio record.
I want to detect sinusoidal pattern.
If i do regular fft i have result with bad SNR.
for example
my signal contents 4 high frequencies:
fft result:
To reduce noise i want to do Coherent integration as described in this article: http://flylib.com/books/en/2.729.1.109/1/
but i cant find any MATLAB examples how to do it. Sorry for bad english. Please help )
I look at spectra almost every day, but I never heard of 'coherent integration' as a method to calculate one. As also mentioned by Jason, coherent integration would only work when your signal has a fixed phase during every FFT you average over.
It is more likely that you want to do what the article calls 'incoherent integration'. This is more commonly known as calculating a periodogram (or Welch's method, a slightly better variant), in which you average the squared absolute value of the individual FFTs to obtain a power-spectral-density. To calculate a PSD in the correct way, you need to pay attention to some details, like applying a suitable Fourier window before doing each FFT, doing the proper normalization (so that the result is properly calibrated in i.e. Volt^2/Hz) and using half-overlapping windows to make use of all your data. All of this is implemented in Matlab's pwelch function, which is part of the signal-processing toolbox. See my answer to a similar question about how to use pwelch.
Integration or averaging of FFT frames just amounts to adding the frames up element-wise and dividing by the number of frames. Since MATLAB provides vector operations, you can just add the frames with the + operator.
coh_avg = (frame1 + frame2 + ...) / Nframes
Where frameX are the complex FFT output frames.
If you want to do non-coherent averaging, you just need to take the magnitude of the complex elements before adding the frames together.
noncoh_avg = (abs(frame1) + abs(frame2) + ...) / Nframes
Also note that in order for coherent averaging to work the best, the starting phase of the signal of interest needs to be the same for each FFT frame. Otherwise, the FFT bin with the signal may add in such a way that the amplitudes cancel out. This is usually a tough requirement to ensure without some knowledge of the signal or some external triggering so it is more common to use non-coherent averaging.
Non-coherent integration will not reduce the noise power, but it will increase signal to noise ratio (how the signal power compares to the noise power), which is probably what you really want anyway.
I think what you are looking for is the "spectrogram" function in Matlab, which computes the short time Fourier transform(STFT) of an input signal.
STFT
Spectrogram

3rd-order rate limiter in Simulink? How to generate smooth triggered signals?

First for those, who are not familiar with Simulink, there is a imaginable outside-Simulink partial solution:
I need to create a vector satisfying the following conditions:
known initial value a1
known final value a2
it has a pre-defined step size, but the length is not pre-determined
the first derivative over the whole range is limited to v_max resp. -v_max
the second derivative over the whole range is limited to a_max resp. -a_max
the third derivative over the whole range is limited to j_max resp. -j_max
at the first and the final point all derivatives are zero.
Before you ask "what have you tried so far", I just had the idea to solve it outside Simulink and I tried the whole stuff below ;)
But maybe you guys have a good idea, while I keep working on my own solution.
I'd like to generate smooth ramp signals (3rd derivative limited) based on a trigger signal in Simulink.
To get a triggered step I created a triggered subsystem propagating the trigger output. It looks like that:
But I actually don't want a step, I need a very smooth ramp with limited derivatives up to the 3rd order. The math behind is:
displacement: x
speed: v = x'
acceleration: a = v' = x''
jerk: j = a' = v'' = x'''
(If this looks familiar to you, I once had a very similar question. I thought about a bounty on it, but after the necessary edit of the question both answers would have been invalid)
As there are just rate limiters of 1st order, I used two derivates and a double integration to resolve my problem. But there is a mayor drawback, I can not ignore anymore. For the sake of illustration I chose a relatively big step size of 0.1.
The complete minimal example (Fixed Step, stepsize: 0.1, ode4): Download here
It can be seen, that the signal not even reaches the intended step height of 10 and furthermore is not constant at the end.
Over the development process of my whole model, this approach was satisfactory enough for small step sizes. But I reached the point where I really need the smooth ramp as intended. That means I need a finally constant signal at exactly the value, specified by the step height gain.
I already spent days to resolve the problem, and hope to fine some help here now.
Some of my ideas:
dynamically increase the step height over the actual desired value and saturate the final output. If the rate limits,step height and the simulation step size wouldn't be flexible one could probably find a satisfying solution. But as everything has to be flexible, there are too much cases where the acceleration and jerk limit is violated.
I tried to use the Matlab function block and write my own 3rd order rate limiter. Though it seems possible for me for the trigger moment, I have no solution how to smooth the "deceleration" at the end of the ramp. Also I'd need C-compilers, which would make it hard to use my model on other systems without problems. (At least I think so.)
The solver can not be changed siginificantly (either ode3 or ode4) and a fixed step size is mandatory (0.00001 to 0.01).
Currently used, not really useful approach:
For a dynamic amplification of 1.07 I get the following output (all values normalised on their limits):
Though the displacement looks nice, the violation of the acceleration limit is very harmful.
For a dynamic amplification of 1.05 I get the following output (all values normalised on their limits):
The acceleration stays in its boundaries, but the displacement does not reach the intended value. (not really clear in the picture) The jerk is still to big. (I could live with that, but it's not nice)
So it appears to me that a inside-Simulink solutions is far from reality. Any ideas how to create a well-behaving custom function block?
Simulation step size, step height, and the rate limits are known before the simulation starts. (But I have a lot of these triggered smooth ramps in a row, it should feed a event-discrete control). So I could imagine to create the whole smooth ramp outside simulink and save it as a timeseries object and append it on the current signal when the trigger is activated.
The problems you see are because the difference is not conditioned very well.
Taking the difference amplifies the numerical that exists in your simulation.
Also the jerk will always be large if you try to apply an actual step.
I guess for your approach it would be better to work the other way around:
i.e. make a jerk, acceleration and velocity with which your step is achieved.
I think your looking for something like the ref3 block:
http://www.dct.tue.nl/home_of_ref3.htm
Note the disclaimer on the site and that it is a little cumbersome to use.
An easy (yet to be improved) way is to use a rate limiter and then a state space model with a filter. From the filter you get the velocity, which in turn you can apply a rate limiter to. You continue with rate-limiter and filters until you have the desired curve.
Otherwise you can come up with numerical rate-limiters of higher order using ie runge kutta formulas or finite differences. However it was pointed out, that they may suffer from bad conditioning.
What I usually do is to use one rate limiter and a filter of 3rd Order and just tune the time constant (1 tripple pole), such that my needs are met. This works well, esp
Integrator chains of length > 1 are unstable!
There is a huge field of research dealing with trajectory planning. The easiest way might be to use FIR filters (Biagotti et al) or to implement an online trajectory planner (Ezair et al 2014 / Knierim et al 2012).