Why does equivalent discrete control implementation provide different results in XCos - discrete-mathematics

I am simulating a simple closed loop speed controller for a DC motor in scilab/xcos.
I have a continuous PI controller working just fine.
I have then discretized the continuous controller and implemented it in two different, but equivalent ways, and it seems that the two discrete implementations provide different results, even though they are supposed to be equivalent.
Both discrete controllers are obtained by the same discretization method (Tustin), but one is implemented as a single DLR Xcos transfer function, whereas the other is implemented as a sum of the P and I parts individually.
The attached model contains all setup in the Context, and illustrates both the continuous controller as well as both of the discrete controllers.
The "componentwise discrete control" tracks the continuous controller reasonably well, whereas the "transfer function discrete control" is unstable.
The problem is resolved for shorter sample times, which puzzles me since the math governing the two discrete implementations is completely identical, and I would therefore expect the relative behaviour of the two discrete controllers should be the same no matter what sample time is used.
I would appreciate any input or explanation as to what I am doing wrong and why these two seemingly equivalent implementations differ.
The model is attached here:

Just add a sample-and-hold in the path of the (continous!) kp in the componentwise controller ;-) The kp "saves" your stability because its continous, though the integrator works discrete. But the embedded discrete controller works discrete with Ts entirely - that's too slow. If you add the discrete behaviour to the kp via sample-and-hold you achieve the same results.


Episodic Semi-gradient Sarsa with Neural Network

While trying to implement the Episodic Semi-gradient Sarsa with a Neural Network as the approximator I wondered how I choose the optimal action based on the currently learned weights of the network. If the action space is discrete I can just calculate the estimated value of the different actions in the current state and choose the one which gives the maximimum. But this seems to be not the best way of solving the problem. Furthermore, it does not work if the action space can be continous (like the acceleration of a self-driving car for example).
So, basicly I am wondering how to solve the 10th line Choose A' as a function of q(S', , w) in this pseudo-code of Sutton:
How are these problems typically solved? Can one recommend a good example of this algorithm using Keras?
Edit: Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
I wondered how I choose the optimal action based on the currently learned weights of the network
You have three basic choices:
Run the network multiple times, once for each possible value of A' to go with the S' value that you are considering. Take the maximum value as the predicted optimum action (with probability of 1-ε, otherwise choose randomly for ε-greedy policy typically used in SARSA)
Design the network to estimate all action values at once - i.e. to have |A(s)| outputs (perhaps padded to cover "impossible" actions that you need to filter out). This will alter the gradient calculations slightly, there should be zero gradient applied to last layer inactive outputs (i.e. anything not matching the A of (S,A)). Again, just take the maximum valid output as the estimated optimum action. This can be more efficient than running the network multiple times. This is also the approach used by the recent DQN Atari games playing bot, and AlphaGo's policy networks.
Use a policy-gradient method, which works by using samples to estimate gradient that would improve a policy estimator. You can see chapter 13 of Sutton and Barto's second edition of Reinforcement Learning: An Introduction for more details. Policy-gradient methods become attractive for when there are large numbers of possible actions and can cope with continuous action spaces (by making estimates of the distribution function for optimal policy - e.g. choosing mean and standard deviation of a normal distribution, which you can sample from to take your action). You can also combine policy-gradient with a state-value approach in actor-critic methods, which can be more efficient learners than pure policy-gradient approaches.
Note that if your action space is continuous, you don't have to use a policy-gradient method, you could just quantise the action. Also, in some cases, even when actions are in theory continuous, you may find the optimal policy involves only using extreme values (the classic mountain car example falls into this category, the only useful actions are maximum acceleration and maximum backwards acceleration)
Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
No. There is no separate loss function in the pseudocode, such as the MSE you would see used in supervised learning. The error term (often called the TD error) is given by the part in square brackets, and achieves a similar effect. Literally the term ∇q(S,A,w) (sorry for missing hat, no LaTex on SO) means the gradient of the estimator itself - not the gradient of any loss function.

Converter control simulation on simulink

I am struggling with a little project I decided to tackle. I am trying to replicate an example I found on a book using matlab simulink but I have no experience with simulink and control theory (I do understand the principles etc.).
The control block diagram is given but I do not understand some blocks and how to add my input (sine wave block on simulink)?
Here are the details:
Example I wish to reproduce
Schematic of the converter and desired control block diagram
If anyone could give me a little insight or direct me to some examples from which I could build on an understanding would be great!
Thank you in advance.
The portion entitled controller is the closed-loop feedback control for the system. K(s) would typically contain some type of PI control. In a more complicated control system, the structure of K(s) may be a little different, but will usually always contain an integration in order to ensure that the system eventually settles at the desired value.
The input Iref is your current command. In this case you would inject your sinusoid here which would produce a current waveform matching your desired output. If your desired output.
Output m is the modulating waveform produced by the controller. Everything inside the half-bridge converter section is a representation of the converter and everything that it is interfaced to (voltage sources).
The feedforward filter here is also a very important component. Since Vs contains an alternating waveform, the feed forward filter allows the system to respond to changes in Vs without relying on feedback compensation K(s). This helps to decouple current regulation from changes in voltage VD.
To start with the project, you can probably build the half bridge converter as shown. You can inject 400*cos(377t - pi/2) as VD.
For the feedback compensator K(s) you can feed the input into two gains (Ki and Kp) which you will select values for later. At the output of Ki insert an integrator (1/s) then sum the output of Kp and the integrator together.
For the feed-forward filter, you should probably just use a low pass filter with a gain of 1 at DC. The low pass filter prevents noise from entering the system. In this case you are running a simulation, so there will be no noise. However, the filter will eliminate any algebraic loops, which can cause warnings or errors in the simulation.
You can input your control signal at Iref.

MLP with sliding windows = TDNN

Need some confirmation on the statement.
Is two of these equivalent?
1.MLP with sliding time windows
2.Time delay neural network (TDNN)
Can anyone confirm on the given statement? Possibly with reference. Thanks
"Equivalent" is too generalizing but you can roughly say that in terms of architecture (at least regarding their original proposal - there have been more modifications like the MS-TDNN which is even more different from a MLP). The correct phrasing would be that TDNN is an extended MLP architecture [1].
Both use Backpropagation and both are FeedForward nets.
The main idea can probably be phrased like this:
Delaying the inputs of neurons located in a hidden or the output layer
is similar to multiplying the layers beyond and helps with pattern
scaling and translation and is close to integrating the input signal
over time.
What makes it different from the MLP:
However, in order to deal with delayed or scaled input signals, the
original denition of the TDNN required that all (delayed) links of a
neuron that are connected to one input are identical.
This requirement was overthrown in later studies, however, like in [1] where past and present nodes have different weights (which obviously seems reasonable for a number of applications) making it equivalent of a MLP.
That's all regarding architecture comparisons. Let's talk about training. The results will be different: The whole training will differ if you input the same sequential data into an MLP wich only gets current data one-by-one from a sliding window and if you input it with current and past data together into the TDNN. The big difference is context. With the MLP you'll have the context of past inputs in past activations. With the TDNN you'll have them in present activations, directly coupled to your present inputs. Again, MLPs have no temporal context capabilities (this is why recurrent neural networks are much more popular for sequential data) and the TDNN is an attempt to solve that. The way I see it, TDNN is basically an attempt to merge the 2 worlds of MLPs (basic Backprop) and RNNs (context/sequences).
TL;DR: If you strip down the TDNNs purpose you can say your statement holds true on an architectural level. But if you compare both architectures side by side in action you will get different observations.
Here is decription of TDNN taken from Waibel et al 1989 paper. "In our TDNN basic unit is modified by intoducing delays D1 through Dn as shown in Fig. 1. J inputs of such unit now will be multiplied by several weights, one for each delay". This is essentialy MLP with sliding window (see also Fig. 2 there).

What is the structure of an indirect (error-state) Kalman filter and how are the error equations derived?

I have been trying to implement a navigation system for a robot that uses an Inertial Measurement Unit (IMU) and camera observations of known landmarks in order to localise itself in its environment. I have chosen the indirect-feedback Kalman Filter (a.k.a. Error-State Kalman Filter, ESKF) to do this. I have also had some success with an Extended KF.
I have read many texts and the two I am using to implement the ESKF are "Quaternion kinematics for the error-state KF" and "A Kalman Filter-based Algorithm for IMU-Camera Calibration" (pay-walled paper, google-able).
I am using the first text because it better describes the structure of the ESKF, and the second because it includes details about the vision measurement model. In my question I will be using the terminology from the first text: 'nominal state', 'error state' and 'true state'; which refer to the IMU integrator, Kalman Filter, and the composition of the two (nominal minus errors).
The diagram below shows the structure of my ESKF implemented in Matlab/Simulink; in case you are not familiar with Simulink I will briefly explain the diagram. The green section is the Nominal State integrator, the blue section is the ESKF, and the red section is the sum of the nominal and error states. The 'RT' blocks are 'Rate Transitions' which can be ignored.
My first question: Is this structure correct?
My second question: How are the error-state equations for the measurement models derived?
In my case I have tried using the measurement model of the second text, but it did not work.
Kind Regards,
Your block diagram combines two indirect methods for bringing IMU data into a KF:
You have an external IMU integrator (in green, labelled "INS", sometimes called the mechanization, and described by you as the "nominal state", but I've also seen it called the "reference state"). This method freely integrates the IMU externally to the KF and is usually chosen so you can do this integration at a different (much higher) rate than the KF predict/update step (the indirect form). Historically I think this was popular because the KF is generally the computationally expensive part.
You have also fed your IMU into the KF block as u, which I am assuming is the "command" input to the KF. This is an alternative to the external integrator. In a direct KF you would treat your IMU data as measurements. In order to do that, the IMU would have to model (position, velocity, and) acceleration and (orientation and) angular velocity: Otherwise there is no possible H such that Hx can produce estimated IMU output terms). If you instead feed your IMU measurements in as a command, your predict step can simply act as an integrator, so you only have to model as far as velocity and orientation.
You should pick only one of those options. I think the second one is easier to understand, but it is closer to a direct Kalman filter, and requires you to predict/update for every IMU sample, rather than at the (I assume) slower camera framerate.
Regarding measurement equations for version (1), in any KF you can only predict things you can know from your state. The KF state in this case is a vector of error terms, and thus you can only predict things like "position error". As a result you need to pre-condition your measurements in z to be position errors. So make your measurement the difference between your "estimated true state" and your position from "noisy camera observations". This exact idea may be represented by the xHat input to the indirect KF. I don't know anything about the MATLAB/Simulink stuff going on there.
Regarding real-world considerations for the summing block (in red) I refer you to another answer about indirect Kalman filters.
Q1) Your SIMULINK model looks to be appropriate. Let me shed some light on quaternion mechanization based KF's which I've worked on for navigation applications.
Since Kalman Filter is an elegant mathematical technique which borrows from the science of stochastics and measurement, it can help you reduce the noise from the system without the need for elaborately modeling the noise.
All KF systems start with some preliminary understanding of the model that you want to make free of noise. The measurements are fed back to evolve the states better (the measurement equation Y = CX). In your case, the states that you are talking about are errors in quartenions which would be the 4 values, dq1, dq2, dq3, dq4.
KF working well in your application would accurately determine the attitude/orientation of the device by controlling the error around the quaternion. The quaternions are spatial orientation of any body, understood using a scalar and a vector, more specifically an angle and an axis.
The error equations that you are talking about are covariances which contribute to Kalman Gain. The covariances denote spread around the mean and they are useful in understanding how the central/ average behavior of the system is changing with time. Low covariances denote less deviation from the mean behavior for any system. As KF cycles run the covariances keep getting smaller.
The Kalman Gain is finally used to compensate for the error between the estimates of the measurements and the actual measurements that are coming in from the camera.
Again, this elegant technique first ensures that the error in the quaternion values converge around zero.
Q2) EKF is a great technique to use as long as you have a non-linear measurement construction technique. Be very careful in using EKF if their are too many transformations in your system, i.e don't try to reconstruct measurements using transformation on your states, this seriously affects the model sanctity and since noise covariances would not undergo similar transformations, there would be a chance of hitting singularity as soon as matrices are non-invertible.
You could look at constant gain KF schemes, which would save you from covariance propagation and save substantial computation effort and time. These techniques are quite new and look very promising. They actively absorb P(error covariance), Q(model noise covariance) and R(measurement noise covariance) and work well with EKF schemes.

Determine function parameters with neural network

I am currently studying a doctoral thesis in control theory. At the end of every chapter there is a simulation of a relative-with-the-subject problem. I have finished the theory,but for further understanding I would like to reproduce the simulations. The first simulation is as follows :
The solution of the problem concludes in a system of differential equations whose right hand side consists of functions with unknown parameters. The author states the following : "We will use neural networks with one hidden layer,sigmoid basis functions and 5 weights in the external layer in order to approximate every parameter of the unknown functions.More specifically, the weights of the hidden layer are selected through iterative trials and are kept stable during the simulation." And then he states the logic with which he selects the initial values of the unknown parameters and then shows the results of the simulation.
Could anyone give me a lead on where to look and what I need to know in order to solve this specific problem myself in MATLAB (since this is the environment I am most familiar with)? Because the results of a google search are chaotic since I don't really know what I'm looking for.
If you need any more info,feel free to ask!
You can try MATLAB's Neural Network Toolbox. This gives you an nice UI where you can configure the network, train it with data to find the parameter values and test for performance. No coding involved.
Or, you can program it by hand. Since you are working with one hidden layer, it should be very simple. I am sure any machine learning or neural net (NN) textbook would have one example of it. You can also look into GitHib for projects. There should be many NN projects there, in case you are looking to salvage code from existing project.
Most importantly, you should start by learning about NN, if you haven't done that already. NN with single hidden layer is easy to implement once you understand the equations for the forward and back propagation.