I have a question about curve fitting, I have many curves like the one in the picture.
X axis : time
Y axis : temperature
Each sample comes out every 30s.
GOAL : predict the value at the end of the transient
What would you do in this situation?
What I am doing is this :
for every new sample I start a new fitting (and so each fitting is independent from the previous one) and check the value of the fitted curve 2 hours (all curves I have set before 2h) after the start of the measurement. If for a number (let's say 5) of subsequent fitting the value in the future stays more or less the same(+-0.2°C) I so assume that the estimation is the right one.
This approach seems to me far too simple and I think I am not exploiting all information. For example the info of the error I am making punctually (e.g. at minute 4:00 I predict and at 4:30 I see that I am doing an error).
In the picture the red part of the curve is excluded (but the real data in the future passes through it). the estimation is the blue one. You see in this case I don't have a good prediction... In general I have also more flat curves.
Based on the comments above, I tried to formulate an answer as no one else is giving some input.
I think your are using a good basic procedure. Better results may be obtained by using a more appropriate fitting curve, which includes all the dominant dynamics, but avoids overfitting of the data. Based on your figure, the simplest form I could think of is:
s + a(1-e^(-t/tau))
with parameters s (the initial temperature), a (amplitude = steady state value) and tau (dominant time constant). As you mentioned yourself, limiting the allowed range for the parameters may avoid overfitting and increase the quality of your estimation.
Using a random high order function, like you are using now, may give good interpolation results, but are dangerous to use for extrapolation, because strange effects may occur outside the fitting region.
Alternatives
Using the error (eg. correcting for the extrapolated error) may be possible, but is tricky and may not always give good results.
Training a neural network to perform the estimation is probably overkill, but may give better results if applied correctly. Note that you need a lot of training data which should be representative for the data for which you will use the neural network later on.
Related
I'm trying to get a better understanding of neural networks by trying to programm a Convolution Neural Network by myself.
So far, I'm going to make it pretty simple by not using max-pooling and using simple ReLu-activation. I'm aware of the disadvantages of this setup, but the point is not making the best image detector in the world.
Now, I'm stuck understanding the details of the error calculation, propagating it back and how it interplays with the used activation-function for calculating the new weights.
I read this document (A Beginner's Guide To Understand CNN), but it doesn't help me understand much. The formula for calculating the error already confuses me.
This sum-function doesn't have defined start- and ending points, so i basically can't read it. Maybe you can simply provide me with the correct one?
After that, the author assumes a variable L that is just "that value" (i assume he means E_total?) and gives an example for how to define the new weight:
where W is the weights of a particular layer.
This confuses me, as i always stood under the impression the activation-function (ReLu in my case) played a role in how to calculate the new weight. Also, this seems to imply i simply use the error for all layers. Doesn't the error value i propagate back into the next layer somehow depends on what i calculated in the previous one?
Maybe all of this is just uncomplete and you can point me into the direction that helps me best for my case.
Thanks in advance.
You do not backpropagate errors, but gradients. The activation function plays a role in caculating the new weight, depending on whether or not the weight in question is before or after said activation, and whether or not it is connected. If a weight w is after your non-linearity layer f, then the gradient dL/dw wont depend on f. But if w is before f, then, if they are connected, then dL/dw will depend on f. For example, suppose w is the weight vector of a fully connected layer, and assume that f directly follows this layer. Then,
dL/dw=(dL/df)*df/dw //notations might change according to the shape
//of the tensors/matrices/vectors you chose, but
//this is just the chain rule
As for your cost function, it is correct. Many people write these formulas in this non-formal style so that you get the idea, but that you can adapt it to your own tensor shapes. By the way, this sort of MSE function is better suited to continous label spaces. You might want to use softmax or an svm loss for image classification (I'll come back to that). Anyway, as you requested a correct form for this function, here is an example. Imagine you have a neural network that predicts a vector field of some kind (like surface normals). Assume that it takes a 2d pixel x_i and predicts a 3d vector v_i for that pixel. Now, in your training data, x_i will already have a ground truth 3d vector (i.e label), that we'll call y_i. Then, your cost function will be (the index i runs on all data samples):
sum_i{(y_i-v_i)^t (y_i-vi)}=sum_i{||y_i-v_i||^2}
But as I said, this cost function works if the labels form a continuous space (here , R^3). This is also called a regression problem.
Here's an example if you are interested in (image) classification. I'll explain it with a softmax loss, the intuition for other losses is more or less similar. Assume we have n classes, and imagine that in your training set, for each data point x_i, you have a label c_i that indicates the correct class. Now, your neural network should produce scores for each possible label, that we'll note s_1,..,s_n. Let's note the score of the correct class of a training sample x_i as s_{c_i}. Now, if we use a softmax function, the intuition is to transform the scores into a probability distribution, and maximise the probability of the correct classes. That is , we maximse the function
sum_i { exp(s_{c_i}) / sum_j(exp(s_j))}
where i runs over all training samples, and j=1,..n on all class labels.
Finally, I don't think the guide you are reading is a good starting point. I recommend this excellent course instead (essentially the Andrew Karpathy parts at least).
I am generation some data whose plots are as shown below
In all the plots i get some outliers at the beginning and at the end. Currently i am truncating the first and the last 10 values. Is there a better way to handle this?
I am basically trying to automatically identify the two points shown below.
This is a fairly general problem with lots of approaches, usually you will use some a priori knowledge of the underlying system to make it tractable.
So for instance if you expect to see the pattern above - a fast drop, a linear section (up or down) and a fast rise - you could try taking the derivative of the curve and looking for large values and/or sign reversals. Perhaps it would help to bin the data first.
If your pattern is not so easy to define but you are expecting a linear trend you might fit the data to an appropriate class of curve using fit and then detect outliers as those whose error from the fit exceeds a given threshold.
In either case you still have to choose thresholds - mean, variance and higher order moments can help here but you would probably have to analyse existing data (your training set) to determine the values empirically.
And perhaps, after all that, as Shai points out, you may find that lopping off the first and last ten points gives the best results for the time you spent (cf. Pareto principle).
First for those, who are not familiar with Simulink, there is a imaginable outside-Simulink partial solution:
I need to create a vector satisfying the following conditions:
known initial value a1
known final value a2
it has a pre-defined step size, but the length is not pre-determined
the first derivative over the whole range is limited to v_max resp. -v_max
the second derivative over the whole range is limited to a_max resp. -a_max
the third derivative over the whole range is limited to j_max resp. -j_max
at the first and the final point all derivatives are zero.
Before you ask "what have you tried so far", I just had the idea to solve it outside Simulink and I tried the whole stuff below ;)
But maybe you guys have a good idea, while I keep working on my own solution.
I'd like to generate smooth ramp signals (3rd derivative limited) based on a trigger signal in Simulink.
To get a triggered step I created a triggered subsystem propagating the trigger output. It looks like that:
But I actually don't want a step, I need a very smooth ramp with limited derivatives up to the 3rd order. The math behind is:
displacement: x
speed: v = x'
acceleration: a = v' = x''
jerk: j = a' = v'' = x'''
(If this looks familiar to you, I once had a very similar question. I thought about a bounty on it, but after the necessary edit of the question both answers would have been invalid)
As there are just rate limiters of 1st order, I used two derivates and a double integration to resolve my problem. But there is a mayor drawback, I can not ignore anymore. For the sake of illustration I chose a relatively big step size of 0.1.
The complete minimal example (Fixed Step, stepsize: 0.1, ode4): Download here
It can be seen, that the signal not even reaches the intended step height of 10 and furthermore is not constant at the end.
Over the development process of my whole model, this approach was satisfactory enough for small step sizes. But I reached the point where I really need the smooth ramp as intended. That means I need a finally constant signal at exactly the value, specified by the step height gain.
I already spent days to resolve the problem, and hope to fine some help here now.
Some of my ideas:
dynamically increase the step height over the actual desired value and saturate the final output. If the rate limits,step height and the simulation step size wouldn't be flexible one could probably find a satisfying solution. But as everything has to be flexible, there are too much cases where the acceleration and jerk limit is violated.
I tried to use the Matlab function block and write my own 3rd order rate limiter. Though it seems possible for me for the trigger moment, I have no solution how to smooth the "deceleration" at the end of the ramp. Also I'd need C-compilers, which would make it hard to use my model on other systems without problems. (At least I think so.)
The solver can not be changed siginificantly (either ode3 or ode4) and a fixed step size is mandatory (0.00001 to 0.01).
Currently used, not really useful approach:
For a dynamic amplification of 1.07 I get the following output (all values normalised on their limits):
Though the displacement looks nice, the violation of the acceleration limit is very harmful.
For a dynamic amplification of 1.05 I get the following output (all values normalised on their limits):
The acceleration stays in its boundaries, but the displacement does not reach the intended value. (not really clear in the picture) The jerk is still to big. (I could live with that, but it's not nice)
So it appears to me that a inside-Simulink solutions is far from reality. Any ideas how to create a well-behaving custom function block?
Simulation step size, step height, and the rate limits are known before the simulation starts. (But I have a lot of these triggered smooth ramps in a row, it should feed a event-discrete control). So I could imagine to create the whole smooth ramp outside simulink and save it as a timeseries object and append it on the current signal when the trigger is activated.
The problems you see are because the difference is not conditioned very well.
Taking the difference amplifies the numerical that exists in your simulation.
Also the jerk will always be large if you try to apply an actual step.
I guess for your approach it would be better to work the other way around:
i.e. make a jerk, acceleration and velocity with which your step is achieved.
I think your looking for something like the ref3 block:
http://www.dct.tue.nl/home_of_ref3.htm
Note the disclaimer on the site and that it is a little cumbersome to use.
An easy (yet to be improved) way is to use a rate limiter and then a state space model with a filter. From the filter you get the velocity, which in turn you can apply a rate limiter to. You continue with rate-limiter and filters until you have the desired curve.
Otherwise you can come up with numerical rate-limiters of higher order using ie runge kutta formulas or finite differences. However it was pointed out, that they may suffer from bad conditioning.
What I usually do is to use one rate limiter and a filter of 3rd Order and just tune the time constant (1 tripple pole), such that my needs are met. This works well, esp
Integrator chains of length > 1 are unstable!
There is a huge field of research dealing with trajectory planning. The easiest way might be to use FIR filters (Biagotti et al) or to implement an online trajectory planner (Ezair et al 2014 / Knierim et al 2012).
I am trying to resample/recreate already recorded data for plotting purposes. I thought this is best place to ask the question (besides dsp.se).
The data is sampled at high frequency, contains to much data points and not suitable for plotting in time domain (not enough memory). i want to sample it with minimal loss. The sampling interval of the resulting data doesn't need to be same (well it is again for plotting purposes, not analysis) although input data in equally sampled.
When we use the regular resample command from matlab/octave, it can distort stiff pieces of the curve.
What is the best approach here?
For reference I put two pictures found in tex.se)
First image is regular resample
Second image is a better resampled data that can well behave around peaks.
You should try this set of files from the File Exchange. It computes optimal lookup table based on either the maximum set of points or a given error. You can choose from natural, linear, or spline for the interpolation methods. Spline will have the smallest table size but is slower than linear. I don't use natural unless I have a really good reason.
Sincerely,
Jason
I am building receiver operating characteristic (ROC) curves to evaluate classifiers using the area under the curve (AUC) (more details on that at end of post). Unfortunately, points on the curve often go below the diagonal. For example, I end up with graphs that look like the one here (ROC curve in blue, identity line in grey) :
The the third point (0.3, 0.2) goes below the diagonal. To calculate AUC I want to fix such recalcitrant points.
The standard way to do this, for point (fp, tp) on the curve, is to replace it with a point (1-fp, 1-tp), which is equivalent to swapping the predictions of the classifier. For instance, in our example, our troublesome point A (0.3, 0.2) becomes point B (0.7, 0.8), which I have indicated in red in the image linked to above.
This is about as far as my references go in treating this issue. The problem is that if you add the new point into a new ROC (and remove the bad point), you end up with a nonmonotonic ROC curve as shown (red is the new ROC curve, and dotted blue line is the old one):
And here I am stuck. How can I fix this ROC curve?
Do I need to re-run my classifier with the data or classes somehow transformed to take into account this weird behavior? I have looked over a relevant paper, but if I am not mistaken, it seems to be addressing a slightly different problem than this.
In terms of some details: I still have all the original threshold values, fp values, and tp values (and the output of the original classifier for each data point, an output which is just a scalar from 0 to 1 that is a probability estimate of class membership). I am doing this in Matlab starting with the perfcurve function.
Note based on some very helpful emails about this from the people that wrote the articles cited above, and the discussion above, the right answer seems to be: do not try to "fix" individual points in an ROC curve unless you build an entirely new classifier, and then be sure to leave out some test data to see if that was a reasonable thing to do.
Getting points below the identity line is something that simply happens. It's like getting an individual classifier that scores 45% correct even though the optimal theoretical minimum is 50%. That's just part of the variability with real data sets, and unless it is significantly less than expected based on chance, it isn't something you should worry too much about. E.g., if your classifier gets 20% correct, then clearly something is amiss and you might look into the specific reasons and fix your classifier.
Yes, swapping a point for (1-fp, 1-tp) is theoretically effective, but increasing sample size is a safe bet too.
It does seem that your system has a non-monotonic response characteristic so be careful not to bend the rules of the ROC too much or you will impact the robustness of the AUC.
That said, you could try to use a Pareto Frontier Curve (Pareto Front). If that fits the requirements of "Repairing Concavities" then you'll basically sort the points so that the ROC curve becomes monotonic.