Fitting a custom equation in Matlab - matlab

I want to fit this equation to find the value of variables, Particularly 'c'
a*exp(-x/T) +c*(T*(exp(-x/T)-1)+x)
I do have the values of `
a = -45793671; T = 64.3096
due to the lack of initial parameters, the SSE and RMSE errors in cftool MATLAB are too high and it's not able to fit the data at all.
I also tried other methods (linear fitting) but the problem with high error persists.
Is there any way to fit the data nicely so that I can find the most accurate value for c?
for x:
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
`
for y:
-45793671
-87174030
-124726368
-165435857
-211887711
-255565545
-295927582
-332434440
-365137627
-383107046
-408000987
-434975682
-465932505
-492048864
-513857005
-543087921
-573111110
-588176196
-607460012
-628445691

I dont'think that the bad fitting is mainly due to a lack of initial parameters.
First trial :
If we start with the parameters stated in the wording of the question : a = -45793671; T = 64.3096 there is only one parameter c remaining to be fitted. The result is not satisfising :
Second trial :
If we keep constant only the specified value of T and optimize two parameters c and a , the RMSE is improved but the shape of the curve remains not good :
Third trial :
If we forget the specified values of the two parameters T,a and proceed with a non-linear regression wrt the three parameters T, c , a the result is better :
But a negative value for T mignt be not acceptable on physical viewpoint. This suggest that the function y(x)=a * exp(-x/T)+c*(T*(exp(-x/T)-1)+x) might be not a good model. You should check if there is no typo in the function and/or if some terms are not missing in order to better model the physical experiment.
For information only (Probably not useful) :
An even better fit is obtained with a much simpler function : y(x) = A + B * x + C * x^2

Related

Unreasonable [positive] log-likelihood values from matlab "fitgmdist" function

I want to fit a data sets with Gaussian mixture model, the data sets contains about 120k samples and each sample has about 130 dimensions. When I use matlab to do it, so I run scripts (with cluster number 1000):
gm = fitgmdist(data, 1000, 'Options', statset('Display', 'iter'), 'RegularizationValue', 0.01);
I get the following outputs:
iter log-likelihood
1 -6.66298e+07
2 -1.87763e+07
3 -5.00384e+06
4 -1.11863e+06
5 299767
6 985834
7 1.39525e+06
8 1.70956e+06
9 1.94637e+06
The log likelihood is bigger than 0! I think it's unreasonable, and don't know why.
Could somebody help me?
First of all, it is not a problem of how large your dataset is.
Here is some code that produces similar results with a quite small dataset:
options = statset('Display', 'iter');
x = ones(5,2) + (rand(5,2)-0.5)/1000;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 64.4731
2 73.4987
3 73.4987
Of course you know that the log function (the natural logarithm) has a range from -inf to +inf. I guess your problem is that you think the input to the log (i.e. the aposteriori function) should be bounded by [0,1]. Well, the aposteriori function is a pdf function, which means that its value can be very large for very dense dataset.
PDFs must be positive (which is why we can use the log on them) and must integrate to 1. But they are not bounded by [0,1].
You can verify this by reducing the density in the above code
x = ones(5,2) + (rand(5,2)-0.5)/1;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 -8.99083
2 -3.06465
3 -3.06465
So, I would rather assume that your dataset contains several duplicate (or very close) values.

Get matlab to show square roots (i.e. 2^(1/2) instead of 1.414)

I have a few simple equations that I want to pipe through matlab. But I would like to get exact answers, because these values are expected to be used and simplified later on.
Right now Matlab shows sqrt(2.0) as 1.1414 instead of something like 2^(1/2) as I would like.
I tried turning on format rat but this is dangerous becasue it shows sqrt(2) as 1393/985 without any sort of warning.
There is "symbolic math" but it seems like overkill.
All I want is that 2 + sqrt(50) would return something like 2 + 5 * (2)^(1/2) and even my 5 years old CASIO calculator can do this!
So what can I do to get 2 + sqrt(50) evaluate to 2 + 5 * (2)^(1/2) in matlab?
As per #Oleg's comment use symbolic math.
x=sym('2')+sqrt(sym('50'))
x =
5*2^(1/2) + 2
The average time on ten thousand iterations through this expression is 1.2 milliseconds, whilst the time for the numeric expression (x=2+sqrt(50)) is only 0.4 micro seconds, i.e. a factor of ten thousand faster.
I did pre-run the symbolic expression 50 times, because, as Oleg points out in his second comment the symbolic engine needs some warming up. The first run through your expression took my pc almost 2 seconds.
I would therefore recommend using numeric equations due to the huge difference in calculation time. Only use symbolic expressions when you are forced to (e.g. simplifying expressions for a paper) and then use a symbolic computation engine like Maple or Wolfram Alpha.
Matlab main engine is not symbolic but numeric.
Symbolic toolbox. Create expression in x and subs x = 50
syms x
f = 2+sqrt(x)
subs(f,50)
ans =
50^(1/2) + 2

fminbnd doesn't give the minimum value

I'm trying some built-in functions in MATLAB. I declared a function like this:
function y = myFunction(x)
y = cos(4*x) .* sin(10*x) .* exp(-abs(x));
end
Then I use fminbnd to find the minimum value:
fminbnd(#myFunction,-pi,pi)
This gives me the result:
ans =
0.7768
However, when I plot 'myFunction' in [-pi,pi], I got the following figure with this code that I used:
>> x = -pi:0.01:pi;
>> y = myFunction(x);
>> plot(x,y)
It can be seen that the min value is -0.77, which is not the result given by fminbnd. What's wrong here? I'm new to MATLAB and I don't know where I'm wrong.
First things first, fminbnd returns the x-coordinate of the minimum location of your function. As such, the actual minimum is located at myFunction(0.7768). x=0.7768 is the location of where the minimum is.
Now, I tried running your code with more verbose information. Specifically, I wanted to see how the minimum changes at each iteration. I overrode the default settings of fminbnd so we can see what's happening at each iteration.
This is what I get:
>> y = #(x) cos(4*x).*sin(10*x).*exp(-abs(x)); %// No need for function declaration
>> options = optimset('Display', 'iter');
>> [X,FVAL,EXITFLAG] = fminbnd(y, -pi, pi, options)
Func-count x f(x) Procedure
1 -0.741629 0.42484 initial
2 0.741629 -0.42484 golden
3 1.65833 -0.137356 golden
4 0.775457 -0.457857 parabolic
5 1.09264 0.112139 parabolic
6 0.896609 -0.163049 golden
7 0.780727 -0.457493 parabolic
8 0.7768 -0.457905 parabolic
9 0.776766 -0.457905 parabolic
10 0.776833 -0.457905 parabolic
Optimization terminated:
the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-04
X =
0.776799595407872
FVAL =
-0.457905463395071
EXITFLAG =
1
X is the location of the minimum, FVAL is the y value of where the minimum is and EXITFLAG=1 means that the algorithm converged properly.
This obviously is not equal to your desired minimum. If I can reference the documentation of fminbnd, it specifically says this:
fminbnd may only give local solutions.
Going with that, the reason why you aren't getting the right answer is because you have a lot of local minima in your function. Specifically, if you zoom in to x=0.7784 this itself is a local minimum:
Since the algorithm managed to find a good local minimum here, it decides to stop.
I managed to get the true minimum if you restrict the search boundaries of the function to be around where the true minimum is. Instead of [-pi,pi]... try something like [-1,1] instead:
>> [X,FVAL,EXITFLAG] = fminbnd(y, -1, 1, options)
Func-count x f(x) Procedure
1 -0.236068 -0.325949 initial
2 0.236068 0.325949 golden
3 -0.527864 -0.256217 golden
4 -0.32561 0.0218758 parabolic
5 -0.0557281 -0.487837 golden
6 0.0557281 0.487837 golden
7 -0.124612 -0.734908 golden
8 -0.134743 -0.731415 parabolic
9 -0.126213 -0.735006 parabolic
10 -0.126055 -0.735007 parabolic
11 -0.126022 -0.735007 parabolic
12 -0.126089 -0.735007 parabolic
Optimization terminated:
the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-04
X =
-0.126055418940111
FVAL =
-0.735007134768142
EXITFLAG =
1
When I did this, I managed to the get the right minimum location and the minimum itself.
While this is only a partial answer I will just point out the following text that is in the Limitations section of the documentation of fminbnd:
fminbnd may only give local solutions.
Which is what is happening in your case. Often, when there is a function with multiple minima* optimization algorithms cant find the global minimum.
Generally the best approach when there are lots of minima is to split the function in 2, compute the minimum of both parts and then compare to see which one is smaller.
*you can find if you function has multiple minima by computing the derivative and checking the amount of zero crosses of the derivative and dividing by two

Efficient size choice for SciPy Discrete Sine Transform

I noticed that SciPy has an implementation of the Discrete Sine Transform, and I was comparing it to the one that's in MATLAB. The MATLAB documentation notes that for best performance, the size of the inputs should be 2^p -1, presumably for a divide and conquer strategy. Is this also true for the SciPy implementation?
Although this question is old, I happen to have just ran some tests and then stumbled upon this question.
The answer is yes. Internally, scipy seems to converts the array to size M = 2*(N+1).
Ideally, M = 2^i, for some integer i. Therefore, N should follow N = 2^i - 1. The following picture shows how timings scale with fft-size. Note that the orange line is much smoother, indicating no unexpected memory overhead.
Green line: N = 2^i
Blue line: N = 2^i + 1
Orange line: N = 2^i - 1
UPDATE
After digging some more into the documentation of scipy.fftpack, I found that the above answer is only partly true. According to the documentation, "SciPy’s FFTPACK has efficient functions for radix {2, 3, 4, 5}". This means that instead of efficiently doing arrays of size M = 2^i, it can handle any M = 2^i * 3^j * 5^k (4 is not a prime). The optimum for scipy.fftpack.dst (or dct) is then M - 1. Finding those numbers can be a little awkward, but luckily there's a function for that, too!
Please note that the above graph is log-log scale, so speedups of 40 or so are not uncommon. Thus, choosing a fast size can make you calculations orders of magnitudes faster! (I found this out the hard way).

Solving Algebraic Equations Programmatically [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I have six parametric equations using 18 (not actually 26) different variables, 6 of which are unknown.
I could sit down with a couple of pads of paper and work out what the equations for each of the unknowns are, but is there a simple programatic solution (I'm thinking in Matlab) that will spit out the six equations I'm looking for?
EDIT:
Shame this has been closed, but I guess I can see why. In case anyone is still interested, the equations are (I believe) non-linear:
r11^2 = (l_x1*s_x + m_x)^2 + (l_y1*s_y + m_y)^2
r12^2 = (l_x2*s_x + m_x)^2 + (l_y2*s_y + m_y)^2
r13^2 = (l_x3*s_x + m_x)^2 + (l_y3*s_y + m_y)^2
r21^2 = (l_x1*s_x + m_x - t_x)^2 + (l_y1*s_y + m_y - t_y)^2
r22^2 = (l_x2*s_x + m_x - t_x)^2 + (l_y2*s_y + m_y - t_y)^2
r23^2 = (l_x3*s_x + m_x - t_x)^2 + (l_y3*s_y + m_y - t_y)^2
(Squared the rs, good spot #gnovice!)
Where I need to find t_x t_y m_x m_y s_x and s_y
Why am I calculating these? There are two points p1 (at 0,0) and p2 at(t_x,t_y), for each of three coordinates (l_x,l_y{1,2,3}) I know the distances (r1 & r2) to that point from p1 and p2, but in a different coordinate system. The variables s_x and s_y define how much I'd need to scale the one set of coordinates to get to the other, and m_x, m_y how much I'd need to translate (with t_x and t_y being a way to account for rotation differences in the two systems)
Oh! And I forgot to mention, I also know that the point (l_x,l_y) is below the highest of p1 and p2, ie l_y < max(0,t_y) as well as l_y > 0 and l_y < t_y.
It does seem specific enough that I might have to just get my pad out and work it through mathematically!
If you have the Symbolic Toolbox, you can use the SOLVE function. For example:
>> solve('x^2 + y^2 = z^2','z') %# Solve for the symbolic variable z
ans =
(x^2 + y^2)^(1/2)
-(x^2 + y^2)^(1/2)
You can also solve a system of N equations for N variables. Here's an example with 2 equations, 2 unknowns to solve for (x and y), and 6 parameters (a through f):
>> S = solve('a*x + b*y = c','d*x - e*y = f','x','y')
>> S.x
ans =
(b*f + c*e)/(a*e + b*d)
>> S.y
ans =
-(a*f - c*d)/(a*e + b*d)
Are they linear? If so, then you can use principles of linear algebra to set up a 6x6 matrix that represents the system of equations, and solve for it using any standard matrix inversion routine...
if they are not linear, they you need to use numerical analysis methods.
As I recall from many years ago, I believe you then create a system of linear approximations to the non-linear equations, and solve that linear system, over and over again iteratively, feeding the answers back into the inputs each time, until some error metric gets sufficiently small to indicate you have reached the solution. It's obviously done with a computer, and I'm sure there are numerical analysis software packages that will do this for you, although I imagine that as any arbitrary system of non-linear equations can include almost infinite degree of different types and levels of complexity, that these software packages can't create the linear approximations for you, (except maybe in the most straightforward standard cases) and you will have ot do that part of the thing manually.
Yes there is (assuming these are linear equations) - you do this by creating a matrix equiation which is equivalent to your 6 linear equations, for example if you had the two equatrions:
6x + 12y = 9
7x - 8y = 14
This could be equivalently represented as:
|6 12| |a| |9 |
|7 -8| |b| = |14|
(Where the 2 matrices are multipled together). Matlab can then solve this for the solution matrix (a, b).
I don't have matlab installed, so I'm afraid I'm going to have to leave the details up to you :-)
As mentioned above, the answer will depend on whether your equations are linear or nonlinear. For linear systems, you can set up a simple matrix system (but don't use matrix inversion, use LU decomposition (if your system is well-conditioned) ).
For non-linear systems, you'll need to use a more advanced solver, most likely some variation on Newton's method. Essentially you'll give Matlab your six equations, and ask it to simultaneously solve for the root (zero) of all of the equations. There are several caveats and complications that come into play when dealing with non-linear systems, one of which is the need for an initial guess that assigns each of your six unknown variables a value close to the true solution. Without a good initial guess, the solver may take a long time finding a solution, or may not converge to a solution at all, even if one exists.
Decades ago, MIT developed MACSYMA, a symbolic algebra system for just this kind of thing. MIT sold MACSYMA to Symbolics, which has pretty well folded, dried up, and blown away. However, because of the miracle of military funding, an early version of MACSYMA was required to be released to the government. THAT version was subsequently released under the GPL, and is continuing to be maintained, under the name MAXIMA.
See http://maxima.sourceforge.net/ for more information.