Explanation of two integral equations and implementation - matlab

I have a problem with these two equations showing in the pictures.
I have two vectors represented the C(m) and S(m) in the two equations. I am trying to implement these equations in Matlab. Instead of doing a continuous integral operation, I think I should do the summation. For example, the first equation
A1 = sqrt(sum(C.^2));
Am I right? Also, I am not sure how to implement equation two that contains a ||dM||. Please help.
What are the mathematical meaning of these two equations? I think the first one may related to the 'sum of squares', if C(m) is a vector then this equation will measure the total variance of the random variable in vector C or some kind of average of vector C? What about the second one?
Thanks very much for your help!
A.

In MATLAB there are typically two different ways to do an integration.
For people who have access to the symbolic toolbox, algebraic integration is an option. If this is the case for you, I would look into help int and see which inputs you need.
For the rest, numerical integration is available, this basically means that you just calculate a function on a lot of points and then take the mean of the function values in these points.
For the mathematical meaning some more context would be helpful, and you may want to ask this question at math.stackexchange.com or on the site of whatever field you are in. (stats, physics?)

Related

Checking if a Rational Function Simplifies to a Polynomial in Matlab

Is there a way to check if a rational function is a polynomial in Matlab?
I have a big rational function, call it R, that I am trying to show is a polynomial. I've tried the simplify and simplifyFraction functions and the following (not very effective) procedure:
Split it into denominator and numerator:
[num,den] = numden(R);
Calculate the roots of both polynomials:
r_num = roots(sym2poly(num));
r_den = roots(sym2poly(den));
Check if all the elements of r_den belong to r_num:
Because of numerical imprecision I haven't been able to come up with a reliable way of doing this.
This is a not-so-easy problem and finding greatest common divisor of polynomials is a very active area of research. There are tons of publications and you can find them online.
The main problem is that root finding is an ill-conditioned problem. And recently a few experts are trying to combine the numerical computations with symbolic representations. If you google for ERES method you will have an entry point together with thesis of Christou.
This problem is particularly important for signals and control people because of the transfer function representations and pole zero cancellations. Matlab goes out a long way to make sure that all is OK and a minimal neighborhood of each pole zero is accepted as a cancellation.
So as a quick remedy, convert your polynomial coefficients to 1D vectors, say a and b, and use minreal(tf(a,b)). Then you can extract num and den of that transfer representation.
Shameless plug: I am the author of a python3 library and I also implemented a system theoretical approach. Here and here is the full implementation details with citations about LCM and GCD operations.

Naive bayes classifier calculation

I'm trying to use naive Bayes classifier to classify my dataset.My questions are:
1- Usually when we try to calculate the likehood we use the formula:
P(c|x)= P(c|x1) * P(c|x2)*...P(c|xn)*P(c) . But in some examples it says in order to avoid getting very small results we use P(c|x)= exp(log(c|x1) + log(c|x2)+...log(c|xn) + logP(c)). can anyone explain more to me the difference between these two formula and are they both used to calculate the "likehood" or the sec one is used to calculate something called "information gain".
2- In some cases when we try to classify our datasets some joints are null. Some ppl use "LAPLACE smoothing" technique in order to avoid null joints. Doesnt this technique influence on the accurancy of our classification?.
Thanks in advance for all your time. I'm just new to this algorithm and trying to learn more about it. So is there any recommended papers i should read? Thanks alot.
I'll take a stab at your first question, assuming you lost most of the P's in your second equation. I think the equation you are ultimately driving towards is:
log P(c|x) = log P(c|x1) + log P(c|x2) + ... + log P(c)
If so, the examples are pointing out that in many statistical calculations, it's often easier to work with the logarithm of a distribution function, as opposed to the distribution function itself.
Practically speaking, it's related to the fact that many statistical distributions involve an exponential function. For example, you can find where the maximum of a Gaussian distribution K*exp^(-s_0*(x-x_0)^2) occurs by solving the mathematically less complex problem (if we're going through the whole formal process of taking derivatives and finding equation roots) of finding where the maximum of its logarithm K-s_0*(x-x_0)^2 occurs.
This leads to many places where "take the logarithm of both sides" is a standard step in an optimization calculation.
Also, computationally, when you are optimizing likelihood functions that may involve many multiplicative terms, adding logarithms of small floating-point numbers is less likely to cause numerical problems than multiplying small floating point numbers together is.

Multi-parametric regression in MATLAB?

I have a curve which looks roughly / qualitative like the curves displayed in those 3 images.
The only thing I know is that the first part of the curve is hardware-specific supposed to be a linear curve and the second part is some sort of logarithmic part (might be a combination of two logarithmic curves), i.e. linlog camera. But I couldn't tell the mathematic structure of the equation, e.g. wether it looks like a*log(b)+c or a*(log(c+b))^2 etc. Is there a way to best fit/find out a good regression for this type of curve and is there a certain way to do this specifically in MATLAB? :-) I've got the student version, i.e. all toolboxes etc.
fminsearch is a very general way to find best-fit parameters once you have decided on a parametric equation. And the optimization toolbox has a range of more-sophisticated ways.
Comparing the merits of one parametric equation against another, however, is a deep topic. The main thing to be aware of is that you can always tweak the equation, adding another term or parameter or whatever, and get a better fit in terms of lower sum-squared-error or whatever other goodness-of-fit metric you decide is appropriate. That doesn't mean it's a good thing to keep adding parameters: your solution might be becoming overly complex. In the end the most reliable way to compare how well two different parametric models are doing is to cross-validate: optimize the parameters on a subset of the data, and evaluate only on data that the optimization procedure has not yet seen.
You can try the "function finder" on my curve fitting web site zunzun.com and see what it comes up with - it is free. If you have any trouble please email me directly and I'll do my best to help.
James Phillips
zunzun#zunzun.com

calculate distance between curves

I need to calculate some kind of distance between to curves.
Those are general curves, and may not be functions - that is, some values of x may be mapped to more then one value.
EDIT
The curves are given as a list of X,Y pairs and the logical curve is the line passing through all the points in the order given. a typical data set will include about 1000 points
as noted, the curve may not be a function, but is usually similar to a function
This issue us what prevents using interp1 or the curve fitting toolbox (in Matlab)
The distance measure I was thinking of the the area of the region between the curves - but any reasonable alternative is ok.
EDIT
a sample illustration of to curves, and the area I want to compute
A Matlab solution is preferred, but other languages are also fine.
If you have functions that are of the type y = f(x) and they are defined over the same domain, then a common way to find the "distance" is to use the L2 norm as explained here http://en.wikipedia.org/wiki/L2_norm#p-norm. This is simply the integral of the absolute value of the difference between the functions squared. If you have parametric curves then you cannot directly employ this approach. If the L2 norm is not good enough for your requirements then you will need to provide a more concrete definition of what you mean by "distance". If you are unclear as to what you need try taking a look at different types of mathematical norm and see if any of the commonly used ones are what you need (ie L1 norm, uniform norm). The wikepedia link above is a good start point. If the L2 is good enough then you need a way to calculate the integral that you have - there are many many numerical integration techniques out there and I suggest google is your friend here (or a good numerical analysis text book).
If you do have parametric type curves then this is very nontrivial. Using the "area" between curves is not a good idea as there is no clear way to define this area and would become even more complicated in the general case where you could have self-intersecting curves. If your curves are parametrized in the same way you could try some very crude measurement where you evaluate points on each curve at equally spaced values over the parameter range, then calculate the length of the distance between each, sum and take the average as a notion of "closeness". i.e. partition your parameter range into a set {u_0, ... , u_n} and evaluate curve1(u_i) and curve2(u_i) for each i to generate a set of n paired points. Then sum the euclidean distances between each pair of points.
This is very very crude though and if the parametrizations are different then it wont be much use.
You need to define what you mean by distance between the curves. If it is the closest approach between two general curves, then it becomes quite difficult to solve the problem.
If the "curves" are not even representable as single valued functions of x, then it becomes more complex yet.
Merely telling us that you need to define "some kind of distance" is too broad of a statement to be on-topic here, and it says that you have not yet thought out the problem you wish to solve.
If all you are willing to tell us is that the curves are two totally general parametric curves, which may be closed or not, or they may not even lie over the same domain, then the question becomes so totally ill-posed as to be impossible to answer. What is the area between two curves in that case?
If the curves are defined over the SAME support, then subtracting them and integration of the absolute value or square of the difference will be adequate. But you have already told us that these "curves" may be multi-valued. In that case, it is essentially impossible to do what you are asking.

Integration with matlab

i want to solve this problem:
alt text http://img265.imageshack.us/img265/6598/greenshot20100727091025.png
i don't want to use "int", i want to use "quad" family (quad,dblquad,triplequad)
but i can't.
can you help me?
I assume that your real problem is more complex than this trivial one. The best solution is just to use a symbolic integral. Why is numerical integration difficult?
Numerical integration in ONE dimension typically requires on the order of say 100 function evaluations. (The exact number will be very dependent on the accuracy required, the limits, etc.) This makes a 2-d integral typically require on the order of 100^2 = 10000 function evals. So an adaptive, 5-d integral will require on the order of 100^5 = 1e10 function evaluations. (This is only a very rough order of magnitude estimate here.) My point is, you simply don't want to do that!
Better is to reduce the problem in complexity. If your integral is separable (as is this one) then do so! Reduce a 5-d problem into multiple 1-d problems.
Also, in many cases I see people wanting to do a numerical integration of a Gaussian PDF. See that this is easily solved using a call to erf or erfc, coupled with a transformation. The point is that in many cases special functions are defined to greatly reduce the complexity of a problem.
I should add that in many cases, the key to solving a difficult problem in mathematics is to use mathematics to reduce the problem to something simpler. If you can find a way to reduce the dimensionality of your problem just a bit, it will become much more tractable.
The integral you show is
Analytically solvable: always do analytically what you can
?equal to a number: constant expressions should be eliminated from numerical calculations
not easy to get calculated in MATLAB (or very correct).
You can use cumtrapz to integrate over each variable alone, and call trapz the final integration. Remember that this will blow up the error on any problem that is more complicated than the simple sum of linear functions.
Mathematica is more suited to nD integrations, if you have access to that.
matlab can do symbolic integration
>> x = sym('x'); y = sym('y'); z = sym('z'); u = sym('u'); v = sym('v');
>> int(int(int(int(int(x+y+z+u+v,1,5),-2,3),0,1),-1,1),0,1)
ans =
180
Just noticed you want to do numeric, not symbolic integration
If you look at the source of dblquad and triplequad
>> edit dblquad
you see that they just call the lower versions.
it should be possible for you to add a quadquad and a quintquad (or recursively an n-quad)