Finding the intersection of Gaussian distributions - matlab

How can I find all the points where Gaussian mixture distributions intersect in MATLAB?

The generic category of your question is finding the intersection of two curves, which is a manageable but non-trivial task (the hardest part is making sure you catch all the intersections).
But your problem is very specific: you're looking for the intersection of two Gaussians. This is very good: we both have an analytic formula for your function, and it's guaranteed that there are exactly two intersections unless some of your parameters are the same.*
Let's assume that your distributions are characterized by means mu1, mu2 and scale sigma1, sigma2. Then your Gaussians at position x are defined by the function
1/sqrt(2*pi*sigma^2) * exp(-(x-mu)^2/2/sigma^2)
Turns out that we can fully solve this equation for x on paper:
1/sqrt(2*pi*sigma1^2)*exp(-(x-mu1)^2/2/sigma1^2) == 1/sqrt(2*pi*sigma2^2)*exp(-(x-mu2)^2/2/sigma2^2)
sigma2/sigma1 == exp((x-mu1)^2/2/sigma1^2) * exp(-(x-mu2)^2/2/sigma2^2)
log(sigma2/sigma1) == (x-mu1)^2/2/sigma1^2) - (x-mu2)^2/2/sigma2^2
which results in the parabolic equation ax^2 + bx + c == 0 where
a = 1/(2*sigma1^2) - 1/(2*sigma2^2);
b = mu2/(sigma2^2) - mu1/(sigma1^2);
c = mu1^2/(2*sigma1^2) - mu2^2/(2*sigma2^2) - log(sigma2/sigma1);
It can be easily proved that the D = b^2 - 4 a c discriminant is non-negative, so indeed the equation has two real roots when the parameters aren't degenerate. So the two intersection points are, with the above definitions,
D = b^2 - 4 * a * c;
x1 = (-b + sqrt(D))/(2*a);
x2 = (-b - sqrt(D))/(2*a);
Using two gaussians with the pseudo-random parameters:
% define parameters and Gaussian
mu1=1; sigma1=3; mu2=2; sigma2=4;
% intersections
a = 1/(2*sigma1^2) - 1/(2*sigma2^2);
b = mu2/(sigma2^2) - mu1/(sigma1^2);
c = mu1^2/(2*sigma1^2) - mu2^2/(2*sigma2^2) - log(sigma2/sigma1)
D = b^2 - 4 * a * c;
x1 = (-b + sqrt(D))/(2*a);
x2 = (-b - sqrt(D))/(2*a);
Just to prove that the above intersection points are correct:
>> f = #(x,mu,sigma) 1/sqrt(2*pi*sigma^2) * exp(-(x-mu).^2/2/sigma^2);
>> f(x1,mu1,sigma1) - f(x1,mu2,sigma2)
ans =
2.7756e-17
>> f(x2,mu1,sigma1) - f(x2,mu2,sigma2)
ans =
1.0408e-17
The above means that the values of the two Gaussians at points x1 and x2 are equal to one another within machine precision, which is as good as any numerical answer will get.
*I originally claimed that we always have two intersections as long as the Gaussians aren't exactly the same. Clearly if both the means and the variances are the same we have two degenerate curves and intersections become meaningless. But as Cris Luengo pointed out in a comment, it might happen that there's only one intersection: when the variances are the same and the means are different (i.e. we have two curves of the exact same shape shifted along x). In this case a=0, consequently the corresponding equation is b*x + c == 0, giving us x0 = -c/b for the intersection. So a more exact (but a bit pseudo-codey) answer (given a, b and c) is
if a == 0 % or allow some tolerance... <=> sigma1 == sigma2
if b == 0 % or allow some tolerance... <=> mu1 == mu2
% degenerate curves: a == b == c == 0, f1(x)==f2(x) for all x
disp('curves are degenerate...')
else
% single intersection: mu1 ~= mu2
x1 = -c/b;
end
else
% two intersections; both parameters are different
D = b^2 - 4 * a * c;
x1 = (-b + sqrt(D))/(2*a);
x2 = (-b - sqrt(D))/(2*a);

Related

Laguerre's method to obtain poly roots (Matlab)

I must write using Laguerre's method a piece of code to find the real and complex roots of poly:
P=X^5-5*X^4-6*X^3+6*X^2-3*X+1
I have little doubt. I did the algorithm in the matlab, but 3 out of 5 roots are the same and I don't think that is correct.
syms X %Declearing x as a variabl
P=X^5-5*X^4-6*X^3+6*X^2-3*X+1; %Equation we interest to solve
n=5; % The eq. order
Pd1 = diff(P,X,1); % first differitial of f
Pd2 = diff(P,X,2); %second differitial of f
err=0.00001; %Answear tollerance
N=100; %Max. # of Iterations
x(1)=1e-3; % Initial Value
for k=1:N
G=double(vpa(subs(Pd1,X,x(k))/subs(P,X,x(k))));
H=G^2 - double(subs(Pd2,X,x(k))) /subs(P,X,x(k));
D1= (G+sqrt((n-1)*(n*H-G^2)));
D2= (G-sqrt((n-1)*(n*H-G^2)));
D = max(D1,D2);
a=n/D;
x(k+1)=x(k)-a
Err(k) = abs(x(k+1)-x(k));
if Err(k) <=err
break
end
end
output (roots of polynomial):
x =
0.0010 + 0.0000i 0.1434 + 0.4661i 0.1474 + 0.4345i 0.1474 + 0.4345i 0.1474 + 0.4345i
What you actually see are all the values x(k) which arose in the loop. The last one, 0.1474 + 0.4345i is the end result of this loop - the approximation of the root which is in your given tolerance threshold. The code
syms X %Declaring x as a variable
P = X^5 - 5 * X^4 - 6 * X^3 + 6 * X^2 - 3 * X + 1; %Polynomial
n=5; %Degree of the polynomial
Pd1 = diff(P,X,1); %First derivative of P
Pd2 = diff(P,X,2); %Second derivative of P
err = 0.00001; %Answer tolerance
N = 100; %Maximal number of iterations
x(1) = 0; %Initial value
for k = 1:N
G = double(vpa(subs(Pd1,X,x(k)) / subs(P,X,x(k))));
H = G^2 - double(subs(Pd2,X,x(k))) / subs(P,X,x(k));
D1 = (G + sqrt((n-1) * (n * H-G^2)));
D2 = (G - sqrt((n-1) * (n * H-G^2)));
D = max(D1,D2);
a = n/D;
x(k+1) = x(k) - a;
Err(k) = abs(x(k+1)-x(k));
if Err(k) <=err
fprintf('Initial value %f, result %f%+fi', x(1), real(x(k)), imag(x(k)))
break
end
end
results in
Initial value -2.000000, result -1.649100+0.000000i
If you want to get other roots, you have to use other initial values. For example one can obtain
Initial value 10.000000, result 5.862900+0.000000i
Initial value -2.000000, result -1.649100+0.000000i
Initial value 3.000000, result 0.491300+0.000000i
Initial value 0.000000, result 0.147400+0.434500i
Initial value 1.000000, result 0.147400-0.434500i
These are all zeros of the polynomial.
A method for calculating the next root when you have found another one would be that you divide through the corresponding linear factor and use your loop for the resulting new polynomial. Note that this is in general not very easy to handle since rounding errors can have a big influence on the result.
Problems with the existing code
You do not implement the Laguerre method properly as a method in complex numbers. The denominator candidates D1,D2 are in general complex numbers, it is inadvisable to use the simple max which only has sensible results for real inputs. The aim is to have a=n/D be the smaller of both variants, so that one has to look for the D in [D1,D2] with the larger absolute value. If there were a conditional assignment as in C, this would look like
D = (abs(D_1)>abs(D2)) ? D1 : D2;
As that does not exist, one has to use commands with a similar result
D = D1; if (abs(D_1)<abs(D2)) D=D2; end
The resulting sequence of approximation points is
x(0) = 0.0010000
x(1) = 0.143349512707684+0.466072958423667i
x(2) = 0.164462212064089+0.461399841949893i
x(3) = 0.164466373475316+0.461405404094130i
There is a point where one can not expect the (residual) polynomial value at the root approximation to substantially decrease. The value close to zero is obtained by adding and subtracting rather large terms in the sum expression of the polynomial. The accuracy lost in these catastrophic cancellation events can not be recovered.
The threshold for polynomial values that are effectively zero can be estimated as the machine constant of the double type times the polynomial value where all coefficients and the evaluation point are replaced by their absolute values. This test serves in the code primarily to avoid divisions by zero or near-zero.
Finding all roots
One approach is to apply the method to a sufficiently large number of initial points along some circle containing all the roots, with some strict rules for early termination at too slow convergence. One would have to make the list of the roots found unique, but keep the multiplicity,...
The other standard method is to apply deflation, that is, divide out the linear factor of the root found. This works well in low degrees.
There is no need for the slower symbolic operations as there are functions that work directly on the coefficient array, such as polyval and polyder. Deflation by division with remainder can be achieved using the deconv function.
For real polynomials, we know that the complex conjugate of a root is also a root. Thus initialize the next iteration with the deflated polynomial with it.
Other points:
There is no point in the double conversions as at no point there is a conversion into the single type.
If you don't do anything with it, it makes no sense to create an array, especially not for Err.
Roots of the example
Implementing all this I get a log of
x(0) = 0.001000000000000+0.000000000000000i, |Pn(x(0))| = 0.99701
x(1) = 0.143349512707684+0.466072958423667i, |dx|= 0.48733
x(2) = 0.164462212064089+0.461399841949893i, |dx|=0.021624
x(3) = 0.164466373475316+0.461405404094130i, |dx|=6.9466e-06
root found x=0.164466373475316+0.461405404094130i with value P0(x)=-2.22045e-16+9.4369e-16i
Deflation
x(0) = 0.164466373475316-0.461405404094130i, |Pn(x(0))| = 2.1211e-15
root found x=0.164466373475316-0.461405404094130i with value P0(x)=-2.22045e-16-9.4369e-16i
Deflation
x(0) = 0.164466373475316+0.461405404094130i, |Pn(x(0))| = 4.7452
x(1) = 0.586360702193454+0.016571894375927i, |dx|= 0.61308
x(2) = 0.562204173408499+0.000003168181059i, |dx|=0.029293
x(3) = 0.562204925474889+0.000000000000000i, |dx|=3.2562e-06
root found x=0.562204925474889+0.000000000000000i with value P0(x)=2.22045e-16-1.33554e-17i
Deflation
x(0) = 0.562204925474889-0.000000000000000i, |Pn(x(0))| = 7.7204
x(1) = 3.332994579372812-0.000000000000000i, |dx|= 2.7708
root found x=3.332994579372812-0.000000000000000i with value P0(x)=6.39488e-14-3.52284e-15i
Deflation
x(0) = 3.332994579372812+0.000000000000000i, |Pn(x(0))| = 5.5571
x(1) = -2.224132251798332+0.000000000000000i, |dx|= 5.5571
root found x=-2.224132251798332+0.000000000000000i with value P0(x)=-3.33067e-14+1.6178e-15i
for the modified code
P = [1, -2, -6, 6, -3, 1];
P0 = P;
deg=length(P)-1; % The eq. degree
err=1e-05; %Answer tolerance
N=10; %Max. # of Iterations
x=1e-3; % Initial Value
for n=deg:-1:1
dP = polyder(P); % first derivative of P
d2P = polyder(dP); %second derivative of P
fprintf("x(0) = %.15f%+.15fi, |Pn(x(0))| = %8.5g\n", real(x),imag(x), abs(polyval(P,x)));
for k=1:N
Px = polyval(P,x);
dPx = polyval(dP,x);
d2Px = polyval(d2P,x);
if abs(Px) < 1e-14*polyval(abs(P),abs(x))
break % if value is zero in relative accuracy
end
G = dPx/Px;
H=G^2 - d2Px / Px;
D1= (G+sqrt((n-1)*(n*H-G^2)));
D2= (G-sqrt((n-1)*(n*H-G^2)));
D = D1;
if abs(D2)>abs(D1) D=D2; end % select the larger denominator
a=n/D;
x=x-a;
fprintf("x(%d) = %.15f%+.15fi, |dx|=%8.5g\n",k,real(x),imag(x), abs(a));
if abs(a) < err*(err+abs(x))
break
end
end
y = polyval(P0,x); % check polynomial value of the original polynomial
fprintf("root found x=%.15f%+.15fi with value P0(x)=%.6g%+.6gi\n", real(x),imag(x),real(y),imag(y));
disp("Deflation");
[ P,R ] = deconv(P,[1,-x]); % division with remainder
x = conj(x); % shortcut for conjugate pairs and clustered roots
end

Left matrix divide with vectors

How explicitly does Matlab solve the rightmost equation in c1, specifically ((x-1)\y)?
I am well aware what happens when you use a matrix, e.g. A\b (where A is a matrix and b is a column vector), but what happens when you use backslash on two vectors with equal rows?
The problem:
x = (1:3)';
y = ones(3,1);
c1 = ((x-1)\y) % why does this =0.6?
You're doing [0;1;2]\[1;1;1]. Essentially x=0.6 is the least-squares solution to
[0;1;2]*x=[1;1;1]
The case you have from the documentation is the following:
If A is a rectangular m-by-n matrix with m ~= n, and B is a matrix with m rows, then A\B returns a least-squares solution to the system of equations A*x= B.
(Specifically, you have m=3, n=1).
A = (1:3).' - 1; % = [0;1;2]
B = ones(3,1); % = [1;1;1]
x = A\B; % = 0.6
Algebraically, it's easy to see this is the solution to the least-squares minimisation
% Calculate least squares
leastSquares = sum( ((A*x) - B).^2 )
= sum( ([0;1;2]*x - [1;1;1]).^2 )
= sum( [-1; x-1; 2x-1].^2 )
= 1 + (x-1)^2 + (2x-1)^2
= 1 + x^2 - 2*x + 1 + 4*x^2 - 4*x + 1
= 5*x^2 - 6*x + 3
% Minimum least squares -> derivative = 0
d(leastSquares)/dx = 10*x - 6 = 0
10*x = 6
x = 0.6
I have no doubt that MATLAB uses a more sophisticated algorithm to come to the same conclusion, but this lays out the mathematics in a fairly plain way.
You can see experimentally that there is no better solution by testing the following for various values of x... 0.6 gives the smallest error:
sum( ([0;1;2]*x - [1;1;1]).^2 )

Matlab : Is the square operation in real domain the Conjugate operation in complex domain?

I am trying to evaluate the expression z = (x-y)^2 in real domain and its corresponding adaptation in complex domain. For real domain, this expression is implemented as
let
x = 5;
y = 2;
z = (x-y)^2
z =
9
In complex domain, the expression would become (please correct me if wrong )
z_c = (x_c - y_c)(x_c - y_c)* This is implemented in Matlab by
>> x_c = 5 + 0.9i;
y_c = 2 - 0.34i;
z_c = (x_c-y_c)*conj((x_c -y_c))
z_c =
10.5376
The * operator for conjugate in maths is implemented by conj()
The answers are different and am I using the correct operator?
You have many ways to deal with that in MATLAB:
x = 5 + 2i;
y = 2 - 4i;
% Method A
(x - y) * conj(x - y);
% Method B
(x - y)' * (x - y);
% Method C
norm(x - y, 2) ^ 2;
The first method is using the Conjugate Operator.
This method is written assuming both x and y are scalar.
Method B is using the definition of Inner Product (The ' is the Vector Adjoint Operator - Transpose and Conjugate).
It will work for vectors as well.
Method C is using the built in norm() function of MATLAB.
Enjoy.

MatLab: Matrix with one peak and rest decreasing

I'm trying to create a matrix such that if I define a random number between 0 and 1 and a random location in the matrix, I want all the values around that to "diffuse" out. Here's sort of an example:
0.214 0.432 0.531 0.631 0.593 0.642
0.389 0.467 0.587 0.723 0.654 0.689
0.421 0.523 0.743 0.812 0.765 0.754
0.543 0.612 0.732 0.843 0.889 0.743
0.322 0.543 0.661 0.732 0.643 0.694
0.221 0.321 0.492 0.643 0.521 0.598
if you notice, there's a peak at (4,5) = 0.889 and all the other numbers decrease as they move away from that peak.
I can't figure out a nice way to generate a code that does this. Any thoughts? I need to be able to generate this type of matrix with random peaks and a random rate of decrease...
Without knowing what other constraints you want to implement:
Come up with a function z = f(x,y) whose peak value is at (x0,y0) == (0,0) and whose values range between [0,1]. As an example, the PDF for the Normal distribution with mu = 0 and sigma = 1/sqrt(2*pi) has a peak at x == 0 of 1.0, and whose lower bound is zero. Similarly, a bivariate normal PDF with mu = {0,0} and determinate(sigma) == [1/(2*pi)]^2 will have similar characteristics.
Any mathematical function may have its domain shifted: f(x-x0, y-y0)
Your code will look something like this:
someFunction = #(x,y) theFunctionYouPicked(x,y);
[x0,y0,peak] = %{ you supply these values %};
myFunction = #(x,y) peak * someFunction(x - x0, y - y0);
[dimX,dimY] = %{ you supply these values %};
mymatrix = bsxfun( myFunction, 0:dimX, (0:dimY)' );
You can read more about bsxfun here; however, here's an example of how it works:
bsxfun( blah, [a b c], [d e f]' )
That should give the following matrix (or its transpose ... I don't have matlab in front of me):
[blah(a,d) blah(a,e) blah(a,f);
blah(b,d) blah(b,e) blah(b,f);
blah(c,d) blah(c,e) blah(c,f)]
Get a toy example working, then you can tinker with it to be more flexible. If the function dictating how it decreases is random (with the constraint that points closer to (x0,y0) are larger than more distant points), it won't be an issue to make a procedural function instead of using strictly mathematical ones.
In response to your answer:
Your equation could be thought of as a model for gravity where an object instantaneously induces a force on another mass, then stops exerting force. Following that logic, it could be modified to a naive vector formulation like this:
% v1 & v2 are vectors that point from the two peak points to the point [ii,jj]
theMatrix(ii,jj) = norm( (r1 / norm( v1 )) * v1 / norm( v1 ) ...
+ (r2 / norm( v2 )) * v2 / norm( v2 ) ...
);
The most extreme type of corner case you'll run into is one where v1 & v2 point in the same direction as in the following row:
[ . . A X1 X2 . . ]
... where you want a value for A w/respect to X1 & X2. Using the above expression it'll boil down to A = X1 / norm(v1) + X2 / norm(v2), which will definitely exceed the peak value at X1 because norm(v1) == 1. You could certainly do some dirty stuff to Band-Aid it, but personally I'd start looking for a different function.
Along those lines, if you used Newton's Law of Universal Gravitation with a few modifications:
You wouldn't need an analogue for G, so you could just assume G == 1
Treat each of the points in the matrix as having mass m2 == 1, so the equation reduces to: F_12 == -1 * (m1 / r^2) * RHAT_12
Sum the "force" vectors and calculate the norm to get each value
... you'll still run into the same problem. The corner case I laid out above would boil down to A = X1/norm(v1)^2 + X2/norm(v2)^2 == X1 + X2/4. Since it's inversely proportional to the square of the distances, it'd be easier to Band-Aid than the linear one, but I wouldn't recommend it.
Similarly, if you use polynomials it won't scale well; you can design one that won't ever exceed your chosen peaks, but there wouldn't be a lower bound.
You could use the logistic function to help with this:
1 / (1 + E^(-c*x))
Here's an example of using the logistic function on a degree 4 polynomial with peaks at points 2 & 4; you'll note I gave the polynomial a scaling factor to pull the polynomial down to relatively small values so calculated values aren't so close together.
I ended up creating a code that wraps the way I want based on a dimension, which I provide. Here's the code:
dims = 100;
A = zeros(dims);
b = floor(1+dims*rand(1));
c = floor(1+dims*rand(1));
d = rand(1);
x1 = c;
y1 = b;
A(x1,y1) = d;
for i = 1:dims
for j = i
k = 1-j;
while k <= j
if x1-j>0 && y1+k>0 && y1+k <= dims
if A(x1-j,y1+k) == 0
A(x1-j,y1+k) = eqn(d,x1-j,y1+k,x1,y1);
end
end
k = k+1;
end
end
for k = i
j = 1-k;
while j<=k
if x1+j>0 && y1+k>0 && y1+k <= dims && x1+j <= dims
if A(x1+j,y1+k)==0
A(x1+j, y1+k) = eqn(d,x1+j,y1+k,x1,y1);
end
end
j = j+1;
end
end
for j = i
k = 1-j;
while k<=j
if x1+j>0 && y1-k>0 && x1+j <= dims && y1-k<= dims
if A(x1+j,y1-k) == 0
A(x1+j,y1-k) = eqn(d,x1+j,y1-k,x1,y1);
end
end
k=k+1;
end
end
for k = i
j = 1-k;
while j<=k
if x1-j>0 && y1-k>0 && x1-j <= dims && y1-k<= dims
if A(x1-j,y1-k)==0
A(x1-j,y1-k) = eqn(d,x1-j,y1-k,x1,y1);
end
end
j = j+1;
end
end
end
colormap('hot');
imagesc(A);
colorbar;
If you notice, the code calls a function (I called it eqn), which provided the information for how to changes the values in each cell. The function that I settled on is d/distance (distance being computed using the standard distance formula).
It seems to work pretty well. I'm now just trying to develop a good way to have multiple peaks in the same square without one peak completely overwriting the other.

constant term in Matlab principal component regression (pcr) analysis

I am trying to learn principal component regression (pcr) with Matlab. I use this guide here: http://www.mathworks.fr/help/stats/examples/partial-least-squares-regression-and-principal-components-regression.html
it's really good, but I just cannot understand one step:
we do the PCA and the regression, nice and clear:
[PCALoadings,PCAScores,PCAVar] = princomp(X);
betaPCR = regress(y-mean(y), PCAScores(:,1:2));
And then we adjust the first coefficient:
betaPCR = PCALoadings(:,1:2)*betaPCR;
betaPCR = [mean(y) - mean(X)*betaPCR; betaPCR];
yfitPCR = [ones(n,1) X]*betaPCR;
How come that the coefficient needs to be 'mean(y) - mean(X)*betaPCR' for the constant one factor? Can you explain that to me?
Thanks in advance!
This is really a math question, not a coding question. Your PCA extracts a set of features and puts them in a matrix, which gives you PCALoadings and PCAScores. Pull out the first two principal components and their loadings, and put them in their own matrix:
W = PCALoadings(:, 1:2)
Z = PCAScores(:, 1:2)
The relationship between X and Z is that X can be approximated by:
Z = (X - mean(X)) * W <=> X ~ mean(X) + Z * W' (1)
The intuition is that Z captures most of the "important information" in X, and the matrix W tells you how to transform between the two representations.
Now you can do a regression of y on Z. First you have to subtract the mean from y, so that both the left and right hand sides have mean zero:
y - mean(y) = Z * beta + errors (2)
Now you want to use that regression to make predictions for y from X. Substituting from equation (1) into equation (2) gives you
y - mean(y) = (X - mean(X)) * W * beta
= (X - mean(X)) * beta1
where we have defined beta1 = W * beta (you do this in your third line of code). Rearranging:
y = mean(y) - mean(X) * beta1 + X * beta1
= [ones(n,1) X] * [mean(y) - mean(X) * beta1; beta1]
= [ones(n,1) X] * betaPCR
which works out if we define
betaPCR = [mean(y) - mean(X) * beta1; beta1]
as in your fourth line of code.