I have a question on the Matlab command gmdistribution to generate draws from mixtures of Gaussians.
Consider the following code to draw from a mixture of two bivariate normals
clear
rng default
P=10^4; %number draws
%First component (X1,X2)
v=1;
mu_a = [0,2];
sigma_a = [v,0;0,v];
%Second component (Y1,Y2)
mu_b = [0,4];
sigma_b = [v,0;0,v];
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws of the mixture (R1,R2)
R = random(obj,P);%nx2
We know that (R1, R2) may be correlated. Indeed, we can show that
cov(R1, R2)=1/4*cov(X1,Y2)+1/4*cov(X2, Y1)
because
cov(W1,W2)=E(W1*W2)-E(W1)E(W2)
=1/4E(X1*X2)+1/4E(X1*Y2)+1/4E(Y1* X2)+1/4E(Y1* Y2)
- [1/2E(X1)+1/2E(Y1)][1/2E(X2)+1/2E(Y2)]
=1/4 cov(X1, Y2)+1/4cov(Y1, X2)
However, if I check their correlation
corr(R(:,1), R(:,2))
I get almost zero (0.0024)
I checked for many other values of MU, SIGMAbut I couldn't find any case with a correlation noticeably far from 0. Is this just a case, or is that the command gmdistribution imposes (X1,X2) independent of (Y1,Y2)?
We can best illustrate the problem with a figure. To make the effect more visible, I decreased the variance of the both components from 1 to 0.2 (v = 0.2). If we then draw some realisations from the mixed model, we get the following scatterplot:
Each "blob" corresponds to one component, one with its centre at 0,2 the other at 0,4.
Now, on its basis the linear correlation coefficient tells us how much W2 increases if W1 increases by one. But as we can see there is no such trend in the realisations; If W1 increases W2 is not increasing or decreasing.
This due to both distributions having the same mean (0) in W1. If that is not the case, e.g mu_a = [0,2]; and mu_b = [2,5]; we get following plot:
Here it is clearly visible that if W1 is high, chances are that W2 is also very high. This leads to a high positive correlation of about 0.87. Summing this up, if either mu_a(1) == mu_b(1) or mu_a(2) == mu_b(2) then the correlation will be near zero.
Related
I can't get my mind around the concept of how to calculate bias and variance from a random set.
I have created the code to generate a random normal set of numbers.
% Generate random w, x, and noise from standard Gaussian
w = randn(10,1);
x = randn(600,10);
noise = randn(600,1);
and then extract the y values
y = x*w + noise;
After that I split my data into a training (100) and test (500) set
% Split data set into a training (100) and a test set (500)
x_train = x([ 1:100],:);
x_test = x([101:600],:);
y_train = y([ 1:100],:);
y_test = y([101:600],:);
train_l = length(y_train);
test_l = length(y_test);
Then I calculated the w for a specific value of lambda (1.2)
lambda = 1.2;
% Calculate the optimal w
A = x_train'*x_train+lambda*train_l*eye(10,10);
B = x_train'*y_train;
w_train = A\B;
Finally, I am computing the square error:
% Compute the mean squared error on both the training and the
% test set
sum_train = sum((x_train*w_train - y_train).^2);
MSE_train = sum_train/train_l;
sum_test = sum((x_test*w_train - y_test).^2);
MSE_test = sum_test/test_l;
I know that if I create a vector of lambda (I have already done that) over some iterations I can plot the average MSE_train and MSE_test as a function of lambda, where then I will be able to verify that large differences between MSE_test and MSE_train indicate high variance, thus overfit.
But, what I want to do extra, is to calculate the variance and the bias^2.
Taken from Ridge Regression Notes at page 7, it guides us how to calculate the bias and the variance.
My questions is, should I follow its steps on the whole random dataset (600) or on the training set? I think the bias^2 and the variance should be calculated on the training set. Also, in Theorem 2 (page 7 again) the bias is calculated by the negative product of lambda, W, and beta, the beta is my original w (w = randn(10,1)) am I right?
Sorry for the long post, but I really want to understand how the concept works in practice.
UPDATE 1:
Ok, so following the previous paper didn't generate any good results. So, I took the standard form of Ridge Regression Bias-Variance which is:
Based on that, I created (I used the test set):
% Bias and Variance
sum_bias=sum((y_test - mean(x_test*w_train)).^2);
Bias = sum_bias/test_l;
sum_var=sum((mean(x_test*w_train)- x_test*w_train).^2);
Variance = sum_var/test_l;
But, after 200 iterations and for 10 different lambdas this is what I get, which is not what I expected.
Where in fact, I was hoping for something like this:
sum_bias=sum((y_test - mean(x_test*w_train)).^2); Bias = sum_bias/test_l
Why have you squared the difference between y_test and y_predicted = x_test*w_train?
I don't believe your formula for bias is correct. In your question, the 'bias term' above in blue is the bias^2 however surely your formula is neither the bias nor the bias^2 since you have only squared the residuals, not the entire bias?
In the matlab function awgn() that is used to add noise to a signal, is there a way specify the variance?
In general, I would have simply done noisevec = sqrt(2)*randn(length(X),1); creates a noise vector of variance 2. Then the noisy observations are
Y = X+noisevec
But, I would like to apply awgn() and then check if the variance of noise is indeed as specified by the user. How to do that?
% add noise to produce
% an SNR of 10dB, use:
X = sin(0:pi/8:6*pi);
Y = awgn(X,10,'measured');
UPDATE : Based on the solution, the output should be same when generating noise with specific variance using the awgn() given in the answer/ solution provided and when using without awgn(). Is something wrong in my understanding? Here is how I checked.
x = rand(1,10); $generating source input
snr =10;
variance = 0.1;
%This procedure is based on the answer
y1 = awgn(x, snr, 'measured');
y1 = x + (y1 - x) * sqrt(variance / var(y1 - x));
%This is the traditional way, without using awgn()
y2 = x+sqrt(variance)*randn(1,10);
y1 is not equal to y2. I wonder why?
awgn does not generate a noise with a specific variance. But if you have to generate a noise with a specific variance, you may consider defining your own noise generator which could be simply scaling the noise up or down to the desired level:
function y = AddMyNoise(x, variance)
y = awgn(x, 10, 'measured');
y = x + (y - x) * sqrt(variance / var(y - x));
end
UPDATE: Note that this method of forcing the output to have a specific variance could be dangerous: It will give strange outputs if x has few elements. In the limit of x being a scalar, this approach will add a fixed value of +-sqrt(variance) to x. No white noise anymore. But if you have more than a few data points, you will get a reasonably white noise.
(Disclaimer: I thought about posting this on math.statsexchange, but found similar questions there that were moved to SO, so here I am)
The context:
I'm using fft/ifft to determine probability distributions for sums of random variables.
So e.g. I'm having two uniform probability distributions - in the simplest case two uniform distributions on the interval [0,1].
So to get the probability distribution for the sum of two random variables sampled from these two distributions, one can calculate the product of the fourier-transformed of each probabilty density.
Doing the inverse fft on this product, you get back the probability density for the sum.
An example:
function usumdist_example()
x = linspace(-1, 2, 1e5);
dx = diff(x(1:2));
NFFT = 2^nextpow2(numel(x));
% take two uniform distributions on [0,0.5]
intervals = [0, 0.5;
0, 0.5];
figure();
hold all;
for i=1:size(intervals,1)
% construct the prob. dens. function
P_x = x >= intervals(i,1) & x <= intervals(i,2);
plot(x, P_x);
% for each pdf, get the characteristic function fft(pdf,NFFT)
% and form the product of all char. functions in Y
if i==1
Y = fft(P_x,NFFT) / NFFT;
else
Y = Y .* fft(P_x,NFFT) / NFFT;
end
end
y = ifft(Y, NFFT);
x_plot = x(1) + (0:dx:(NFFT-1)*dx);
plot(x_plot, y / max(y), '.');
end
My issue is, the shape of the resulting prob. dens. function is perfect.
However, the x-axis does not fit to the x I create in the beginning, but is shifted.
In the example, the peak is at 1.5, while it should be 0.5.
The shift changes if I e.g. add a third random variable or if I modify the range of x.
But I can't get figure how.
I'm afraid it might have to do with the fact that I'm having negative x values, while fourier transforms usually work in a time/frequency domain, where frequencies < 0 don't make sense.
I'm aware I could find e.g. the peak and shift it to its proper place, but seems nasty and error prone...
Glad about any ideas!
The problem is that your x origin is -1, not 0. You expect the center of the triangular pdf to be at .5, because that's twice the value of the center of the uniform pdf. However, the correct reasoning is: the center of the uniform pdf is 1.25 above your minimum x, and you get the center of the triangle at 2*1.25 = 2.5 above the minimum x (that is, at 1.5).
In other words: although your original x axis is (-1, 2), the convolution (or the FFT) behave as if it were (0, 3). In fact, the FFT knows nothing about your x axis; it only uses the y samples. Since your uniform is zero for the first samples, that zero interval of width 1 is amplified to twice its width when you do the convolution (or the FFT). I suggest drawing the convolution on paper to see this (draw original signal, reflected signal about y axis, displace the latter and see when both begin to overlap). So you need a correction in the x_plot line to compensate for this increased width of the zero interval: use
x_plot = 2*x(1) + (0:dx:(NFFT-1)*dx);
and then plot(x_plot, y / max(y), '.') will give the correct graph:
I try to write an algorithm which determine $\mu$, $\sigma$,$\pi$ for each class from a mixture multivariate normal distribution.
I finish with the algorithm partially, it works when I set the random guess values($\mu$, $\sigma$,$\pi$) near from the real value. But when I set the values far from the real one, the algorithm does not converge. The sigma goes to 0 $(2.30760684053766e-24 2.30760684053766e-24)$.
I think the problem is my covarience calculation, I am not sure that this is the right way. I found this on wikipedia.
I would be grateful if you could check my algorithm. Especially the covariance part.
Have a nice day,
Thanks,
2 mixture gauss
size x = [400, 2] (400 point 2 dimension gauss)
mu = 2 , 2 (1 row = first gauss mu, 2 row = second gauss mu)
for i = 1 : k
gaussEvaluation(i,:) = pInit(i) * mvnpdf(x,muInit(i,:), sigmaInit(i, :) * eye(d));
gaussEvaluationSum = sum(gaussEvaluation(i, :));
%mu calculation
for j = 1 : d
mu(i, j) = sum(gaussEvaluation(i, :) * x(:, j)) / gaussEvaluationSum;
end
%sigma calculation methode 1
%for j = 1 : n
% v = (x(j, :) - muNew(i, :));
% sigmaNew(i) = sigmaNew(i) + gaussEvaluation(i,j) * (v * v');
%end
%sigmaNew(i) = sigmaNew(i) / gaussEvaluationSum;
%sigma calculation methode 2
sub = bsxfun(#minus, x, mu(i,:));
sigma(i,:) = sum(gaussEvaluation(i,:) * (sub .* sub)) / gaussEvaluationSum;
%p calculation
p(i) = gaussEvaluationSum / n;
Two points: you can observe this even when you implement gaussian mixture EM correctly, but in your case, the code does seem to be incorrect.
First, this is just a problem that you have to deal with when fitting mixtures of gaussians. Sometimes one component of the mixture can collapse on to a single point, resulting in the mean of the component becoming that point and the variance becoming 0; this is known as a 'singularity'. Hence, the likelihood also goes to infinity.
Check out slide 42 of this deck: http://www.cs.ubbcluj.ro/~csatol/gep_tan/Bishop-CUED-2006.pdf
The likelihood function that you are evaluating is not log-concave, so the EM algorithm will not converge to the same parameters with different initial values. The link I gave above also gives some solutions to avoid this over-fitting problem, such as putting a prior or regularization term on your parameters. You can also consider running multiple times with different starting parameters and discarding any results with variance 0 components as having over-fitted, or just reduce the number of components you are using.
In your case, your equation is right; the covariance update calculation on Wikipedia is the same as the one on slide 45 of the above link. However, if you are in a 2d space, for each component the mean should be a length 2 vector and the covariance should be a 2x2 matrix. Hence your code (for two components) is wrong because you have a 2x2 matrix to store the means and a 2x2 matrix to store the covariances; it should be a 2x2x2 matrix.
I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.