MATLAB lsqcurvefit with a gaussian convolution model - matlab

I am trying to fit data that is described by a Gaussian distribution of convoluted exponential. An example of some perfect simulated data to illustrate my point:
clc; clear all
modeValue = 40; % mode value of our simulated gaussian
sig = 4; % variance
values = [1:0.5:100]; % vector for gaussian to be evaluated at
gauss=(1/(sqrt(2*pi)*sig)).*exp(-(values-modeValue).^2/(2*sig^2))'; % create our gaussian
gauss=gauss./max(gauss); % normalize
tau = logspace(-6,2,256); % generate our simulated x-data
for ii = 1:1:length(values)
data(:,ii) = exp(-tau.*values(ii)); % generated a data set
end
dataConv = data*gauss; % Multiply each data set by the corresponding gaussian weight
dataConv = dataConv./max(dataConv); % This is our final simulated data
semilogx(tau,dataConv,'ro');
If this code is run in MATLAB, a perfect generated data set dataConv is created.
My goal now is to recover our gauss plot by doing a lsqcurvefit on our generated data. The idea is to give a starting value for modeValue, sig and values, and have MATLAB automatically select them, generated all of the decay curves, convolute all the decays into 1 data set, check the fit, then go back and tweak the input values.
I can fit a single exponential using lsqcurvefit, but don't see how to do what I am after. Any help is greatly appreciated. Thanks

Related

Integration/differentiation through the FFT

I'm trying to understand how to perform an integration or differentiation of an FFT using MATLAB. However, I think I'm doing something wrong somewhere and would like to know what I'm missing...
Here's an example of an FFT integration that, to the best of my knowledge, should work but doesn't.
clc; clear all; close all;
Fs = 1000; % Sampling frequency
T = 1/Fs; % Sampling period
L = 1500; % Length of signal
t = (0:L-1)*T; % Time vector
f = Fs*(0:(L/2))/L;
omega = 2*pi.*f;
S is the time signal we are going to operate the FFT on, and dS is its derivative. We're going to apply an FFT to dS, and try to integrate that transform to get the same result as S.
S = 0.7*sin(2*pi*50*t) + sin(2*pi*120*t);
dS = 70*pi*cos(2*pi*50*t) + 140*pi*cos(2*pi*120*t);
P2 = fft(S);
Y = P2(1:L/2+1);
c = fft(dS);
dm = c(1:L/2+1);
From what I found online, to integrate an FFT, you need to multiply each FFT value by the corresponding omega*1i. I'm assuming each point on the FFT result correspond to the values of my frequency vector f.
for z = 1:length(f)
dm(z) = dm(z)./(1i*omega(z));
end
figure
semilogy(f,abs(Y),'b'); hold on
semilogy(f,abs(dm),'r');
We can see on the plot that both curves don't match: the FFT of the initial time signal S is different from the integral of the FFT of the differentiated time signal dS.
The main difference between your two plots is in the noise. Because you use a logarithmic y axis, the noise gets blown up and looks important. Pay attention to the magnitudes when comparing. Anything about 1015 times smaller than the peak value should be ignored. This is the precision of the floating-point numbers used.
The relevant part of these frequency spectra is the two peaks. And the difference there between the sine and cosine is the phase. But you are plotting the magnitude, so the function and its derivative will look the same. Plot the phase also! (but only where the magnitude is above the noise level).

How do I fit distributions to data sets in matlab?

I'm trying to find a fit to my data using matlab but I'm having a lot of trouble, here's what ive done so far:
A = load('homicide_crime.txt') % A is a two column array the first column is the year and the second column is crime in that year
norm_crime = (A(:,2)-mean(A(:,2)))/std(A(:,2));
[f,x]=hist(norm_crime,20);
plot(x,f/trapz(x,f))
y=normpdf(x,0,1);
hold on
plot(x,y)
This is the resulting plot
.
So i tried afterwards using the distribution fitter which gave me this.
Neither of these things look right since the peak aren't aligned and the fit is too small.
Here is the data set for anyone intrested
https://pastebin.com/CyddrN1R.
Any help is much appreciated.
Actually, I think you are confusing data transformation with distribution fitting.
DATA TRANSFORMATION
In this approach, data is manipulated through a non-linear transformation in order to achieve a perfect fit. This means that it forces your data to follow the chosen distribution rule. To accomplish this with a normal distribution, all you have to do is applying the following code:
A = load('homicide_crime.txt');
years = A(:,1);
crimes = A(:,2);
figure(),histfit(crimes);
rank = tiedrank(crimes);
p = rank ./ (numel(rank) + 1);
crimes_normal = norminv(p,0,1);
figure(),histfit(crimes_normal);
Using the following manipulation:
crimes_normal = (crimes - mean(crimes)) ./ std(crimes);
that can also be written as:
crimes_normal = zscore(crimes);
you modify your observations so that they have mu=0 and sigma=1, but this is far from making them perfectly fit a normal distribution.
DISTRIBUTION FITTING
In this approach, the parameters of the chosen distribution are calculated over the given dataset, and then random observations are drawn. On one side you have your empirical observations, and on the other side you have your fitted data. A goodness-of-fit test can finally tell you how well empirical observations fit the given distribution comparing them to theoretical observations.
Since your are working with a normal distribution, you know that it is fully described by two parameters: mu and sigma. Hence:
A = load('homicide_crime.txt');
years = A(:,1);
crimes_emp = A(:,2);
[mu,sigma] = normfit(crimes_emp);
% you can also use
% mu = mean(crimes);
% sigma = std(crimes);
% to achieve the same result
[f,x] = hist(crimes_emp);
crimes_the = normpdf(x,mu,sigma) .* max(f);
figure();
bar(x, (f ./ sum(f)));
hold on;
plot(x,crimes_the,'-r','LineWidth',2);
hold off;
And this returns something very close to the problem you originally noticed. As you can clearly see, without even running a Kolmogorov-Smirnov or an Anderson-Darling... your data doesn't fit a normal distribution well.
You can try a non-parametric density estimation method. I used kernel density estimation (KDE) with the default normal distribution as the kernel, to obtain the result as shown below. The Matlab command for the same is ksdensity() and documentation can be found here.
A = load('homicide_crime.txt') % load data
years = A(:,1);
values = A(:,2);
[f0,x0] = hist(values,100); % plot the actual histogram
[f1,x1,b1] = ksdensity(values); % KDE with automatically assigned bandwidth
[f2,x2,b2] = ksdensity(values,'Bandwidth',b1*0.6); % 60% of initial bandwidth (b1)
[f3,x3,b3] = ksdensity(values,'Bandwidth',b2*0.6); % 60% of previous bandwidth (b2) = 36% of initial bandwidth (b1)
[f4,x4,b4] = ksdensity(values,'Bandwidth',b3*0.6); % 60% of previous bandwidth (b3) = 21.6% of initial bandwidth (b1)
figure; hold on;
bar(years, f0/(sum(f0)*10) ); % scaled for visualization
plot(years, f1, 'y')
plot(years, f2, 'c')
plot(years, f3, 'g')
plot(years, f4, 'r','linewidth',3) % Final fit
In the code above, I first plot the histogram and then calculate the kde without any user specified bandwidth. This leads to an oversmooth fitting (yellow curve). With a few trials by reducing the bandwidth as only 60% of the previous iteration, I finally was able to get the closest fit (red curve). You can play around the bandwidth to get a still better desirable fit.

Non parametric estimate of cdf in Matlab

I have a vector A in Matlab of dimension Nx1. I want to get a non-parametric estimate the cdf at each point in A and store all the values in a vector B of dimension Nx1. Which different options do I have?
I have read about ecdf and ksdensity but it is not clear to me what is the difference, pros and cons. Any direction would be appreciated.
This doesn't exactly answer your question, but you can compute the empirical CDF very simply:
A = randn(1,1e3); % example Gaussian data
x_cdf = sort(A);
y_cdf = (1:numel(A))/numel(A);
plot(x_cdf, y_cdf) % plot CDF
This works because, by definition, each sample contributes to the (empirical) CDF with an increment of 1/N. That is, for values smaller than the minimum sample the CDF equals 0; for values between the minimum sample and the next highest sample it equals 1/N, etc.
The advantage of this approach is that you know exactly what is being done.
If you need to evaluate the empirical CDF at prescribed x-axis values:
A = randn(1,1e3); % example Gaussian data
x_cdf = -5:.1:5;
y_cdf = sum(bsxfun(#le, A(:), x_cdf), 1)/numel(A);
plot(x_cdf, y_cdf) % plot CDF
If you have prescribed y-axis values, the corresponding x-axis values are by definition the quantiles of the (empirical) distribution:
A = randn(1,1e3); % example Gaussian data
y_cdf = 0:.01:1;
x_cdf = quantile(A, y_cdf);
plot(x_cdf, y_cdf) % plot CDF
You want ecdf, not ksdensity.
ecdf computes the empirical distribution function of your data set. This converges to the cumulative distribution function of the underlying population as the sample size increases.
ksdensity computes a kernel density estimation from your data. This converges to the probability density function of the underlying population as the sample size increases.
The PDF tells you how likely you are to get values near a given value. It wiggles up and down over your domain, going up near more likely values and falling near less likely values. The CDF tells you how likely you are to get values below a given value. So it always starts at zero at the left end of your domain and increases monotonically to one at the right end of your domain.

How do you get the "approximations" and "details" of a wavelet transform in LTFAT?

I'm trying to get started with the LTFAT toolbox for Matlab/Octave?
It is easy to get the wavelet and scaling coefficients of a signal
[c,info] = fwt(signal,'sym8',8);
but I don't know how I can get the corresponding "approximations" and "details"...I guess they can be obtained since when you run
plotwavelets(c,info)
they are plotted (the "subbands" d1,d2,...,a5)
Anyone familiar with this toolbox?
UPDATE: I asked in the LTFAT mailing list and they helped me out here (credit goes to Nicki Holighaus). Just in case someone stumbles on this...
LTFAT doesn't have a specific function yielding the approximations and details from the corresponding coefficients, but can be easily computed using the inverse DWT: you just need to reconstruct the signal setting everything to zero except the coefficients whose corresponding details/approximations you want to obtain. This code is working for me
% the DWT coefficients are split according to the different levels
cellCoeffs = wavpack2cell(c,info.Lc);
% number of "bands", including the approximations and all the details
nBands = length(info.Lc);
% a cell array containing the coefficients of the DWT (in the form required by "wavcell2pack") for the reconstruction
emptyCellCoeffs = cell(nBands,1);
% the cell corresponding to each "band" is set to a vector of zeros of the appropriate length
for i=1:nBands
emptyCellCoeffs{i} = zeros(info.Lc(i),1);
end
% it will contain the aproximations and details
res = zeros(nBands,length(signal));
for i=1:nBands
% a copy of the coefficients for the reconstruction with everything set to zero...
aux = emptyCellCoeffs;
% ...except the coefficients for the corresponding level
aux{i} = cellCoeffs{i};
% inverse DWT after turning the cell representation back into a vector
res(i,:) = ifwt(wavcell2pack(aux),info);
end
error = sum(sum(res,1) - signal')
You can probably do better in terms of efficiency...but I think this is easy to understand.
Cheers!

How to generate normally distributed random number

can someone explain or point me to a page that explain how to create normally distributed random number in matlab using just error function, the inverse of the error function, and rand()(uniform random number generator between 0 and 1)? the random number doesn't have to be bounded to a certain interval. I'm having problem understand the concept of the error function and the inverse of it, and how it relates to creating random number that is normally distribute
You need to apply the method called inverse transform sampling, which consists in the following. Assume you want to generate a random variable with a given distribution function F. If you can compute the inverse function F-1, then you can obtain the desired random variable by applying F-1 to random samples with uniform distribution on the interval [0,1].
The error function (erf in Matlab) almost gives the distribution function of a normal random variable. Its inverse function is called erfinv in Matlab. Uniformly distributed random numbers are generated with rand.
With these ingredients you should be able to do the task. Please give it a try, and then see the code hovering the mouse over the rectangle:
N = 1e6; % number of samples
x = erfinv(2*rand(1,N)-1); % note factor 2, because of definition of erf
hist(x,31) % plot histogram to check it is (approximately) normal
This link from Mathworks seems to give the answer.
Here's the example from the link:
% First, initialize the random number generator to make the results in this
% example repeatable.
rng(0,'twister');
% Create a vector of 1000 random values drawn from a normal distribution
% with a mean of 500 and a standard deviation of 5.
a = 5;
b = 500;
y = a.*randn(1000,1) + b;
% Calculate the sample mean, standard deviation, and variance.
stats = [mean(y) std(y) var(y)]
% stats =
%
% 499.8368 4.9948 24.9483
%
% The mean and variance are not 500 and 25 exactly because they are
% calculated from a sampling of the distribution.