(Disclaimer: I thought about posting this on math.statsexchange, but found similar questions there that were moved to SO, so here I am)
The context:
I'm using fft/ifft to determine probability distributions for sums of random variables.
So e.g. I'm having two uniform probability distributions - in the simplest case two uniform distributions on the interval [0,1].
So to get the probability distribution for the sum of two random variables sampled from these two distributions, one can calculate the product of the fourier-transformed of each probabilty density.
Doing the inverse fft on this product, you get back the probability density for the sum.
An example:
function usumdist_example()
x = linspace(-1, 2, 1e5);
dx = diff(x(1:2));
NFFT = 2^nextpow2(numel(x));
% take two uniform distributions on [0,0.5]
intervals = [0, 0.5;
0, 0.5];
figure();
hold all;
for i=1:size(intervals,1)
% construct the prob. dens. function
P_x = x >= intervals(i,1) & x <= intervals(i,2);
plot(x, P_x);
% for each pdf, get the characteristic function fft(pdf,NFFT)
% and form the product of all char. functions in Y
if i==1
Y = fft(P_x,NFFT) / NFFT;
else
Y = Y .* fft(P_x,NFFT) / NFFT;
end
end
y = ifft(Y, NFFT);
x_plot = x(1) + (0:dx:(NFFT-1)*dx);
plot(x_plot, y / max(y), '.');
end
My issue is, the shape of the resulting prob. dens. function is perfect.
However, the x-axis does not fit to the x I create in the beginning, but is shifted.
In the example, the peak is at 1.5, while it should be 0.5.
The shift changes if I e.g. add a third random variable or if I modify the range of x.
But I can't get figure how.
I'm afraid it might have to do with the fact that I'm having negative x values, while fourier transforms usually work in a time/frequency domain, where frequencies < 0 don't make sense.
I'm aware I could find e.g. the peak and shift it to its proper place, but seems nasty and error prone...
Glad about any ideas!
The problem is that your x origin is -1, not 0. You expect the center of the triangular pdf to be at .5, because that's twice the value of the center of the uniform pdf. However, the correct reasoning is: the center of the uniform pdf is 1.25 above your minimum x, and you get the center of the triangle at 2*1.25 = 2.5 above the minimum x (that is, at 1.5).
In other words: although your original x axis is (-1, 2), the convolution (or the FFT) behave as if it were (0, 3). In fact, the FFT knows nothing about your x axis; it only uses the y samples. Since your uniform is zero for the first samples, that zero interval of width 1 is amplified to twice its width when you do the convolution (or the FFT). I suggest drawing the convolution on paper to see this (draw original signal, reflected signal about y axis, displace the latter and see when both begin to overlap). So you need a correction in the x_plot line to compensate for this increased width of the zero interval: use
x_plot = 2*x(1) + (0:dx:(NFFT-1)*dx);
and then plot(x_plot, y / max(y), '.') will give the correct graph:
Related
If I plot sinus like this
x=0:0.05:2*pi;
y=sin(x);
plot(x,y,'.-')
I'm getting obviously non-uniformly density of points.Please see attachment.sin
What I want is, that points should be at the equivalent distance each other. So, I need to define x array somehow.. or is there is another way?
The point density is uniform in x. If you want the points to be uniform in y, you could use:
y=-1:.05:1;
plot(asin(y),y,'o')
But then the points aren't uniform in x.
EDIT: Just for fun or for any future readers, to get points uniform in overall distance, the distance between points is d=sqrt(h^2+(f(x+h)-f(x))^2) which is approximately d=h*sqrt(1+f'(x)^2), i.e. h=d/sqrt(1+cos(x)^2) in this case. The curve length is the integral of sqrt(1+f'(x)^2) which in this case is 4*sqrt(2)*ellipticE(1/2) = 7.6404:
N = 100;
d = 7.6404/N;
x = zeros(1,N);
for n = 2:N
x(n) = x(n-1) + d/sqrt(1+cos(x(n-1))^2);
end
y = sin(x);
plot(x,y,'x')
You can check that the distance between points is approximately constant by looking at sqrt(diff(y).^2+diff(x).^2). It's only approximate because of the use of the derivative (at the left endpoint of the interval at that) for the distance, but this gets better as N increases. To get the distance exact, we'd need to numerically solve a trig equation for each point. The curve length is also affected by the approximation and tends to miss the last point.
I have a displacement and a time data of a movement of an object.
The object oscillates around zero. That is, first - it gets set into motion by a small amount of force, then it comes to rest. again, a little force is applied and object gets set into motion.
I have found out the velocity and acceleration using
V= [0 ; diff(disp) ./ diff(times)];
A= [0; diff(V) ./ diff(times)];
I was thinking of finding points where velocity is zero. But i guess there are more than required such instances. Find the graph below:
velocity plot
I am interested in only circles time values. Is there a way to get these?
I observe a pattern
velocity increases then decreases by almost same amount.
Then due to friction, it crosses zero by a smaller amount and again becomes negative
finally comes to rest, but a very little velocity is still present.
It is this touch point to zero that I want. Then again force is applied and the same cycle repeats.
Pl note that I do not have a time of when force is applied. Otherwise there was nothing to be done.
Also, I did plot the acceleration. But is seems so useless..
I am using matlab.
Here's one way to find approximate zeros in gridded data:
% some dummy synthetic data
x = linspace(0, 10, 1e3);
y = exp(-0.3*x) .* sin(x) .* cos(pi*x);
% its derivative (presumably your "acceleration")
yp = diff(y) ./ diff(x);
% Plot data to get an overview
plot(x,y), hold on
% Find zero crossings (product of two consecutive data points is negative)
zero_x = y(1:end-1) .* y(2:end) < 0;
% Use derivative for linear interpolation between those points
x_cross = x(zero_x) + y(zero_x)./yp(zero_x);
% Plot those zeros
plot(x_cross, zeros(size(x_cross)), 'ro')
Result:
It is then up to you to select which zeros you need, because I could not understand from the question what made those points in the circles so special...
The resting points you asked have the following property:
dx / dt = v = 0
d^2 x / dt^2 = a = 0 # at the instance that the object becomes v = 0, there is no force on it.
So you may want to check also the second formula to filter the resting points.
I have a set of data with over 4000 points. I want to exclude grooves from them, ideally from the point from which they start. The data look for example like this:
The problem with this is the noise I get at the top of the plateaus. I have an idea, in which I would take an average value of the most common within some boundaries (again, ideally sth like the red line here:
and then I would construct a temporary matrix, which would fill up one by one with Y if they are less than this average. If the Y(i) would rise above average, the matrix would find its minima and compare it with the global minima. If the temporary matrix's minima wouldn't be sth like 80% of the global minima, it would be discarded as noise.
I've tried using mean(Y), interpolating and fitting it in a polynomial (the green line) - none of those method would cut it to the point I would be satisfied.
I need this to be extremely robust and it doesn't need to be quick. The top and bottom values can vary a lot, as well as the shape of the plateaus. The groove width is more or less the same.
Do you have any ideas? Again, the point is to extract the values that would make the groove.
How about a median filter?
Let's define some noisy data similar to yours, and plot it in blue:
x = .2*sin((0:9999)/1000); %// signal
x(1000:1099) = x(1000:1099) + sin((0:99)/50*pi); %// noise: spike
x(5000:5199) = x(5000:5199) - sin((0:199)/100*pi); %// noise: wider spike
x = x + .05*sin((0:9999)/10); %// noise: high-freq ripple
plot(x)
Now apply the median filter (using medfilt2 from the Image Processing Toolbox) and plot in red. The parameter k controls the filter memory. It should chosen to be large compared to noise variations, and small compared to signal variations:
k = 500; %// filter memory. Choose as needed
y = medfilt2(x,[1 k]);
hold on
plot(y, 'r', 'linewidth', 2)
In case you don't have the image processing toolbox and can't use medfilt2 a method that's more manual. Skip the extreme values, and do a curve fit with sin1 as curve type. Note that this will only work if the signal is in fact a sine wave!
x = linspace(0,3*pi,1000);
y1 = sin(x) + rand()*sin(100*x).*(mod(round(10*x),5)<3);
y2 = 20*(mod(round(5*x),5) == 0).*sin(20*x);
y = y1 + y2; %// A messy sine-wave
yy = y; %// Store the messy sine-wave
[~, idx] = sort(y);
y(idx(1:round(0.15*end))) = y(idx(round(0.15*end))); %// Flatten out the smallest values
y(idx(round(0.85*end):end)) = y(idx(round(0.85*end)));%// Flatten out the largest values
[foo goodness output] = fit(x.',y.', 'sin1'); %// Do a curve fit
plot(foo,x,y) %// Plot it
hold on
plot(x,yy,'black')
Might not be perfect, but it's a step in the right direction.
I am trying to use Matlab's nlinfit function to estimate the best fitting Gaussian for x,y paired data. In this case, x is a range of 2D orientations and y is the probability of a "yes" response.
I have copied #norm_funct from relevant posts and I'd like to return a smoothed, normal distribution that best approximates the observed data in y, and returns the magnitude, mean and SD of the best fitting pdf. At the moment, the fitted function appears to be incorrectly scaled and less than smooth - any help much appreciated!
x = -30:5:30;
y = [0,0.20,0.05,0.15,0.65,0.85,0.88,0.80,0.55,0.20,0.05,0,0;];
% plot raw data
figure(1)
plot(x, y, ':rs');
axis([-35 35 0 1]);
% initial paramter guesses (based on plot)
initGuess(1) = max(y); % amplitude
initGuess(2) = 0; % mean centred on 0 degrees
initGuess(3) = 10; % SD in degrees
% equation for Gaussian distribution
norm_func = #(p,x) p(1) .* exp(-((x - p(2))/p(3)).^2);
% use nlinfit to fit Gaussian using Least Squares
[bestfit,resid]=nlinfit(y, x, norm_func, initGuess);
% plot function
xFine = linspace(-30,30,100);
figure(2)
plot(x, y, 'ro', x, norm_func(xFine, y), '-b');
Many thanks
If your data actually represent probability estimates which you expect come from normally distributed data, then fitting a curve is not the right way to estimate the parameters of that normal distribution. There are different methods of different sophistication; one of the simplest is the method of moments, which means you choose the parameters such that the moments of the theoretical distribution match those of your sample. In the case of the normal distribution, these moments are simply mean and variance (or standard deviation). Here's the code:
% normalize y to be a probability (sum = 1)
p = y / sum(y);
% compute weighted mean and standard deviation
m = sum(x .* p);
s = sqrt(sum((x - m) .^ 2 .* p));
% compute theoretical probabilities
xs = -30:0.5:30;
pth = normpdf(xs, m, s);
% plot data and theoretical distribution
plot(x, p, 'o', xs, pth * 5)
The result shows a decent fit:
You'll notice the factor 5 in the last line. This is due to the fact that you don't have probability (density) estimates for the full range of values, but from points at distances of 5. In my treatment I assumed that they correspond to something like an integral over the probability density, e.g. over an interval [x - 2.5, x + 2.5], which can be roughly approximated by multiplying the density in the middle by the width of the interval. I don't know if this interpretation is correct for your data.
Your data follow a Gaussian curve and you describe them as probabilities. Are these numbers (y) your raw data – or did you generate them from e.g. a histogram over a larger data set? If the latter, the estimate of the distribution parameters could be improved by using the original full data.
I have a weird problem with the discrete fft. I know that the Fourier Transform of a Gauss function exp(-x^2/2) is again the same Gauss function exp(-k^2/2). I tried to test that with some simple code in MatLab and FFTW but I get strange results.
First, the imaginary part of the result is not zero (in MatLab) as it should be.
Second, the absolute value of the real part is a Gauss curve but without the absolute value half of the modes have a negative coefficient. More precisely, every second mode has a coefficient that is the negative of that what it should be.
Third, the peak of the resulting Gauss curve (after taking the absolute value of the real part) is not at one but much higher. Its height is proportional to the number of points on the x-axis. However, the proportionality factor is not 1 but nearly 1/20.
Could anyone explain me what I am doing wrong?
Here is the MatLab code that I used:
function [nooutput,M] = fourier_test
Nx = 512; % number of points in x direction
Lx = 50; % width of the window containing the Gauss curve
x = linspace(-Lx/2,Lx/2,Nx); % creating an equidistant grid on the x-axis
input_1d = exp(-x.^2/2); % Gauss function as an input
input_1d_hat = fft(input_1d); % computing the discrete FFT
input_1d_hat = fftshift(input_1d_hat); % ordering the modes such that the peak is centred
plot(real(input_1d_hat), '-')
hold on
plot(imag(input_1d_hat), 'r-')
The answer is basically what Paul R suggests in his second comment, you introduce a phase shift (linearly dependent on the frequency) because the center of the Gaussian described by input_1d_hat is effectively at k>0, where k+1 is the index into input_1d_hat. Instead if you center your data (such that input_1d_hat(1) corresponds to the center) as follows you get a phase-corrected Gaussian in the frequency domain:
Nx = 512; % number of points in x direction
Lx = 50; % width of the window containing the Gauss curve
x = linspace(-Lx/2,Lx/2,Nx); % creating an equidistant grid on the x-axis
%%%%%%%%%%%%%%%%
x=fftshift(x); % <-- center
%%%%%%%%%%%%%%%%
input_1d = exp(-x.^2/2); % Gauss function as an input
input_1d_hat = fft(input_1d); % computing the discrete FFT
input_1d_hat = fftshift(input_1d_hat); % ordering the modes such that the peak is centered
plot(real(input_1d_hat), '-')
hold on
plot(imag(input_1d_hat), 'r-')
From the definition of the DFT, if the Gaussian is not centered such that maximum occurs at k=0, you will see a phase twist. The effect off fftshift is to perform a circular shift or swapping of left and right sides of the dataset, which is equivalent to shifting the center of the peak to k=0.
As for the amplitude scaling, that is an issue with the definition of the DFT implemented in Matlab. From the documentation for the FFT:
For length N input vector x, the DFT is a length N vector X,
with elements
N
X(k) = sum x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
n=1
The inverse DFT (computed by IFFT) is given by
N
x(n) = (1/N) sum X(k)*exp( j*2*pi*(k-1)*(n-1)/N), 1 <= n <= N.
k=1
Note that in the forward step the summation is not normalized by N. Therefore if you increase the number of points Nx in the summation while keeping the width Lx of the Gaussian function constant you will increase X(k) proportionately.
As for signal leaking into the imaginary frequency dimension, that is due to the discrete form of the DFT, which results in truncation and other effects, as noted again by Paul R. If you reduce Lx while keeping Nx constant, you should see a reduction in the amount of signal in the imaginary dimension relative to the real dimension (compare the spectra while keeping peak intensities in the real dimension equal).
You'll find additional answers to similar questions here and here.