Suppose I am finding the power spectral density of data like such:
x = winter_data.values #measured at frequency 1Hz
f, Sxx = sp.signal.welch(x1, fs=1, window='hanning', nperseg=N, noverlap = N / 2)
I want to test that Parseval's theorem works on these data sets. Since welch returns the power spectral density, should we not have
np.trapz(x**2, dx=1)
and
len(x1)*np.trapz(Sxx, f)
equal to eachother? Or is my definition of power spectral density incorrect? (np.trapz() is just used to calculate the integrals). I always thought that power spectral density was defined as
S_xx(f) = (1/T)|X(f)|^2
Currently I am not getting them equal.
Related
Working with Sloan Digital Sky Spectrum, I created a composite spectrum of quasars. The spectrum is a plot between wavelength (x-axis) measured in Angstrom and flux (y-axis) measured in ergs/cm^2/s/Angstrom.
For calculation of power law, I converted wavelength into frequency which resulted in THz scale frequency. On y-axis, I first changed the units from ergs/cm^2/s/Angstrom to ergs/cm^2/s/Hz by translating this to MATLAB and than to jansky units. This is the resultant plot.
Now I want to calculate slope of this graph, should I use the basic fitting tools and take the value of m from there? What are other methods to calculate slope for quasar spectrum for power law f = f_o nu^(-slope).
Use polyfit with degree 1 to fit a linear model.
P = polyfit(Xtrain, Ytrain, 1);
P will be a vector of two components:
The offset (or 0 degree component) and
The slope (or first degree component).
You can then fit new data (test data) by applying the fitted linear model
Ytest = P(1) * Xtest + P(2);
I have fitted a Gaussian Mixture Model to the multiple joint probability density functions. How can I obtain the conditional probability density function (i.e.,p(x|y)) from this mixture model (NXN matrix) in Matlab?
Based on Bayes rule, you can write down formula p(x|y)=p(x,y)/p(y). If you are able to obtain probability value p(y) for some given y, you can plug it in directly into Bayes formula. Otherwise you can go on and express each gaussian of the mixture as conditional gaussian with parameters (P stands for covariance matrices, mu stands for means):
mu_x|y = mu_x + P_xy P_yy^-1 (y - mu_y)
P_x|y = P_xx + P_xy P_yy^-1 P_yx
I am not doing signal processing. But in my area, I will use the spectral density of a matrix of data. I get quite confused at a very detailed level.
%matrix H is given.
corr=xcorr2(H); %get the correlation
spec=fft2(corr); % Wiener-Khinchin Theorem
In matlab, xcorr2 will calculate the correlation function of this matrix. The lag will range from -N+1 to N-1. So if size of matrix H is N by N, then size of corr will be 2N-1 by 2N-1. For discretized data, I should use corr or half of corr?
Another problem is I think Wiener-Khinchin Theorem is basically for continuous function. I have always thought that Discretized FT is an approximation to Continuous FT, or you can say it is a tool to calculate Continuous FT. If you use matlab build in function 'fft', you should divide the final result by \delta x.
Any kind soul who knows this area well there to share some matlab code with me?
Basically, approximating a continuous FT by a Discretized FT is the same as approximating an integral by a finite sum.
We will first discuss the 1D case, then we'll discuss the 2D case.
Let's look at the Wiener-Kinchin theorem (for example here).
It states that :
"For the discrete-time case, the power spectral density of the function with discrete values x[n], is :
where
Is the autocorrelation function of x[n]."
1) You can see already that the sum is taken from -infty to +infty in the calculation of S(f)
2) Now considering the Matlab fft - You can see (command 'edit fft' in Matlab), that it is defined as :
X(k) = sum_{n=1}^N x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
which is exactly what you want to be done in order to calculate the power spectral density for a frequency f.
Note that, for continuous functions, S(f) will be a continuous function. For Discretized function, S(f) will be discrete.
Now that we know all that, it can easily be extended to the 2D case. Indeed, the structure of fft2 matches the structure of the right hand side of the Wiener-Kinchin Theorem for the 2D case.
Though, it will be necessary to divide your result by NxM, where N is the number of sample points in x and M is the number of sample points in y.
I have a matrix z(x,y)
This is an NxN abitary pdf constructed from a unique Kernel density estimation (i.e. not a usual pdf and it doesn't have a function). It is multivariate and can't be separated and is discrete data.
I wan't to construct a NxN matrix (F(x,y)) that is the cumulative distribution function in 2 dimensions of this pdf so that I can then randomly sample the F(x,y) = P(x < X ,y < Y);
Analytically I think the CDF of a multivariate function is the surface integral of the pdf.
What I have tried is using the cumsum function in order to calculate the surface integral and tested this with a multivariate normal against the analytical solution and there seems to be some discrepancy between the two:
% multivariate parameters
delta = 100;
mu = [1 1];
Sigma = [0.25 .3; .3 1];
x1 = linspace(-2,4,delta); x2 = linspace(-2,4,delta);
[X1,X2] = meshgrid(x1,x2);
% Calculate Normal multivariate pdf
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
% My attempt at a numerical surface integral
FN = cumsum(cumsum(F,1),2);
% Normalise the CDF
FN = FN./max(max(FN));
X = [X1(:) X2(:)];
% Analytic solution to a multivariate normal pdf
p = mvncdf(X,mu,Sigma);
p = reshape(p,delta,delta);
% Highlight the difference
dif = p - FN;
error = max(max(sqrt(dif.^2)));
% %% Plot
figure(1)
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
xlabel('x1'); ylabel('x2'); zlabel('Probability Density');
figure(2)
surf(X1,X2,FN);
xlabel('x1'); ylabel('x2');
figure(3);
surf(X1,X2,p);
xlabel('x1'); ylabel('x2');
figure(5)
surf(X1,X2,dif)
xlabel('x1'); ylabel('x2');
Particularly the error seems to be in the transition region which is the most important.
Does anyone have any better solution to this problem or see what I'm doing wrong??
Any help would be much appreciated!
EDIT: This is the desired outcome of the cumulative integration, The reason this function is of value to me is that when you randomly generate samples from this function on the closed interval [0,1] the higher weighted (i.e. the more likely) values appear more often in this way the samples converge on the expected value(s) (in the case of multiple peaks) this is desired outcome for algorithms such as particle filters, neural networks etc.
Think of the 1-dimensional case first. You have a function represented by a vector F and want to numerically integrate. cumsum(F) will do that, but it uses a poor form of numerical integration. Namely, it treats F as a step function. You could instead do a more accurate numerical integration using the Trapezoidal rule or Simpson's rule.
The 2-dimensional case is no different. Your use of cumsum(cumsum(F,1),2) is again treating F as a step function, and the numerical errors resulting from that assumption only get worse as the number of dimensions of integration increases. There exist 2-dimensional analogues of the Trapezoidal rule and Simpson's rule. Since there's a bit too much math to repeat here, take a look here:
http://onestopgate.com/gate-study-material/mathematics/numerical-analysis/numerical-integration/2d-trapezoidal.asp.
You DO NOT need to compute the 2-dimensional integral of the probability density function in order to sample from the distribution. If you are computing the 2-d integral, you are going about the problem incorrectly.
Here are two ways to approach the sampling problem.
(1) You write that you have a kernel density estimate. A kernel density estimate is a special case of a mixture density. Any mixture density can be sampled by first selecting one kernel (perhaps differently or equally weighted, same procedure applies), and then sampling from that kernel. (That applies in any number of dimensions.) Typically the kernels are some relatively simple distribution such as a Gaussian distribution so that it is easy to sample from it.
(2) Any joint density P(X, Y) is equal to P(X | Y) P(Y) (and equivalently P(Y | X) P(X)). Therefore you can sample from P(Y) (or P(X)) and then from P(X | Y). In order to sample from P(X | Y), you will need to integrate P(X, Y) along a line Y = y (where y is the sampled value of Y), but (this is crucial) you only need to integrate along that line; you don't need to integrate over all values of X and Y.
If you tell us more about your problem, I can help with the details.
I am confused about the terminology used in scipy.signal.periodogram, namely:
scaling : { 'density', 'spectrum' }, optional
Selects between computing the power spectral density ('density')
where Pxx has units of V*2/Hz if x is measured in V and computing
the power spectrum ('spectrum') where Pxx has units of V*2 if x is
measured in V. Defaults to 'density'
(see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html)
1) a few tests show that result for option 'density' is dependent on signal and window length and sampling frequency (grows when signal length increases). How come? I would say that it is exactly density that should be not dependent on these things. If I take a longer signal I should just get more accurate estimation, not different result. Not to mention that dependence on window length is also very surprising.
Result diverges in the limit of infinite signal, which could be a feature of energy, but not power. Shouldn't the periodogram converge to real theoretical PSD when length increases? If, so, am I supposed to perform another normalisation outside of the signal.periodogram method?
2) to the contrary I see that alternative option 'spectrum' gives what I would previously call Power Spectrum Density, that is, it gives a resuls independent on window segment and window length and consistent with theoretical calculation. For instance for Asin(2PIft) a two sided solution yields two peaks at -f and f, each of height 0.25*A^2.
There is a lot of literature on this subject, but I get an impression that also there is a lot of incompatibile terminology, so I will be thankful for any clarification. The straightforward question is how to interpret these options and their units. (I am used to seeing V^2/Hz which are labeled "Power Spectrum Density").
Let's take a real array called data, of length N, and with sampling frequency fs. Let's call the time bin dt=1/fs, and T = N * dt. In frequency domain, the frequency bin df = 1/T = fs/N.
The power spectrum PS (scaling='spectrum' in scipy.periodogram) is calculated as follow:
import numpy as np
import scipy.fft as fft
dft = fft.fft(data)
PS = np.abs(dft)**2 / N ** 2
It has the units of V^2. It can be understood as follow. By analogy to the continuous Fourier transform, the energy E of the signal is:
E := np.sum(data**2) * dt = 1/N * np.sum(np.abs(dft)**2) * dt
(by Parseval's theorem). The power P of the signal is the total energy E divided by the duration of the signal T:
P := E/T = 1/N**2 * np.sum(np.abs(dft)**2)
The power P only depends on the Discrete Fourier Transform (DFT) and the number of samples N. Not directly on the sampling frequency fs or signal duration T. And the power per frequency channel, i.e., power spectrum SP, is thus given by the formula above:
PS = np.abs(dft)**2 / N ** 2
For the power spectrum density PSD (scaling='density' in scipy.periodogram), one needs to divide the PS by the frequency bin of the DFT, df:
PSD := PS/df = PS * N * dt = PS * N / fs
and thus:
PSD = np.abs(dft)**2 / N * dt
This has the units of V^2/Hz = V^2 * s, and now depends on the sampling frequency. That way, integrating the PSD over the frequency range gives the same result as summing the individual values of the PS.
This should explain the relations that you see when changing the window, sampling frequency, duration.
scipy.signal.peridogram uses the scipy.signal.welch function with 0 overlap. Therefore, the scaling is similar to the one provided by the welch function, density or spectrum.
In case of the density scaling, the amplitude will vary with window length, as the longer the window the higher the frequency resolution e.g. the \Delta_f is smaller. Since the estimated density is the average one, the smaller the \Delta_f the less zero energy is considered in the averaging.
As you have mentioned spectrum scaling is an integration of the energy density over the spectrum to produce the energy. Therefore, the integration over zero values does not affect the final value.
Fourier transform actually requires finite energy in an infinite duration of time series (like a decay). So, If you just make your time series sample longer by "duplicating", the energy will be infinite with an infinite duration.
My main confusion was on the "spectrum" option for scipy.signal.periodogram, which seems to create a constant energy spectrum even when the time series become longer.
Normally, 0.5*A^2=S(f)*delta_f, where S(f) is the power density spectrum. S(f)*delta_f, representing energy is constant if A is constant. But when using a longer duration of time series, delta_f (i.e. incremental frequency) is reduced accordingly, based on FFT procedure. For example, 100s time series will lead to a delta_f=0.01Hz, while 1000s time series will have a delta_f=0.001Hz. S(f) representing density will accordingly change.