Lets say I have a set of probabilities from the lower quartile of a dataset than I'm assuming is distributed log normally. I have no data from the latter 3 quartiles.
How can I fit my data to the first quartile of a lognormal curve, to approximate the mean/median?
Related
Consider the following draws for a 2x1 vector in Matlab with a probability distribution that is a mixture of two Gaussian components.
P=10^3; %number draws
v=1;
%First component
mu_a = [0,0.5];
sigma_a = [v,0;0,v];
%Second component
mu_b = [0,8.2];
sigma_b = [v,0;0,v];
%Combine
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws
RV_temp = random(obj,P);%Px2
% Transform each component of RV_temp into a uniform in [0,1] by estimating the cdf.
RV1=ksdensity(RV_temp(:,1), RV_temp(:,1),'function', 'cdf');
RV2=ksdensity(RV_temp(:,2), RV_temp(:,2),'function', 'cdf');
Now, if we check whether RV1 and RV2 are uniformly distributed on [0,1] by doing
ecdf(RV1)
ecdf(RV2)
we can see that RV1 is uniformly distributed on [0,1] (the empirical cdf is close to the 45 degree line) while RV2 is not.
I don't understand why. It seems that the more distant are mu_a(2)and mu_b(2), the worse the job done by ksdensity with a reasonable number of draws. Why?
When you have a mixture of N(0.5,v) and N(8.2,v) then the range of the generated data is larger than if you had expectation which were closer, like N(0,v) and N(0,v), as you have in the other dimension. Then you ask ksdensity to approximate a function using P points inside this range.
Like in standard linear interpolation, the denser the points the better approximation of the function (inside the range), this is the same case here. Thus in the N(0.5,v) and N(8.2,v) where the points are "sparse" (or sparser, is that a word?) the approximation is worse than in the N(0,v) and N(0,v) where the points are denser.
As a small side note, are there any reason that you do not apply ksdensity directly on the bivariate data? Also I cannot reproduce your comment where you say that 5e2points are also good. Final comment, 1e3 is typically prefered over 10^3.
I think this is simply about the number of samples you're using. For the first example, the means of the two Gaussians are relatively close, hence a thousand samples are enough to obtain a cdf really close the the U[0,1] cdf. On the second vector though, you have a higher difference, and need more samples. With 100000 samples, I obtained the following result:
With 1000 I obtained this:
Which is clearly farther from the Uniform cdf function. Try to increase the number of samples to a million and check if the result is again getting closer.
I refer to this paper, second-page right-column second-paragraph, where it is stated how to produce quadruple density wavelet coefficients:
if we do not down sample the wavelet coefficients we generate
wavelets with double density, where wavelets of level n are centered every 1/2*2^n. To generate the quadruple density dictionary,we compute the scaling coefficients with double density by not down sampling
them. The next step is to calculate double density wavelet coefficients on the two sets of scaling coefficients - even
and odd - separately.
I am confused how to get two sets of scaling coefficients - even and odd. What does it mean by even and odd?
Is that like, split the original image matrix into two matrices with those only even-index (0,0) (0,2)..... and odd-index (0,1),(0,3)...? What is the advantage?
Thanks
Those who has interest for Overcomplete Wavelet Transform, please look at these two links.
http://www.dahlsys.com/cuda/overcomplete_wavelet/index.html
http://eeweb.poly.edu/~onur/source.html
Thanks
I have two matrixesthat belongs gaussian distribtion.The size are 3x3. Now, I want to estimate up and down threshold of their matrixes. I denote mean and standard deviation of each matrix is μ1;μ2 and σ1;σ2. The high and low threshold are
T_hight=(μ1+μ2)./2+k1∗(σ1+σ2)./2
T_low=(μ1+μ2)./2-k2∗(σ1+σ2)./2
where k1,k2 is const
My question is "Is my formula correct? Because this is gaussian distribution, so k1=k2,Right? And this is my code. Could you check have me"
μ1=mean(v1(:));first matrix
σ1=std2(v1(:));
μ2=mean(v2(:));second matrix
σ2=std2(v2(:));
k1=k2=1;
T_hight=(μ1+μ2)./2+k1∗(σ1+σ2)./2;
T_low=(μ1+μ2)./2-k2∗(σ1+σ2)./2;
In the formula you are using, the joint standard deviation is wrong.it should be
T_high=(μ1+μ2)./2+k1∗sqrt((σ1^2+σ2^2)/2);
T_low=(μ1+μ2)./2-k2∗sqrt((σ1^2+σ2^2)/2);
As you treat all 18 pixels as belonging to the same distribution, why not use the following
v=[v1(:);v2(:)];
μ=mean(v);
σ=std(v);
k1=k2=1;
T_high=μ+k1*σ;
T_low=μ-k2∗σ1;
In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris
I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.
let us assume,
I have a vector t with the times in seconds of my samples. (These samples are not equally distributed on the time domain.
Also I have a vector data containing the samplevalues at the time t.
t and data have the same length.
If I plot the graph some sort of periodical signal is obtained.
now I could perform: abs(fft(data)) to get my spectrum, which is then plotted over the amount of data points on the x-axis.
How can I obtain my spectrum regarding the times in vector t and plot it?
I want to see which frequencies in 1/s or which period in s my signal contains.
Thanks for your help.
[Not the OP's intention]: FFT will give you the spectrum (global) for any number of input data points. You cannot have a specific data point (in time) associated with parts (or the full) spectrum.
What you can do instead is use spectrogram and obtain the Short-Time Fourier Transform (STFT). This will give you a NxM discrete grid of time-frequency FT values (N: FT frequency bins, M: signal time-windows).
By localizing the (overlapping) STFT windows on your data samples of interest you will get N frequency magnitude values, thus the distribution of short-term spectrum estimates as the signal changes in time.
See also the possibly relevant answer here: https://stackoverflow.com/a/12085728/651951
EDIT/UPDATE:
For unevenly spaced data you need to consider the Non-Uniform DFT (and Non-uniform FFT implementations). See the relevant question/answer here https://scicomp.stackexchange.com/q/593
The primary approaches for NFFT or NUFFT, are based on creating a uniform grid through local convolutions/interpolation, running FFT on this and undoing the convolutional effect of the interpolation filter.
You can read more:
A. Dutt and V. Rokhlin, Fast Fourier transforms for nonequispaced data, SIAM J. Sci. Comput., 14, 1993.
L. Greengard and J.-Y. Lee, Accelerating the Nonuniform Fast Fourier Transform, SIAM Review, 46 (3), 2004.
Pippig, M. und Potts, D., Particle Simulation Based on Nonequispaced Fast Fourier Transforms, in: Fast Methods for Long-Range Interactions in Complex Systems, 2011.
For an implementation (with an interface to MATLAB) try NFFT and possibly its parallelized version PNFFT. You may find a nice walk-through on how to set-up and use here.
You can resample or interpolate your sample points to get another set of sample points that are equally spaced in t. The chosen spacing or sample rate of the second set of equally spaced sample points will allow you to infer frequencies to the result of an FFT of that second set.
The results may be noisy or include aliasing unless the initial data set is bandlimited to a sufficiently low frequency to allow interpolation. If bandlimited, then you might try something like cubic splines as an interpolation method.
Although it may look like one can get a high FFT bin frequency resolution by resampling to a larger number of data points, the actual useful resolution accuracy will be more related to the original number of samples.