Multivariate Emperical CDF - matlab

How can I compute a multivariate emperical CDF? Is there anything in Matlab, or perhaps an approach that can give me similar output as ecdf but as an input uses a matrix instead of a vector.
Appreciate any input.
Basically would like something like this:
http://reference.wolfram.com/mathematica/ref/EmpiricalDistribution.html

So, to provide an official answer (based on our comment conversation):
Use hist3 to get the emprical pdf, and then do a 2D cumsum (I'm not sure this is built in, but you could write your own) to sum across the pdf and create a 2D cdf. Each entry in the cdf matrix is the sum of all values of lesser row and column index in the pdf matrix.

If ecdf works for what you need, and you only need matrix functionality, you can try vectorizing the input to ecdf and then reshaping the output.
y = rand(100); % replace this with your actual code...
f = ecdf(y(:)); % pass in the vectorized version of y
f = reshape(f, size(y)); % Reshape output

Related

Inverse cumulative distribution function in MATLAB given an empirical PDF

Is it possible to generate random numbers with a distribution that depends on empirical probability data? I imagine this is possible by taking the inverse cumulative distribution function. I have seen some examples where this is done in MATLAB (the software that I'm using) but all of those examples have an underlying analytic form for the probability. Here I have only the PDF. For instance, I have data of probabilities for a particular event. Most of the probabilities are zero and hence not unique, but not all.
My goal is to generate the random numbers and then figure out what the distribution is. I'd really appreciate if people can help clear up my thinking here.
EDIT:
I think I want something like:
cdf=cumsum(pdf); % calculate pdf from empirical pdf
M=length(cdf);
xq=linspace(0,1,M);
invcdf=interp1(cdf,xq,xq); % calculate inverse cdf, i.e., x
but how do I take into account that a lot of the values of the pdf are zero and not unique? Is this even the right approach?
I am basing my answer on Inverse empirical cumulative distribution function from the MathWorks File Exchange. See that link for other suggestions to solving your problem.
% y - input: data set
% q - input: desired quantile (can be a scalar or a vector)
% xq - output: ICDF at specified quantile
[f, x] = ecdf(y);
xq = zeros(size(q));
for ii = 1:length(q)
xq(ii) = min(x(q(ii) <= f));
end
I'd eliminate the for loop if you're only using scalars. Also, there may be a more efficient way to vectorize the for loop, but this should at least get you started.

3D Convolution in MATLAB

I'm currently trying to use convolution to average blocks in my data and turn my current, 336x264x25x27, grid into a new, 100x100x27, grid.
To achieve this I've been trying to use convolution.
In a 2-D setting I've been able to convert a 336x264 matrix to 100x100 using the conv2 function in matlab. I'm now trying to use convn to accomplish a similar task in 4-D.
As stated, currently I'm using the convn function. I'm trying to average out cells over the first two dimensions so that I end up with a 100x100x27 matrix. My code is as follows:
A = rand(336,264,25,27); % Sample Data
A = A(:,:,13,:); % This line and the following line eliminate the third dimension (time) which will be constant throughout my output. Now "A" is 336x264x27 after using "squeeze".
A = squeeze(A);
B = ones(100,100,27); % This is the size of matrix that I would like to achieve. I was under the impression that "B" was the size matrix that you want to inevitably end up with but I believe I am mistaken.
C = convn(A,B); % C would hopefully by my 100x100x27 matrix.
Currently, this is resulting in a 435x363x53 matrix. If you could help me with my logic and show me how I might turn "A" into a 100x100x27 matrix using convolution it would be much appreciated!

Creating 2D points near y=x

I need to generate some random 2D points (for example 30 points) near the y=x line, insert them in a matrix, plot it and then calculate the SVD of the matrix. But since I'm new to MATLAB I don't know how can I generate my desired matrix.
Since this looks like homework I'll just post some general ideas here.
randi can be used to get semi-random integers. Using that you can create a 2D matrix by duplicating the array and putting them together. Thus: generate a 30x1 column and duplicate it to a 30x2 column. All rows will have the same two entries, i.e. x=y.
Noise can be added to this by creating a 30x2 matrix of random numbers, use rand for that and simply add that to the previously created matrix.
Check the documentation on svd to see how the singular-value decomposition works, it's fairly straight-forward if you know your linear algebra.
Finally for plotting you can use various tools such as image, imagesc, plot, surf and scatter, try them and see which works best for you.
Here is a quick example I made: https://saturnapi.com/fullstack/2d-points-randomly-near-line
%// Welcome to Saturn's MATLAB-Octave API.
%// Delete the sample code below these comments and write your own!'
x = 13 + 6.*rand(20,1);
y = x*0.7 + 0.5*rand(20,1);
[X,Y] = meshgrid(x,y)
figure(1);
plot(x,y,'.');
%// Print plot as PNG with resultion of 60 pixels per inch
print("MyPNG.png", "-dpng", "-r60");

PCA for feature extraction MATLAB

I have a data matrix A of size N-by-M.
I wanted use PCA for dimensionality reduction. I want to set the dimensions to 'k'.
I understand that after feature extraction, I should get a Nxk matrix.
I have tried pcares as follows,
[residuals,reconstructed] = pcares(A,k)
But this does not help me.
I am also trying to use the dr toolbox (here)
This returns me a k-by-k matrix. How do I proceede further?
Any help would be appreciated.
Thank You
pcares gives you the residual, which is the error when subtracting the input with the reconstructed input. You can use the pca command. It returns a MxM matrix whose columns are the principle components. You can use the first k of them to construct the feature, just do the following
X = bsxfun(#minus, A, mean(A)) * coeff(:, 1:k);, where coeff is what is returned from the pca command. The function call with bsxfun subtracts the mean (centers the data, as this is what pca did when calculating the output coeff).

Extending a sequence statistically in MATLAB

Is there any built-in functions in MATLAB that would statistically extend a sequence of real numbers so that the resulting sequence is extended to any size I want. I have a sequence of 499 elements and I want to extend it to 4096 elements. Thanks in advance.
If you're wanting to interpolate a vector of 499 elements to a higher resolution of 4096 elements, you can use the INTERP1 function in the following way (where x is your 499-element vector):
y = interp1(x,linspace(1,499,4096));
The above uses the function LINSPACE to generate a 4096-element vector of values spaced linearly between 1 and 499, which is then used as the interpolation points. By default, the INTERP1 function uses linear interpolation to compute new values between the old points. You can use other interpolation methods in the following way:
y = interp1(x,linspace(1,499,4096),'spline'); %# Cubic spline method
y = interp1(x,linspace(1,499,4096),'pchip'); %# Piecewise cubic Hermite method
I don't really understand the word "statistically" in the question, but from your comments it seems that you just need linear (or smooth) interpolation.
Try with interp1q or interp1.
If you know the distribution of the data to be in a Pearson or Johnson system of parametric family of distributions, then you can generate more data using the sampling functions pearsrnd and johnsrnd (useful in generating random values without specifying which parametric distribution)
Example:
%# load data, lets say this is vector of 499 elements
data = load('data.dat');
%# generate more data using pearsrnd
moments = {mean(data),std(data),skewness(data),kurtosis(data)};
newData = pearsrnd(moments{:}, [4096-499 1]);
%# concat sequences
extendedData = [data; newData];
%# plot histograms (you may need to adjust the num of bins to see the similarity)
subplot(121), hist(data), xlabel('x'), ylabel('Frequency')
subplot(122), hist(extendedData), xlabel('x'), ylabel('Frequency')
or using johnsrnd:
%# generate more data using johnsrnd
quantiles = quantile(data, normcdf([-1.5 -0.5 0.5 1.5]));
newData = johnsrnd(quantiles, [4096-499 1]);
On the other hand, if you want to assume a non-paramteric distribution, you can use the ecdf function or the ksdensity function.
Please refer to the demo Nonparametric Estimates of Cumulative Distribution Functions and Their Inverses for a complete example (highly suggested!).