scipy kde contour to probability - scipy

I have created a bivariate gaussian using the scipy.state kde library:
k = kde.gaussian_kde(data.T)
However I don't understand how the contours or z values relate to the probability of a new data point falling within those contours,
ngenerated distribution with contours
I would like to be able to define a contour with the equivalent of p=0.001, or put another way a contour defining 99.9% of expected observations.

Related

would you please help me to modify my matlab code that is about calculation of surface charge density distribution on a sphere?

I have a question about the calculation of surface charge density distribution on a sub-wavelength sphere illuminated by electromagnetic plane-wave with wave vector k0 that excited the plasmonic resonances. For the investigation of these modes, I need to get the surface charge density distribution on the surface of the sphere. According to many references, I must calculate the scalar product of the (E_out - E_in) . n Which (n) is the outward unit vector and on the surface of the sphere is equal to (r) unit vector. I use Matlab for my calculations. For getting the surface charge density distribution on the surface of the sphere, I have calculated (E_out - E_in) elements in the spherical system and then convert them to Cartesian elements to be able to use the (surf command) but they are complex numbers. Then I need to use (abs command) to get real numbers. But I need one dimension matrix with scaler positive and negative elements. And (abs command) gives me only positive numbers. So I can't plot a polar graph to know the charge distribution. Please guide me if possible.

Lognormal curve fitting equation

I have run into a very peculiar problem. It might seem silly to a lot of you. But I am in dire need of a way out. I am analyzing sets of high-speed images with MATLAB. The image of interest (https://www.dropbox.com/s/h4h26y3mvpao8m6/sample.png?dl=0) is an average of 3000 images (background subtracted). As shown in the picture, I am reading the pixel intensities/values along columns. As this is a laser beam, the shape or beam profile away from the wall has the shape of a Gaussian distribution. As I approach to the wall (the brightest part at the right of the image) because of some effect the shape is turning into one like a log-normal distribution. In this spreadsheet (https://www.dropbox.com/s/yeim06a5cq3iqg8/sample.xlsx?dl=0) I have pasted the raw intensities as I read thru from point A to point B. The column D has the raw intensities and the column E has the values achieved with a 'sgolay' fit of the column D values. If I plot these it pretty much has the shape of a lognormal distribution. I can get the mu and sigma with the 'lognfit' or 'fitdist' functions. Now the question is what is the equation [expressed as a function of pixel location (x) or the pixel intensity (y)] of the fitted 'lognormal curve' that could be used to recreate the fitted curve? Your help is highly appreciated.
The lognfit extracts the mu and the sigma of the lognormal distribution. The mu is the mean of logarithmic values and sigma the standard deviation of logarithmic values. You can refer to https://en.wikipedia.org/wiki/Log-normal_distribution for the shape of the function given mu and sigma.
With logrnd(mu,sigma) you can generate samples from the same distribution:
https://it.mathworks.com/help/stats/lognrnd.html?searchHighlight=lognrnd&s_tid=srchtitle_lognrnd_1

A matlab programming difference for Gaussian

In my homework, I am required to depict that a method can generate an Gaussian Distribution. the matlab program is shown below:
n=100;
b=25;
len=200000;
X=rand(n,len);
x=sum(X-0.5)*b/n;
[ps2,t2]=hist(x,50);
ps2=ps2/len;
bar(t2,ps2,'y');
hold on;
sigma_2=b^2/(12*n);
R=normrnd(0,sqrt(sigma_2),1,len);
[ps2,t2]=hist(R,50);
ps2=ps2/len;
plot(t2,ps2,'bo-','linewidth',1.5);
x is the sum of n uniformly distributed variables multiplying by b/n. And x is gaussian distributed with zero-mean and sigma^2=b^2/12n.
Then I got the image where the two distribution matched.
However, when I substituted the t2 inside the normal distibution density function f(x)=exp(-x.^2/(2*sigma_2))/sqrt(2*pi*sigma_2), the output is quite larger than the first one, although the shape is similar.
I wander why this occurs?
Its because you did not normalize discrete histograms. We know that in a continuous distributions the integral of probability functions are one. For solving this issue you should divide histogram to its integral. An approximate integral of a discrete function is rectangular integral:
integral (f) = sum(f)* LengthStep
so you should change your code this way :
n=100;
b=25;
len=200000;
X=rand(n,len);
x=sum(X-0.5)*b/n;
[ps2,t2]=hist(x,50);
ps2=ps2/(sum(ps2)*(t2(2)-t2(1))); % normalize discrete distribution
bar(t2,ps2,'y');
hold on;
sigma_2=b^2/(12*n);
R=normrnd(0,sqrt(sigma_2),1,len);
[ps2,t2]=hist(R,50);
ps2=ps2/(sum(ps2)*(t2(2)-t2(1))); % normalize discrete distribution
plot(t2,ps2,'bo-','linewidth',1.5);
hold on
plot(t2,exp(-t2.^2/(2*sigma_2))/sqrt(2*pi*sigma_2),'r'); %plot continuous distribution
and this is the result :

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

How to use matlab contourf to draw two-dimensional decision boundary

I finished an SVM training and got data like X, Y. X is the feature matrix only with 2 dimensions, and Y is the classification labels. Because the data is only in two dimensions, so I would like to draw a decision boundary to show the surface of support vectors.
I use contouf in Matlab to do the trick, but really find it hard to understand how to use the function.
I wrote like:
#1 try:
contourf(X);
#2 try:
contourf([X(:,1) X(:,2) Y]);
#3 try:
Z(:,:,1)=X(Y==1,:);
Z(:,:,2)=X(Y==2,:);
contourf(Z);
all these things do not correctly. And I checked the Matlab help files, most of them make Z as a function, so I really do not know how to form the correct Z matrix.
If you're using the svmtrain and svmclassify commands from Bioinformatics Toolbox, you can just use the additional input argument (...'showplot', true), and it will display a scatter plot with a decision boundary and the support vectors highlighted.
If you're using your own SVM, or a third-party tool such as libSVM, what you probably need to do is to:
Create a grid of points in your 2D input feature space using the meshgrid command
Classify those points using your trained SVM
Plot the grid of points and the classifications using contourf.
For example, in kind-of-MATLAB-but-pseudocode, assuming your input features are called X1 and X2:
numPtsInGrid = 100;
x1Range = linspace(x1lower, x1upper, numPtsInGrid);
x2Range = linspace(x2lower, x2upper, numPtsInGrid);
[X1, X2] = meshgrid(x1Range, x2Range);
Z = classifyWithMySVMSomehow([X1(:), X2(:)]);
contourf(X1(:), X2(:), Z(:))
Hope that helps.
I know it's been a while but I will give it a try in case someone else will come up with that issue.
Assume we have a 2D training set so as to train an SVM model, in other words the feature space is a 2D space. We know that a kernel SVM model leads to a score (or decision) function of the form:
f(x) = sumi=1 to N(aiyik(x,xi)) + b
Where N is the number of support vectors, xi is the i -th support vector, ai is the estimated Lagrange multiplier and yi the associated class label. Values(scores) of decision function in way depict the distance of the observation x frοm the decision boundary.
Now assume that for every point (X,Y) in the 2D feature space we can find the corresponding score of the decision function. We can plot the results in the 3D euclidean space, where X corresponds to values of first feature vector f1, Y to values of second feature f2, and Z to the the return of decision function for every point (X,Y). The intersection of this 3D figure with the Z=0 plane gives us the decision boundary into the two-dimensional feature space. In other words, imagine that the decision boundary is formed by the (X,Y) points that have scores equal to 0. Seems logical right?
Now in MATLAB you can easily do that, by first creating a grid in X,Y space:
d = 0.02;
[x1Grid,x2Grid] = meshgrid(minimum_X:d:maximum_X,minimum_Y:d:maximum_Y);
d is selected according to the desired resolution of the grid.
Then for a trained model SVMModel find the scores of every grid's point:
xGrid = [x1Grid(:),x2Grid(:)];
[~,scores] = predict(SVMModel,xGrid);
Finally plot the decision boundary
figure;
contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k');
Contour gives us a 2D graph where information about the 3rd dimension is depicted as solid lines in the 2D plane. These lines implie iso-response values, in other words (X,Y) points with same Z value. In our occasion contour gives us the decision boundary.
Hope I helped to make all that more clear. You can find very useful information and examples in the following links:
MATLAB's example
Representation of decision function in 3D space