Scatter plot with density in Matlab - matlab

I would like to plot data set 1 and data set 2 in one plot vertical. Unfortunately the data is huge, so it is just a smear of points and can't see the density. I tried hist3 and other suggestions but it overwrites my data sets and the binning looks awful.
Is there another way to plot scatter density plots? Is there really no Matlab function for it? If not, which program could I use to easy generate such a plot?
A mix between this two examples:
(source: bcgsc.ca)

Thanks to #Emil Albert for a correction (a transpose was missing)
What's wrong with computing hist3 and displaying the result with imagesc?
data1 = randn(1,1e5); %// example data
data2 = randn(1,1e5) + .5*data1 ; %// example data correlated to above
values = hist3([data1(:) data2(:)],[51 51]);
imagesc(values.')
colorbar
axis equal
axis xy
If you want to have the axes in accordance with the true data values: use the second output of hist3 to obtain the positions of the bin centers, and pass them to imagesc:
data1 = randn(1,1e5); %// example data
data2 = 2*randn(1,1e5) + 1.2*data1 + 4; %// example data correlated to above
[values, centers] = hist3([data1(:) data2(:)],[51 51]);
imagesc(centers{:}, values.')
colorbar
axis xy

Try Violin Plot submission on File Exchange. It's very customizable. I use it all the time. Thanks to #Jonas.

Related

Draw 3D model of more than one curve in matlab with vectors

I'm doing a project that involves making a 3D model of the cornea in matlab. I have 6 plot3 in the same graph to draw one cornea
but now i want a surface plot.
Don't mind the curve orientation.
Note that all the plot3 have x, y and z that are vectors
Thanks in advance
If I were you I would use the Surf command doku surf. It is used to display [x,y,z] data. Since you have not have as many touples of data (just 6) you will have to interpolate all the other values. Therefore I would use the scattered interpolant function doku scattered interpolant.
!!!!!!!!!!!!!!Take care all this is pseudocode!!!!!!!!!!!!!!!!
F = scatteredInterpolant(x_existing,y_existing,z_existing);
generates a scattered interpolant object. You do already feed your already existing data in there. Afterwards you generate the points at which you want to interpolate:
%generates samples from -4 t0 4 in 0.05 steps
[x_sample,y_sample] = meshgrid(-4:0.05:4,-4:0.05:4);
Now you calculate the fitted z values using the scattered interpolant obj
z_interpolated=F(x_sample,y_sample) %interpolates
surf(x_sample,y_sample,z_interpolated) %plots with surf between -4 and 4
!!!!!!!!!!!!!!!From here working code!!!!!!!!!!!!!!!!!!!!!
%serialiasation of data (special for this usecase)
x_data=[h0(30:632,6);(a30(28:408,3))+0.527;(a60(276:632,3));(a90(26:575,3))+3.417;(a120(188:586,3))-0.6625;(a150(16:380,3))+1.173];
y_data=[(h0(30:632,5));((a30(28:408,2))-0.9128);(a60(276:632,2));(a90(26:575,2));(a120(188:586,2))-0.3825;((a150(16:380,2))+2.032)];
z_data=[yA0;yA30+0.162;yA60;yA90+0.837;yA120+0.135;yA150+0.135];
% cleaning the data of nan values
x_data=x_data(~isnan(z_data));
y_data=y_data(~isnan(z_data));
z_data=z_data(~isnan(z_data));%random for the looks
%interpolating
F=scatteredInterpolant(x_data,y_data,z_data);
%read yourself what this does
F.Method = 'natural';
F.ExtrapolationMethod = 'none';
%choosing sample points
[x_sample,y_sample] = meshgrid(-6:0.05:6,-6:0.05:6);
%interpolation
z_interpolated=F(x_sample,y_sample);
%plot
surf(x_sample,y_sample,z_interpolated)
I hope I was able to help you. If you try it and it works it would be very nice of you to post the working code here so that in the future here stands a working solution.

Difficulty plotting correct IFFT of 2D function from given data

I am trying to read in a 2D data set into a matrix, plot the matrix, as well as plot the IFFT of the matrix. The data is 128x2 data set, with frequency in the first column (A) vs amplitude in the second column (B)
Unfortunately, plotting the matrix of the data is not plotting the correct waveform. Also, the IFFT seems to be incorrect as well.
waves = csvread('10cm.txt');
A = waves(:,1);
B = abs(waves(:,2));
Matrix = [A B];
waves_transform = abs(ifft2(waves));
figure, plot(waves);
figure, plot(waves_transform)
When I read in each column of the data and plot A vs B, the waveform of the data is correct but the ifft2 of the data is incorrect. I need to properly take the inverse Fourier transform of the two dimensional data that I have read in.
waves = csvread('10cm.txt');
A = waves(:,1);
B = abs(waves(:,2));
Matrix = [A B];
waves_transform = abs(ifft2(Matrix));
figure, plot(A,B);
figure, plot(waves_transform)
waves & waves_transform
Does anyone know why reading in the data and plotting it is different than reading in each of the columns and plotting it results in different graphs? Also, can anyone help me take the IFFT of the 2D data correctly?
10cm.txt DATA FILE HERE: http://pastebin.com/0t0TwVvC
According to MATLAB documentation, if you do plot(Y) and Y is a matrix, then the plot function plots the columns of Y versus their row number. The x-axis scale ranges from 1 to the number of rows in Y.
So, in your case you have to do:
plot(waves(:,1), waves(:,2))
Might I also suggest a free and IMO better numpy package for python

Matlab: plotting frequency distribution with a curve

I have to plot 10 frequency distributions on one graph. In order to keep things tidy, I would like to avoid making a histogram with bins and would prefer having lines that follow the contour of each histogram plot.
I tried the following
[counts, bins] = hist(data);
plot(bins, counts)
But this gives me a very inexact and jagged line.
I read about ksdensity, which gives me a nice curve, but it changes the scaling of my y-axis and I need to be able to read the frequencies from the y-axis.
Can you recommend anything else?
You're using the default number of bins for your histogram and, I will assume, for your kernel density estimation calculations.
Depending on how many data points you have, that will certainly not be optimal, as you've discovered. The first thing to try is to calculate the optimum bin width to give the smoothest curve while simultaneously preserving the underlying PDF as best as possible. (see also here, here, and here);
If you still don't like how smooth the resulting plot is, you could try using the bins output from hist as a further input to ksdensity. Perhaps something like this:
[kcounts,kbins] = ksdensity(data,bins,'npoints',length(bins));
I don't have your data, so you may have to play with the parameters a bit to get exactly what you want.
Alternatively, you could try fitting a spline through the points that you get from hist and plotting that instead.
Some code:
data = randn(1,1e4);
optN = sshist(data);
figure(1)
[N,Center] = hist(data);
[Nopt,CenterOpt] = hist(data,optN);
[f,xi] = ksdensity(data,CenterOpt);
dN = mode(diff(Center));
dNopt = mode(diff(CenterOpt));
plot(Center,N/dN,'.-',CenterOpt,Nopt/dNopt,'.-',xi,f*length(data),'.-')
legend('Default','Optimum','ksdensity')
The result:
Note that the "optimum" bin width preserves some of the fine structure of the distribution (I had to run this a couple times to get the spikes) while the ksdensity gives a smooth curve. Depending on what you're looking for in your data, that may be either good or bad.
How about interpolating with splines?
nbins = 10; %// number of bins for original histogram
n_interp = 500; %// number of values for interpolation
[counts, bins] = hist(data, nbins);
bins_interp = linspace(bins(1), bins(end), n_interp);
counts_interp = interp1(bins, counts, bins_interp, 'spline');
plot(bins, counts) %// original histogram
figure
plot(bins_interp, counts_interp) %// interpolated histogram
Example: let
data = randn(1,1e4);
Original histogram:
Interpolated:
Following your code, the y axis in the above figures gives the count, not the probability density. To get probability density you need to normalize:
normalization = 1/(bins(2)-bins(1))/sum(counts);
plot(bins, counts*normalization) %// original histogram
plot(bins_interp, counts_interp*normalization) %// interpolated histogram
Check: total area should be approximately 1:
>> trapz(bins_interp, counts_interp*normalization)
ans =
1.0009

How to create 3D joint density plot MATLAB?

I 'm having a problem with creating a joint density function from data. What I have is queue sizes from a stock as two vectors saved as:
X = [askQueueSize bidQueueSize];
I then use the hist3-function to create a 3D histogram. This is what I get:
http://dl.dropbox.com/u/709705/hist-plot.png
What I want is to have the Z-axis normalized so that it goes from [0 1].
How do I do that? Or do someone have a great joint density matlab function on stock?
This is similar (How to draw probability density function in MatLab?) but in 2D.
What I want is 3D with x:ask queue, y:bid queue, z:probability.
Would greatly appreciate if someone could help me with this, because I've hit a wall over here.
I couldn't see a simple way of doing this. You can get the histogram counts back from hist3 using
[N C] = hist3(X);
and the idea would be to normalise them with:
N = N / sum(N(:));
but I can't find a nice way to plot them back to a histogram afterwards (You can use bar3(N), but I think the axes labels will need to be set manually).
The solution I ended up with involves modifying the code of hist3. If you have access to this (edit hist3) then this may work for you, but I'm not really sure what the legal situation is (you need a licence for the statistics toolbox, if you copy hist3 and modify it yourself, this is probably not legal).
Anyway, I found the place where the data is being prepared for a surf plot. There are 3 matrices corresponding to x, y, and z. Just before the contents of the z matrix were calculated (line 256), I inserted:
n = n / sum(n(:));
which normalises the count matrix.
Finally once the histogram is plotted, you can set the axis limits with:
xlim([0, 1]);
if necessary.
With help from a guy at mathworks forum, this is the great solution I ended up with:
(data_x and data_y are values, which you want to calculate at hist3)
x = min_x:step:max_x; % axis x, which you want to see
y = min_y:step:max_y; % axis y, which you want to see
[X,Y] = meshgrid(x,y); *%important for "surf" - makes defined grid*
pdf = hist3([data_x , data_y],{x y}); %standard hist3 (calculated for yours axis)
pdf_normalize = (pdf'./length(data_x)); %normalization means devide it by length of
%data_x (or data_y)
figure()
surf(X,Y,pdf_normalize) % plot distribution
This gave me the joint density plot in 3D. Which can be checked by calculating the integral over the surface with:
integralOverDensityPlot = sum(trapz(pdf_normalize));
When the variable step goes to zero the variable integralOverDensityPlot goes to 1.0
Hope this help someone!
There is a fast way how to do this with hist3 function:
[bins centers] = hist3(X); % X should be matrix with two columns
c_1 = centers{1};
c_2 = centers{2};
pdf = bins / (sum(sum(bins))*(c_1(2)-c_1(1)) * (c_2(2)-c_2(1)));
If you "integrate" this you will get 1.
sum(sum(pdf * (c_1(2)-c_1(1)) * (c_2(2)-c_2(1))))

Matlab cdfplot: how to control the spacing of the marker spacing

I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.