How can I create a probability density function in Matlab? - matlab

I have a table with two sets of values.
And I want to create equally sized bins of x, count the number of values of y in each bin and then plot it.
How can I do it?
Data
x y
0 0.0023243872
815.54065 0.0021484715
1111.9492 0.0023388069
1378.9236 0.0021542402
1631.0813 0.0021254013
1927.4899 0.0023618778
2194.3323 0.0021484711
2223.8984 0.0023157364
2446.6221 0.0022868966
2490.8728 0.0023388073
2743.0305 0.0024801167
3009.7410 0.0021917303
3262.1626 0.0022955481
3306.2815 0.0021052146
3335.8479 0.0023330392
3558.5713 0.0024772326
3602.6660 0.0023474589
3825.1497 0.0022292205
4121.6904 0.0021023308
4374.1118 0.0024916520
4447.7969 0.0023935998
4640.5586 0.0022522912
4714.5371 0.0023705289
4937.0991 0.0022263369
5233.6396 0.0021773111
5262.8101 0.0024656970
5455.9673 0.0024339736
5559.7461 0.0024455092
5752.5078 0.0021167498
5752.5078 0.0027021724
5826.4863 0.0023936001
6019.4819 0.0027021721
6048.7842 0.0021686594
6271.3760 0.0024368572
6345.5889 0.0022321043
6567.9165 0.0021167498
6612.3291 0.0022205692
6835.0225 0.0027165920
7131.4312 0.0027483148
7160.6016 0.0023849490
7427.3418 0.0020042793
7457.5381 0.0022032652
7650.2212 0.0021109823
7724.2002 0.0023301556
7724.2783 0.0022090334
7724.2783 0.0021801949
7947.1040 0.0028059918
7947.1040 0.0027425468
8242.3545 0.0019927442
8243.3809 0.0029588358
8465.4980 0.0024455097
8465.4980 0.0022032652
8510.5107 0.0029213454
8539.2910 0.0022148010
8539.2910 0.0020734922
8762.1709 0.0021686594
8762.1709 0.0026070056
8762.7764 0.0028232955
8805.9531 0.0020042795
8806.0313 0.0020590730

I am not sure to understand how we need y.
Is this what you are looking for ?
N = 100 ;
x = rand(N, 1) ;
M = hist(x) ;
plot(M) ;

I don't think the y matters. You just want to count the number of references to the x, which represent the y. You can use the hist(x, bins) method where x is the vector of data, your x, and bins is the number of bins you would like to put them in.
See http://www.mathworks.com/help/matlab/ref/hist.html

Related

Which Bins are occupied in a 3D histogram in MatLab

I got 3D data, from which I need to calculate properties.
To reduce computung I wanted to discretize the space and calculate the properties from the Bin instead of the individual data points and then reasign the propertie caclulated from the bin back to the datapoint.
I further only want to calculate the Bins which have points within them.
Since there is no 3D-binning function in MatLab, what i do is using histcounts over each dimension and then searching for the unique Bins that have been asigned to the data points.
a5pre=compositions(:,1);
a7pre=compositions(:,2);
a8pre=compositions(:,3);
%% BINNING
a5pre_edges=[0,linspace(0.005,0.995,19),1];
a5pre_val=(a5pre_edges(1:end-1) + a5pre_edges(2:end))/2;
a5pre_val(1)=0;
a5pre_val(end)=1;
a7pre_edges=[0,linspace(0.005,0.995,49),1];
a7pre_val=(a7pre_edges(1:end-1) + a7pre_edges(2:end))/2;
a7pre_val(1)=0;
a7pre_val(end)=1;
a8pre_edges=a7pre_edges;
a8pre_val=a7pre_val;
[~,~,bin1]=histcounts(a5pre,a5pre_edges);
[~,~,bin2]=histcounts(a7pre,a7pre_edges);
[~,~,bin3]=histcounts(a8pre,a8pre_edges);
bins=[bin1,bin2,bin3];
[A,~,C]=unique(bins,'rows','stable');
a5pre=a5pre_val(A(:,1));
a7pre=a7pre_val(A(:,2));
a8pre=a8pre_val(A(:,3));
It seems like that the unique function is pretty time consuming, so I was wondering if there is a faster way to do it, knowing that the line only can contain integer or so... or a totaly different.
Best regards
function [comps,C]=compo_binner(x,y,z,e1,e2,e3,v1,v2,v3)
C=NaN(length(x),1);
comps=NaN(length(x),3);
id=1;
for i=1:numel(x)
B_temp(1,1)=v1(sum(x(i)>e1));
B_temp(1,2)=v2(sum(y(i)>e2));
B_temp(1,3)=v3(sum(z(i)>e3));
C_id=sum(ismember(comps,B_temp),2)==3;
if sum(C_id)>0
C(i)=find(C_id);
else
comps(id,:)=B_temp;
id=id+1;
C_id=sum(ismember(comps,B_temp),2)==3;
C(i)=find(C_id>0);
end
end
comps(any(isnan(comps), 2), :) = [];
end
But its way slower than the histcount, unique version. Cant avoid find-function, and thats a function you sure want to avoid in a loop when its about speed...
If I understand correctly you want to compute a 3D histogram. If there's no built-in tool to compute one, it is simple to write one:
function [H, lindices] = histogram3d(data, n)
% histogram3d 3D histogram
% H = histogram3d(data, n) computes a 3D histogram from (x,y,z) values
% in the Nx3 array `data`. `n` is the number of bins between 0 and 1.
% It is assumed all values in `data` are between 0 and 1.
assert(size(data,2) == 3, 'data must be Nx3');
H = zeros(n, n, n);
indices = floor(data * n) + 1;
indices(indices > n) = n;
lindices = sub2ind(size(H), indices(:,1), indices(:,2), indices(:,3));
for ii = 1:size(data,1)
H(lindices(ii)) = H(lindices(ii)) + 1;
end
end
Now, given your compositions array, and binning each dimension into 20 bins, we get:
[H, indices] = histogram3d(compositions, 20);
idx = find(H);
[x,y,z] = ind2sub(size(H), idx);
reduced_compositions = ([x,y,z] - 0.5) / 20;
The bin centers for H are at ((1:20)-0.5)/20.
On my machine this runs in a fraction of a second for 5 million inputs points.
Now, for each composition(ii,:), you have a number indices(ii), which matches with another number idx[jj], corresponding to reduced_compositions(jj,:). One easy way to make the assignment of results is as follows:
H(H > 0) = 1:numel(idx);
indices = H(indices);
Now for each composition(ii,:), your closest match in the reduced set is reduced_compositions(indices(ii),:).

How can I sum data in column 2 using bins in Column 1?

I would like to create 10 equally-sized bins of x. And then sum all values of y inside those bins of x. And then plot y v/s x. How can I go about it? Is there a function for it already?
Should I use accumarray?
x y
0 0.0023243872
815.54065 0.0021484715
1111.9492 0.0023388069
1378.9236 0.0021542402
1631.0813 0.0021254013
1927.4899 0.0023618778
2194.3323 0.0021484711
2223.8984 0.0023157364
2446.6221 0.0022868966
2490.8728 0.0023388073
2743.0305 0.0024801167
3009.7410 0.0021917303
3262.1626 0.0022955481
3306.2815 0.0021052146
3335.8479 0.0023330392
3558.5713 0.0024772326
3602.6660 0.0023474589
3825.1497 0.0022292205
4121.6904 0.0021023308
4374.1118 0.0024916520
4447.7969 0.0023935998
4640.5586 0.0022522912
4714.5371 0.0023705289
4937.0991 0.0022263369
5233.6396 0.0021773111
5262.8101 0.0024656970
5455.9673 0.0024339736
5559.7461 0.0024455092
5752.5078 0.0021167498
5752.5078 0.0027021724
5826.4863 0.0023936001
6019.4819 0.0027021721
6048.7842 0.0021686594
6271.3760 0.0024368572
6345.5889 0.0022321043
6567.9165 0.0021167498
6612.3291 0.0022205692
6835.0225 0.0027165920
7131.4312 0.0027483148
7160.6016 0.0023849490
7427.3418 0.0020042793
7457.5381 0.0022032652
7650.2212 0.0021109823
7724.2002 0.0023301556
7724.2783 0.0022090334
7724.2783 0.0021801949
7947.1040 0.0028059918
7947.1040 0.0027425468
8242.3545 0.0019927442
8243.3809 0.0029588358
8465.4980 0.0024455097
8465.4980 0.0022032652
8510.5107 0.0029213454
8539.2910 0.0022148010
8539.2910 0.0020734922
8762.1709 0.0021686594
8762.1709 0.0026070056
8762.7764 0.0028232955
8805.9531 0.0020042795
8806.0313 0.0020590730
Here's the small script that can help you:
n = 10;%//Number of bins
bins = linspace(0,max(x),n+1);%//Starting and ending points of your bins
x_new = bins(1:n) + 0.5*(bins(2)-bins(1));%//Middle values of your bins
y_new = zeros(size(x_new));
for k = 1:n
y_new(k) = sum(y((x>bins(k))&(x<bins(k+1))));
end
plot(x_new,y_new)
The key is inside the for loop. We use the conditional indexing. From vector y we take only those values that correspond to the x values in a particular bin of interest. What you get at the output is:
Hope that helps
You can use the histogram function to put your data into 10 bins:
h=histogram(x,10)
Now you need to sum up y-values in each bin. The number of elements in each bin are stored in the array h.Values. You can sum corresponding y-elements using for loop.
s=zeros(size(h.Values));
start=1;
for i=1:numel(s)
s(i)=sum(y(start:h.Values(i)+start-1));
start=h.Values(i)+start;
end
Now you can plot. Here each value is plotted against the corresponding bin center.
plot(linspace(h.BinLimits(1)+h.BinWidth/2,h.BinLimits(2)-h.BinWidth/2,h.NumBins),s);

How to interpolate random non monotonic increasing data

So I am working on my Thesis and I need to calculate geometric characteristics of an airfoil.
To do this, I need to interpolate the horizontal and vertical coordinates of an airfoil. This is used for a tool which will calculate the geometric characteristics automatically which come from random airfoil geometry files.
Sometime the Y values of the airfoil are non monotonic. Hence, the interp1 command gives an error since some values in the Y vector are repeated.
Therefore, my question is: How do I recognize and subsequently interpolate non monotonic increasing data automatically in Matlab.
Here is a sample data set:
0.999974 0.002176
0.994846 0.002555
0.984945 0.003283
0.973279 0.004131
0.960914 0.005022
0.948350 0.005919
0.935739 0.006810
0.923111 0.007691
0.910478 0.008564
0.897850 0.009428
0.885229 0.010282
0.872617 0.011125
0.860009 0.011960
0.847406 0.012783
0.834807 0.013598
0.822210 0.014402
0.809614 0.015199
0.797021 0.015985
0.784426 0.016764
0.771830 0.017536
0.759236 0.018297
0.746639 0.019053
0.734038 0.019797
0.721440 0.020531
0.708839 0.021256
0.696240 0.021971
0.683641 0.022674
0.671048 0.023367
0.658455 0.024048
0.645865 0.024721
0.633280 0.025378
0.620699 0.026029
0.608123 0.026670
0.595552 0.027299
0.582988 0.027919
0.570436 0.028523
0.557889 0.029115
0.545349 0.029697
0.532818 0.030265
0.520296 0.030820
0.507781 0.031365
0.495276 0.031894
0.482780 0.032414
0.470292 0.032920
0.457812 0.033415
0.445340 0.033898
0.432874 0.034369
0.420416 0.034829
0.407964 0.035275
0.395519 0.035708
0.383083 0.036126
0.370651 0.036530
0.358228 0.036916
0.345814 0.037284
0.333403 0.037629
0.320995 0.037950
0.308592 0.038244
0.296191 0.038506
0.283793 0.038733
0.271398 0.038920
0.259004 0.039061
0.246612 0.039153
0.234221 0.039188
0.221833 0.039162
0.209446 0.039064
0.197067 0.038889
0.184693 0.038628
0.172330 0.038271
0.159986 0.037809
0.147685 0.037231
0.135454 0.036526
0.123360 0.035684
0.111394 0.034690
0.099596 0.033528
0.088011 0.032181
0.076685 0.030635
0.065663 0.028864
0.055015 0.026849
0.044865 0.024579
0.035426 0.022076
0.027030 0.019427
0.019970 0.016771
0.014377 0.014268
0.010159 0.012029
0.007009 0.010051
0.004650 0.008292
0.002879 0.006696
0.001578 0.005207
0.000698 0.003785
0.000198 0.002434
0.000000 0.001190
0.000000 0.000000
0.000258 -0.001992
0.000832 -0.003348
0.001858 -0.004711
0.003426 -0.005982
0.005568 -0.007173
0.008409 -0.008303
0.012185 -0.009379
0.017243 -0.010404
0.023929 -0.011326
0.032338 -0.012056
0.042155 -0.012532
0.052898 -0.012742
0.064198 -0.012720
0.075846 -0.012533
0.087736 -0.012223
0.099803 -0.011837
0.111997 -0.011398
0.124285 -0.010925
0.136634 -0.010429
0.149040 -0.009918
0.161493 -0.009400
0.173985 -0.008878
0.186517 -0.008359
0.199087 -0.007845
0.211686 -0.007340
0.224315 -0.006846
0.236968 -0.006364
0.249641 -0.005898
0.262329 -0.005451
0.275030 -0.005022
0.287738 -0.004615
0.300450 -0.004231
0.313158 -0.003870
0.325864 -0.003534
0.338565 -0.003224
0.351261 -0.002939
0.363955 -0.002680
0.376646 -0.002447
0.389333 -0.002239
0.402018 -0.002057
0.414702 -0.001899
0.427381 -0.001766
0.440057 -0.001656
0.452730 -0.001566
0.465409 -0.001496
0.478092 -0.001443
0.490780 -0.001407
0.503470 -0.001381
0.516157 -0.001369
0.528844 -0.001364
0.541527 -0.001368
0.554213 -0.001376
0.566894 -0.001386
0.579575 -0.001398
0.592254 -0.001410
0.604934 -0.001424
0.617614 -0.001434
0.630291 -0.001437
0.642967 -0.001443
0.655644 -0.001442
0.668323 -0.001439
0.681003 -0.001437
0.693683 -0.001440
0.706365 -0.001442
0.719048 -0.001444
0.731731 -0.001446
0.744416 -0.001443
0.757102 -0.001445
0.769790 -0.001444
0.782480 -0.001445
0.795173 -0.001446
0.807870 -0.001446
0.820569 -0.001446
0.833273 -0.001446
0.845984 -0.001448
0.858698 -0.001448
0.871422 -0.001451
0.884148 -0.001448
0.896868 -0.001446
0.909585 -0.001443
0.922302 -0.001445
0.935019 -0.001446
0.947730 -0.001446
0.960405 -0.001439
0.972917 -0.001437
0.984788 -0.001441
0.994843 -0.001441
1.000019 -0.001441
First column is X and the second column is Y. Notice how the last values of Y are repeated.
Maybe someone can provide me with a piece of code to do this? Or any suggestions are welcome as well.
Remember I need to automate this process.
Thanks for your time and effort I really appreciate it!
There is quick and dirty method if you do not know the exact function defining the foil profile. Split your data into 2 sets, top and bottom planes, so the 'x' data are monotonic increasing.
First I imported your data table in the variable A, then:
%// just reorganise your input in individual vectors. (this is optional but
%// if you do not do it you'll have to adjust the code below)
x = A(:,1) ;
y = A(:,2) ;
ipos = y > 0 ; %// indices of the top plane
ineg = y <= 0 ; %// indices of the bottom plane
xi = linspace(0,1,500) ; %// new Xi for interpolation
ypos = interp1( x(ipos) , y(ipos) , xi ) ; %// re-interp the top plane
yneg = interp1( x(ineg) , y(ineg) , xi ) ; %// re-interp the bottom plane
y_new = [fliplr(yneg) ypos] ; %// stiches the two half data set together
x_new = [fliplr(xi) xi] ;
%% // display
figure
plot(x,y,'o')
hold on
plot(x_new,y_new,'.r')
axis equal
As said on top, it is quick and dirty. As you can see from the detail figure, you can greatly improve the x resolution this way in the area where the profile is close to the horizontal direction, but you loose a bit of resolution at the noose of the foil where the profile is close to the vertical direction.
If it's acceptable then you're all set. If you really need the resolution at the nose, you could look at interpolating on x as above but do a very fine x grid near the noose (instead of the regular x grid I provided as example).
if your replace the xi definition above by:
xi = [linspace(0,0.01,50) linspace(0.01,1,500)] ;
You get the following near the nose:
adjust that to your needs.
To interpolate any function, there must be a function defined. When you define y=f(x), you cannot have the same x for two different values of y because then we are not talking about a function. In your example data, neither x nor y are monotonic, so anyway you slice it, you'll have two (or more) "y"s for the same "x". If you wish to interpolate, you need to divide this into two separate problems, top/bottom and define proper functions for interp1/2/n to work with, for example, slice it horizontally where x==0. In any case, you would have to provide additional info than just x or y alone, e.g.: x=0.5 and y is on top.
On the other hand, if all you want to do is to insert a few values between each x and y in your array, you can do this using finite differences:
%// transform your original xy into 3d array where x is in first slice and y in second
xy = permute(xy(85:95,:), [3,1,2]); %// 85:95 is near x=0 in your data
%// lets say you want to insert three additional points along each line between every two points on given airfoil
h = [0, 0.25, 0.5, 0.75].'; %// steps along each line - column vector
%// every interpolated h along the way between f(x(n)) and f(x(n+1)) can
%// be defined as: f(x(n) + h) = f(x(n)) + h*( f(x(n+1)) - f(x(n)) )
%// this is first order finite differences approximation in 1D. 2D is very
%// similar only with gradient (this should be common knowledge, look it up)
%// from here it's just fancy matrix play
%// 2D gradient of xy curve
gradxy = diff(xy, 1, 2); %// diff xy, first order, along the 2nd dimension, where x and y now run
h_times_gradxy = bsxfun(#times, h, gradxy); %// gradient times step size
xy_in_3d_array = bsxfun(#plus, xy(:,1:end-1,:), h_times_gradxy); %// addition of "f(x)" and there we have it, the new x and y for every step h
[x,y] = deal(xy_in_3d_array(:,:,1), xy_in_3d_array(:,:,2)); %// extract x and y from 3d matrix
xy_interp = [x(:), y(:)]; %// use Matlab's linear indexing to flatten x and y into columns
%// plot to check results
figure; ax = newplot; hold on;
plot(ax, xy(:,:,1), xy(:,:,2),'o-');
plot(ax, xy_interp(:,1), xy_interp(:,2),'+')
legend('Original','Interpolated',0);
axis tight;
grid;
%// The End
And these are the results, near x=0 for clarity of presentation:
Hope that helps.
Cheers.

add constant c to each element of specific part of matrix, avoiding loop

I have an mxn matrix X of return values, where I want to add a constant term c for each element of the following sub matrix Y of my original matrix X.
Y = X(end-4:end,:)
Is there a possibility avoiding a loop?
Thanks for any help!
Generate some sample data
X = rand(6,6)
X =
0.9696054 0.7389534 0.7440913 0.2781074 0.0622399 0.0154607
0.8043438 0.8845991 0.1999374 0.2341657 0.6345166 0.8774855
0.0092971 0.1108798 0.1118406 0.6249466 0.3932468 0.4050876
0.6970928 0.1084640 0.0937833 0.8243776 0.7633255 0.0650740
0.3161001 0.4452197 0.1290970 0.5837050 0.5709813 0.2331514
0.0739229 0.5626630 0.8300330 0.9590604 0.0852536 0.0225583
I do Y=X so that that X and Y will have the same dimensions and it can easily be seen where the addition occured. It's for display purposes, really.
Y=X;
Add the constant to the elements you want, element-wise
Y(end-4:end,:) = X(end-4:end,:)+4
Y =
0.969605 0.738953 0.744091 0.278107 0.062240 0.015461
4.804344 4.884599 4.199937 4.234166 4.634517 4.877485
4.009297 4.110880 4.111841 4.624947 4.393247 4.405088
4.697093 4.108464 4.093783 4.824378 4.763326 4.065074
4.316100 4.445220 4.129097 4.583705 4.570981 4.233151
4.073923 4.562663 4.830033 4.959060 4.085254 4.022558
The relevant elements are now four times larger than they originally were.

Binning in matlab

I have been unable to find a function in matlab or octave to do what I want.
I have a matrix m of two columns (x and y values). I know that I can extract the column by doing m(:,1) or m(:,2). I want to split it into smaller matricies of [potentially] equal size and and plot the mean of these matricies. In other words, I want to put the values into bins based on the x values, then find means of the bins. I feel like the hist function should help me, but it doesn't seem to.
Does anyone know of a built-in function to do something like this?
edit
I had intended to mention that I looked at hist and couldn't get it to do what I wanted, but it must have slipped my mind.
Example: Let's say I have the following (I'm trying this in octave, but afaik it works in matlab):
x=1:20;
y=[1:10,10:1];
m=[x, y];
If I want 10 bins, I would like m to be split into:
m1=[1:2, 1:2]
...
m5=[9:10, 9:10]
m6=[10:11, 10:-1:9]
...
m10=[19:20, 2:-1:1]
and then get the mean of each bin.
Update: I have posted a follow-up question here. I would greatly appreciate responses.
I have answered this in video form on my blog:
http://blogs.mathworks.com/videos/2009/01/07/binning-data-in-matlab/
Here is the code:
m = rand(10,2); %Generate data
x = m(:,1); %split into x and y
y = m(:,2);
topEdge = 1; % define limits
botEdge = 0; % define limits
numBins = 2; % define number of bins
binEdges = linspace(botEdge, topEdge, numBins+1);
[h,whichBin] = histc(x, binEdges);
for i = 1:numBins
flagBinMembers = (whichBin == i);
binMembers = y(flagBinMembers);
binMean(i) = mean(binMembers);
end