Matlab, how to calculate AUC (Area Under Curve)? - matlab

I have the file data.txt with two columns and N rows, something like this:
0.009943796 0.4667975
0.009795735 0.46777886
0.009623984 0.46897832
0.009564759 0.46941447
0.009546991 0.4703958
0.009428543 0.47224948
0.009375241 0.47475737
0.009298249 0.4767201
[...]
Every couple of values in the file correspond to one point coordinates (x,y).
If plotted, this points generate a curve. I would like to calculate the area under curve (AUC) of this curve.
So I load the data:
data = load("data.txt");
X = data(:,1);
Y = data(:,2);
So, X contains all the x coordinates of the points, and Y all the y coordinates.
How could I calculate the area under curve (AUC) ?

Easiest way is the trapezoidal rule function trapz.
If your data is known to be smooth, you could try using Simpson's rule, but there's nothing built-in to MATLAB for integrating numerical data via Simpson's rule. (& I'm not sure how to use it for x/y data where x doesn't increase steadily)

just add
AUC = trapz(X,Y)
to your program
and you will get the area under the curve

You can do something like that:
AUC = sum((Y(1:end-1)+Y(2:end))/2.*...
(X(2:end)-X(1:end-1)));

Source: Link
An example in MATLAB to help you get your answer ...
x=[3 10 15 20 25 30];
y=[27 14.5 9.4 6.7 5.3 4.5];
trapz(x,y)
In case you have negative values in y, you can do like,
y=max(y,0)

[~,~,~,AUC] = perfcurve(labels,scores,posclass);
% posclass might be 1
http://www.mathworks.com/matlabcentral/newsreader/view_thread/252131

There are some options to trapz for the person ready to do some coding by themselves. This link shows the implementation of Simpson's rule, with python code included. There is also a File Exchange on simpsons rule.

Related

Creating 2D points near y=x

I need to generate some random 2D points (for example 30 points) near the y=x line, insert them in a matrix, plot it and then calculate the SVD of the matrix. But since I'm new to MATLAB I don't know how can I generate my desired matrix.
Since this looks like homework I'll just post some general ideas here.
randi can be used to get semi-random integers. Using that you can create a 2D matrix by duplicating the array and putting them together. Thus: generate a 30x1 column and duplicate it to a 30x2 column. All rows will have the same two entries, i.e. x=y.
Noise can be added to this by creating a 30x2 matrix of random numbers, use rand for that and simply add that to the previously created matrix.
Check the documentation on svd to see how the singular-value decomposition works, it's fairly straight-forward if you know your linear algebra.
Finally for plotting you can use various tools such as image, imagesc, plot, surf and scatter, try them and see which works best for you.
Here is a quick example I made: https://saturnapi.com/fullstack/2d-points-randomly-near-line
%// Welcome to Saturn's MATLAB-Octave API.
%// Delete the sample code below these comments and write your own!'
x = 13 + 6.*rand(20,1);
y = x*0.7 + 0.5*rand(20,1);
[X,Y] = meshgrid(x,y)
figure(1);
plot(x,y,'.');
%// Print plot as PNG with resultion of 60 pixels per inch
print("MyPNG.png", "-dpng", "-r60");

3-D Plotting with MATLAB for Galton's Skewness and Moor's Kurtosis

I know there are many plotting documents for Matlab online and I am pretty sure that it has been asked many times. I aplogize in advance for any inconvenience.
I am dealing with a new distribution and I need to draw 3D plot for different values of parameters (I can do it with Excel or any other programs, however, since my other graphs is drawn with MATLAB, and I need to put this 3D in Matlab, too, to publish it as an article). I calculated the result using MATLAB loops, however, plotting gives me the hardest time. I had no other choice but to ask for your assistance. I have these equations for different alphas and betas with a constant sigma and calculate Galton's Skewness and Moor's Kurtosis given with the last two equations.
median=sqrt(2*(sigma^2)*beta*gammaincinv(0.5,alpha));
q1=sqrt(2*(sigma^2)*beta*gammaincinv((6/8),alpha));
q3=sqrt(2*(sigma^2)*beta*gammaincinv((2/8),alpha));
q4=sqrt(2*(sigma^2)*beta*gammaincinv((7/8),alpha));
q5=sqrt(2*(sigma^2)*beta*gammaincinv((5/8),alpha));
q6=sqrt(2*(sigma^2)*beta*gammaincinv((3/8),alpha));
q7=sqrt(2*(sigma^2)*beta*gammaincinv((1/8),alpha));
galtonskewness=(q1-2*median+q3)/(q1-q3);
moorskurtosis=(q4-q5+q6-q7)/(q1-q3);
Let's assume that,
sigma=1
beta=[0.1 0.2 0.5 1 2 5];
alpha=[0.1 0.2 0.5 1 2 5];
I have used mesh(X,Y,Z) for the same range of alphas and betas with the same increment but I take the error "these values cannot be complex". I just want to draw something like the one below.
It must be something easy that I am missing out, but I do not understand where the mistake is. I appreciate any help. Thank you!
I ran the above code for a 2D mesh of points for alpha and beta between 0.1 and 5 for both dimensions and I got results for both.
I suspect it's due to your alpha and beta declaration. You are only providing a few points, and if you try to use mesh, it won't get good results. Therefore, define a meshgrid of points for both alpha and beta, then vectorize your MATLAB code to produce the kurotsis and skewness curves. Only under certain situations should you use for loops. In general, you should avoid using them whenever possible.
How meshgrid works is that given a range of X and Y values, it will produce two (or three if you want 3D co-ordinates) arrays where each location in each array gives you the spatial co-ordinate at that particular location. Therefore, if we did something like:
[X,Y] = meshgrid(1:3, 1:3);
This is what we get:
X =
1 2 3
1 2 3
1 2 3
Y =
1 1 1
2 2 2
3 3 3
Notice that in a 2D grid, for the top-left corner, (x,y) = (1,1), and so for the corresponding location in X, we get 1 and Y we get 1. If you do the same logic for any other position in the 2D grid, you simply look at the X and Y values in each array and it will tell you what the component is for each dimension.
As such, instead of looping through all possible points in your grid, generate them all using meshgrid, then vectorize the computation by calculating your values all at once rather than individually. Once you do this, you have the right structure to be able to put this into mesh.
Therefore, try doing this instead:
%// Define meshgrid of points
[alpha,beta] = meshgrid(0.1:0.1:5, 0.1:0.1:5);
%// From your code
sigma = 1;
%// Calculate quantities - Notice that this is all vectorized
med=sqrt(2*(sigma^2)*beta.*gammaincinv(0.5,alpha));
q1=sqrt(2*(sigma^2)*beta.*gammaincinv((6/8),alpha));
q3=sqrt(2*(sigma^2)*beta.*gammaincinv((2/8),alpha));
q4=sqrt(2*(sigma^2)*beta.*gammaincinv((7/8),alpha));
q5=sqrt(2*(sigma^2)*beta.*gammaincinv((5/8),alpha));
q6=sqrt(2*(sigma^2)*beta.*gammaincinv((3/8),alpha));
q7=sqrt(2*(sigma^2)*beta.*gammaincinv((1/8),alpha));
galtonskewness=(q1-2*med+q3)./(q1-q3);
moorskurtosis=(q4-q5+q6-q7)./(q1-q3);
%// Show our meshes
figure;
mesh(alpha, beta, galtonskewness);
figure;
mesh(alpha, beta, moorskurtosis);
Also take note that I renamed your median variable to med. MATLAB has a function called median and so you don't want to unintentionally shadow over this function with a variable of the same name.
This is what I get:
Take note that I'm not getting the plots that you have placed in your post. It may be because I'm choosing the wrong variables to define the mesh, or perhaps your equations may be incorrect. Double check what you know in theory to what you have here in code and try again.
This should hopefully give you enough to start with though!

matlab: cdfplot of relative error

The figure shown above is the plot of cumulative distribution function (cdf) plot for relative error (attached together the code used to generate the plot). The relative error is defined as abs(measured-predicted)/(measured). May I know the possible error/interpretation as the plot is supposed to be a smooth curve.
X = load('measured.txt');
Xhat = load('predicted.txt');
idx = find(X>0);
x = X(idx);
xhat = Xhat(idx);
relativeError = abs(x-xhat)./(x);
cdfplot(relativeError);
The input data file is a 4x4 matrix with zeros on the diagonal and some unmeasured entries (represent with 0). Appreciate for your kind help. Thanks!
The plot should be a discontinuous one because you are using discrete data. You are not plotting an analytic function which has an explicit (or implicit) function that maps, say, x to y. Instead, all you have is at most 16 points that relates x and y.
The CDF only "grows" when new samples are counted; otherwise its value remains steady, just because there isn't any satisfying sample that could increase the "frequency".
You can check the example in Mathworks' `cdfplot1 documentation to understand the concept of "empirical cdf". Again, only when you observe a sample can you increase the cdf.
If you really want to "get" a smooth curve, either 1) add more points so that the discontinuous line looks smoother, or 2) find any statistical model of whatever you are working on, and plot the analytic function instead.

How can I find equation of a plot connecting data points in Matlab?

I have various plots (with hold on) as show in the following figure:
I would like to know how to find equations of these six curves in Matlab. Thanks.
I found interactive fitting tool in Matlab simple and helpful, though somewhat limited in scope:
The graph above seems to be linear interpolation. Given vectors X and Y of data, where X contains the arguments and Y the function points, you could do
f = interp1(X, Y, x)
to get the linearly interpolated value f(x). For example if the data is
X = [0 1 2 3 4 5];
Y = [0 1 4 9 16 25];
then
y = interp1(X, Y, 1.5)
should give you a very rough approximation to 1.5^2. interp1 will match the graph exactly, but you might be interested in fancier curve-fitting operations, like spline approximations etc.
Does rxns stand for reactions? In that case, your curves are most likely exponential. An exponential function has the form: y = a*exp(b * x) . In your case, y is the width of mixing zone, and x is the time in years. Now, all you need to do is run exponential regression in Matlab to find the optimal values of parameters a and b, and you'll have your equations.
The advice, though there might be better answer, from me is: try to see the rate of increase in the curve. For example, cubic is more representative than quadratic if the rate of increase seems fast and find the polynomial and compute the deviation error. For irregular curves, you might try spline fitting. I guess there is also a toolbox in matlab for spline fitting.
There is a way to extract information with the current figure handle (gcf) from you graph.
For example, you can get the series that were plotted in a graph:
% Some figure is created and data are plotted on it
figure;
hold on;
A = [ 1 2 3 4 5 7] % Dummy data
B = A.*A % Some other dummy data
plot(A,B);
plot(A.*3,B-1);
% Those three lines of code will get you series that were plotted on your graph
lh=findall(gcf,'type','line'); % Extract the plotted line from the figure handle
xp=get(lh,'xdata'); % Extract the Xs
yp=get(lh,'ydata'); % Extract the Ys
There must be other informations that you can get from the "findall(gcf,...)" methods.

How to create 3D joint density plot MATLAB?

I 'm having a problem with creating a joint density function from data. What I have is queue sizes from a stock as two vectors saved as:
X = [askQueueSize bidQueueSize];
I then use the hist3-function to create a 3D histogram. This is what I get:
http://dl.dropbox.com/u/709705/hist-plot.png
What I want is to have the Z-axis normalized so that it goes from [0 1].
How do I do that? Or do someone have a great joint density matlab function on stock?
This is similar (How to draw probability density function in MatLab?) but in 2D.
What I want is 3D with x:ask queue, y:bid queue, z:probability.
Would greatly appreciate if someone could help me with this, because I've hit a wall over here.
I couldn't see a simple way of doing this. You can get the histogram counts back from hist3 using
[N C] = hist3(X);
and the idea would be to normalise them with:
N = N / sum(N(:));
but I can't find a nice way to plot them back to a histogram afterwards (You can use bar3(N), but I think the axes labels will need to be set manually).
The solution I ended up with involves modifying the code of hist3. If you have access to this (edit hist3) then this may work for you, but I'm not really sure what the legal situation is (you need a licence for the statistics toolbox, if you copy hist3 and modify it yourself, this is probably not legal).
Anyway, I found the place where the data is being prepared for a surf plot. There are 3 matrices corresponding to x, y, and z. Just before the contents of the z matrix were calculated (line 256), I inserted:
n = n / sum(n(:));
which normalises the count matrix.
Finally once the histogram is plotted, you can set the axis limits with:
xlim([0, 1]);
if necessary.
With help from a guy at mathworks forum, this is the great solution I ended up with:
(data_x and data_y are values, which you want to calculate at hist3)
x = min_x:step:max_x; % axis x, which you want to see
y = min_y:step:max_y; % axis y, which you want to see
[X,Y] = meshgrid(x,y); *%important for "surf" - makes defined grid*
pdf = hist3([data_x , data_y],{x y}); %standard hist3 (calculated for yours axis)
pdf_normalize = (pdf'./length(data_x)); %normalization means devide it by length of
%data_x (or data_y)
figure()
surf(X,Y,pdf_normalize) % plot distribution
This gave me the joint density plot in 3D. Which can be checked by calculating the integral over the surface with:
integralOverDensityPlot = sum(trapz(pdf_normalize));
When the variable step goes to zero the variable integralOverDensityPlot goes to 1.0
Hope this help someone!
There is a fast way how to do this with hist3 function:
[bins centers] = hist3(X); % X should be matrix with two columns
c_1 = centers{1};
c_2 = centers{2};
pdf = bins / (sum(sum(bins))*(c_1(2)-c_1(1)) * (c_2(2)-c_2(1)));
If you "integrate" this you will get 1.
sum(sum(pdf * (c_1(2)-c_1(1)) * (c_2(2)-c_2(1))))