Find the sum of two histograms and plot it - matlab

Good day! I have two histograms and I can draw them overlaid on top of one another. But I need to find the sums in each of the columns and plot the summed histogram.
I cannot sum them up because the histogram is built using a function, and you cannot get the values in the column from it.
N = 50; % Total amount
x1 = randn(N,1); % Normally distributed numbers
x2 = rand(N,1)*2; % Normally distributed numbers and shifted
k = -5:0.5:5;
R1 = histogram(x1, k)
hold on
histogram(x2, k)
grid on

Maybe you need [x1,x2] for histogram, e.g.,
histogram([x1,x2],k)
If you have many variables like x1,x2, ..., xN, you can try
histogram(eval(sprintf("[%s]",strjoin(who("x*"),","))),k)

Related

Cross validation and ROC curve using Matlab: how plot mean ROC curve?

I am using k-fold cross validation with k = 10. Thus, I have 10 ROC curves.
I would like to average between the curves. I can't just average the values ​​on the Y axes (using perfcurve) because the vectors returned are not the same size.
[X1,Y1,T1,AUC1] = perfcurve(t_test(1),resp(1),1);
.
.
.
[X10,Y10,T10,AUC10] = perfcurve(t_test(10),resp(10),1);
How to solve this? How can I plot the average curve of the 10 ROC curves?
So, you have k curves with different number of points, all bound in [0..1] interval in both dimensions. First, you need to calculate interpolated values for each curve at specified query points. Now you have new curves with fixed number of points and can compute their mean. The interp1 function will do the interpolation part.
%% generating sample data
k = 10;
X = cell(k, 1);
Y = cell(k, 1);
hold on;
for i=1:k
n = 10+randi(10);
X{i} = sort([0 1 rand(1, n)]);
Y{i} = sort([0 1 rand(1, n)].^.5);
end
%% Calculating interpolations
% location of query points
X2 = linspace(0, 1, 50);
n = numel(X2);
% initializing values for different curves at different query points
Y2 = zeros(k, n);
for i=1:k
% finding interpolated values for i-th curve
Y2(i, :) = interp1(X{i}, Y{i}, X2);
end
% finding the mean
meanY = mean(Y2, 1);
Notice that different interpolation methods can affect your results. For example, the ROC plot data are kind of stairs data. To find the exact values on such curves, you should use the Previous Neighbor Interpolation method, instead of the Linear Interpolation which is the default method of interp1:
Y2(i, :) = interp1(X{i}, Y{i}, X2); % linear
Y3(i, :) = interp1(X{i}, Y{i}, X2, 'previous');
This is how it affects the final results:
I solved it using Matlab's perfcurve. For that, I had to pass as a parameter a list of vectors (size vectors 1xn) for "label" and "scores". Thus, the perfcurve function already understands as a set of resolutions made using k-fold and returns the average ROC curve and its confidence interval, in addition to the AUC and its confidence interval.
[X1,Y1,T1,AUC1] = perfcurve(t_test_list,resp_list,1);
t_test and resp they are lists of size 1xk (k is the number of folds / k-fold) and each element of the lists is a 1xn vector with scores and labels.
resp = nnet(x_test(i));
t_test_act = t_test(i);
resp has 2xn format (n is the number of predicted samples). There are two classes.
t_test_act contains the labels of the current set of tests, it has formed 2xn and is composed of 0 and 1 (each column has a 1 and a 0, indicating the true class of the sample).
resp_list{i} = resp(1,:) %(scores)
t_test_list{i} = t_test_act(1,:) %(labels)
[X1,Y1,T1,AUC1] = perfcurve(t_test_list,resp_list,1);

matlab scatter plot using colorbar for 2 vectors

I have a two columns of data. X = Model values of NOx concentrations and Y = Observations of NOx concentrations. Now, I want to scatter plot X, Y (markers varying with colors) as well as the colourbar which would show me the counts (i.e. number of data points in that range). X and Y are daily data for a year, i.e. 365 rows.
Please help me. Any help is greatly appreciated.
I have attached a sample image.
If I understand you correctly, the real problem is creating the color information, which is, creating a bivariate histogram. Luckily, MATLAB has a function, hist3, for that in the Statistics & Machine Learning Toolbox. The syntax is
[N,C] = hist3(X,nbins)
where X is a m-by-2 matrix containing the data, and nbins is a 1-by-2 vector containing the number of bins in each dimension. The return value N is a matrix of size nbins(1)-by-nbins(2), and contains the histogram data. C is a 1-by-2 cell array, containing the bin centers in both dimensions.
% Generate sample data
X = randn(10000, 1);
Y = X + rand(10000, 1);
% Generate histogram
[N,C] = hist3([X,Y], [100,100]);
% Plot
imagesc(C{1},C{2},N);
set(gca,'YDir','normal');
colormap(flipud(pink));
colorbar;
Result:

Matlab - multiple variables normalized histogram?

I'm working on MATLAB, where I have a vector which I need to split into two classes and then get a histogram of both resulting vectors (which have different sizes). The values represent height records so the interval is about 140-185.
How can I get a normalized histogram of both resulting vectors in different colors. I was able to get both normalized vectors in the same colour (which is indistiguible) and and also a histogram with different colours but not not normalized...
I hope you understand my question and will be able to help me.
Thanks in advance :)
Maybe this is what you need:
matrix = [155+10*randn(2000,1) 165+10*randn(2000,1)];
matrix(1:1100,1) = NaN;
matrix(1101:2000,2) = NaN; %// example data
[y x] = hist(matrix, 15); %// 15 is desired number of bins
y = bsxfun(#rdivide, y, sum(y)) / (x(2)-x(1)); %// normalize to area 1
bar(x,y) %// plots each column of y vs x. Automatically uses different colors

Matlab best technique to remove outliers in data

I have 2 columns x, y of 100 points each. I would like to remove the outliers data and refill their gap with the average value of the points near to them. Firstly, can I do that? is any Matlab function? Secondly, if yes, what is the best technique to make that?
E.g:
x = 1:1:100
y = rand(1,99)
y(end+1)=2
In this case, not so similar to my problem, I would like to remove value 2 at the end and to be replaced with one similar to their neighbor points. In my case the distribution of the [x,y] is a non linear function, having few outliers.
It depends on what you mean by outlier. If you assume that outliers are more than three standard deviations from the median, for example, you could do this
all_idx = 1:length(x)
outlier_idx = abs(x - median(x)) > 3*std(x) | abs(y - median(y)) > 3*std(y) % Find outlier idx
x(outlier_idx) = interp1(all_idx(~outlier_idx), x(~outlier_idx), all_idx(outlier_idx)) % Linearly interpolate over outlier idx for x
y(outlier_idx) = interp1(all_idx(~outlier_idx), y(~outlier_idx), all_idx(outlier_idx)) % Do the same thing for y
This code will just remove the outliers and linearly interpolate over their positions using the closest values that are not outliers.

MATLAB - Pixelize a plot and make it into a heatmap

I have a matrix with x and y coordinates as well as the temperature values for each of my data points. When I plot this in a scatter plot, some of the data points will obscure others and therefore, the plot will not give a true representation of how the temperature varies in my data set.
To fix this, I would like to decrease the resolution of my graph and create pixels which represent the average temperature for all data points within the area of the pixel. Another way to think about the problem that I need to put a grid over the current plot and average the values within each segment of the grid.
I have found this thread - Generate a heatmap in MatPlotLib using a scatter data set - which shows how to use python to achieve the end result that I want. However, my current code is in MATLAB and even though I have tried different suggestions such as heatmap, contourf and imagesc, I can't get the result I want.
You can "reduce the resolution" of your data using accumarray, where you specify which output "bin" each point should go in and specify that you wish to take a mean over all points in that bin.
Some example data:
% make points that overlap a lot
n = 10000
% NOTE: your points do not need to be sorted.
% I only sorted so we can visually see if the code worked,
% see the below plot
Xs = sort(rand(n, 1));
Ys = rand(n, 1);
temps = sort(rand(n, 1));
% plot
colormap("hot")
scatter(Xs, Ys, 8, temps)
(I only sorted by Xs and temps in order to get the stripy pattern above so that we can visually verify if the "reduced resolution" worked)
Now, suppose I want to decrease the resolution of my data by getting just one point per 0.05 units in the X and Y direction, being the average of all points in that square (so since my X and Y go from 0 to 1, I'll get 20*20 points total).
% group into bins of 0.05
binsize = 0.05;
% create the bins
xbins = 0:binsize:1;
ybins = 0:binsize:1;
I use histc to work out which bin each X and Y is in (note - in this case since the bins are regular I could also do idxx = floor((Xs - xbins(1))/binsize) + 1)
% work out which bin each X and Y is in (idxx, idxy)
[nx, idxx] = histc(Xs, xbins);
[ny, idxy] = histc(Ys, ybins);
Then I use accumarray to do a mean of temps within each bin:
% calculate mean in each direction
out = accumarray([idxy idxx], temps', [], #mean);
(Note - this means that the point in temps(i) belongs to the "pixel" (of our output matrix) at row idxy(1) column idxx(1). I did [idxy idxx] as opposed to [idxx idxy] so that the resulting matrix has Y == rows and X == columns))
You can plot like this:
% PLOT
imagesc(xbins, ybins, out)
set(gca, 'YDir', 'normal') % flip Y axis back to normal
Or as a scatter plot like this (I plot each point in the midpoint of the 'pixel', and drew the original data points on too for comparison):
xx = xbins(1:(end - 1)) + binsize/2;
yy = ybins(1:(end - 1)) + binsize/2;
[xx, yy] = meshgrid(xx, yy);
scatter(Xs, Ys, 2, temps);
hold on;
scatter(xx(:), yy(:), 20, out(:));