Matlab - Histogram edges and cut off - matlab

I'm trying to more or less replicate the following p-values density histogram, with different data:
So I want to create a histogram with the bin ticks at the beginning/end of a bar. With 15 bars and the values ranging from 0 up to and including 1.
At the moment I'm using the histc command:
xint=1/15;
edges=(0:xint:1);
[n,bin]=histc(data,edges);
bar(edges,n,'histc');
tit='p-values histogram';
htitle=title(tit);
set(htitle,'fontname','Calibri')
xlabel('p-values');
ylabel('Frequency');
Which gives me:
However, if the data is equal to 1, the current code plots a new bar after 1. I guess I need to include the edges (to get the same as the example), but I couldn't seem to find the right command?
Also how can I make the histogram cut off at x=1, like the example? Inserting the "lambda arrow" of the example at 0.6 is preferable (but optional).

Edits 3 and 4: Since you're using Matlab R2013b, which doesn't have histogram, use the number-of-bins syntax of hist to plot:
[n, centers] = hist(data, 15)
Note that this returns centers, not edges of the bins. For the arrow, you can use annotation if R2013b supports it. Alternatively (a bit hackish):
line([0.6 0.6], [1750 1250])
plot(0.6, 1250, 'Marker, 'v')
text(0.6, 1750, '\lambda', 'HorizontalAlignment','center', 'VerticalAlignment','bottom')
Edit 2: Try
xint=1/15;
edges_in=(0:xint:1);
histogram(data,edges_in);
to plot directly, rather than using bar. This post from MathWorks says that the histc option of bar() is deprecated.
Use histcounts instead of histc:
xint=1/15;
edges_in=(0:xint:1);
[n,edges_out]=histcounts(data,edges_in); % <-- changed
size(n)
size(edges_in)
size(edges_out)
bar(edges_out(1:end-1),n,'histc'); % <-- changed - last bin edge shouldn't be included
tit='p-values histogram';
htitle=title(tit);
set(htitle,'fontname','Calibri')
xlabel('p-values');
ylabel('Frequency');
axis([0 1 0 2500]); % <-- added - but leave it off for debugging
Per the histc docs, "The last bin consists of the scalar value equal to last value in binranges." So the last bin is just 1.0. By contrast, with histcounts, "The last bin also includes the right bin edge, so that it contains X(i) if edges(end-1) ≤ X(i) ≤ edges(end)." That should do what you want.
I included an axis above to tighten up the plot, but leave that off for debugging so you can see if the last bar is still there.
Edit Per the histcounts docs, the returned vector has one fewer element than the edge vector. If that's the case (per the size printouts in the edited code), it should be removed so bar doesn't plot that bar.

Related

Calculate the autocorrelation of a time series created from a normal distribution

I generate a time series from a normal distribution and then I try to plot the autocorrelation by using the following code snippet:
ts1 = normrnd(0,0.25,1,100);
autocorrelation_ts1 = xcorr(ts1);
I was expecting that the autocorrelation would show 1 for x=0 and almost 0 for the rest of values, instead I get value 6 at axis position 100.
I think the question applies both to Matlab and Octave but I am not sure.
First thing is that your second line of code is wrong. I think you meant to put
autocorrelation_ts1 = xcorr(ts1);
Other than this, I think your solution is correct. The reason the max value is at 100 and not 0 is because a temporal shift of 0 in the autocorrelation actually happens on the 100th iteration of the correlation function. In other words, the numbers on the X axis don't correspond to time.
To get time on the X axis change your code to
[autocorrelation_ts1, shifts] = xcorr(ts1);
Then
plot(shifts, autocorrelation_ts1)
With regard to the max value, matlab documentation for xcorr indicates that 1 is not the maximum output value of the function when called without the normalization argument. If you want to normalize such that all values are 1 or less, use
[autocorrelation_ts1, shifts] = xcorr(ts1, 'normalized');
Just as complementary reference to Scott's answer, this is the complete code snippet, including stem chart scaling to show up to 20 shifts/lags.
[auto_ts1, lags] = xcorr(ts1);
ts_begin = ceil(size(lags,2)/2);
ts_end = ts_begin + 20;
stem(lags(ts_begin:ts_end),auto_ts1(ts_begin:ts_end)/max(auto_ts1), 'linewidth', 4.0, 'filled')

Mismatch in Histogram.

I am trying to plot the histogram from the attached datasets in excel files. I have 2 questions (Q.2 is more important). The related csv files can be accessed from this link:
CSV files
1.Why the two histograms are different though exact same bins and bin sizes are used.
aa = xlsread('LF_NPV_Branch_Run.csv','C2:C828');
bb = xlsread('RES_Cob.csv','A1:CV827');
cc = aa*ones(1,100);
dev=bb-cc;
err_a=dev';
nbins = 20;
bound_n=min([floor(min(min(err_a))/10)*10,-10])
bound_p=max([ceil(max(max(err_a))/10)*10,10])
bins = linspace(bound_n,bound_p,nbins)
hist(err_a, bins)
figure(2)
hist(err_a(:), bins)
2.For figure 2, though the number for the tallest bin shows ~38000, but when I calculate the number using the bin on the center (zero) the number of points should be 63039 (which is more than the limit on the Y axis), not ~38000. What is the reason of this apparent mismatch?
val = dev(dev > bins(10) & dev < bins(11));
size(val)
Normally, if you have multiple questions, you should ask them seperately, but I can see that these two questions are closely related.
If you read MATLAB's documentation for hist(x,xbins):
If xbins is a vector of evenly spaced values, then hist uses the values as the bin centers.
The bin edges for the bin centred at bin(10) are actually
lower=(bins(9)+bins(10))/2
upper=(bins(10)+bins(11))/2
Therefore, to answer your Q2, you should find the result of the following matches the bin size shown in figure:
val = dev(dev > lower & dev <= upper);
size(val)
If you want bins to be the edges, you should use histogram(err_a(:), bins). See Specify Bin Edges of Histogram.
Q1:
err_a is a 100x827 matrix; err_a(:) makes it a 82700x1 column vector.
hist(m, bins) returns a bin for every column in m for each bin centre specified in bins. In your case, err_a has 827 columns. For each bin centre, hist(err_a, bins) gives 827 results and that is why there is a cluster of columns for every bin centre. hist(err_a(:), bins) on the other hand only gives 1 result per bin centre.

How do I plot values in an array vs the number of times those values appear in Matlab?

I have a set of ages (over 10000 of them) and I want to plot a graph with the age from 20 to 100 on the x axis and then the number of times each of those ages appears in the data on the y axis. I have tried several ways to do this and I can't figure it out. I also have some other data which requires me to plot values vs how many times they occur so any advice on how to do this would be much appreciated.
I'm quite new to Matlab so it would be great if you could explain how things in your answer work rather than just typing out some code.
Thanks.
EDIT:
So I typed histogram(Age, 80) because as I understand that will plot the values in Age on a histogram split up into 80 bars (1 for each age). Instead I get this:
The bars aren't aligned and it's clearly not 1 per age nor has it plotted the number of times each age occurs on the y axis.
You have to use histogram(), and that's correct.
Let's see with an example.
I extract 100 ages between 20 and 100:
ages=randsample([20:100],100,true);
Now I call histogram() in this manner:
h=histogram(ages,[20:100]);
where h is an histogram object and this will also show the following plot:
However, this might look easy due to the fact that my ages vector is in range 20:100, so it will not contain any other values. If your vector, as instead, contains also ages not in range 20:100, you can specify the additional option 'BinLimits' as third input in histogram() like this:
h=histogram(ages,length([20:100]),'BinLimits',[20:100]);
and this option plots a histogram using the values in ages that fall between 20 and 100 inclusive.
Note: by inspecting h you can actually see and/or edit some proprieties of your histogram. An attribute (field) of such object you might be interested to is Values. This is a vector of length 80 (in our case, since we work with 80 bins) in which the i-th element is the number of items is the i-th bin. This will help you count the occurrences (just in case you need them to go on with your analysis).
Like Luis said in comments, hist is the way to go. You should specify bin edges, rather than the number of bins:
ages = randi([20 100], [1 10000]);
hist(ages, [20:100])
Is this what you were looking for?

How to remove data points from a data set in Matlab

In Matlab, I have a vector that is a 1x204 double. It represents a biological signal over a certain period of time and over that time the signal varies - sometimes it peaks and goes up and sometimes it remains relatively small, close to the baseline value of 0. I need to plot this the reciprocal of this data (on the xaxis) against another set of data (on the y-axis) in order to do some statistical analysis.
The problem is that due to those points close to 0, for e.g. the smallest point I have is = -0.00497, 1/0.00497 produces a value of -201 and turns into an "outlier", while the rest of the data is very different and the values not as large. So I am trying to remove the very small values close to 0, from the data set so that it does not affect 1/value.
I know that I can use the cftool to remove those points from the plot, but how do I get the vector with those points removed? Is there a way of actually removing the points? From the cftool and removing those points on the original, I was able to generate the code and find out which exact points they are, but I don't know how to create a vector with those points removed.
Can anyone help?
I did try using the following for loop to get it to remove values, with 'total_BOLD_time_course' being my signal and '1/total_BOLD_time_course' is what I want to plot, but the problem with this is that in my if statement total_BOLD_time_course(i) = 1, which is not exactly true - so by doing this the points still exist in the vector but are now taking the value 1. But I just want them to be gone from the vector.
for i = 1:204
if total_BOLD_time_course(i) < 0 && total_BOLD_time_course(i) < -0.01
total_BOLD_time_course(i) = 1;
else if total_BOLD_time_course(i) > 0 && total_BOLD_time_course(i) < 0.01
total_BOLD_time_course(i) = 1 ;
end
end
end
To remove points from an array, use the syntax
total_BOLD_time_course( abs(total_BOLD_time_course<0.01) ) = nan
that makes them 'blank' on the graph, and ignored by further calculations, but without destroying the temporal sequence of the datapoints.
If actually destroying timepoints is not a concern then do
total_BOLD_time_course( abs(total_BOLD_time_course<0.01) ) = []
Then there'll be fewer data points, and they won't map on to any other time_course you have. But the advantage is that it will "close up" the gaps in the graph.
--
PS
note that in your code, the phrase
x<0 && x<-0.01
is redundant because if any number is less than -0.01, it is automatically less than 0. I believe the first should be x>0, and then your code is fine.
As VHarisop suggests, you can set a threshold for outliers and exclude them. But, depending on your plot, it might be important to ensure that the remaining data are not shunted horizontally to fill the gaps. To plot 1./y as a function of x, you could either just plot(x, 1./y) and then set the y limits with ylim to exclude the outliers from view, or use NaNs:
e = 0.01
y( abs(y) < e ) = nan;
plot( x, 1./y )
For quantitative (non-visual) statistical analysis, either remove the values entirely from y as suggested—bearing in mind that this leaves you with a shorter vector—or use statistics functions that know how to treat NaNs as missing data (nanmean, nanstd, etc).
Yeah, you can. You might want to define a threshold, like e = 0.01, and cut off all vector elements whose absolute value is below e.
Example:
# assuming v is your initial vector
e = 0.01
new_vector = v(abs(v) > e);
Alternatively, you could use the excludedata tool from the Curve Fitting Toolbox, since you know the indices of the vector elements you want to exlude.

Matlab, smaller duration histogram

I have this histogram plot. It show histogram for every 100 duration. I want to show histogram in smaller duration for example every 10 .How can I do this in Matlab?Thanks.
Use
hist(data,nbins)
to specify the number of bins. Default is 10, so if you want to have it split not by 100 but by 10 use:
hist(data,100)
In addition to the answer by #slezadav, if you want to set a given bin width (10 in your example) you can use something like
hist(data,5:10:995)
Using a vector as the second argument of hist specifies bin centers.
As explained in the docs:
use the nbins argument of the hist function:
rng(0,'twister')
data = randn(1000,1);
figure
nbins = 5;
hist(data,nbins)
you can check this by changing the parameter of nbins.
See also here: http://www.mathworks.de/de/help/matlab/ref/hist.html