How do I plot values in an array vs the number of times those values appear in Matlab? - matlab

I have a set of ages (over 10000 of them) and I want to plot a graph with the age from 20 to 100 on the x axis and then the number of times each of those ages appears in the data on the y axis. I have tried several ways to do this and I can't figure it out. I also have some other data which requires me to plot values vs how many times they occur so any advice on how to do this would be much appreciated.
I'm quite new to Matlab so it would be great if you could explain how things in your answer work rather than just typing out some code.
Thanks.
EDIT:
So I typed histogram(Age, 80) because as I understand that will plot the values in Age on a histogram split up into 80 bars (1 for each age). Instead I get this:
The bars aren't aligned and it's clearly not 1 per age nor has it plotted the number of times each age occurs on the y axis.

You have to use histogram(), and that's correct.
Let's see with an example.
I extract 100 ages between 20 and 100:
ages=randsample([20:100],100,true);
Now I call histogram() in this manner:
h=histogram(ages,[20:100]);
where h is an histogram object and this will also show the following plot:
However, this might look easy due to the fact that my ages vector is in range 20:100, so it will not contain any other values. If your vector, as instead, contains also ages not in range 20:100, you can specify the additional option 'BinLimits' as third input in histogram() like this:
h=histogram(ages,length([20:100]),'BinLimits',[20:100]);
and this option plots a histogram using the values in ages that fall between 20 and 100 inclusive.
Note: by inspecting h you can actually see and/or edit some proprieties of your histogram. An attribute (field) of such object you might be interested to is Values. This is a vector of length 80 (in our case, since we work with 80 bins) in which the i-th element is the number of items is the i-th bin. This will help you count the occurrences (just in case you need them to go on with your analysis).

Like Luis said in comments, hist is the way to go. You should specify bin edges, rather than the number of bins:
ages = randi([20 100], [1 10000]);
hist(ages, [20:100])
Is this what you were looking for?

Related

Calculating value y for each XI and XII in MATLAB:

I am currently working in matlab to design a way to reconstruct 3D data. For this I have two pictures with black points. The difference in the amount of points per frame is key for the reconstruction, but MATLAB gives an error when matrixes are not equal. This is happening becaus the code is not doing what I want it to do, so can anyone hel me with the following?
I have two columns of Xdata: XLI and XRI
What matlab does when I do XLI-XRI is substracting the pairs i.e XLI(1)-XRI(1) etc, but I want to substract each value of XRI of every value of XLI. i.e
XLI(1)-XRI(1,2,3,4 etc)
XLI(2)-XRI(1 2 3 4 etc)
and so on
Can anyone help?
I think you are looking for a way to deduct all combinations from eachother. Here is an example of how you can do that with bsxfun:
xLI = [1 2 3]
xRI = [1 2]
bsxfun(#minus,xLI ,xRI')
I cannot comment on Dennis's post (not enough points on this website) : his solution should work, but depending on your version of Matlab you might get a "Error using ==> bsxfun" and need to transpose either xLI or xRI for that to work :
bsxfun(#minus,xLI' ,xRI)
Best,
Tepp

MATLAB - histograms of equal size and histogram overlap

An issue I've come across multiple times is wanting to take two similar data sets and create histograms from them where the bins are identical, so as to easily calculate things like histogram overlap.
You can define the number of bins (obviously) using
[counts, bins] = hist(data,number_of_bins)
But there's not an obvious way (as far as I can see) to make the bin size equal for several different data sets. If remember when I initially looked finding various people who seem to have the same issue, but no good solutions.
The right, easy way
As pointed out by horchler, this can easily be achieved using either histc (which lets you define your bins vector), or vectorizing your histogram input into hist.
The wrong, stupid way
I'm leaving below as a reminder to others that even stupid questions can yield worthwhile answers
I've been using the following approach for a while, so figured it might be useful for others (or, someone can very quickly point out the correct way to do this!).
The general approach relies on the fact that MATLAB's hist function defines an equally spaced number of bins between the largest and smallest value in your sample. So, if you append a start (smallest) and end (largest) value to your various samples which is the min and max for all samples of interest, this forces the histogram range to be equal for all your data sets. You can then truncate the first and last values to recreate your original data.
For example, create the following data set
A = randn(1,2000)+7
B = randn(1,2000)+9
[counts_A, bins_A] = hist(A, 500);
[counts_B, bins_B] = hist(B, 500);
Here for my specific data sets I get the following results
bins_A(1) % 3.8127 (this is also min(A) )
bins_A(500) % 10.3081 (this is also max(A) )
bins_B(1) % 5.6310 (this is also min(B) )
bins_B(500) % 13.0254 (this is also max(B) )
To create equal bins you can simply first define a min and max value which is slightly smaller than both ranges;
topval = max([max(A) max(B)])+0.05;
bottomval = min([min(A) min(B)])-0.05;
The addition/subtraction of 0.05 is based on knowledge of the range of values - you don't want your extra bin to be too far or too close to the actual range. That being said, for this example by using the joint min/max values this code will work irrespective of the A and B values generated.
Now we re-create histogram counts and bins using (note the extra 2 bins are for our new largest and smallest value)
[counts_Ae, bins_Ae] = hist([bottomval, A, topval], 502);
[counts_Be, bins_Be] = hist([bottomval, B, topval], 502);
Finally, you truncate the first and last bin and value entries to recreate your original sample exactly
bins_A = bins_Ae(2:501)
bins_B = bins_Ae(2:501)
counts_A = counts_Ae(2:501)
counts_B = counts_Be(2:501)
Now
bins_A(1) % 3.7655
bins_A(500) % 13.0735
bins_B(1) % 3.7655
bins_B(500) % 13.0735
From this you can easily plot both histograms again
bar([bins_A;bins_B]', [counts_A;counts_B]')
And also plot the histogram overlap with ease
bar(bins_A,(counts_A+counts_B)-(abs(counts_A-counts_B)))

Creating a set matrix size

I have results which are 6 columns long however have been printed as 2 then 3 beneath then 1 beneath that! There are hundreds of lines and matlab will not except the structure of the matrix as it is now. Is there any way to tell matlab i want the first 5 results in their own columns then continuing down the rows after that?
My results appear as follows:
0.5 0
0.59095535915335684063 -0.59095535915335395405 -5.89791913085569763
33e-08
... repeated alot
thansk so much, em xx
I would just do a format shortE before you process the output, this will give you everything in scientific notation with 4 digits after the decimal. That 'should' allow you to fit your columns all in one line, so you don't have to deal with the botched output.
In general you should not want the output to be in a too specific format, but suppose you have this matrix:
M =[0.5 0 0.59095535915335684063 -0.59095535915335395405 -5.89791913085569763 33e-08];
To make it an actual matrix I will repeat it a bit:
M = repmat(M,10,1);
Now you can ensure that all six columns will fit on a normal screen by using the format.
format short
Try help format to find more options. Now simply showing the matrix will put all columns next to eachother. If you want one column below, the trick is to reduce your windows width untill it can only hold five columns. Matlab will now print the last column below the first.
M % Simply show the matrix
% Now reduce your window size
M % Simply show it again
This should help you display the numbers in matlab, if you want to process them further you can consider to write them to a file instead. Try help xlswrite for a simple solution.

Core Plot skipped values Graph

i'm trying to draw a Graph with a user-friendly timeline having every day/week (to be decided by time range) as a label at x-axis. However, the datasource values are given on another basis - there might be 10 entries one day and the eleventh comes in a month.
See the photoshop image:
With the latest Core Plot drop I cannot find a way to do it, need your help.
Regards,
user792677.
The scatter plot asks the datasource for x-y pairs of data. There is no requirement that either value follow some sort of sequence or regular pattern. Other plot types work similarly, although the names and number of data fields can vary.
In your example, return 4 from the -numberOfRecordsForPlot: method. Return the following values from the -numberForPlot:field:recordIndex: method. The table assumes your y-values are calculated by a function f(x), but they can of course come from any source. Use the field parameter to determine whether the plot is asking for the x- or y- value.
Index X-Value Y-Value
0 1 f(1)
1 3 f(3)
2 9 f(9)
3 10 f(10)

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.
This will allow me to then analyze all of the data points within a specific range and to treat them as "similar situations to the input."
From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called "probability integral transform."
Can anyone assist me with some code (Matlab preferred, but it doesn't really matter) to help me accomplish this?
Here's something I put together quickly. It's not polished and not perfect, but it does what you want to do.
clear
randList=[randn(1e4,1);2*randn(1e4,1)+5];
[xCdf,xList]=ksdensity(randList,'npoints',5e3,'function','cdf');
xRange=getInterval(5,xList,xCdf,0.1);
and the function getInterval is
function out=getInterval(yPoint,xList,xCdf,areaFraction)
yCdf=interp1(xList,xCdf,yPoint);
yCdfRange=[-areaFraction/2, areaFraction/2]+yCdf;
out=interp1(xCdf,xList,yCdfRange);
Explanation:
The CDF of the random distribution is shown below by the line in blue. You provide a point (here 5 in the input to getInterval) about which you want a range that gives you 10% of the area (input 0.1 to getInterval). The chosen point is marked by the red cross and the
interval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as
newList=randList(randList>=xRange(1) & randList<=xRange(2));
You'll find that on an average, the number of points in this example is ~2000, which is 10% of numel(randList)
numel(newList)
ans =
2045
NOTE:
Please note that this was done quickly and I haven't made any checks to see if the chosen point is outside the range or if yCdfRange falls outside [0 1], in which case interp1 will return a NaN. This is fairly straightforward to implement, and I'll leave that to you.
Also, ksdensity is very CPU intensive. I wouldn't recommend increasing npoints to more than 1e4. I assume you're only working with a fixed list (i.e., you have a list of 5e5 points that you've obtained somehow and now you're just running tests/analyzing it). In that case, you can run ksdensity once and save the result.
I do not speak Matlab, but you need to find quantiles in your data. This is Mathematica code which would do this:
In[88]:= data = RandomVariate[SkewNormalDistribution[0, 1, 2], 10^4];
Compute quantile points:
In[91]:= q10 = Quantile[data, Range[0, 10]/10];
Now form pairs of consecutive quantiles:
In[92]:= intervals = Partition[q10, 2, 1];
In[93]:= intervals
Out[93]= {{-1.397, -0.136989}, {-0.136989, 0.123689}, {0.123689,
0.312232}, {0.312232, 0.478551}, {0.478551, 0.652482}, {0.652482,
0.829642}, {0.829642, 1.02801}, {1.02801, 1.27609}, {1.27609,
1.6237}, {1.6237, 4.04219}}
Verify that the splitting points separate data nearly evenly:
In[94]:= Table[Count[data, x_ /; i[[1]] <= x < i[[2]]], {i, intervals}]
Out[94]= {999, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000}