I have several sets of data. Each set is a list of numbers which is the distance from 0 that a particle has travelled. Each set is associated with a finite time, so set 1 is the distances at T=0; set 2 is the distances at T=1 and so on. The size of each set is the total number of particles and the size of each set is the same.
I want to plot a concentration vs distance line.
For example, if there are 1000 particles (the size of the sets); at time T=0 then the plot will just be a straight line x=0 because all the particles are at 0 (the set contains 1000 zeroes). So the concentration at x=0 =100% and is 0% at all other distances
At T=1 and T=2 and so on, the distances will increase (generally) so I might have sets that look like this: (just an example)
T1 = (1.1,2.2,3.0,1.2,3.2,2.3,1.4...) etc T2 = (2.9,3.2,2.6,4.5,4.3,1.4,5.8...) etc
it is likely that each number in each set is unique in that set
The aim is to have several plots (I can eventually plot them on one graph) that show the concentration on the y-axis and the distance on the x-axis. I imagine that as T increases T0, T1, T2 then the plot will flatten until the concentration is roughly the same everywhere.
The x-axis (distance) has a fixed maximum which is the same for each plot. So, for example, some sets will have a curve that hits zero on the y-axis (concentration) at a low value for x (distance) but as the time increases, I envisage a nearly flat line where the line does not cross the x-axis (concentration is non-zero everywhere)
I have tried this with a histogram, but it is not really giving the results I want. I would like a line plot but have to try and put the distances into common-sense sized bins.
thank you W
some rough data
Y1 = 1.0e-09 * [0.3358, 0.3316, 0.3312, 0.3223, 0.2888, 0.2789, 0.2702,...
0.2114, 0.1919, 0.1743, 0.1738, 0.1702, 0.0599, 0.0003, 0, 0, 0, 0, 0, 0];
Y2 = 1.0e-08 * [0.4566, 0.4130, 0.3439, 0.3160, 0.3138, 0.2507, 0.2483,...
0.1714, 0.1371, 0.1039, 0.0918, 0.0636, 0.0502, 0.0399, 0.0350, 0.0182,...
0.0010, 0, 0, 0];
Y3 = 1.0e-07 * [0.2698, 0.2671, 0.2358, 0.2250, 0.2232, 0.1836, 0.1784,...
0.1690, 0.1616, 0.1567, 0.1104, 0.0949, 0.0834, 0.0798, 0.0479, 0.0296,...
0.0197, 0.0188, 0.0173, 0.0029];
These data sets contain the distances of just 20 particles. The Y0 set is zeros. I will be dealing with thousands, so the data sets will be too large.
Thankyou
Well, basically, you just miss the hold command. But first, put all your data in one matrix, like this:
Y = [1.0e-09 * [0.3358, 0.3316, 0.3312, 0.3223, 0.2888, 0.2789, 0.2702,...
0.2114, 0.1919, 0.1743, 0.1738, 0.1702, 0.0599, 0.0003, 0, 0, 0, 0, 0, 0];
1.0e-08 * [0.4566, 0.4130, 0.3439, 0.3160, 0.3138, 0.2507, 0.2483,...
0.1714, 0.1371, 0.1039, 0.0918, 0.0636, 0.0502, 0.0399, 0.0350, 0.0182,...
0.0010, 0, 0, 0];
1.0e-07 * [0.2698, 0.2671, 0.2358, 0.2250, 0.2232, 0.1836, 0.1784,...
0.1690, 0.1616, 0.1567, 0.1104, 0.0949, 0.0834, 0.0798, 0.0479, 0.0296,...
0.0197, 0.0188, 0.0173, 0.0029]];
Then you need to plot each time step separately, and use the hold on to paste them on the same axes:
hold on
for r = size(Y,1):-1:1
histogram(Y(r,:));
end
hold off
T_names = [repmat('T',size(Y,1),1) num2str((size(Y,1):-1:1).')];
legend(T_names)
Which will give you (using the example data):
Notice, that in the loop I iterate on the rows backwards - that's just to make the narrower histograms plot on the wider, so you can see all of them clearly.
EDIT
In case you want continues lines, and not bins, you have to first get the histogram values by histcounts, then plot them like a line:
hold on
for r = 1:size(Y,1)
[H,E] = histcounts(Y(r,:));
plot(E,[H(1) H])
end
hold off
T_names = [repmat('T',size(Y,1),1) num2str((1:size(Y,1)).')];
legend(T_names)
With your small example data it doesn't look so impressive though:
Related
I am following the example to fit a Mixture of Two Normals distribution that
you can find here.
x = [trnd(20,1,50) trnd(4,1,100)+3];
hist(x,-2.25:.5:7.25);
pdf_normmixture = #(x,p,mu1,mu2,sigma1,sigma2) ...
p*normpdf(x,mu1,sigma1) + (1-p)*normpdf(x,mu2,sigma2);
pStart = .5;
muStart = quantile(x,[.25 .75])
sigmaStart = sqrt(var(x) - .25*diff(muStart).^2)
start = [pStart muStart sigmaStart sigmaStart];
lb = [0 -Inf -Inf 0 0];
ub = [1 Inf Inf Inf Inf];
options = statset('MaxIter',300, 'MaxFunEvals',600);
paramEsts = mle(x, 'pdf',pdf_normmixture, 'start',start, ...
'lower',lb, 'upper',ub, 'options',options)
bins = -2.5:.5:7.5;
h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');
h.FaceColor = [.9 .9 .9];
xgrid = linspace(1.1*min(x),1.1*max(x),200);
pdfgrid = pdf_normmixture(xgrid,paramEsts(1),paramEsts(2),paramEsts(3),paramEsts(4),paramEsts(5));
hold on
plot(xgrid,pdfgrid,'-')
hold off
xlabel('x')
ylabel('Probability Density')
Could you please explain why when it calculates
h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');
it divides for (length(x)*.5)
The idea is to scale your histogram such that is represents probability instead of counts. This is the unscaled histogram
The vertical axis is the count of how many events fall within each bin. You have defined your bins to be -2.25:.5:7.25 and thus your bin width is 0.5. So if we look at the first bar of the histogram, it is telling us that the number of elements in x (or the number of events in your experiment) that fall in the bin -2.5 to -2 (note the width of 0.5) is 2.
But now you want to compare this with a probability distribution function and we know that the integral of a PDF is 1. This is the same as saying the area under the PDF curve is 1. So if we want our histogram's vertical scale to match the of the PDF as in this second picture
we need to scale it such that the total area of all the histogram's bars sum to 1. The area of the first bar of the histogram is height times width which according to our investigation above is 2*0.5. Now the width stays the same for all the bins in the histogram so we can find its total area by adding up all the bar heights and then multiplying by the width once at the end. The sum of all the heights in the histogram is the total number of events, which is the total number of elements in x or length(x). Thus the area of the first histogram is length(x)*0.5 and to make this area equal to 1 we need to scale all the bar heights by dividing them by length(x)*0.5.
This is a little complicated to explain. I have time series data formatted like this: https://docs.google.com/spreadsheets/d/1B8mN0uD-t4kQr2U20gS713ZFHN6IgGB7OMR3-pqJjrw/edit?usp=sharing
That data represents voltage recordings at .01s intervals. When plotted it looks like this:
Essentially what I want to do is find the time at which the first peak in each very narrow pair occur (ie at ~.1, .75, 1.6, etc).
The time values are in a separate array, but the index values (row numbers) correspond between the two sets.
Any ideas on how to do this?
My initial attempt was something like this from the matlab manual
function [edges2] = risingEdge2(time, data)
threshold = 0.4;
offsetData = [data(2:end); NaN];
edges2 = find(data < threshold & offsetData > threshold);
end
I couldn't figure out a good way to ignore for n seconds after the first peak...I'm also getting many more peaks than expected...probably because of noisy data.
The following approach seems to work for the given data.
%// Define parameters
window_size = 200;
stepsize = 0.4; %// to be used for quantizing data into three levels - 0, 0.4, 0.8
%// Perform a sliding max to get rid of the dips within the spikes
slmax_data = nlfilter(data,[window_size 1],#max);
%// Quantize sliding max data to three levels as the plot seem to suggest
quantized_slmax_data = round((slmax_data-min(slmax_data))./stepsize);
If you zoom into the above figure, you will see ledges around the high peaks -
%// Get sliding mode to get rid of the short ledges around the high peaks
slmax_mode_data = nlfilter(quantized_slmax_data,[window_size 1],#mode);
%// Finally, find the indices where the mode data jump from 0 to 1 only, which
%// correspond to the start of spikes
index = find(diff(slmax_mode_data)==1)+window_size/2;
Output -
index =
682
8048
16487
24164
31797
Here -- find all rising edges, then find those that are very close together and take the first.
rising_edges = find(diff(data > .3) > 0);
first_of_paired_edges = find(diff(time(rising_edges)) < 500);
first_rising_edge_times = time(rising_edges(first_of_paired_edges));
You could then slide up the edge to the peak.
first_peak_times = time(arrayfun( #(n) n+find(diff(data(n+[0:1000]) < 0, 1, 'first'),
rising_edges(first_of_paired_edges));
I need to plot this function in Matlab:
Lines must be connected, I mean at end of decreasing line, increasing one must start etc. It looks like this:
Any idea? I need it on some wide interval, for example t goes from zero to 10
The reason why it isn't working as expected is because for each curve you are drawing between multiples of 0.1 seconds, the y-intercept is not being properly calculated and so the curves are not placed in the right location. For the first part of your curve, y = -57.5t, the y-intercept is at the origin and so your curve is y = -57.5t as expected. However, when you reach 0.1 seconds, you need to solve for the y-intercept for this new line with the new slope, as it has shifted over. Specifically:
y = 42.5t + b
We know that at t = 0.1 seconds, y = -5.75 given the previous curve. Solving for the y-intercept gives us:
-5.75 = (42.5)(0.1) + b
b = -10
As such, between 0.1s <= t <= 0.2s, your equation of the line is actually:
y = 42.5t - 10
Now, repeating the same procedure at t = 0.2s, we have a new equation of the line, even though it has the same slope as the origin:
y = -57.5t + b
From the previous curve, we know that at t = 0.2 seconds, y = (42.5)(0.2) - 10 = -1.5. Therefore, the intercept for this new curve is:
-1.5 = -(57.5)(0.2) + b
b = 10
Therefore, y = -57.5t + 10 is the curve between 0.2s <= t <= 0.3s. If you keep repeating these calculations, you'll see that the next y-intercept is -20, then for the next one it's 20, then the next one after that is -30 and so on. You see a nice multiple of 10 pattern for these calculations, and you'll see that the curve with the positive slope always has a negative y-intercept that is a multiple of 10, and the curve with the negative slope has a positive slope with a y-intercept that is a multiple of -10.
This is the pattern we need to keep in mind when plotting this curve. Because when you're plotting in MATLAB, we have to plot points discretely, you'll want to define a sampling time that defines the time resolution between each point. Because these are linear curves, you don't need that small of a sampling time, but let's choose 0.01 seconds for the sake of simplicity. This means that we will have 10 points between each new curve.
Therefore, for every 10 points in our plot, we will draw a different curve with a different y-intercept for each curve. Because you want to draw points between 0 to 10 seconds, this means we will need (100)(10) = 1000 points. However, this does not include the origin, so you actually need 1001 points. As such, you'd define your t vector like this:
t = linspace(0,10,1001);
Now, for every 10 points, we need to keep changing our y intercept. At the first segment, the y intercept is 0, the second segment, the y intercept is 10 and so on. Now, a lot of MATLAB purists are going to tell you that for loops are taboo, but when it comes to indexing operations, for loops are amongst the fastest in timing in comparison to other more vectorized solutions. As an example, take a look at this post, where I implement a solution with a for loop and it was the fastest amongst the other proposed solutions.
First let's define an array of slopes where each element tells us the slope per segment. Because we have 10 seconds worth of segments, and each segment is 0.1 seconds in length, including the origin we have 101 segments. At the origin, we have a slope of -57.5. After this, our slopes alternate between 42.5 and -57.5. Actually, this alternates 50 times. To create this array, we do:
m = [-57.5 repmat([42.5 -57.5], 1, 50)];
I use repmat to repeat the [42.5 -57.5] array 50 times for a total of 100 times, plus the -57.5 at the origin.
Now, let's define a y-intercept vector that tells us what the y intercept is at each segment.
y = zeros(1,101);
y(2:2:101) = 1;
y = 10*cumsum(y);
y(2:2:101) = -y(2:2:101);
The above code will generate a y-intercept vector such that it starts at 0, then has coefficients of -10, 10, then -20, 20, etc. The trick with this code is that I first generate a sequence of [0 1 0 1 0 1 0 1 0 1...]. After, I use cumsum, which does a cumulative summation where for each point in your array, it adds values from the beginning up until that point. Therefore, if we did cumsum on this binary sequence, it would give us [0 1 1 2 2 3 3 4 4...]. When we multiply this by 10, we get [0 10 10 20 20 30 30 40 40...]. Finally, to complete the slopes, we just negate every even location in this array, and so we finally get [0 -10 10 -20 20 -30 30 -40 40...].
Now, here's the code we're going to use to generate our curve. We are going to iterate through each segment, and generate our output values with the y-intercept taken into account. We first need to allocate an output array that will store our values, then we will populate the values per segment. We also need to keep track of which time values we are going to access to compute our output values.
As such:
%// Define time vector
t = linspace(0,10,1001);
%// Define slopes
m = [-57.5 repmat([42.5 -57.5], 1, 50)];
%// Define y-intercepts
y = zeros(1,101);
y(2:2:101) = 1;
y = 10*cumsum(y);
y(2:2:101) = -y(2:2:101);
%// Calculate the output curves for each segment
out = zeros(1, numel(t));
for idx = 1 : numel(y)-1
%// Compute where in the time array and output array
%// we need to write to
vals_to_access = (idx - 1)*10 + 1 : idx*10;
%// Create the curve for this segment
out(vals_to_access) = m(idx)*t(vals_to_access) + y(idx);
end
%// Copy second last value over to last value
out(end) = out(end-1);
%// Plot the curve
plot(t,out);
axis tight;
The trick with the for loop is to know where to access the time values for each segment, and where to write these values to. That's the purpose of vals_to_access. Also, note that the for loop only populated values in the array from the first index up to the 1000th index, but did not compute the 1001th element. To make things simple, we'll just copy the element from the second last point to the last point, which is why out(end) = out(end-1); is there. The above code will also plot the curve and makes sure that the axes are tightly bound. As such, this is what I get:
I am trying to adjust the scale of the x-axis so that the values are closer together, but I am not able to do so.
I need the output to be like this photo:
However, what I actually get is the photo below:
Here's the code I have written to reproduce this error:
x = [0.1 1 10 100 1000 10000];
y = [1.9904 19.8120 82.6122 93.0256 98.4086 99.4016];
figure;
bar(x,y);
ylabel('Y values');
xlabel('X values');
set(gca,'XTick', [0.1 1 10 100 1000 10000])
How can I adjust the x-axis so that it looks like the first photo?
Because your data has such huge dynamic range, and because of the linear behaviour of the x axis, your graph is naturally going to appear like that. One compromise that I can suggest is that you transform your x data so that it gets mapped to a smaller scale, then remap your x data so that it falls onto a small exponential scale. After, simply plot the data using this remapped scale, then rename the x ticks so that they have the same values as your x data. To do this, I would take the log10 of your data first, then apply an exponential to this data. In this way, you are scaling the x co-ordinates down to a smaller dynamic range. When you apply the exponential to this smaller range, the x co-ordinates will then spread out in a gradual way where higher values of x will certainly make the value go farther along the x-axis, but not too far away like you saw in your original plot.
As such, try something like this:
x = [0.1 1 10 100 1000 10000]; %// Define data
y = [1.9904 19.8120 82.6122 93.0256 98.4086 99.4016];
xplot = (1.25).^(log10(x)); %// Define modified x values
figure;
bar(xplot,y); %// Plot the bar graph on the modified scale
set(gca,'XTick', xplot); %// Define ticks only where the bars are located
set(gca,'XTickLabel', x); %// Rename these ticks to our actual x data
This is what I get:
Note that you'll have to play around with the base of the exponential, which is 1.25 in what I did, to suit your data. Obviously, the bigger the dynamic range of your x data, the smaller this exponent will have to be in order for your data to be closer to each other.
Edit from your comments
From your comments, you want the bars to be equidistant in between neighbouring bars. As such, you simply have to make the x axis linear in a small range, from... say... 1 to the total number of x values. You'd then apply the same logic where we rename the ticks on the x axis so that they are from the true x values instead. As such, you only have to change one line, which is xplot. The other lines should stay the same. Therefore:
x = [0.1 1 10 100 1000 10000]; %// Define data
y = [1.9904 19.8120 82.6122 93.0256 98.4086 99.4016];
xplot = 1:numel(x); %// Define modified x values
figure;
bar(xplot,y); %// Plot the bar graph on the modified scale
set(gca,'XTick', xplot); %// Define ticks only where the bars are located
set(gca,'XTickLabel', x); %// Rename these ticks to our actual x data
This is what I get:
I have this problem that I should plot a step plot from a matrix. For example:
[0 0 10 20 50;
50 100 100 300 50]
The second line should be the x-axis so there would be points at 50, 150, 250, 550 and 600. And the according y-values should be 0, 0, 10, 20 and 50. The function stairs(B(1,:)) gives me a step plot but it's someway off. I'd appreciate the help!
stairs can take in two sets of values, your x and your y.
So the first issue is that you need to define both x and y;
y = B(1,:);
x = B(2,:);
The second is that your second line is the steps along x not the actual values, and stairs needs. So we need to change your x values, using cumsum which performs a cumulative sum. Since we have a couple of points with y=0, as well as calling stairs with two inputs I'm adding some LineSpec options to ensure those points are visible.
x = cumsum(x);
stairs(x,y, '-.xk');
The last point may be a little difficult to see, so you may want to adjust the axis:
xlim([0 700])
ylim([0 60])