Select and plot value above a threshold - matlab

I have a plot in which there are a few noise components. I am planning to select data from that plot preferably above a threshold in my case I am planning to keep it at 2.009 on the Y axis. And plot the lines going only above it. And if anything is below that i would want to plot it as 0.
as we can see in the figure
t1=t(1:length(t)/5);
t2=t(length(t)/5+1:2*length(t)/5);
t3=t(2*length(t)/5+1:3*length(t)/5);
t4=t(3*length(t)/5+1:4*length(t)/5);
t5=t(4*length(t)/5+1:end);
X=(length(prcdata(:,4))/5);
a = U(1 : X);
b = U(X+1: 2*X);
c = U(2*X+1 : 3*X);
d = U(3*X+1 : 4*X);
e = U(4*X+1 : 5*X);
figure;
subplot (3,2,2)
plot(t1,a);
subplot (3,2,3)
plot(t2,b);
subplot(3,2,4)
plot(t3,c);
subplot(3,2,5)
plot(t4,d);
subplot(3,2,6)
plot(t5,e);
subplot(3,2,1)
plot(t,prcdata(:,5));
figure;
A=a(a>2.009,:);
plot (t1,A);
This code splits the data (in the image into 5 every 2.8 seconds, I am planning to use the thresholding in first 2.8 seconds. Also I had another code but I am just not sure if it works as it took a long time to be analysed
figure;
A=a(a>2.009,:);
plot (t1,A);
for k=1:length(a)
if a(k)>2.009
plot(t1,a(k)), hold on
else
plot(t1,0), hold on
end
end
hold off

The problem is that you are trying to plot potentially several thousand times and adding thousands of points onto a plot which causes severe memory and graphical issues on your computer. One thing you can do is pre process all of the information and then plot it all at once which will take significantly less time.
figure
threshold = 2.009;
A=a>threshold; %Finds all locations where the vector is above your threshold
plot_vals = a.*A; %multiplies by logical vector, this sets invalid values to 0 and leaves valid values untouched
plot(t1,plot_vals)
Because MATLAB is a highly vectorized language, this format will not only be faster to compute due to a lack of for loops, it is also much less intensive on your computer as the graphics engine does not need to process thousands of points individually.
The way MATLAB handles plots is with handles to each line. When you plot a vector, MATLAB is able to simply store the vector in one address and call it once when plotting. However, when each point is called individually, MATLAB has to store each point in a separate location in memory and call all of them individually and graphically handle each point completely separately.
Per request here is the edit
plot(t1(A),plot_vals(A))

Related

Lessening GPU processing power on MATLAB for plots

I have a graph with a lot of points and I want to be able to look at certain intervals using xlim. The issue is, when I increment the interval, I have to rerun my program. This is taking a lot of processing power.
So basically, I make graph using plot, then use xlim. I don't want to keep doing this. Is there a way to plot only certain intervals using plot? That way, MATLAB doesn't have to process the whole vector.
For example
A=[1,2,3,4]
and
B=[1,2,3,4]
If I do plot(A,B) then xlim(1,2) it would plot the graph first and then limit the interval. This takes a lot of processing power if you imagine a really massive complicated graph, thus I don't want to use plot using the normal method.
Is there a way to plot the graph on the interval x=[1,2] with only one function?
Update the XLimMode and NextPlot properties of your axes object before plotting. e.g.
x = randn(128,1);
y = randn(128,1);
hax = axes();
hax.XLimMode = 'manual';
hax.XLim = [1,2];
hax.NextPlot = 'add';
h = plot(x,y,'o','Parent',hax)
hax.NextPlot = 'replace'; % optional

Signal smoothing algorithm (Matlab's moving average)

I have written a simple code that performs a 3-point moving average smoothing algorithm. It is meant to follow the same basic algorithm as Matlab's smooth(...) function as described here.
However, the result of my code is very different from that of Matlab's. Matlab's 3-point filter appears to perform a much more aggressive smoothing.
Here is a comparison of a noisy data smoothed using my code (red) and Matlab's function (blue):
Here is my code written in the form of a function:
function [NewSignal] = smoothing(signal)
NewSignal = signal;
for i = 2 : length(signal)-1
NewSignal(i,:) = (NewSignal(i,:)+NewSignal(i-1,:)+NewSignal(i+1,:))./3;
end
end
Matlab's function is used as follows:
signal = smooth(time, signal, 3, 'moving');
As far as I understand Matlab's function works the same way; it averages 3 adjacent bins to a single bin. So I expected both algorithms to produce the same results.
So, what is the reason for the discrepancy? And how could I tweak my code to produce the same results?
Edit:
My sample data can be found here. It can be accessed using:
M = csvread('DS0009.csv');
time = M(:,1);
signal = M(:,2);
Here is the new result (red plot) using rinkert's correction:
One reason for the difference could be that you are partially using your smoothed signal during smoothing. In your loop, you store the smoothed value in NewSignal(i,:), and for the next sample to smooth this value will be called by NewSignal(i-1,:).
Let NewSignal be determined by the original signal only:
function [NewSignal] = smoothing(signal)
NewSignal = signal;
for i = 2 : length(signal)-1
NewSignal(i,:) = (signal(i,:)+signal(i-1,:)+signal(i+1,:))./3;
end
end
Update: To show that the function above in fact does the same as Matlab's smooth function, let's consider this MVCE:
t = (0:0.01:10).'; % time vector
y = sin(t) + 0.5*randn(size(t));
y_smooth1 = smooth(t,y,3,'moving');
y_smooth2 = smoothing(y);
difference_methods = abs(y_smooth1-y_smooth2);
So creating a sine wave, add some noise, and determine the absolute difference between the two methods. If you take the sum of all the differences you will see that this adds up to something like 7.5137e-14, which cannot explain the differences you see.
Plotting the smooth signal (blue original, red smoothed):
figure(1); clf; hold on
plot(t,y)
plot(t,y_smooth2)
And then plotting the difference between the two methods:
figure(2); clf; hold on;
plot(t,y_smooth1-y_smooth2)
As you can see, the difference is of the order 1e-16, so influenced by the Floating-point relative accuracy (see eps).
To answer your question in the comments: the Function filter and smooth perform arithmetically the same (in the case that they are applied for moving average). however, there are the special cases at the beginning and endpoints which are handled differently.
This is also stated in the documentation of smooth "Because of the way smooth handles endpoints, the result differs from the result returned by the filter function."
Here you see it in an example:
%generate randonm data
signal=rand(1,50);
%plot data
plot(signal,'LineWidth',2)
hold on
%plot filtered data
plot(filter(ones(3,1)/3,1,signal),'r-','LineWidth',2)
%plot smoothed data
plot( smooth(signal,3,'moving'),'m--','LineWidth',2)
%plot smoothed and delayed
plot([zeros(1,1); smooth(signal,3,'moving')],'k--','LineWidth',2)
hold off
legend({'Data','Filter','Smooth','Smooth-Delay'})
As you can see the filtered data (in red) is just a delayed version of the smoothed data (in magenta). Additionally, they differ in the beginning. Delaying the smoothed data results in an identical waveform as the filtered data (besides the beginning). As rinkert pointed out, your approach overwrites the data points which you are accessing in the next step. This is a different issue.
In the next example you will see that rinkerts implementation (smooth-rinkert) is identical to matlabs smooth, and that your approach differs from both due to overwriting the values:
So it is your function which low passes the input stronger. (as pointed out by Cris)

Interpolating irregularly spaced 3D matrix in matlab

I have a time series of temperature profiles that I want to interpolate, I want to ask how to do this if my data is irregularly spaced.
Here are the specifics of the matrix:
The temperature is 30x365
The time is 1x365
Depth is 30x1
Both time and depth are irregularly spaced. I want to ask how I can interpolate them into a regular grid?
I have looked at interp2 and TriScatteredInterp in Matlab, however the problem are the following:
interp2 works only if data is in a regular grid.
TriscatteredInterp works only if the vectors are column vectors. Although time and depth are both column vectors, temperature is not.
Thanks.
Function Interp2 does not require for a regularly spaced measurement grid at all, it only requires a monotonic one. That is, sampling positions stored in vectors depths and times must increase (or decrease) and that's all.
Assuming this is indeed is the situation* and that you want to interpolate at regular positions** stored in vectors rdepths and rtimes, you can do:
[JT, JD] = meshgrid(times, depths); %% The irregular measurement grid
[RT, RD] = meshgrid(rtimes, rdepths); %% The regular interpolation grid
TemperaturesOnRegularGrid = interp2(JT, JD, TemperaturesOnIrregularGrid, RT, RD);
* : If not, you can sort on rows and columns to come back to a monotonic grid.
**: In fact Interp2 has no restriction for output grid (it can be irregular or even non-monotonic).
I would use your data to fit to a spline or polynomial and then re-sample at regular intervals. I would highly recommend the polyfitn function. Actually, anything by this John D'Errico guy is incredible. Aside from that, I have used this function in the past when I had data on a irregularly spaced 3D problem and it worked reasonably well. If your data set has good support, which I suspect it does, this will be a piece of cake. Enjoy! Hope this helps!
Try the GridFit tool on MATLAB central by John D'Errico. To use it, pass in your 2 independent data vectors (time & temperature), the dependent data matrix (depth) along with the regularly spaced X & Y data points to use. By default the tool also does smoothing for overlapping (or nearly) data points. If this is not desired, you can override this (and other options) through a wide range of configuration options. Example code:
%Establish regularly spaced points
num_points = 20;
time_pts = linspace(min(time),max(time),num_points);
depth_pts = linspace(min(depth),max(depth),num_points);
%Run interpolation (with smoothing)
Pest = gridfit(depth, time, temp, time_pts, depth_pts);

Matlab cdfplot: how to control the spacing of the marker spacing

I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.

MATLAB scatter3, plot3 speed discrepencies

This is about how MATLAB can take very different times to plot the same thing — and why.
I generate 10000 points in 3D space:
X = rand(10000, 1);
Y = rand(10000, 1);
Z = rand(10000, 1);
I then used one of four different methods to plot this, to create a plot like so:
I closed all figures and cleared the workspace between each run to try to ensure fairness.
Bulk plotting using scatter3:
>> tic; scatter3(X, Y, Z); drawnow; toc
Elapsed time is 0.815450 seconds.
Individual plotting using scatter3:
>> tic; hold on;
for i = 1:10000
scatter3(X(i), Y(i), Z(i), 'b');
end
hold off; drawnow; toc
Elapsed time is 51.469547 seconds.
Bulk plotting using plot3:
>> tic; plot3(X, Y, Z, 'o'); drawnow; toc
Elapsed time is 0.153480 seconds.
Individual plotting using plot3:
>> tic; hold on
for i = 1:10000
plot3(X(i), Y(i), Z(i), 'o');
end
drawnow; toc
Elapsed time is 5.854662 seconds.
What is it that MATLAB does behind the scenes in the 'longer' routines to take so long? What are the advantages and disadvantages of using each method?
Edit:
Thanks to advice from Ben Voigt (see answers), I have included drawnow commands in the timing — but this has made little difference to the times.
The main difference between the time required to run scatter3 and plot3 comes from the fact that plot3 is compiled, while scatter3 is interpreted (as you'll see when you edit the functions). If scatter3 was compiled as well, the speed difference would be small.
The main difference between the time required to plot in a loop versus plotting in one go is that you add the handle to the plot as a child to the axes (have a look at the output of get(gca,'Children')), and you're thus growing a complicated array inside a loop, which we all know to be slow. Furthermore, you're calling several functions often instead of just once and incur thus calls from the function overhead.
Recalculation of axes limits aren't an issue here. If you run
for i = 1:10000
plot3(X(i), Y(i), Z(i), 'o');
drawnow;
end
which forces Matlab to update the figure at every iteration (and which is A LOT slower), you'll see that the axes limits don't change at all (since the default axes limits are 0 and 1). However, even if the axes limits started out differently, it wouldn't take many iterations for them to converge with these data. Compare with omitting the hold on, which makes plotting take longer, because axes are recalculated at every step.
Why have these different functions? scatter3 allows you to plot points with different marker sizes, and colors under a single handle, while you'd need a loop and get a handle for each point using plot3, which is not only costly in terms of speed, but also in terms of memory. However, if you need to interact with different points (or groups of points) individually - maybe you want to add a separate legend entry for each, maybe you want to be able to turn them on and off separately etc - using plot3 in a loop may be the best (though slow) solution.
For a faster approach, consider this third option (directly uses the low-level function LINE):
line([X,X], [Y,Y], [Z,Z], 'LineStyle','none', 'Marker','o', 'Color','b')
view(3)
Here are some articles discussing plotting performance issues:
Performance: scatter vs. line
Plot performance
Well, if you wanted control over the color of each point, bulk scatter would be faster, because you'd need to call plot separately.
Also, I'm not sure your timing information is accurate because you haven't called drawnow, so the actual drawing could take place after toc.
In summary:
plot3 is fastest because it draws the same marker at many different locations
scatter3 draws many different markers, since size and color of the marker (are allowed to) vary with each point
calling in a loop is really slow, because argument parsing and so forth have to take place repeatedly, in addition as points are added to the plot the axes have to be recalculated