So I have a sample dataset which I need to plot using Matlab.
The columns look like this:
Obviously due to this data set the plot looks exceptionally condensed.
Now I am totally new to plotting and statistical data processing.
What can be done to make the data plot more visually comparable/perusal-able (plotting at larger intervals?)?
Here's the code I wrote:
fid=fopen('me.dat', 'r');
s=textscan(fid,'%s %s %f %f', 'headerlines', 1);
fclose(fid);
a=s{1};
b=s{2};
c=s{3};
d=s{4};
plot(c,d)
Thanks.
When I have this kind of problem, I usually use the following methods:
1) Plot only every certain point. If you have 1D arrays a and b and you want to plot, say, every 5th point, use plot(a(1:5:end),b(1:5:end)), instead of plot(a,b). This works, because a(1:5:end) returns a(1), a(6), a(11), ..., so that you will plot roughly 1/5 of your data points. Here you simply omit most of your data points, so I prefer the second method.
2) If you have Image Processing toolbox, use imresize. Before plotting, resize your data aplot=imresize(a,0.2); If you want to decrease the size of your array by a factor of N, the second argument of the imresize should be 1/N. This generally works better, since you have an idea what's going on in your full dataset.
Related
I have a plot in which there are a few noise components. I am planning to select data from that plot preferably above a threshold in my case I am planning to keep it at 2.009 on the Y axis. And plot the lines going only above it. And if anything is below that i would want to plot it as 0.
as we can see in the figure
t1=t(1:length(t)/5);
t2=t(length(t)/5+1:2*length(t)/5);
t3=t(2*length(t)/5+1:3*length(t)/5);
t4=t(3*length(t)/5+1:4*length(t)/5);
t5=t(4*length(t)/5+1:end);
X=(length(prcdata(:,4))/5);
a = U(1 : X);
b = U(X+1: 2*X);
c = U(2*X+1 : 3*X);
d = U(3*X+1 : 4*X);
e = U(4*X+1 : 5*X);
figure;
subplot (3,2,2)
plot(t1,a);
subplot (3,2,3)
plot(t2,b);
subplot(3,2,4)
plot(t3,c);
subplot(3,2,5)
plot(t4,d);
subplot(3,2,6)
plot(t5,e);
subplot(3,2,1)
plot(t,prcdata(:,5));
figure;
A=a(a>2.009,:);
plot (t1,A);
This code splits the data (in the image into 5 every 2.8 seconds, I am planning to use the thresholding in first 2.8 seconds. Also I had another code but I am just not sure if it works as it took a long time to be analysed
figure;
A=a(a>2.009,:);
plot (t1,A);
for k=1:length(a)
if a(k)>2.009
plot(t1,a(k)), hold on
else
plot(t1,0), hold on
end
end
hold off
The problem is that you are trying to plot potentially several thousand times and adding thousands of points onto a plot which causes severe memory and graphical issues on your computer. One thing you can do is pre process all of the information and then plot it all at once which will take significantly less time.
figure
threshold = 2.009;
A=a>threshold; %Finds all locations where the vector is above your threshold
plot_vals = a.*A; %multiplies by logical vector, this sets invalid values to 0 and leaves valid values untouched
plot(t1,plot_vals)
Because MATLAB is a highly vectorized language, this format will not only be faster to compute due to a lack of for loops, it is also much less intensive on your computer as the graphics engine does not need to process thousands of points individually.
The way MATLAB handles plots is with handles to each line. When you plot a vector, MATLAB is able to simply store the vector in one address and call it once when plotting. However, when each point is called individually, MATLAB has to store each point in a separate location in memory and call all of them individually and graphically handle each point completely separately.
Per request here is the edit
plot(t1(A),plot_vals(A))
I want to make a plot that discontinues at one point using Matlab.
This is what the plot looks like using scatter:
However, I would like to the plot to be a smooth curve but not scattered dots. If I use plot, it would give me:
I don't want the vertical line.
I think I can break the function manually into two pieces, and draw them separately on one figure, but the problem is that I don't know where the breaking point is before hand.
Is there a good solution to this? Thanks.
To find the jump in the data, you can search for the place where the derivative of the function is the largest:
[~,ind] = max(diff(y));
One way to plot the function would be to set that point to NaN and plotting the function as usual:
y(ind) = NaN;
plot(x,y);
This comes with the disadvantage of losing a data point. To avoid this, you could add a data point with value NaN in the middle:
xn = [x(1:ind), mean([x(ind),x(ind+1)]), x(ind+1:end)];
yn = [y(1:ind), NaN, y(ind+1:end)];
plot(xn,yn);
Another solution would be to split the vectors for the plot:
plot(x(1:ind),y(1:ind),'-b', x(ind+1:end),y(ind+1:end),'-b')
All ways so far just handle one jump. To handle an arbitrary number of jumps in the function, one would need some knowledge how large those jumps will be or how many jumps there are. The solution would be similar though.
you should iterate through your data and find the index where there is largest distance between two consecutive points. Break your array from that index in two separate arrays and plot them separately.
I have a huge sparse matrix (1,000 x 1,000,000) that I cannot load on matlab (not enough RAM).
I want to visualize this matrix to have an idea of its sparsity and of the differences of the values.
Because of the memory constraints, I want to proceed as follows:
1- Divide the matrix into 4 matrices
2- Load each matrix on matlab and visualize it so that the colors give an idea of the values (and of the zeros particularly)
3- "Stick" the 4 images I will get in order to have a global idea for the original matrix
(i) Is it possible to load "part of a matrix" in matlab?
(ii) For the visualization tool, I read about spy (and daspect). However, this function only enables to visualize the non-zero values indifferently of their scales. Is there a way to add a color code?
(iii) How can I "stick" plots in order to make one?
If your matrix is sparse, then it seems that the currently method of storing it (as a full matrix in a text file) is very inefficient, and certainly makes loading it into MATLAB very hard. However, I suspect that as long as it is sparse enough, it can still be leaded into MATLAB as a sparse matrix.
The traditional way of doing this would be to load it all in at once, then convert to sparse representation. In your case, however, it would make sense to read in the text file, one line at a time, and convert to a MATLAB sparse matrix on-the-fly.
You can find out if this is possible by estimating the sparsity of your matrix, and using this to see if the whole thing could be loaded into MATLAB's memory as a sparse matrix.
Try something like: (untested code!)
% initialise sparse matrix
sparse_matrix = sparse(num_rows, num_cols);
row_num = 1;
fid = fopen(filename);
% read each line of text file in turn
while ~feof(fid)
this_line = fscanf(fid, '%f');
% add row to sparse matrix (note transpose, which I think is required)
sparse_matrix(row_num, :) = this_line';
row_num = row_num + 1;
end
fclose(fid)
% visualise using spy
spy(sparse_matrix)
Visualisation
With regards to visualisation: visualising a sparse matrix like this via a tool like imagesc is possible, but I believe it may internally create the full matrix – maybe someone can confirm if this is true or not. If it does, then it's going to cause you memory problems.
All spy is really doing is plotting in 2D the locations of the non-zero elements. You can fairly easily write your own spy function, which can have different coloured or sized points depending on the values at each location. See this answer for some examples.
Saving sparse matrices
As I say above, the method your matrix is saved as is pretty inefficient – for a matrix with 10% sparsity, around 95% of your text file will be a zero or a space. I don't know where this data has come from, but if you have any control over its creation (e.g. it comes from another program you have written) it would make much more sense to save only the non-zero elements in the format row_idx, col_idx, value.
You can then use spconvert to import the sparse matrix directly.
One of the simplest methods (if you can actually store the full sparse matrix in RAM) is to use gnuplot to visualize the sparisty pattern.
I was able to spy matrices of size 10-20GB using gnuplot without problems. But make sure you use png or jpeg formats to output the image. Note that you don't need the value of the non-zero entry only the integers (row, col). And plot them "plot "row_col.dat" using 1:2 with points".
This chooses your row as x axis and cols as your y axis and start plotting the non-zero entries. It is very easy to do this. This is the most scalable solution I know. Gnuplot works at decent speed even for very large datasets (>10GB of [row, cols]), but Matlab just hangs (with due respect)
I use imagesc() to visualise arrays. It scales the values in array to values between 0 and 1, then plots the array like a greyscale bitmap image (of course you can change the colormap to make it easier to see detail).
I have a time series of temperature profiles that I want to interpolate, I want to ask how to do this if my data is irregularly spaced.
Here are the specifics of the matrix:
The temperature is 30x365
The time is 1x365
Depth is 30x1
Both time and depth are irregularly spaced. I want to ask how I can interpolate them into a regular grid?
I have looked at interp2 and TriScatteredInterp in Matlab, however the problem are the following:
interp2 works only if data is in a regular grid.
TriscatteredInterp works only if the vectors are column vectors. Although time and depth are both column vectors, temperature is not.
Thanks.
Function Interp2 does not require for a regularly spaced measurement grid at all, it only requires a monotonic one. That is, sampling positions stored in vectors depths and times must increase (or decrease) and that's all.
Assuming this is indeed is the situation* and that you want to interpolate at regular positions** stored in vectors rdepths and rtimes, you can do:
[JT, JD] = meshgrid(times, depths); %% The irregular measurement grid
[RT, RD] = meshgrid(rtimes, rdepths); %% The regular interpolation grid
TemperaturesOnRegularGrid = interp2(JT, JD, TemperaturesOnIrregularGrid, RT, RD);
* : If not, you can sort on rows and columns to come back to a monotonic grid.
**: In fact Interp2 has no restriction for output grid (it can be irregular or even non-monotonic).
I would use your data to fit to a spline or polynomial and then re-sample at regular intervals. I would highly recommend the polyfitn function. Actually, anything by this John D'Errico guy is incredible. Aside from that, I have used this function in the past when I had data on a irregularly spaced 3D problem and it worked reasonably well. If your data set has good support, which I suspect it does, this will be a piece of cake. Enjoy! Hope this helps!
Try the GridFit tool on MATLAB central by John D'Errico. To use it, pass in your 2 independent data vectors (time & temperature), the dependent data matrix (depth) along with the regularly spaced X & Y data points to use. By default the tool also does smoothing for overlapping (or nearly) data points. If this is not desired, you can override this (and other options) through a wide range of configuration options. Example code:
%Establish regularly spaced points
num_points = 20;
time_pts = linspace(min(time),max(time),num_points);
depth_pts = linspace(min(depth),max(depth),num_points);
%Run interpolation (with smoothing)
Pest = gridfit(depth, time, temp, time_pts, depth_pts);
I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.