MATLAB query about for loop, reading in data and plotting - matlab

I am a complete novice at using matlab and am trying to work out if there is a way of optimising my code. Essentially I have data from model outputs and I need to plot them using matlab. In addition I have reference data (with 95% confidence intervals) which I plot on the same graph to get a visual idea on how close the model outputs and reference data is.
In terms of the model outputs I have several thousand files (number sequentially) which I open in a loop and plot. The problem/question I have is whether I can preprocess the data and then plot later - to save time. The issue I seem to be having when I try this is that I have a legend which either does not appear or is inaccurate.
My code (apolgies if it not elegant):
fn= xlsread(['tbobserved' '.xls']);
time= fn(:,1);
totalreference=fn(:,4);
totalreferencelowerci=fn(:,6);
totalreferenceupperci=fn(:,7);
figure
plot(time,totalrefrence,'-', time, totalreferencelowerci,'--', time, totalreferenceupperci,'--');
xlabel('Year');
ylabel('Reference incidence per 100,000 population');
title ('Total');
clickableLegend('Observed reference data', 'Totalreferencelowerci', 'Totalreferenceupperci','Location','BestOutside');
xlim([1910 1970]);
hold on
start_sim=10000;
end_sim=10005;
h = zeros (1,1000);
for i=start_sim:end_sim %is there any way of doing this earlier to save time?
a=int2str(i);
incidenceFile =strcat('result_', 'Sim', '_', a, 'I_byCal_total.xls');
est_tot=importdata(incidenceFile, '\t', 1);
cal_tot=est_tot.data;
magnitude=1;
t1=cal_tot(:,1)+1750;
totalmodel=cal_tot(:,3)+cal_tot(:,5);
h(a)=plot(t1,totalmodel);
xlim([1910 1970]);
ylim([0 500]);
hold all
clickableLegend(h(a),a,'Location','BestOutside')
end
Essentially I was hoping to have a way of reading in the data and then plot later - ie. optimise the code.
I hope you might be able to help.
Thanks.
mp

Regarding your issue concerning
I have a legend which either does not
appear or is inaccurate.
have a look at the following extracts from your code.
...
h = zeros (1,1000);
...
a=int2str(i);
...
h(a)=plot(t1,totalmodel);
...
You are using a character array as index. Instead of h(a) you should use h(i). MATLAB seems to cast the character array a to double as shown in the following example with a = 10;.
>> double(int2str(10))
ans = 49 48
Instead of h(10) the plot handle will be assigned to h([49 48]) which is not your intention.

Related

How to create a vector of the results in a for loop

I have a problem with the following code. I want to store all the values I am creating in the for loop below so that I can make a plot of it. I have tried several things, but nothing works. Does anyone know a simple method to create a vector of the results and then plot them?
dx=0.1;
t=1;
e=1;
for x=-1:dx:1
lower_bound=-100;
upper_bound=x/(sqrt(4*t*e));
e=1;
u=(1/sqrt(pi))*quad(#integ,lower_bound,upper_bound);
plot(x,u)
hold on
end
hold off
I would like to use as much of this matlab code as possible.
dx=0.1;
t=1;
e=1;
xval=[-1:dx:1].';
upper_bound = zeros(numel(xval),1);
u = zeros(numel(xval),1);
for ii=1:numel(xval)
x = xval(ii)
lower_bound=-100;
upper_bound(ii,1)=x/(sqrt(4*t*e));
u(ii,1)=(1/sqrt(pi))*quad(#integ,lower_bound,upper_bound(ii));
end
figure;
plot(xval,u)
by adding the (ii) behind your statements it saves your variables in an array. I did not use that on your lower_bound since it is a constant.
Note that I first created an array xval and called that with integers in ii, since subscriptindices must be positive integers in MATLAB. I also initialised both upper_bound and u by creating a zero matrix before the loop executes. This is handy since extending an existing vector is very memory and time consuming in MATLAB and since you know how big they will get (same number of elements as xval) you might as well use that.
I also got the plot call outside the loop, to prevent you from plotting 21 blue lines in 1 plot.

data driven curve - MATLAB

I have several sets of data that I want to fit but not all of them look the same (some look like a Gaussian with one peak, some like two Gaussians with 2 peaks or Lorentzians). I wanted to try this method
http://www.mathworks.com/matlabcentral/fileexchange/31562-data-driven-fitting-with-matlab/content/fitit.m
but the program given is not complete so I can not use it (there is no line that defines 'train' and 'test'). I am writing it so that it suits and works for my data (based on the code that it is given and the demo). I was able to find the best fit but I am also trying to use the bootstrap technique in order to find the confidence intervals. My data is xdata and ydata and they are sorted and the duplicates have been removed before I use them in my program.
cpart=cvpartition(size(xdata,1),'k',10);
tr_x=xdata(training(cpart,1));
tr_y=ydata(training(cpart,1));
tst_x=xdata(test(cpart,1));
tst_y=ydata(test(cpart,1));
all_span=linspace(0.01,0.99,99);
s=zeros(length(all_span);
for k=1:length(all_span)
f = #(tr_x,tr_y,tst_x,tst_y) norm(tst_y mylowess (tr_x, tr_y, tst_x, all_span (k)))^2
s(k) = sum(crossval(f,datax,datay,'partition',cpart));
end
[~,mj]=min(s);
n_span=all_span(mj);%n_span is the optimal span
function ys=mylowess(x1,y1,xs,span)
ys1 = smooth(x1,y1,span,'loess');
ys = interp1(x1,ys1,xs,'linear',NaN);
if any(isnan(ys))
ys(xs<x1(1)) = ys1(1);
ys(xs>x1(end)) = ys1(end);
end
So up to this point I understand the program and I have managed to find the optimal span. I want to find the confidence intervals but so far I was not able to make it work.
When I type:
NB=length(xdata);
f=#(xdata,ydata) mylowess(xdata,ydata,xdata,n_span);
yboot2 = bootstrp(NB,f,xdata,ydata)';
I get the following error
Error using griddedInterpolant
The grid vectors are not strictly monotonic increasing.
Error in interp1 (line 186)
F = griddedInterpolant(X,V,method);
Error in mylowess (line 26)
ysmooth=interp1(xdata,ysmooth1,xinput,'linear',NaN);
As I said before there are no duplicates in xdata and I have already sorted xdata before I used them in the program. Can anyone see the mistake I am making? Or is there an easier way to get the confidence intervals?
Thank you for your help.

MATLAB loading data from multiple .mat files

My data is x,y co-ordinates in multiple files
a=dir('*.mat')
b={a(:).name}
to load the filenames in a cell array
How do I use a loop to sequentially load one column of data from each file into consecutive rows of a new/separate array......?
I've been doing it individually using e.g.
Load(example1.mat)
A(:,1)=AB(:,1)
Load(example2.mat)
A(:,2)=AB(:,1)
Load(example3.mat)
A(:,3)=AB(:,1)
Obviously very primitive and time consuming!!
My Matlab skills are weak so any advice gratefully received
Cheers
Many thanks again, I'm still figuring out how to read the code but I used it like this;
a=dir('*.mat');
b={a(:).name};
test1=zeros(numel(b),1765);
for k=1:numel(b) S=load(b{k});
I then used the following code to create a PCA cluster plot
test1(k,:)=S.AB(:,2); end [wcoeff,score,latent,tsquared,explained] = pca(test1,... 'VariableWeights','variance');
c3 = wcoeff(:,1:3) coefforth = inv(diag(std(test1)))*wcoeff; I = c3'*c3 cscores = zscore(test1)*coefforth;
figure() plot(score(:,1),score(:,2),'+') xlabel('1st Principal Component') ylabel('2nd Principal Component') –
I was using 'gname' to label the points on the cluster plot but found that the point were simply labelled from 1 to the number of rows in the array.....I was going to ask you about this but I found out simply through trial and error if I used 'gname(b)' this labels the points with the .names listed in b.....
However the clusterplot starts to look very busy/messy once I have labelled quite a few points so now I am wondering is is possible to extract the filenames into a list by dragging round or selecting a few points, I think it is possible as I have read a few related topics.....but any tips/advice around gname or labelled/extracting labels from clusterplots would be greatly appreciated. Apologies again for my formatting I'm still getting used to this website!!!
Here is a way to do it. Hopefully I got what you wanted correctly :)
The code is commented but please ask any questions if something is unclear.
a=dir('*.mat');
b={a(:).name};
%// Initialize the output array. Here SomeNumber depends on the size of your data in AB.
A = zeros(numel(b),SomeNumber);
%// Loop through each 'example.mat' file
for k = 1:numel(b)
%// ===========
%// Here you could do either of the following:
1)
%// Create a name to load with sprintf. It does not require a or b.
NameToLoad = sprintf('example%i.mat',k);
%// Load the data
S = load(NameToLoad);
2)
%// Load directly from b:
S = load(b{k});
%// ===========
%// Now S is a structure containing every variable from the exampleX.mat file.
%// You can access the data using dot notation.
%// Store the data into rows of A
A(k,:) = S.AB(:,1);
end
Hope that is what you meant!

MATLAB error message when plotting big data files?

I have written code to plot data from very large .txt files (20Gb to 60Gb). The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s.
The code works well for plotting relatively small .txt files (10Gb), however when I try to plot my larger data files (60Gb) I get the following error message:
Attempted to access TIME(0); index must be a
positive integer or logical.
Error in textscan_loop (line 17)
TIME =
((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift
Time along
The basic idea behind my code is to conserve RAM by reading Nlines of data from .txt on disk to Matlab variable C in RAM, plotting C then clearing C. This process occurs in loop so the data is plotted in chunks until the end of the .txt file is reached. The code can be found below:
Nlines = 1e6; % set numbe of lines to sample per cycle
sample_rate = (1); %sample rate
DECE= 1000;% decimation factor
TIME = (0:sample_rate:sample_rate*((Nlines)-1));%first inctance of time vector
format = '%f\t%f';
fid = fopen('H:\PhD backup\Data/ONK_PP260_G_text.txt');
while(~feof(fid))
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point you need the memory!
clearvars C ;
TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along
plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
hold on;
clearvars d;
end
fclose(fid);
I think the while loop does around 110 cycles before the code stops executing and the error message is displayed, I know this because the graph shows around 110e7 data points and the loop processes 1e6 data points at a time.
If anyone knows why this error might be occurring please let me know.
Cheers,
Jim
The error that you encounter is in fact not in the plotting, but in the line of reference.
Though I have been unable to reproduce the exact error, I suspect it to be related to this:
Time = 1:0
Time(end)
In any case, the way forward is clear. You need to run this code with dbstop if error and observe all relevant variables in the line that throws the error.
From here you will likely figure out what is causing the problem, hopefully just something simple like your code being unable to deal with data size that is an exact multiple of 1000 or so.
Trying to use plot for big data is problematic as matlab is trying to plot every single data point.
Obviously the screen will not display all of these points (many will overlap), and therefore it is recommended to plot only the relevant points. One could subsample and do this manually as you seem to have tried, but fortunately we have a ready to use solution for this:
The Plot (Big) File Exchange Submission
Here is the introduction:
This simple tool intercepts data going into a plot and reduces it to
the smallest possible set that looks identical given the number of
pixels available on the screen. It then updates the data as a user
zooms or pans. This is useful when a user must plot a very large
amount of data and explore it visually.
This works with MATLAB's built-in line plot functions, allowing the
functionality of those to be preserved.
Instead of:
plot(t, x);
One could use:
reduce_plot(t, x);
Most plot options, such as multiple series and line properties, can be
passed in too, such that 'reduce_plot' is largely a drop-in
replacement for 'plot'.
h = reduce_plot(t, x(1, :), 'b:', t, x(2, :), t, x(3, :), 'r--*');
This function works on plots where the "x" data is always increasing,
which is the most common, such as for time series.

Extract parts of a big matrix and allocate them in new variables with loop function

I am a total beginner in MATLAB and I hope to find some help here. I have some model prediction results for 80 individuals alltogether in one large matrix. I need to extract the data for each individual from the big matrix, assign them in a new variable/matrix, do some extra calculations and then plot certain information as needed.
To do so, I am trying to write a script with a loop function but in a complicated, or maybe more accurately: in a primitive way!
Simplified Example:
My matrix is called: All_Indi_Data .... its dimension is: 600 rows x 21 columns
%Column 1: grouping variable (e.g., code or ID with values 1,2,3,4,5, etc.);
%Column 2: independent var.;
%Column 3: t;
%Column 4: OBS;
%Column 5: PRED;
i= length (All_Indi_Data);
%% First Indi.
q=1; % indicating the ID of the indi for which I want to extract the data
j=1; % variable added to insure writing start from the first row
for r=1:i
if All_Indi_Data (r,1)==q
Indi_1 (j,1:21) = All_Indi_Data (r,1:21)
j=j+1
end
end
%% Second Indi.
q=q+1
j=1
for r=1:i
if All_Indi_Data (r,1)==q
Indi_2 (j,1:21) = All_Indi_Data (r,1:21)
j=j+1
end
end
.
.
.
1) My first question is: can I allocate these data in new variables (Indi_1, Indi_2, ect.) in a more simple way with or without the loop function?!!! I would appreciate your help a lot.
2) Is there any code or any way to plot these selected parts (according to the grouping variable, e.g. data for Indi_1) from the previously mentioned big matrix without wasting a lot of time and space (wto recopying the core part of the code again and again) for the script, and using the loop function?! in other words, I would like to detect - with loop function & the grouping variable- which values are of interest and then to plot them (e.g. data in colum 3 with data from column 4 for each individual, starting from the first to the last)?!
I hope that I described my problem clearly and hope to hear something from the expert guys :) ...
Thanks a lot in advance ..
Try the following code:
for idx=1:80
pos=find(All_Indi_Data(:,1)==idx);
eval(['Indi_' num2str(idx) '=All_Indi_Data(pos,:);']);
end
What I do is: in each iteration, I search for a value of the ID, indicated in the variable idx. Note that I do not use ´i´ as the name of a variable, because Matlab uses it and ´j´ and the imaginary unit for complex numbers and that could cause problems.
Then, using find I search for the position (or positions) of All_Indi_Data in which I can find the information of that individual. Now I have in the variable ´pos´ the indexes of the rows in which there is information for the individual of interest.
Finally, using eval I extract the data for each individual into a variable. Note that eval combined with a loop makes it easy to create lots of variables. I indicate the rows I want to extract with ´pos´ and, as I want all the columns, I use just ´:´ (you could use ´1:21´ too).
With another similar loop you can plot the information you want. For example:
for idx=1:80
eval(['x=Indi_' num2str(idx) ';']);
% Now I have in X the information for this individual
%Plot the columns of x I want
plot(x(:, 3), x(:,4));
pause; %stay here until a press a key
end