I have written code to plot data from very large .txt files (20Gb to 60Gb). The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s.
The code works well for plotting relatively small .txt files (10Gb), however when I try to plot my larger data files (60Gb) I get the following error message:
Attempted to access TIME(0); index must be a
positive integer or logical.
Error in textscan_loop (line 17)
TIME =
((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift
Time along
The basic idea behind my code is to conserve RAM by reading Nlines of data from .txt on disk to Matlab variable C in RAM, plotting C then clearing C. This process occurs in loop so the data is plotted in chunks until the end of the .txt file is reached. The code can be found below:
Nlines = 1e6; % set numbe of lines to sample per cycle
sample_rate = (1); %sample rate
DECE= 1000;% decimation factor
TIME = (0:sample_rate:sample_rate*((Nlines)-1));%first inctance of time vector
format = '%f\t%f';
fid = fopen('H:\PhD backup\Data/ONK_PP260_G_text.txt');
while(~feof(fid))
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point you need the memory!
clearvars C ;
TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along
plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
hold on;
clearvars d;
end
fclose(fid);
I think the while loop does around 110 cycles before the code stops executing and the error message is displayed, I know this because the graph shows around 110e7 data points and the loop processes 1e6 data points at a time.
If anyone knows why this error might be occurring please let me know.
Cheers,
Jim
The error that you encounter is in fact not in the plotting, but in the line of reference.
Though I have been unable to reproduce the exact error, I suspect it to be related to this:
Time = 1:0
Time(end)
In any case, the way forward is clear. You need to run this code with dbstop if error and observe all relevant variables in the line that throws the error.
From here you will likely figure out what is causing the problem, hopefully just something simple like your code being unable to deal with data size that is an exact multiple of 1000 or so.
Trying to use plot for big data is problematic as matlab is trying to plot every single data point.
Obviously the screen will not display all of these points (many will overlap), and therefore it is recommended to plot only the relevant points. One could subsample and do this manually as you seem to have tried, but fortunately we have a ready to use solution for this:
The Plot (Big) File Exchange Submission
Here is the introduction:
This simple tool intercepts data going into a plot and reduces it to
the smallest possible set that looks identical given the number of
pixels available on the screen. It then updates the data as a user
zooms or pans. This is useful when a user must plot a very large
amount of data and explore it visually.
This works with MATLAB's built-in line plot functions, allowing the
functionality of those to be preserved.
Instead of:
plot(t, x);
One could use:
reduce_plot(t, x);
Most plot options, such as multiple series and line properties, can be
passed in too, such that 'reduce_plot' is largely a drop-in
replacement for 'plot'.
h = reduce_plot(t, x(1, :), 'b:', t, x(2, :), t, x(3, :), 'r--*');
This function works on plots where the "x" data is always increasing,
which is the most common, such as for time series.
Related
I have a piece of MATLAB code that works fine, but I wanted to know is there any faster way of performing the same task, where each .csv file is a 768*768 dimension matrix
Current code:
for k = 1:143
matFileName = sprintf('ang_thresholded%d.csv', k);
matData = load(matFileName);
imshow(matData)
end
Any help in this regard will be very helpful. Thank You!
In general, its better to separate the loading, computational and graphical stuff.
If you have enough memory, you should try to change your code to:
n_files=143;
% If you know the size of your images a priori:
matData=zeros( 768, 768,n_files); % prealocate for speed.
for k = 1:n_files
matFileName = sprintf('ang_thresholded%d.csv', k);
matData(:,:,k) = load(matFileName);
end
seconds=0.01;
for k=1:n_Files
%clf; %Not needed in your case, but needed if you want to plot more than one thing (hold on)
imshow(matData(:,:,k));
pause(seconds); % control "framerate"
end
Note the use of pause().
Here is another option using Matlab's data stores which are designed to work with large datasets or lots of smaller sets. The TabularTextDatastore is specifically for this kind of text based data.
Something like the following. However, note that since I don't have any test files it is sort of notional example ...
ttds = tabularTextDatastore('.\yourDirPath\*.csv'); %Create the data store
while ttds.hasdata %This turns false after reading the last file.
temp = read(ttds); %Returns a Matlab table class
imshow(temp.Variables)
end
Since it looks like your filenames' numbering is not zero padded (e.g. 1 instead of 001) then the file order might get messed up so that may need addressed as well. Anyway I thought this might be a good alternative approach worth considering depending on what else you want to do with the data and how much of it there might be.
I'm trying to split an audio file into 30 millisecond disjoint intervals using Matlab. I have the following code at the moment:
clear all
close all
% load the audio file and get its sampling rate
[y, fs] = audioread('JFK_ES156.wav');
for m = 1 : 6000
[t(m), fs] = audioread('JFK_ES156.wav', [(m*(0.03)*fs) ((m+1)*(0.03)*fs)]);
end
But the problem is that I get the following error:
In an assignment A(I) = B, the number of elements in B and I
must be the same.
Error in splitting (line 12)
[t(m), fs] = audioread('JFK_ES156.wav', [(m*(0.03)*fs)
((m+1)*(0.03)*fs)]);
I don't see why there's a mismatch in the number of elements in B and I and how to solve this. How can I get past this error? Or is there just an easier way to split the audio file (maybe another function I don't know about or something)?
I think the easiest way to split audio is to just load it and use the vec2mat function. so you would have something like this;
[X,Fs] = audioread('JFK_ES156.wav');
%Calculate how many samples you need to capture 30ms of audio
matSize = Fs*0.3;
%Pay attention to that apostrophe. Makes sure samples are stored in columns
%rather than rows.
output = vec2mat(x,matSize)';
%You can now have your audio split up into the different columns of your matrix.
%You can call them by using the column calling command for matrices.
%Plot first 30ms of audio
plot(output(:,1));
%You can join the audio back together using this command.
output = output(:);
Hope that helps. Another good thing about this method is that it keeps all your data in one place!
Edit : One thing I thought of, you may get a problem with this depending on your vector size. But I think vec2mat actually zeroPads your vector. Not a big thing, but if you're moving back and forth between the two, then it might be a good idea to have another variable that stores the original length of your signal.
You should just use the variable y and reshape it to form your split audio. For example,
chunk_size = fs*0.03;
y_chunks = reshape(y, chunk_size, 6000);
That will give you a matrix with each column a 30 ms chunk. This code will also be faster than reading small segments from file in a loop.
As hiandbaii suggested you could also use cell array. Make sure you clear your existing variables before that. Not clearing the array t is probably the reason you got the error "Cell contents assignment to a non-cell array object."
Your original error is because you cannot assign a vector with scalar indexing. That is, 'm' is a scalar, but your audioread call is returning a vector. This is what the error says about mismatch in size of I and B. You could also fix that by making t a 2-D array and use an assignment like
[t(m,:), fs] =
It appears that each 30 ms segment is not equal to one sample. That would be the only case where your code works. i.e. 0.03*fs != 1.
You could try using cells instead.. i.e. replace t(m) with t{m}
My data is x,y co-ordinates in multiple files
a=dir('*.mat')
b={a(:).name}
to load the filenames in a cell array
How do I use a loop to sequentially load one column of data from each file into consecutive rows of a new/separate array......?
I've been doing it individually using e.g.
Load(example1.mat)
A(:,1)=AB(:,1)
Load(example2.mat)
A(:,2)=AB(:,1)
Load(example3.mat)
A(:,3)=AB(:,1)
Obviously very primitive and time consuming!!
My Matlab skills are weak so any advice gratefully received
Cheers
Many thanks again, I'm still figuring out how to read the code but I used it like this;
a=dir('*.mat');
b={a(:).name};
test1=zeros(numel(b),1765);
for k=1:numel(b) S=load(b{k});
I then used the following code to create a PCA cluster plot
test1(k,:)=S.AB(:,2); end [wcoeff,score,latent,tsquared,explained] = pca(test1,... 'VariableWeights','variance');
c3 = wcoeff(:,1:3) coefforth = inv(diag(std(test1)))*wcoeff; I = c3'*c3 cscores = zscore(test1)*coefforth;
figure() plot(score(:,1),score(:,2),'+') xlabel('1st Principal Component') ylabel('2nd Principal Component') –
I was using 'gname' to label the points on the cluster plot but found that the point were simply labelled from 1 to the number of rows in the array.....I was going to ask you about this but I found out simply through trial and error if I used 'gname(b)' this labels the points with the .names listed in b.....
However the clusterplot starts to look very busy/messy once I have labelled quite a few points so now I am wondering is is possible to extract the filenames into a list by dragging round or selecting a few points, I think it is possible as I have read a few related topics.....but any tips/advice around gname or labelled/extracting labels from clusterplots would be greatly appreciated. Apologies again for my formatting I'm still getting used to this website!!!
Here is a way to do it. Hopefully I got what you wanted correctly :)
The code is commented but please ask any questions if something is unclear.
a=dir('*.mat');
b={a(:).name};
%// Initialize the output array. Here SomeNumber depends on the size of your data in AB.
A = zeros(numel(b),SomeNumber);
%// Loop through each 'example.mat' file
for k = 1:numel(b)
%// ===========
%// Here you could do either of the following:
1)
%// Create a name to load with sprintf. It does not require a or b.
NameToLoad = sprintf('example%i.mat',k);
%// Load the data
S = load(NameToLoad);
2)
%// Load directly from b:
S = load(b{k});
%// ===========
%// Now S is a structure containing every variable from the exampleX.mat file.
%// You can access the data using dot notation.
%// Store the data into rows of A
A(k,:) = S.AB(:,1);
end
Hope that is what you meant!
I'm creating a code to do data analysis that will run on a server. The code is supposed to spit a pdf file with 3 plots on it.
I have created a code that generates the plot
fig = figure;
for i = 1:3
%do some calculation to find, X, Y and fit
subplot(3,1,i)
scatter(X,Y)
hold on
plot(X,fit)
end
print (fig, '-dpdf','fig.pdf')
X, Y, and fit are calculated/imported parameters. The output of this code is a pdf document with only the last plot on it (missing the first two).
How can I print all three of them to file?
I tried your code on my CPU (X,Y and fit were generated randomly) and it works fine, so the bug may come from the interaction of this snip of code with you "%do some calculations block"
I would suggest to add a "hold off" command before the end of the for loop ...
gus
I am a complete novice at using matlab and am trying to work out if there is a way of optimising my code. Essentially I have data from model outputs and I need to plot them using matlab. In addition I have reference data (with 95% confidence intervals) which I plot on the same graph to get a visual idea on how close the model outputs and reference data is.
In terms of the model outputs I have several thousand files (number sequentially) which I open in a loop and plot. The problem/question I have is whether I can preprocess the data and then plot later - to save time. The issue I seem to be having when I try this is that I have a legend which either does not appear or is inaccurate.
My code (apolgies if it not elegant):
fn= xlsread(['tbobserved' '.xls']);
time= fn(:,1);
totalreference=fn(:,4);
totalreferencelowerci=fn(:,6);
totalreferenceupperci=fn(:,7);
figure
plot(time,totalrefrence,'-', time, totalreferencelowerci,'--', time, totalreferenceupperci,'--');
xlabel('Year');
ylabel('Reference incidence per 100,000 population');
title ('Total');
clickableLegend('Observed reference data', 'Totalreferencelowerci', 'Totalreferenceupperci','Location','BestOutside');
xlim([1910 1970]);
hold on
start_sim=10000;
end_sim=10005;
h = zeros (1,1000);
for i=start_sim:end_sim %is there any way of doing this earlier to save time?
a=int2str(i);
incidenceFile =strcat('result_', 'Sim', '_', a, 'I_byCal_total.xls');
est_tot=importdata(incidenceFile, '\t', 1);
cal_tot=est_tot.data;
magnitude=1;
t1=cal_tot(:,1)+1750;
totalmodel=cal_tot(:,3)+cal_tot(:,5);
h(a)=plot(t1,totalmodel);
xlim([1910 1970]);
ylim([0 500]);
hold all
clickableLegend(h(a),a,'Location','BestOutside')
end
Essentially I was hoping to have a way of reading in the data and then plot later - ie. optimise the code.
I hope you might be able to help.
Thanks.
mp
Regarding your issue concerning
I have a legend which either does not
appear or is inaccurate.
have a look at the following extracts from your code.
...
h = zeros (1,1000);
...
a=int2str(i);
...
h(a)=plot(t1,totalmodel);
...
You are using a character array as index. Instead of h(a) you should use h(i). MATLAB seems to cast the character array a to double as shown in the following example with a = 10;.
>> double(int2str(10))
ans = 49 48
Instead of h(10) the plot handle will be assigned to h([49 48]) which is not your intention.