I have a function that imports data collected in a txt file using the following code:
FILE = fopen(textDataFileName);
FIC = textscan(FILE, '%s');
FICu = FIC{1,1}(:,:);
n = numel(FICu);
for i = 1:n
FICun = str2double(FICu{i,1});
FICa(i,1) = FICun;
end
After the data is imported my function extracts all the necessary data and then I have other functions that do data analysis for me. My issue though, is that the function as a whole is slowed down by the above for loop. Originally the for loop wasn't a problem as the data set was relatively small; however, new data is appended to the text file everyday and thus the for loop has to contend with larger and larger data sets (the size of which cannot be predicted before importing). Does anyone have any easy way to vectorize my for loop and accomplish the same thing?
As a quick note, changing the format does not result in the behavior one would expect. In fact changing the formation to %f (for floating point) or %d (for decimal) causes the function to skip most of the data in the file.
Another update:
My code is now as follows:
FILE = fopen(textDataFileName);
FIC = textscan(FILE, '%s');
FICu = FIC{1,1}(:,:);
n = numel(FICu);
FICa = zeros(n,1);
FICa = str2double(FICu(:,1));
This shaved 2s off the time it takes to complete. Any suggestions? (also, please note the issues of changing the file format, it did not work as one would expect.
Related
I am currently working with a script that saves matrices as .mat files from other .mat files. I need to save 96 separate files so I have a loop that goes through the matrix names. I need to have the matrices saved with specific titles, that I have saved the titles in cell arrays {}. However, when I use the save(filename,variable) function, I get an error saying:
Error using save
Must be a text scalar.
Error in File_Creator (line 35)
save(name,fname);
My matrices need to be named 'PHI_Af','PHI_Am' (so on until) 'SLR_EF' (so every cr value needs to have a matrix with every par value. Here is what I am currently attempting:
cr = {'Af','Am','As','Aw','BS','BW','Cs','Cw','Cf','Ds','Dw','Df','ET','EF'};
par = {'PHI','BLD','KS','LAMBDA','PSIS','SLR'};
underscore = {'_'};
%% i and j are parameters in a loop where i = 1:length(par) and j = 1:length(cr)
%% f is the variable currently storing the matrix
s.(horzcat(par{i},underscore{1},cr{j})) = f;
name = string(strcat(par{i},'_',cr{j},'.mat'));
fname = string(s.(horzcat(par{i},underscore{1},cr{j})));
save(name,fname);
When I replace 'fname' with a generic string e.g. 'f', then the command runs but all the matrices save as the same thing ('f'), which makes it extremely difficult to run them all in the same script later.
I hope somebody can tell me what I'm doing wrong or provide me with a better solution. Please let me know if I can provide any more information.
Thank you
Assuming that the matrix, f, changes in each iteration of the loop (due to some other code you didn't post), it seems like this is all the code you need:
cr = {'Af','Am','As','Aw','BS','BW','Cs','Cw','Cf','Ds','Dw','Df','ET','EF'};
par = {'PHI','BLD','KS','LAMBDA','PSIS','SLR'};
for i = 1:length(par)
for j = 1:length(cr)
% add code here that loads the matrix f
name = [par{i}, '_', cr{j}, '.mat'];
save(name, 'f');
end
end
I ran a simulation which wrote a huge file to disk. The file is a big matrix v. I can't read it all, but I really only need a portion of the matrix, say, 1:100 of the columns and rows. I'd like to do something like
vtag = dlmread('v',1:100:end, 1:100:end);
Of course, that doesn't work. I know I should have only done the following when writing to the file
dlmwrite('vtag',v(1:100:end, 1:100:end));
But I did not, and running everything again would take two more days.
Thanks
Amir
Thankfully the dlmread function supports specifying a range to read as the third input. So if you wan to read all N columns for the first 100 rows, you can specify that with the following command
startRow = 1;
startColumn = 1;
endRow = 100;
endColumn = N;
rng = [startRow, startColumn, endRow, endColumn] - 1;
vtag = dlmread(filename, ',', rng);
EDIT Based on your clarification
Since you don't want 1:100 rows but rather 1:100:end rows, the following approach should work better for you.
You can use textscan to read chunks of data at a time. You can read a "good" row and then read in the next "chunk" of data to ignore (discarding it in the process), and continue until you reach the end of the file.
The code below is a slight modification of that idea, except it utilizes the HeaderLines input to textscan which instructs the function how many lines to ignore before reading in the data. The first time through the loop, no lines will be skipped, however all other times through the loop, rows2skip lines will be skipped. This allows us to "jump" through the file very rapidly without calling any additional file opertions.
startRow = 1;
rows2skip = 99;
columns = 3000;
fid = fopen(filename, 'rb');
% For now, we'll just assume you're reading in floating-point numbers
format = repmat('%f ', [1 columns]);
count = 1;
lines2discard = startRow - 1;
while ~feof(fid)
% Use "HeaderLines" to skip data before reading in data we care about
row = textscan(fid, format, 1, 'Delimiter', ',', 'HeaderLines', lines2discard);
data{count} = [row{:}];
% After the first time through, set the "HeaderLines" (i.e. lines to ignore)
% to be the # we want to skip between lines (much faster than alternatives!)
lines2discard = rows2skip;
count = count + 1;
end
fclose(fid);
data = cat(1, data{:});
You may need to adjust your format specifier for your own type of input.
I have a dataset that is composed of (1069 x 38742), I want to remove all the 3rd columns of the matrix and so I have written a for loop to get this done.
The code is as follows: `
dataTS1 = rand(1069,38742)
for i = 1:12914
dataTS1(:,3*i) = [];
end`
The problem is that it is taking a very long time to execute this code.
I also tried another following after searching for some other methods such as the following using logical indexing
dataTS1 = rand(1069,38742)
for i = 1:12914
index = true(1069,size(dataTS1,2));
index(:,3*i) = false;
y = dataTS1(:,index);
end
However for the 2nd loop, I get the image error that Index exceeds matrix dimensions.
As for the first loop, I am not sure why it is taking so long.
It takes so long because every time you 'remove' data, you actually copy a large part of your array.
better use (without for loop)
dataTS1(:,3:3:end) = [];
I have a text file that is ~80MB. It has 2 cols and around 6e6 rows. I would like to import the data into MATLAB, but it is too much data to do with the load function. I have been playing around with the fopen function but cant get anything to work.
Ideally I would like to take the first col of data and import and eventually have it in one large array in MATLAB. If that isn't possible, I would like to split it into arrays of 34,013 in length. I would also like to do the same for the 2nd col of data.
fileID = fopen('yourfilename.txt');
formatSpec = '%f %f';
while ~feof(fileID)
C = textscan(fileID,formatSpec,34013);
end
Hope this helps..
Edit:
The reason you are getting error is because C has two columns. So you need to take the columns individually and handle them.
For example:
column1data = reshape(C(:,1),301,113);
column2data = reshape(C(:,2),301,113);
You may also consider to convert your file to binary format if your data file is not changing each time you want to load it. Then you'll load it way much faster.
Or you may do "transparent binary conversion" like in the function below. Only first time you load the data will be slow. All subsequent will be fast.
function Data = ReadTextFile(FileName,NColumns)
MatFileName = sprintf('%s.mat',FileName); % binary file name
if exist(MatFileName,'file')==2 % if it exists
S = load(MatFileName,'Data'); % load it instead of
Data = S.Data; % the original text file
return;
end
fh = fopen(FileName); % if binary file does not exist load data ftom the original text file
fh_closer = onCleanup( #() fclose(fh) ); % the file will be closed properly even in case of error
Data = fscanf(fh, repmat('%f ',1,NColumns), [NColumns,inf]);
Data = Data';
save(MatFileName,'Data'); % and make binary "chache" of the original data for faster subsequent reading
end
Do not forget to remove the MAT file when the original data file is changed.
I have this file which is a series of x, y, z coordinates of over 34 million particles and I am reading them in as follows:
parfor i = 1:Ntot
x0(i,1)=fread(fid, 1, 'real*8')';
y0(i,1)=fread(fid, 1, 'real*8')';
z0(i,1)=fread(fid, 1, 'real*8')';
end
Is there a way to read this in without doing a loop? It would greatly speed up the read in. I just want three vectors with x,y,z. I just want to speed up the read in process. Thanks. Other suggestions welcomed.
I do not have a machine with Matlab and I don't have your file to test either but I think coordinates = fread (fid, [3, Ntot], 'real*8') should work fine.
Maybe fread is the function you are looking for.
You're right. Reading data in larger batches is usually a key part of speeding up file reads. Another part is pre-allocating the destination variable zeros, for example, a zeros call.
I would do something like this:
%Pre-allocate
x0 = zeros(Ntot,1);
y0 = zeros(Ntot,1);
z0 = zeros(Ntot,1);
%Define a desired batch size. make this as large as you can, given available memory.
batchSize = 10000;
%Use while to step through file
indexCurrent = 1; %indexCurrent is the next element which will be read
while indexCurrent <= Ntot
%At the end of the file, we may need to read less than batchSize
currentBatch = min(batchSize, Ntot-indexCurrent+1);
%Load a batch of data
tmpLoaded = fread(fid, currentBatch*3, 'read*8')';
%Deal the fread data into the desired three variables
x0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(1:3:end);
y0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(2:3:end);
z0(indexCurrent + (0:(currentBatch-1))) = tmpLoaded(3:3:end);
%Update index variable
indexCurrent = indexCurrent + batchSize;
end
Of course, make sure you test, as I have not. I'm always suspicious of off-by-one errors in this sort of work.