Loading data to vector rather than cell in MATlab - matlab

I load data from text files into my MATlab function using this code:
data = cell(h.numDirs, numDataFilesInFirstDir);
for d = 1:h.numDirs
% Code to set fileNames, iDir
for t = 1:size(fileNames,1)
fId = fopen([iDir, '/', fileNames{t}]);
% Drop the first two lines (column headers)
for skip = 1:2
fgets(fId);
end
U_temp = fscanf(fId, '%f %f', [2, Inf]);
U_temp = U_temp'; % ' transpose (syntax highlighting on SO)
data(d, t) = {U_temp(:,2)};
fclose(fId);
end
end
The files should each have the same length (at least for varying t, usually for varying d or else I have problems later)
Should I be (/ How can I) simplify the code here to avoid (unnecessary?) cells?
I could scan the first data set, then use something like
data = zeros(h.numDirs, numDataFilesInFirstDir, lengthOfFirstFile)
but I don't know if that would be any better. Is that a 'better' solution/method?

I would use dlmread instead of fscanf. Data type is hard since your dimensions vary. I wouldn't pad arrays... any benefit from not using cells would be overcome by the extra complexity and memory hit. Cell arrays are a reasonable choice. I wouldn't worry about preallocation too much in this case actually. Below is a similar option using structs with dynamic field names that embed the source directory and filename, for later reference.
data = struct();
for d = 1: ...
for t = 1: ...
file = fullfile(iDir, fileNames{t});
range = [3, 1, inf, 2];
dlm = ' ';
Utemp = dlmread(file, dlm, range);
data.(iDir).(fileNames{t}) = Utemp(:, 2);

Related

How to sparsely read a large file in Matlab?

I ran a simulation which wrote a huge file to disk. The file is a big matrix v. I can't read it all, but I really only need a portion of the matrix, say, 1:100 of the columns and rows. I'd like to do something like
vtag = dlmread('v',1:100:end, 1:100:end);
Of course, that doesn't work. I know I should have only done the following when writing to the file
dlmwrite('vtag',v(1:100:end, 1:100:end));
But I did not, and running everything again would take two more days.
Thanks
Amir
Thankfully the dlmread function supports specifying a range to read as the third input. So if you wan to read all N columns for the first 100 rows, you can specify that with the following command
startRow = 1;
startColumn = 1;
endRow = 100;
endColumn = N;
rng = [startRow, startColumn, endRow, endColumn] - 1;
vtag = dlmread(filename, ',', rng);
EDIT Based on your clarification
Since you don't want 1:100 rows but rather 1:100:end rows, the following approach should work better for you.
You can use textscan to read chunks of data at a time. You can read a "good" row and then read in the next "chunk" of data to ignore (discarding it in the process), and continue until you reach the end of the file.
The code below is a slight modification of that idea, except it utilizes the HeaderLines input to textscan which instructs the function how many lines to ignore before reading in the data. The first time through the loop, no lines will be skipped, however all other times through the loop, rows2skip lines will be skipped. This allows us to "jump" through the file very rapidly without calling any additional file opertions.
startRow = 1;
rows2skip = 99;
columns = 3000;
fid = fopen(filename, 'rb');
% For now, we'll just assume you're reading in floating-point numbers
format = repmat('%f ', [1 columns]);
count = 1;
lines2discard = startRow - 1;
while ~feof(fid)
% Use "HeaderLines" to skip data before reading in data we care about
row = textscan(fid, format, 1, 'Delimiter', ',', 'HeaderLines', lines2discard);
data{count} = [row{:}];
% After the first time through, set the "HeaderLines" (i.e. lines to ignore)
% to be the # we want to skip between lines (much faster than alternatives!)
lines2discard = rows2skip;
count = count + 1;
end
fclose(fid);
data = cat(1, data{:});
You may need to adjust your format specifier for your own type of input.

Save a sparse array in csv

I have a huge sparse matrix a and I want to save it in a .csv. I can not call full(a) because I do not have enough ram memory. So, calling dlmwrite with full(a) argument is not possible. We must note that dlmwrite is not working with sparse formatted matrices.
The .csv format is depicted below. Note that the first row and column with the characters should be included in the .csv file. The semicolon in the (0,0) position of the .csv file is necessary too.
;A;B;C;D;E
A;0;1.5;0;1;0
B;2;0;0;0;0
C;0;0;1;0;0
D;0;2.1;0;1;0
E;0;0;0;0;0
Could you please help me to tackle this problem and finally save the sparse matrix in the desired form?
You can use csvwrite function:
csvwrite('matrix.csv',a)
You could do this iteratively, as follows:
A = sprand(20,30000,.1);
delimiter = ';';
filename = 'filecontaininghugematrix.csv';
dims = size(A);
N = max(dims);
% create names first
idx = 1:26;
alphabet = dec2base(9+idx,36);
n = ceil(log(N)/log(26));
q = 26.^(1:n);
names = cell(sum(q),1);
p = 0;
for ii = 1:n
temp = repmat({idx},ii,1);
names(p+(1:q(ii))) = num2cell(alphabet(fliplr(combvec(temp{:})')),2);
p = p + q(ii);
end
names(N+1:end) = [];
% formats for writing
headStr = repmat(['%s' delimiter],1,dims(2));
headStr = [delimiter headStr(1:end-1) '\n'];
lineStr = repmat(['%f' delimiter],1,dims(2));
lineStr = ['%s' delimiter lineStr(1:end-1) '\n'];
fid = fopen(filename,'w');
% write header
header = names(1:dims(2));
fprintf(fid,headStr,header{:});
% write matrix rows
for ii = 1:dims(1)
row = full(A(ii,:));
fprintf(fid, lineStr, names{ii}, row);
end
fclose(fid);
The names cell array is quite memory demanding for this example. I have no time to fix that now, so think about this part yourself if it is really a problem ;) Hint: just write the header element wise, first A;, then B; and so on. For the rows, you can create a function that maps the index ii to the desired character, in which case the complete first part is not necessary.

Reading data from a Text File into Matlab array

I am having difficulty in reading data from a .txt file using Matlab.
I have to create a 200x128 dimension array in Matlab, using the data from the .txt file. This is a repetitive task, and needs automation.
Each row of the .txt file is a complex number of form a+ib, which is of form a[space]b. A sample of my text file :
Link to text file : Click Here
(0)
1.2 2.32222
2.12 3.113
.
.
.
3.2 2.22
(1)
4.4 3.4444
2.33 2.11
2.3 33.3
.
.
.
(2)
.
.
(3)
.
.
(199)
.
.
I have numbers of rows (X), inside the .txt file surrounded by brackets. My final matrix should be of size 200x128. After each (X), there are exactly 128 complex numbers.
Here is what I would do. First thing, delete the "(0)" types of lines from your text file (could even use a simple shells script for that). This I put into the file called post2.txt.
# First, load the text file into Matlab:
A = load('post2.txt');
# Create the imaginary numbers based on the two columns of data:
vals = A(:,1) + i*A(:,2);
# Then reshape the column of complex numbers into a matrix
mat = reshape(vals, [200,128]);
The mat will be a matrix of 200x128 complex data. Obviously at this point you can put a loop around this to do this multiple times.
Hope that helps.
You can read the data in using the following function:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init some stuff:
data= nan(n, m);
formatStr = [repmat('%f', 1, 2*m)];
% Read in the Data:
fid = fopen(aFilename);
for ind = 1:n
lineID = fgetl(fid);
dataLine = fscanf(fid, formatStr);
dataLineComplex = dataLine(1:2:end) + dataLine(2:2:end)*1i;
data(ind, :) = dataLineComplex;
end
fclose(fid);
(edit) This function can be improved by including the (1) parts in the format string and throwing them out:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init format stuff:
formatStr = ['(%*d)\n' repmat('%f%f\n', 1, m)];
% Read in the Data:
fid = fopen(aFilename);
data = fscanf(fid, formatStr);
data = data(1:2:end) + data(2:2:end)*1i;
data = reshape(data, n,m);
fclose(fid);

Load multiple tab delimited text files

I have boatloads of tab delimited textfiles that contain numerical data in 1000x2 format.
They're named file00001.txt - file10000.txt
I would like to write a script to load each of these files and make a variable containing ONLY the 400th row of the 2nd column of each of these files.
After that I'm going to try and plot a graph with the data I collected - but that's not important here.
I would be very grateful for your help.
Edit -
My most recent endeavour is:
numfiles = 10;
mydata = cell(1, numfiles);
for k = 1:numfiles
myfilename = sprintf('DM0000%d.txt', k);
mydata{k} = importdata(myfilename);
end
I'm running into a few problems -
1) if numfiles is >9, the 10th file data entry in the mydata variable comes up as []. This may have something to do with the naming method of my files? They're named in this fashion:
DM00000 ...DM00009, DM00010, DM00011, etc.
2) Also this is pretty slow to load, someone said using fopen, if so where should I put it in and how?
I'm guessing it'd be somewhere along the lines of fopen('filename', 'r')?
Based on your edit, this is what I'd recommend:
numfiles = 10;
row = 400;
column = 2;
data = zeros(1, numfiles);
for k = 1:numfiles
filename = sprintf('DM%05d.txt', k);
fid = fopen(filename,'r');
tempdata = textscan(fid, '%f%f');
fclose(fid);
data(k) = tempdata{column}(row);
end
I've updated the formatspec in sprintf to create the filenames correctly (you were missing the padding with zeros). I'm using textscan to import the data as doubles (change the %f to something else if required - check out the formatspec documentation). I also changed data to be a matrix rather than a cell array. You mentioned that you'd want to plot the data, so it'll be easier if it's a matrix and I couldn't see any need to use a cell array here.

matlab reading variables with varying lengths into the workspace

This is a follow up question to
Reading parameters from a text file into the workspace
I am wondering, how would I read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
%
Eitan T suggested using:
fid = fopen(filename);
C = textscan(fid, '%[^= ]%*[= ]%f', 'CommentStyle', '%')
fclose(fid);
to obtain the information from the file and then
lat = C{2}(strcmp(C{1}, 'lat'));
lon = C{2}(strcmp(C{1}, 'lon'));
to obtain the relevant parameters. How could I alter this to read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
heights = 0, 2, 4, 7, 10, 13, 16, 19, 22, 25, 30, 35
Volume = 6197333, 5630000, 4958800, 4419400, 3880000, 3340600,
3146800, 2780200, 2413600, 2177000, 1696000, 811000
%
where the variable should contain all of the data points following the equal sign (up until the start of the next variable, Volume in this case)?
Thanks for your help
Here's one method, which uses some filthy string hacking and eval to get the result. This works on your example, but I wouldn't really recommend it:
fid = fopen('data.txt');
contents = textscan(fid, '%s', 'CommentStyle', '%', 'Delimiter', '\n');
contents = contents{1};
for ii = 1:length(contents)
line = contents{ii};
eval( [strrep(line, '=', '=['), '];'] ) # convert to valid Matlab syntax
end
A better method would be to read each of the lines using textscan
for ii = 1:length(contents)
idx = strfind(contents{ii}, ' = ');
vars{ii} = contents{ii}(1:idx-1);
vals(ii) = textscan(contents{ii}(idx+3:end), '%f', 'Delimiter', ',');
end
Now the variables vars and vals have the names of your variables, and their values. To extract the values you could do something like
ix = strmatch('lat', vars, 'exact');
lat = vals{ix};
ix = strmatch('lon', vars, 'exact');
lon = vals{ix};
ix = strmatch('heights', vars, 'exact');
heights = vals{ix};
ix = strmatch('Volume', vars, 'exact');
volume = vals{ix};
This can be accomplished using a 2-step approach:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
Convert these strings-of-the-rest-of-the-lines to floats (second element)
There is however a slight drawback here; your lines seem to follow two formats; one is the one described in step 1), the other is a continuation of the previous line, which contains numbers.
Because of this, an extra step is required:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
This will fail when the "other format" is encountered. Detect this, correct this, and continue
Convert these strings-of-the-rest-of-the-lines to floats (second element)
I think this will do the trick:
fid = fopen('data.txt');
C = [];
try
while ~feof(fid)
% Read next set of data, assuming the "first format"
C_new = textscan(fid, '%[^= ]%*[= ]%s', 'CommentStyle', '%', 'Delimiter', '');
C = [C; C_new]; %#ok
% When we have non-trivial data, check for the "second format"
if ~all(cellfun('isempty', C_new))
% Second format detected!
if ~feof(fid)
% get the line manually
D = fgetl(fid);
% Insert new data from fgetl
C{2}(end) = {[C{2}{end} C{1}{end} D]};
% Correct the cell
C{1}(end) = [];
end
else
% empty means we've reached the end
C = C(1:end-1,:);
end
end
fclose(fid);
catch ME
% Just to make sure to not have any file handles lingering about
fclose(fid);
throw(ME);
end
% convert second elements to floats
C{2} = cellfun(#str2num, C{2}, 'UniformOutput', false);
If you can get rid of the multi-line Volume line, what you have written is valid matlab. So, just invoke the parameter file as a matlab script using the run command.
run(scriptName)
Your only other alternative, as others have shown, is to write what will end up looking like a bastardized Matlab parser. There are definitely better ways to spend your time than doing that!