MATLAB: How to load an array without delimiter from file - matlab

I have been using the load function until now to load my space separated files into arrays in matlab. However this wastes lots of space for me since my values are either 0 or 1. Thus instead of writing files like
0 1 1 0
1 1 1 0
I removed the spaces to create half as big files like:
0110
1110
However now load doesn't work correctly any more (creates a matrix I think with only the first number, so 2x1 instead of 2x4).
I looked around with importdata, reading the file line by line and lots of other stuff but I couldn't find a clear solution.
So essentially I want to read a matrix from a file which doesn't have a delimiter. Every number is an element of the array
Does anyone know of a clean way to do this?
Thanks

Here is one way:
data.txt
0110
1110
MATLAB
%# read file lines as cell array of strings
fid = fopen('data.txt','rt');
C = textscan(fid, '%s', 'Delimiter','');
fclose(fid);
%# extract digits
C = cell2mat(cellfun(#(s)s-'0', C{1}, 'Uniform',false));
result:
C =
0 1 1 0
1 1 1 0
If you are really concerned about memory, maybe you cast as boolean: C = logical(C) as 0/1 are the possible values.

Adapted from the example here:
function ret = readbin()
fid = fopen('data.txt');
ret = [];
tline = fgetl(fid);
while ischar(tline)
if isempty(ret)
ret = tline;
else
ret = [ret; tline];
end
tline = fgetl(fid);
end
% Turn char '0' into numerical 0
ret = ret - 48;
fclose(fid);
end
Subtracting 48 (ASCII code for '0') you get a numeric matrix with 1's and 0's in the appropriate places. This is the output I get at the end of the function:
K>> ret
ret =
0 1 1 0
1 1 1 0
K>> class(ret)
ans =
double

Inspired from this question: Convert decimal number to vector, here is a proposition, but I don't think it will work for large "numbers":
arrayfun(#str2double, str);

An alternate form of Amro's solution would be either
fid = fopen('data.txt','r');
C = textscan(fid, '%1d%1d%1d%1d');
fclose(fid);
M = cell2mat(C)
Or
fid = fopen('data.txt','r');
M_temp = textscan(fid, '%1d');
fclose(fid);
M = reshape(M_temp{1}, 2,4)
These are slightly different from Amro's solution as the first reads in a %1d format number (1 character integer) and returns a cell array C that can be converted using a much simpler cell2mat command. The second bit of code reads in only a vector of those values then reshapes the result---which works okay if you already know the size of the data but may need additional work if not.
If your actual files are very large you may find that one of these ways is faster, although it is hard to tell without actually running them side-by-side.

Related

Matlab from text file to sparse matrix.

I have a huge text file in the following format:
1 2
1 3
1 10
1 11
1 20
1 376
1 665255
2 4
2 126
2 134
2 242
2 247
First column is the x coordinate while second column is the y coordinate.
It indicates that if I had to construct a Matrix
M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;
This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.
So I need the following function:
function mat = generate( path )
fid = fopen(path);
tline = fgetl(fid);
% initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
mat = sparse(1);
while ischar(tline)
tline = fgetl(fid);
if ischar(tline)
C = strsplit(tline);
end
mat(C{1}, C{2}) = 1;
end
fclose(fid);
end
But unfortunately besides the first row it just puts trash in my sparse mat.
Demo:
1 7
1 9
2 4
2 9
If I print the sparse mat I get:
(1,1) 1
(50,52) 1
(49,57) 1
(50,57) 1
Any suggestions ?
Fixing what you have...
Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:
C = str2num(tline);
Then you just index C like an array instead of a cell array:
mat(C(1), C(2)) = 1;
Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.
A simpler alternative...
You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:
data = dlmread(filePath);
mat = sparse(data(:, 1), data(:, 2), 1);
clear data; % Save yourself some memory if you don't need it any more

Read multiple data from a MATLAB file

I am currently trying to read data from a text file written exactly like this:
Height = 10
Length = 10
NodeX = 11
NodeY = 11
K = 10
I've written a small code like this
fileID = fopen('input.dat','r');
[a, b] = fscanf(fileID, '%s %f')
And I get the following answer:
a =
72
101
105
103
104
116
b =
1
It seems quite obvious I am not mananging to specify the format specification.
I would like to know how to pick a string along with a float multiple times in the same file.
As the documentation for fscanf states:
If formatSpec contains a combination of numeric and character
specifiers, then fscanf converts each character to its numeric
equivalent. This conversion occurs even when the format explicitly
skips all numeric values (for example, formatSpec is '%*d %s').
MATLAB can be annoyingly bad at reading mixed data types. One possible alternative is to read each line and split up your data using a simple regular expression:
fileID = fopen('results.txt','r');
mydata = {};
ii = 1;
while ~feof(fileID) % While we're not at the end of the file
tline = fgetl(fileID); % Get next line
mydata(ii,:) = regexp(tline, '([a-zA-Z])* = (\d*)', 'tokens');
ii = ii + 1;
end
fclose(fileID);
This returns a 5 x 1 cell array where each cell contains 2 cells (slightly annoying, but you can pull them out) that match your data. In this case, mydata{1}{1} is Height and mydata{1}{2} is 10.
Edit:
And you can flatten your cell array with a reshape call:
mydata = reshape([mydata{:}], 2, [])';
Which turns mydata in this case into a 5x2 cell array.
The fscanf function is a low-level I/O function and is often not the best choice for such rather high-level file input. One alternative would be to use the textscan function, which allows quite advanced format specifications:
fileID = fopen('input.dat','r');
C = textscan(fileID,'%s = %d')
which creates a 1x2 cell array. The first cell C{1} contains another 5x1 cell, where each field contains the name of the field, e.g. 'Height'. The second cell C{2} contains a 5x1 vector containing all integer values from the file.

MATLAB reading to the end of a binary file

I think the solution will be quite simple for somebody with some MATLAB knowhow however I do not know how to do it.
I have a binary file that I am reading with fread and I am reading the first 4 bytes of this file followed by the next 2 bytes.
I basically want this process of reading 4 bytes followed by 2 bytes repeated till the end of the file is reached.
So the number of bytes read is 4,2,4,2,4,2......
I have the following to read the first pair of data and I want this to repeat.
fileID = fopen('MyBinaryFile');
4bytes = fread(fileID, 4);
fseek(fileID, 4, 0);
2bytes = fread(fileID, 2);
Thanks in advance for any help and suggestions
I take it this is a variant of your former question MATLAB reading a mixed data type binary file.
Your goal is to read a binary file containing mixed data type. In your case it contains 2 columns:
1x single value (4 bytes) and 1x int16 value (2 bytes).
There are several ways to read this type of file. They differ in speed because some ways minimize disk access but require more temporary memory, and other way use just the memory needed but require more disk access (= slower).
Ultimately, the 3 ways I'm going to show you produce exactly the same result.
The direct answer to this question is the version #3 below, but I encourage you to have a look at the 2 other options described here, they are both really worth understanding.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
B = int16(-5:5) ; %// a few "int16" data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'single');
fwrite(fileID,B(il),'int16');
end
fclose(fileID);
This create a 2 column binary file, the columns being:
11 values of type float going from -3 to 1.
11 values of type int16 going from -5 to +5.
For future reference:
>> disp(A)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(B)
-5 -4 -3 -2 -1 0 1 2 3 4 5
In each of the solution below, the first column will be read in a variable called varSingle, and the second column in a variable called varInt16.
1) Read all data in one go - convert to proper type after
%% // SOLUTION 1 (fastest) : Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
colSize = [4 2] ; %// number of byte for each column [4 byte single, 2 byte int16]
R = reshape( R , sum(colSize) , [] ) ; %// reshape data into a matrix (6 is because 4+2byte=6 byte per column)
temp = R(1:4,:) ; %// extract data for first column into temporary variable (OPTIONAL)
varSingle = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
temp = R(5:end,:) ; %// extract data for second column
varInt16 = typecast( temp(:) , 'int16' ) ; %// convert into "int16"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // SOLUTION 2 : Read column by column (using "skipvalue") - 2 pass approach
col1size = 4 ; %// size of data in column 1 (in [byte])
col2size = 2 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
varSingle = fread(fileID,'*single',col2size) ; %// read all "float" values, skipping all "int16"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
varInt16 = fread(fileID,'*int16',col1size) ; %// read all "int16" values, skipping all "float"
fclose(fileID);
That works too. It works fine ... but it is going to be slower than method 1 above, because you will have to scan the file twice. It may be a good option if the file is very large and method 1 above fail because of an out of memory error.
3) Read element by element
%% // SOLUTION 3 : Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
varSingle=[];varInt16=[];
while ~feof(fileID)
try
varSingle(end+1) = fread(fileID, 1, '*single' ) ;
varInt16(end+1) = fread(fileID, 1, '*int16' ) ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
That does work too, and if you were writing C code it would be more than ok. But in Matlab this is not the recommended way to go (your choice ultimately)
As promised, the 3 methods above will give you exactly what we wrote in the file at the beginning:
>> disp(varSingle)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(varInt16)
-5 -4 -3 -2 -1 0 1 2 3 4 5
fileID = fopen('MyBinaryFile');
kk=1;
while ~feof(fileID)
bytes4(kk) = fread(fileID, 4);
fseek(fileID, 4, 0);
bytes2(kk) = fread(fileID, 2);
kk=kk+1;
end
the while loop condition is ~feof, which stands for End-Of-File. So as long as you haven't reached the end of your file it runs.
I added the kk just so you store everything and not just overwrite them each loop iteration.
If you want to get the data without loops, there are MATLABish ways to that:
%'Sizes'
T = 4; %'Time record size'
D = 2; %'Date record size'
R = T+D; %'Record size'
%'Open file'
f = fopen('MyBinaryFile', 'rb');
if f < 0
error('Could not open file.');
end;
%'Read the entire file at once, and close file'
buf = fread(f, Inf, '*uint8');
fclose(f);
%'Ignore the last unpadded bytes, and reshape by the size of 1 record'
buf = reshape(buf(1:R*fix(numel(buf)/R)), R, []);
%'Pinpoint the data'
time_bytes = buf( 1: T, :);
date_bytes = buf(T+1:T+D, :);

Exporting blank values into a .txt file - MATLAB

I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);

how to read a matrix from a text file in matlab

I have a text file which has 500 columns and 500 rows, of numerical(integer) values . Every element in the row is separated by a tab. I want to read this file as a matrix in matlab. Example(my text file is like this):
1 2 2 1 1 2
0 0 0 1 2 0
1 2 2 1 1 2
0 0 0 1 2 0
And after reading this text file as a matrix (a[]) in matlab I want to do transpose.
Help me.
You can use importdata.
Something like:
filename = 'myfile01.txt';
delimiterIn = '\t';
headerlinesIn = 1;
A = importdata(filename,delimiterIn,headerlinesIn);
A_trans = A';
You can skip headerlines if your file does not have any haeder.. (It is the number of lines before the actual data starts)
Taken from Matlab documentation, improtdata
Have you tired load with -ascii option?
For example
a = load('myfile.txt', '-ascii'); % read the data
a = a.'; %' transpose
% Pre-allocate matrix
Nrow=500; Ncol=500;
a = zeros(Nrow,Ncol);
% Read file
fid = fopen('yourfile.txt','r');
for i:1:Nrow
a(i,:) = cell2mat(textscan(fid,repmat('%d ',Ncol));
end
fclose(fid);
% Trasnspose matrix
a_trans = a.';
You could simply do:
yourVariable = importdata('yourFile.txt')';
%Loads data from file, transposes it and stores it into 'yourVariable'.