Read and parse text file in octave/matlab - matlab

I am trying to read in some values from a file in an octave program (I suspect matlab is similar), but not sure how to do it.
I have input file in the form:
x y
A B C
a_11 ... a_1n
a_21 .. a_2n
...
a_m1 ... a_mn
Where x,y are doubles, A, B, C are integers, and a_11...a_mn is a matrix.
I saw examples of how to read in just the matrix, but how can I read mixed stuff like this?

In my opinion this is not a good way of storing data. But octave offers the functionality to read this as well with dlmread:
data = dlmread (file, sep, r0, c0)
data = dlmread (file, sep, range)
If you have this text file test.csv:
1 2
1.1 2.2 3.3 4.4
1 2 3
4 5 6
7 8 9
You can read in your data like this:
integers = dlmread('test.csv', '', [0 0 0 1]);
floats = dlmread('test.csv', '', [1 0 1 3]);
matrix = dlmread('test.csv', '', 2, 0);

Related

Matlab from text file to sparse matrix.

I have a huge text file in the following format:
1 2
1 3
1 10
1 11
1 20
1 376
1 665255
2 4
2 126
2 134
2 242
2 247
First column is the x coordinate while second column is the y coordinate.
It indicates that if I had to construct a Matrix
M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;
This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.
So I need the following function:
function mat = generate( path )
fid = fopen(path);
tline = fgetl(fid);
% initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
mat = sparse(1);
while ischar(tline)
tline = fgetl(fid);
if ischar(tline)
C = strsplit(tline);
end
mat(C{1}, C{2}) = 1;
end
fclose(fid);
end
But unfortunately besides the first row it just puts trash in my sparse mat.
Demo:
1 7
1 9
2 4
2 9
If I print the sparse mat I get:
(1,1) 1
(50,52) 1
(49,57) 1
(50,57) 1
Any suggestions ?
Fixing what you have...
Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:
C = str2num(tline);
Then you just index C like an array instead of a cell array:
mat(C(1), C(2)) = 1;
Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.
A simpler alternative...
You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:
data = dlmread(filePath);
mat = sparse(data(:, 1), data(:, 2), 1);
clear data; % Save yourself some memory if you don't need it any more

MATLAB reading to the end of a binary file

I think the solution will be quite simple for somebody with some MATLAB knowhow however I do not know how to do it.
I have a binary file that I am reading with fread and I am reading the first 4 bytes of this file followed by the next 2 bytes.
I basically want this process of reading 4 bytes followed by 2 bytes repeated till the end of the file is reached.
So the number of bytes read is 4,2,4,2,4,2......
I have the following to read the first pair of data and I want this to repeat.
fileID = fopen('MyBinaryFile');
4bytes = fread(fileID, 4);
fseek(fileID, 4, 0);
2bytes = fread(fileID, 2);
Thanks in advance for any help and suggestions
I take it this is a variant of your former question MATLAB reading a mixed data type binary file.
Your goal is to read a binary file containing mixed data type. In your case it contains 2 columns:
1x single value (4 bytes) and 1x int16 value (2 bytes).
There are several ways to read this type of file. They differ in speed because some ways minimize disk access but require more temporary memory, and other way use just the memory needed but require more disk access (= slower).
Ultimately, the 3 ways I'm going to show you produce exactly the same result.
The direct answer to this question is the version #3 below, but I encourage you to have a look at the 2 other options described here, they are both really worth understanding.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
B = int16(-5:5) ; %// a few "int16" data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'single');
fwrite(fileID,B(il),'int16');
end
fclose(fileID);
This create a 2 column binary file, the columns being:
11 values of type float going from -3 to 1.
11 values of type int16 going from -5 to +5.
For future reference:
>> disp(A)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(B)
-5 -4 -3 -2 -1 0 1 2 3 4 5
In each of the solution below, the first column will be read in a variable called varSingle, and the second column in a variable called varInt16.
1) Read all data in one go - convert to proper type after
%% // SOLUTION 1 (fastest) : Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
colSize = [4 2] ; %// number of byte for each column [4 byte single, 2 byte int16]
R = reshape( R , sum(colSize) , [] ) ; %// reshape data into a matrix (6 is because 4+2byte=6 byte per column)
temp = R(1:4,:) ; %// extract data for first column into temporary variable (OPTIONAL)
varSingle = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
temp = R(5:end,:) ; %// extract data for second column
varInt16 = typecast( temp(:) , 'int16' ) ; %// convert into "int16"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // SOLUTION 2 : Read column by column (using "skipvalue") - 2 pass approach
col1size = 4 ; %// size of data in column 1 (in [byte])
col2size = 2 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
varSingle = fread(fileID,'*single',col2size) ; %// read all "float" values, skipping all "int16"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
varInt16 = fread(fileID,'*int16',col1size) ; %// read all "int16" values, skipping all "float"
fclose(fileID);
That works too. It works fine ... but it is going to be slower than method 1 above, because you will have to scan the file twice. It may be a good option if the file is very large and method 1 above fail because of an out of memory error.
3) Read element by element
%% // SOLUTION 3 : Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
varSingle=[];varInt16=[];
while ~feof(fileID)
try
varSingle(end+1) = fread(fileID, 1, '*single' ) ;
varInt16(end+1) = fread(fileID, 1, '*int16' ) ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
That does work too, and if you were writing C code it would be more than ok. But in Matlab this is not the recommended way to go (your choice ultimately)
As promised, the 3 methods above will give you exactly what we wrote in the file at the beginning:
>> disp(varSingle)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(varInt16)
-5 -4 -3 -2 -1 0 1 2 3 4 5
fileID = fopen('MyBinaryFile');
kk=1;
while ~feof(fileID)
bytes4(kk) = fread(fileID, 4);
fseek(fileID, 4, 0);
bytes2(kk) = fread(fileID, 2);
kk=kk+1;
end
the while loop condition is ~feof, which stands for End-Of-File. So as long as you haven't reached the end of your file it runs.
I added the kk just so you store everything and not just overwrite them each loop iteration.
If you want to get the data without loops, there are MATLABish ways to that:
%'Sizes'
T = 4; %'Time record size'
D = 2; %'Date record size'
R = T+D; %'Record size'
%'Open file'
f = fopen('MyBinaryFile', 'rb');
if f < 0
error('Could not open file.');
end;
%'Read the entire file at once, and close file'
buf = fread(f, Inf, '*uint8');
fclose(f);
%'Ignore the last unpadded bytes, and reshape by the size of 1 record'
buf = reshape(buf(1:R*fix(numel(buf)/R)), R, []);
%'Pinpoint the data'
time_bytes = buf( 1: T, :);
date_bytes = buf(T+1:T+D, :);

Read a .txt File into a MATLAB matrix

I have a .out file (.txt) in the form:
This is a text file
This file was created by Andrew on 4/5/14
Certificate Result Test #12
Time A B C D
50 4 3 8 9
55 4 8 7 4
60 8 4 1 4
65 7 1 5 1
70 4 2 2 2
How do I read the numbers in the table into a matrix, called M, in MATLAB whilst ignoring all the text at the start?
I have tried using fscan and M = dlmread(filename) but I recieve errors saying Mismatch between file and format string due to the lines of text at the start.
Thanks in advance
Use textscan with the 'HeaderLines' option:
fid = fopen('my_file.out'); % or whatever your file is called
M = textscan(fid,'%d %d %d %d %d','HeaderLines',7); % using int32 data types, change as required
fclose(fid)
Note that M is a cell array
textscan is a powerful tool, with good low level functions. There is also the more convenient 'importdata' which works for many files of this kind:
m = importdata('my.txt', ' ', 6)
m =
data: [5x5 double]
textdata: {6x5 cell}
colheaders: {'Time' 'A' 'B' 'C' 'D'}
As you can see, it not only returns the data in m.data, but you also get the column headers for free.

Adding time slice in a matlab output using fprintf

I have matlab output of a complex number, where 'modTrace' contains the complex numbers. to write my output in a file called '3pt.txt' I write the following:
modTrace
fileID = fopen('3pt.txt','w');
fprintf ( fileID,'%e+%ei\n',real ( traces ), imag ( traces ) );
fclose(fileID);
The output looks like this:
2.355387e-13+3.263411e-12i
3.037095e-12+1.848502e-12i
2.264321e-12+1.408536e-12i
3.808791e-13-1.647224e-14i
-3.249665e-14+7.954636e-15i
7.026766e-14+1.056313e-13i
and so on. Now if I want the output with time slice numbers to the left, something like:
0 2.355387e-13+3.263411e-12i
1 3.037095e-12+1.848502e-12i
2 2.264321e-12+1.408536e-12i
3 3.808791e-13-1.647224e-14i
4 -3.249665e-14+7.954636e-15i
5 7.026766e-14+1.056313e-13i
and so on, how to I edit the code?
Thank you in advance for help.
Your fprintf statement is not working the way you are intending. It prints out all the values in real(traces) before getting to imag(traces). For example,
>> v = [1 2; 3 4; 5 6]
v =
1 2
3 4
5 6
>> fprintf('%d %d\n',v(:,1),v(:,2))
1 3
5 2
4 6
To pair values the way intended, you need to transpose the matrix:
>> fprintf('%d %d\n',v.')
1 2
3 4
5 6
Following this example, you can prepend slice numbers as follows:
v = [(1:numel(traces)).' real(traces(:)) imag(traces(:))].';
fprintf(fileID,'%d %e+%ei\n',v)

Appending "in a way" multiple matrices

Say I have matrices A and B. I want to create a third matrix C where
A = [1,0,0,1]
B = [1,0,1,0]
C = [11, 00, 01, 10]
Is there such a function in Matlab? If not how would I go about creating a function that does this?
Edit: C is not literal numbers. They are concatenated values of A,B element-wise.
Edit2: The actual problem I am dealing with is I have 10 large matrices of the size [x,y] where x,y > 1000. The elements in these matrices all have 0s and 1s. No other number. What I need to accomplish is have the element [x1,y1] in matrix 1 to be appended to the element in [x1,y1] of matrix 2. and then that value to be appended to [x1,y1] of matrix 3.
Another example:
A = [1,1,1,1;
0,0,0,0]
B = [0,0,0,0;
1,1,1,1]
C = [1,0,1,0;
0,1,0,1]
And I need a matrix D where
D = [101, 100, 101, 101; 010, 011, 010, 011]
I recommend that you avoid manipulating binary numbers as strings where possible. It seems tempting, and there are cases where matlab provides a more elegant solution if you treat a binary number as a string, but you cannot perform binary arithmetic on strings. You can always work with integers as if they were binary (they are stored as bits in your machine after all), and then just display them as binary numbers using dec2bin when necessary.
A = [1,0,0,1]
B = [1,0,1,0]
C = bitshift(A,1)+B;
display(dec2bin(C));
In the other case you show in your question you could use:
A = [1,1,1,1; 0,0,0,0];
B = [0,0,0,0; 1,1,1,1];
C = [1,0,1,0; 0,1,0,1];
D = bitshift(A,2) + bitshift(B,1) + C;
You can also convert an arbitrary length row vector of zeros and ones into its decimal equivalent by defining this simple function:
mat2dec = #(x) x*2.^(size(x,2)-1:-1:0)';
This will also work for matrices too. For example
>> M = [0 0 1; 0 1 1; 0 1 0; 1 1 0; 1 1 1; 1 1 0; 1 0 0];
>> dec2bin(mat2dec(M))
ans =
001
011
010
110
111
110
100
In my experience, treating binary numbers as strings obfuscates your code and is not very flexible. For example, try adding two binary "strings" together. You have to use bin2dec every time, so why not just leave the numbers as numbers until you want to display them? You have already run into some of the issues caused by strings of different lengths too. You will be amazed how one simple change can break everything when treating numbers as strings. The worst part is that an algorithm may work great for one set of data and not for another. If all I test with is two-bit binary numbers and a three-bit number somehow sneaks its way in, I may not see an error, but my results will be inexplicably incorrect. I realize that this is a very subjective issue, and I think that I definitely stand in the minority on StackOverflow, so take it for what it's worth.
It depends how you want the output formatted. You could apply bitshift to the numerical values and convert to binary:
>> b = dec2bin(bitshift(A,1)+B)
b =
11
00
01
10
For a general matrix Digits:
>> Digits
Digits =
1 0 0 0
0 1 0 0
1 0 1 1
1 1 0 0
1 0 0 0
1 0 1 1
1 1 0 0
1 0 1 0
0 1 1 1
0 1 1 1
>> Ddec = D*(2.^(size(D,2)-1:-1:0))'; % dot product
>> dec2bin(Ddec)
ans =
1000
0100
1011
1100
1000
1011
1100
1010
0111
0111
Another way to write that is dec2bin(sum(D .* repmat(2.^(size(D,2)-1:-1:0),size(D,1),1),2)).
For your larger problem, with 10 large matrixes (say M1, M2, ..., M10), you can build the starting Digits matrix by:
Digits = [M1(:) M2(:) M3(:) M4(:) M5(:) M6(:) M7(:) M8(:) M9(:) M10(:)];
If that is reverse the order of the digits, just do Digits = fliplr(Digits);.
If you would rather not reshape anything you can compute the matrix decimal values from the matrices of digits as follows:
M = cat(3,A,B,C);
Ddec = sum(bsxfun(#times,M,permute(2.^(size(M,3)-1:-1:0),[1 3 2])),3)
I see some quite extensive answers so perhaps this is thinking too simple, but how about just this assuming you have vectors of ones and zeros representing your binary numbers:
A = [1,0,0,1];
B = [1,0,1,0];
C = 10*A + B
This should give you the numbers you want, you may want to add leading zeros. Of course this method can easily be expanded to append multiple matrices, you just need to make sure there is a factor (base) 10 between them before adding.