MATLAB - Read Textfile (lines with different formats) line by line - matlab

I have a text file (lets call it an input file) of this type:
%My kind of input file % Comment 1 % Comment 2
4 %Parameter F
2.745 5.222 4.888 1.234 %Parameter X
273.15 373.15 1 %Temperature Initial/Final/Step
3.5 %Parameter Y
%Matrix A
1.1 1.3 1 1.05
2.0 1.5 3.1 2.1
1.3 1.2 1.5 1.6
1.3 2.2 1.7 1.4
I need to read this file and save the values as variables or even better as part of different arrays. For example by reading I should obtain Array1.F=4; then Array1.X should be a vector of 3 real numbers, Array2.Y=3.5 then Array2.A is a matrix FxF. There are tons of functions to read from text file but I don't know how to read these kind of different formats. I've used in the past fgetl/fgets to read lines but it reads as strings, I've used fscanf but it reads the whole text file as if it is formatted all equally. However I need something to read sequentially with predefined formats. I can easily do this with fortran reading line by line because read has a format statement. What is the equivalent in MATLAB?

This actually parses the file you posted in your example. I could've done better, but I'm tired today:
res = struct();
fid = fopen('test.txt','r');
read_mat = false;
while (~feof(fid))
% Read text line by line...
line = strtrim(fgets(fid));
if (isempty(line))
continue;
end
if (read_mat) % If I'm reading the final matrix...
% I use a regex to capture the values...
mat_line = regexp(line,'(-?(?:\d*\.)?\d+)+','tokens');
% If the regex succeeds I insert the values in the matrix...
if (~isempty(mat_line))
res.A = [res.A; str2double([mat_line{:}])];
continue;
end
else % If I'm not reading the final matrix...
% I use a regex to check if the line matches F and Y parameters...
param_single = regexp(line,'^(-?(?:\d*\.)?\d+) %Parameter (F|Y)$','tokens');
% If the regex succeeds I assign the values...
if (~isempty(param_single))
param_single = param_single{1};
res.(param_single{2}) = str2double(param_single{1});
continue;
end
% I use a regex to check if the line matches X parameters...
param_x = regexp(line,'^((?:-?(?:\d*\.)?\d+ ){4})%Parameter X$','tokens');
% If the regex succeeds I assign the values...
if (~isempty(param_x))
param_x = param_x{1};
res.X = str2double(strsplit(strtrim(param_x{1}),' '));
continue;
end
% If the line indicates that the matrix starts I set my loop so that it reads the final matrix...
if (strcmp(line,'%Matrix A'))
res.A = [];
read_mat = true;
continue;
end
end
end
fclose(fid);

Related

Change text line in a file by using Matlab

So I have to modify a .dxf file (an Autocad file) by changing some data in it for another one we choose previously. Changing some lines of a .txt file in Matlab is not pretty difficult.
However, I cannot change a specific line when the new input's length is larger than the old one.
This is what I have and I want to change only 1D57:
TEXT
5
1D57
330
1D52
100
AcDbEntity
8
0
If I have as an input BBBB, everything goes right since both strings have the same length. The same does not apply when I try with BBBBbbbbbbbbbb:
TEXT
5
BBBBbbbbbbbbbb2
100
AcDbEntity
8
0
It deletes everything after it until the string stops. It happens the same when the input is shorter: it does not change the line for the new string but it writes until the new input stops. For example, in our case with AAA as an input, the result would be AAA7.
This is basically the code I am using to modify the file:
fID = fopen('copia.dxf','r+');
for i = 1:2
LineToReplace = TextIndex(i);
for k = 1:((LineToReplace) - 1);
fgetl(fID);
end
fseek(fID, 0, 'cof');
fprintf (fID, [Data{i}, '\n']);
end
fclose(fID);
You need to overwrite at least the rest of the file in order to change it (unless exact number of characters is replaced), as explained in jodag's comment. For instance,
% String to change and it's replacement
% (can readily be automated for more replacements)
str_old = '1D52';
str_new = 'BBBBbbbbbbbbbb';
% Open input and output files
fIN = fopen('copia.dxf','r');
fOUT = fopen('copia_new.dxf','w');
% Temporary line
tline = fgets(fIN);
% Read the entire file line by line
% Write it to the new file
% Replace str_old with str_new when encountered - note, if there is more
% than one occurence of str_old in the file all will be replaced - this can
% be handled with a proper flag
while (ischar(tline))
% char(10) is MATLAB's newline character representation
if strcmp(tline, [str_old, char(10)])
fprintf(fOUT, '%s \n', str_new);
else
% No need for \n - it's already there as we're using fgets
fprintf(fOUT, '%s', tline);
end
tline = fgets(fIN);
end
% Close the files
fclose(fIN);
fclose(fOUT);
% Copy the new file into the original
movefile 'copia_new.dxf' 'copia.dxf'
In practice, it is often far easier to simply overwrite the whole file.
As written in the notes - this can be automated for more replacements and it would also need an additional flag to only replace a given string once.

Open specific file with the specific words (16 bits) structure

I have a specific binary? file format containing datas about the configuration used to take a picture with a custom camera. This file format is named DAI and contains for example values of offset/gain/etc...
I am using a black-box script in java to turn this file into a .csv and I want to perform the same thing in Matlab. I've got a config file describing in ascii format how this file is built (name of the field, type of the data, first_word, last_word, low_bit, high_bit). For example I know that the first field in the DAI file will be :
spare1; PCHAR; first_word=0; low_bit=0; high_bit=7
But right now I have no clue of how to use this information. My first thought were to fopen() the file and use fread() to read the binary data from the file and turn it into the format I want but I don't know how to use the values of "last_word,high_bit,..." to do so. I have a limited understanding of binary files.
To sum up everything :
file.dai contains datas /
file.cfg contains the structure :
mband_1_start_line; PCHAR; first_word=12; low_bit=6; high_bit=15
mband_1_length; PCHAR; first_word=12; low_bit=0; high_bit=5
mband_1_gain; PCHAR; first_word=13; low_bit=0; high_bit=7
mband_1_offset; PCHAR; first_word=13; low_bit=8; last_word=14; high_bit=7
and I want to recover the datas corresponding to the fields like mband_1_offset.
If someone can help me to figure the good way of doing that I will be very thankful !
[EDIT : SOLVED] So thanks to your very helpful help I've manage to get the values for each field even when the header changes !!
Here's the final code :
Here's the final code :
...code to retrieve the content of the .cfg file....
%% Open and read the DAI file
fid = fopen(dai_file,'r','l');
% First thing is to skip the header
% We read a first time the file
dat=fread(fid,inf,'*uint8');
% We search for the position of the end of the header : NUL NUL ETX
% In decimal it gives :
skip = findstr(dat',[000,000,003]);
% We define the wordsize : 2 bytes (2 words)
wordsize = 2;
% We rewind the file to start over to get the values for each field
frewind(fid);
% We initiate the structure camdat containing the datas of the camera
camdat=struct;
% We start the loop for each field of the layout config file
for ct = 1:length(layout)
% Defining the words/bits
first_word = layout{ct,3};
last_word = layout{ct,5};
low_bit = layout{ct,4};
high_bit = layout{ct,6};
% We position to the "skip value + the position of the first_word in bytes"
fseek(fid,skip+first_word*wordsize,-1);
% We compute the number of words (last - first +1)
datasize=last_word-first_word+1;
% We read the datas as uint16 (words are 16bits)
data=fread(fid,datasize,'*uint16');
% We convert it to bits
% Case of 1 word
bits=bitget(data(1),[1:16]);
% Case of 2 words
if length(data) > 1
bits=[bits,bitget(data(2),1:16)];
high_bit = high_bit+16;
end
% We take only the bits that define the field (between low_bit and
% high_bit)
bits_used = bits(low_bit+1:high_bit+1);
% We convert the bits to dec
data = sum(bits_used.*uint16(2).^uint16([0:length(bits_used)-1]));
% We store it in the camdat.field struct
camdat.(layout{ct,1})=data;
end
% We close the DAI file
fclose(fid);
% Displaying for test
camdat
My approach in this case is to find the part of the file that matches your data.
fid = fopen('dai_file.dai','r','l');
dat=fread(fid,inf,'*uint8');
findstr(dat',[74,210,129,93]);
>> 891 1159 1427 1695 ....
Strange enough this happens 100 times.
If byte 891 is right than bios_1 is NOT in the 4th word from bit 0 to 7, but in the 445th word bit 0 to 7.
Let's try
fid = fopen(dai_file,'r','l');
fseek(fid,445*2,-1)
data=fread(fid,1,'*uint16');
bits=bitget(data(1),[1:16]);
bits = bits(1:8);
data = sum(bits.*uint16(2).^uint16([0:7]))
>> data = 74
Yep, there it is. So I would suggest to add 441 to each word entry and see if it works.
Oke, so you get information about the layout of the file.
I would first store this in a more accessabel format
layout{1,1} = 'mband_1_start_line';
layout{1,2} = 'PCHAR';
layout{1,3} = 12;
layout{1,4} = 6;
layout{1,5} = 12;
layout{1,6} = 15;
Then you loop over the layout
wordsize = 2; %bytes / word
fid = fread(filename,'r','l')
camdat=struct;
for ct = 1:size(layout,1)
fseek(fid,-1,layout{1,3}/wordsize) %go to byte position
datsize=layout{1,5}-layout{1,3}+1; %number of words
data=fread(fid,datsize,'*uint16') %get words
bits=bitget(data(1),[1:16]); %convert to bits
for ct = 2:datasize
bits=[bits,bitget(data(ct),[1:16])];
end
bits = bits(layout{1,4}:(datasize-1)*16+layout{1,6};%get bits
data = sum(bits.*uint16(2).^uint16([0:(length(bits)-1)])) %convert back
camdat.(layout{1,1})=data; %store
end
fclose(fid)
There will be problems with values that are longer than 16 bits ofcourse.
If the wordsize is different, you can change it to 4 for 32 bit, or 8 for 64 bit, but then you have to also change that in the loop.
So I've been using your help to figure a way to do what I wanted.
The idea is to go to the bytes of the "first_word", take the bits between the first and last word (and low_bit and high_bit), turn them into decimals. With your code I've done the following that gives results but not the one I was waiting for (in the .csv) (attached file).
First I'm not sure I'm handling well the case where the last_word is not the same as the first_word.
Then I'm not sure that my fseek() sends me at the correct bytes of the file...
%% Name of the files
%% Open and read the .cfg file
%% Open and read the DAI file
...So here I've got my .cfg opened and store in layout{i,j}
wordsize = 2; %bytes / word
fid = fopen(dai_file,'r','l');
camdat=struct;
for ct = 1:length(layout)
first_word = layout{ct,3};
last_word = layout{ct,5};
low_bit = layout{ct,4};
high_bit = layout{ct,6};
fseek(fid,first_word*wordsize,-1); %go to bytes
datasize=last_word-first_word+1; %number of words
data=fread(fid,datasize,'*uint16'); %get words
bits=bitget(data(1),[1:16]); %convert to bits
if length(data) > 1 % case of 2 words
bits=[bits,bitget(data(2),1:16)];
high_bit = high_bit+16;
end
bits = bits(low_bit+1:high_bit+1);%get bits
data = sum(bits.*uint16(2).^uint16([0:length(bits)-1])); %convert back
camdat.(layout{ct,1})=data; %store
end
camdat
fclose(fid);
So if you have ideas of where I'm wrong, I'll be very grateful !!!!

Reading data from a Text File into Matlab array

I am having difficulty in reading data from a .txt file using Matlab.
I have to create a 200x128 dimension array in Matlab, using the data from the .txt file. This is a repetitive task, and needs automation.
Each row of the .txt file is a complex number of form a+ib, which is of form a[space]b. A sample of my text file :
Link to text file : Click Here
(0)
1.2 2.32222
2.12 3.113
.
.
.
3.2 2.22
(1)
4.4 3.4444
2.33 2.11
2.3 33.3
.
.
.
(2)
.
.
(3)
.
.
(199)
.
.
I have numbers of rows (X), inside the .txt file surrounded by brackets. My final matrix should be of size 200x128. After each (X), there are exactly 128 complex numbers.
Here is what I would do. First thing, delete the "(0)" types of lines from your text file (could even use a simple shells script for that). This I put into the file called post2.txt.
# First, load the text file into Matlab:
A = load('post2.txt');
# Create the imaginary numbers based on the two columns of data:
vals = A(:,1) + i*A(:,2);
# Then reshape the column of complex numbers into a matrix
mat = reshape(vals, [200,128]);
The mat will be a matrix of 200x128 complex data. Obviously at this point you can put a loop around this to do this multiple times.
Hope that helps.
You can read the data in using the following function:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init some stuff:
data= nan(n, m);
formatStr = [repmat('%f', 1, 2*m)];
% Read in the Data:
fid = fopen(aFilename);
for ind = 1:n
lineID = fgetl(fid);
dataLine = fscanf(fid, formatStr);
dataLineComplex = dataLine(1:2:end) + dataLine(2:2:end)*1i;
data(ind, :) = dataLineComplex;
end
fclose(fid);
(edit) This function can be improved by including the (1) parts in the format string and throwing them out:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init format stuff:
formatStr = ['(%*d)\n' repmat('%f%f\n', 1, m)];
% Read in the Data:
fid = fopen(aFilename);
data = fscanf(fid, formatStr);
data = data(1:2:end) + data(2:2:end)*1i;
data = reshape(data, n,m);
fclose(fid);

Matlab: Edit values in text file without changing the file format

I have a following parameter file in which I want to change values on left hand side starting with gam.dat till 1 1 1 (against -tail variable, head variable, variogram type) without changing the format of the file.
This parameter file will be called inside the loop such that each iteration of the loop would require changing the values inside this parameter file.
Reading and writing from a file has always been my weak point. Any help on how this can be done easily? Thanks!
Parameters
**********
START OF PARAMETERS:
gam.dat -file with data
1 1 - number of variables, column numbers
-1.0e21 1.0e21 - trimming limits
gam.out -file for output
1 -grid or realization number
100 1.0 1.0 -nx, xmn, xsiz
100 1.0 1.0 -ny, ymn, ysiz
20 1.0 1.0 -nz, zmn, zsiz
4 30 -number of directions, number of h
1 0 1 -ixd(1),iyd(1),izd(1)
1 0 2 -ixd(2),iyd(2),izd(2)
1 0 3 -ixd(3),iyd(3),izd(3)
1 1 1 -ixd(4),iyd(4),izd(4)
1 -standardize sill? (0=no, 1=yes)
1 -number of gamma
1 1 1 -tail variable, head variable, gamma type
Something like this might help. Then again it might not be exactly what you're looking for.
fid = fopen(filename as a string);
n = 1;
textline = [];
while( ~feof(fid) ) // This just runs until the end of the file is reached.
textline(n) = fgetl(fid)
// some operations you want to perform?
// You can also do anything you want to the lines here as you are reading them in.
// This will read in every line in the file as well.
n = n + 1;
end
fwrite(fid, textline); // This writes to the file and will overwrite what is already there.
// You always write to a new file if you want to though!
fclose(fid);
The only reason I am suggesting the use of fgetl here is because it looks like there are specific operations/changes you want to make based on the line or the information in the line. You can also use fread which will do the same thing but you'll then have to operate on the matrix as a whole after it's built rather than making any modifications to it while reading the data in and building the matrix.
Hope that helps!
More complete example based on the comments below.
fid = fopen('gam.txt');
n = 1;
textline = {};
while( ~feof(fid) ) % This just runs until the end of the file is reached.
textline(n,1) = {fgetl(fid)}
% some operations you want to perform?
% You can also do anything you want to the lines here as you are reading them in.
% This will read in every line in the file as well.
if ( n == 5 ) % This is just an operation that will adjust line number 5.
temp = cell2mat(textline(n));
textline(n,1) = {['newfile.name', temp(regexp(temp, '\s', 'once'):end)]};
end
n = n + 1;
end
fclose(fid)
fid = fopen('gam2.txt', 'w') % this file has to already be created.
for(n = 1:length(textline))
fwrite(fid, cell2mat(textline(n));
end
fclose(fid)

Problem (bug?) loading hexadecimal data into MATLAB

I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end