Parsing a data file in matlab - matlab

I have a text file with two columns of data. I want to split this file and save it as two individual strings in matlab, but I also need to stop copying the data when I meet an identifier in the data then stat two new strings.
For example
H 3
7 F
B B
T Y
SPLIT
<>
Where SPLIT <> is where I want to end the current string.
I'm trying to use fopen and fscanf, but struggling to get it to do what I want it to.

I tried the following script on the example you provided and it works. I believe the comments are very self explanatory.
% Open text file.
fid = fopen('test.txt');
% Read the first line.
tline = fgetl(fid);
% Initialize counter.
ii = 1;
% Check for end string.
while ~strcmp(tline, 'SPLIT')
% Analyze line only if it is not an empty one.
if ~strcmp(tline, '')
% Read the current line and split it into column 1 and column 2.
[column1(ii), column2(ii)] = strread(tline, ['%c %c']);
% Advance counter.
ii = ii + 1;
end
% Read the next line.
tline = fgetl(fid);
end
% Display results in console.
column1
column2
% Close text file.
fclose(fid);
The key functions here are fgetl and strread. Take a look at their documentation, it has some very nice examples as well. Hope it helps.

Related

how to get the line number having a specific string in a txt file - matlab

I have a txt file that has a lot of content and in this file there is a lot of "include" word and I want to get data from all three lines after that.
myFile.txt:
"-include:
-6.5 6.5
sin(x^2)
diff
-include
-5 5
cos(x^4)
diff"
How do I get this data in an array?
Based on your example (first include has : at the end, second one doesn't) you could use something like this.
fID = fopen('myFile.txt'); % Open the file for reading
textString = textscan(fID, '%s', 'Delimiter', '\n'); % Read all lines into cells
fclose(fID); % Close the file for reading
textString = textString{1}; % Use just the contents on the first cell (each line inside will be one cell)
includeStart = find(contains(textString,'include'))+1; % Find all lines with the word include. Add +1 to get the line after
includeEnd = [includeStart(2:end)-2; length(textString)]; % Get the position of the last line before the next include (and add the last line in the file)
parsedText = cell(length(includeStart), 1); % Create a new cell to store the output
% Loop through all the includes and concatenate the text with strjoin
for it = 1:length(includeStart)
parsedText{it} = strjoin(textString(includeStart(it):includeEnd(it)),'\n');
% Display the output
fprintf('Include block %d:\n', it);
disp(parsedText{it});
end
Which results in the following output:
Include block 1:
-6.5 6.5
sin(x^2)
diff
Include block 2:
-5 5
cos(x^4)
diff
You can tune the loop to suit your needs. If you just want line numbers, use includeStart and includeEnd variables.

Change text line in a file by using Matlab

So I have to modify a .dxf file (an Autocad file) by changing some data in it for another one we choose previously. Changing some lines of a .txt file in Matlab is not pretty difficult.
However, I cannot change a specific line when the new input's length is larger than the old one.
This is what I have and I want to change only 1D57:
TEXT
5
1D57
330
1D52
100
AcDbEntity
8
0
If I have as an input BBBB, everything goes right since both strings have the same length. The same does not apply when I try with BBBBbbbbbbbbbb:
TEXT
5
BBBBbbbbbbbbbb2
100
AcDbEntity
8
0
It deletes everything after it until the string stops. It happens the same when the input is shorter: it does not change the line for the new string but it writes until the new input stops. For example, in our case with AAA as an input, the result would be AAA7.
This is basically the code I am using to modify the file:
fID = fopen('copia.dxf','r+');
for i = 1:2
LineToReplace = TextIndex(i);
for k = 1:((LineToReplace) - 1);
fgetl(fID);
end
fseek(fID, 0, 'cof');
fprintf (fID, [Data{i}, '\n']);
end
fclose(fID);
You need to overwrite at least the rest of the file in order to change it (unless exact number of characters is replaced), as explained in jodag's comment. For instance,
% String to change and it's replacement
% (can readily be automated for more replacements)
str_old = '1D52';
str_new = 'BBBBbbbbbbbbbb';
% Open input and output files
fIN = fopen('copia.dxf','r');
fOUT = fopen('copia_new.dxf','w');
% Temporary line
tline = fgets(fIN);
% Read the entire file line by line
% Write it to the new file
% Replace str_old with str_new when encountered - note, if there is more
% than one occurence of str_old in the file all will be replaced - this can
% be handled with a proper flag
while (ischar(tline))
% char(10) is MATLAB's newline character representation
if strcmp(tline, [str_old, char(10)])
fprintf(fOUT, '%s \n', str_new);
else
% No need for \n - it's already there as we're using fgets
fprintf(fOUT, '%s', tline);
end
tline = fgets(fIN);
end
% Close the files
fclose(fIN);
fclose(fOUT);
% Copy the new file into the original
movefile 'copia_new.dxf' 'copia.dxf'
In practice, it is often far easier to simply overwrite the whole file.
As written in the notes - this can be automated for more replacements and it would also need an additional flag to only replace a given string once.

MATLAB - How to save vectors with different length

I created a file that contains vectors and these could have empty space between their elements.
-77.4 1 0.17 260 88 1004.0 1006.5
-77.3 1 0.17 1009.2 1011.8
I save the file 'myfile.txt' row by row with fprintf() function.
Well, when I load the file with the command load('myfile.txt') I receive this error message "Number of columns on line ... must be the same as previous lines"
How can I fix it? Perhaps save the row vectors by another way? How to do?
Thank you
You would be better off by using the save command as #maxywb stated in his comment, but if you find yourself in a situation where you have a text file that does not have consistent column numbers, you can parse the file line by line and save the results into a cell array
fid = fopen('myFile.txt','r');
values = {};
count = 1;
tline = fgets(fid);
while ischar(tline)
values{count} = textscan(tline,'%f','delimiter',', ');
count = count+1;
tline = fgets(fid);
end
fclose(fid)

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];

Problem (bug?) loading hexadecimal data into MATLAB

I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end