I am trying to create a Matlab code that strips the header off of a text file and proceeds in the pattern of recording a line and skipping the next three lines.
I have figured out how to strip the header, but do not know how to code so that the program only records lines 1 (after the header is removed) 5,9, 13, etc.
Any recommendations?
I don't know how your data is formatted, so the actual code you use may differ, but this should give you the idea
file_lines = {};
fid = fopen(filename);
while 1
text_line = fgetl(fid);
%quit reading on an empty line
if ~ischar(text_line)
break
end
%keep the lines that 1 as the first value (this is what you wanted, right?)
data_line = str2num(text_line)
if(data_line(1,1) == 1)
file_lines{end+1} = data_line;
end
end
fclose(fid);
Related
I am trying to concatenate three lines (I want to leave the lines as is; 3 rows) from Shakespeare.txt file that shows:
To be,
or not to be:
that is the question.
My code right now is
fid = fopen('Shakespeare.txt')
while ~feof(fid)
a = fgets(fid);
b = fgets(fid);
c = fgets(fid);
end
fprintf('%s', strcat(a, b, c))
I'm supposed to use strcat and again, I want concatenated and leave them as three rows.
One method of keeping the rows separate is by storing the lines of the text file in a string array. Here a 1 by 3 string array is used. It may also be a good idea to use fgetl() which grabs each line of the text file at a time. Concantenating the outputs of fgetl() as strings may also be another option to ensure the they do not get stored as character (char) arrays. Also using the \n indicates to line break when printing the strings within the array String_Array.
fid = fopen('Shakespeare.txt');
while ~feof(fid)
String_Array(1) = string(fgetl(fid));
String_Array(2) = string(fgetl(fid));
String_Array(3) = string(fgetl(fid));
end
fprintf('%s\n', String_Array);
Ran using MATLAB R2019b
I have to read this txt file:
#Acceleration force along the x y z axes (including gravity).
#timestamp(ns),x,y,z(m/s^2)
#Datetime: 05/07/2013 11:27:39
##########################################
#Activity: 12 - BSC - Back sitting chair - 10s
#Subject ID: 1
#First Name: sub1
#Last Name: sub1
#Age: 32
#Height(cm): 180
#Weight(kg): 85
#Gender: Male
##########################################
#DATA
2318482693000, 0.89064306, -9.576807, -0.019153614 2318492108000, 0.91937345, -9.595961, -0.05746084 2318503365000,
0.8714894, -9.595961, -0.05746084 2318512831000, 0.8523358, -9.643845, -0.038307227 2318523222000, 0.8714894, -9.634268, -0.05746084 2318533156000, 0.90979666, -9.634268, -0.019153614 2318553207000,
0.9672575, -9.634268, -0.009576807 2318563306000, 0.93852705, -9.672575, -0.009576807 2318575332000, 0.9672575, -9.682152, -0.009576807 2318584933000, 0.9672575, -9.662998, -0.009576807 2318595971000, 0.9576807, -9.634268, 0.009576807
And I have to save numbers after the word "#DATA" in a vector.
I don't know how to read a txt file from a certain word (in this case #DATA).
something like
fid = fopen('yourfile.extension');
while ~feof(fid)
tline = fgetl(fid);
disp(tline) %just for debugging purposes
if strncmp("#DATA",tline,5)
break
end
end
while ~feof(fid)
%start reading your data with fscanf for example
end
And never forget your search engine is your friend...
https://fr.mathworks.com/help/matlab/characters-and-strings.html
https://fr.mathworks.com/help/matlab/ref/feof.html
https://fr.mathworks.com/help/matlab/ref/fscanf.html
https://fr.mathworks.com/help/matlab/ref/fgetl.html
https://fr.mathworks.com/help/matlab/ref/strncmp.html
If the file isn't too large, you could load it as a character array and use a regular expression to parse it. It may not be the best method if you want code that's efficient on memory and scales to any file size, but IMO it's very straightforward.
txt = fileread(file);
valList = regexp(txt, '#DATA\n([\d\n\, \.-]+)', 'tokens');
vals = strsplit(valList{1}{1}, ', ');
That's pretty quick-and-dirty, and the regular expression will change depending on how you decide to format the file, but that should get you in the right direction. If you format the file more nicely, you can probably use a prettier expression.
Dear All (with many thanks in advance),
The following script has trouble reading (and therefore writing) the %s character in the file 'master.py'.
I get that matlab thinks the %s is an escape character, so perhaps an option is to modify the terminator, but I have found this difficult.
(EDIT: Forgot to mention the file master.py is not in my control, so I can't modify the file to %%s for example).
%matlab script
%===============
fileID = fopen('script.py','w');
yMax=5;
fprintf(fileID,'yOverallDim = %d\n', -1*yMax);
%READ IN "master.py" for rest of script
fileID2 = fopen('master.py','r');
currentLine = fgets(fileID2);
while ischar(currentLine)
fprintf(fileID,currentLine);
currentLine = fgets(fileID2);
end
fclose(fileID);
fclose(fileID2);
The file 'master.py' looks like this (and the problem is on line 6 'setName ="Set-%s"%(i+1)':
i=0
for yPos in range (0,yOverallDim,yVoxelSize):
yCoordinate=yPos+(yVoxelSize/2) #
for xPos in range (0,xOverallDim,xVoxelSize):
xCoordinate=xPos+(xVoxelSize/2)
setName ="Set-%s"%(i+1)
p = mdb.models['Model-1'].parts['Part-1']
# p = mdb.models['Model-1'].parts['Part-2']
c = p.cells
cells = c.findAt(((xCoordinate, yCoordinate, 10.0), ))
region = p.Set(cells=cells, name=setName)
p.SectionAssignment(region=region, sectionName='Section-1', offset=0.0, offsetType=MIDDLE_SURFACE, offsetField='', thicknessAssignment=FROM_SECTION)
i+=1
In the documentation of fprintf you'll find this:
fprintf(fileID,formatSpec,A1,...,An) applies the formatSpec to all elements of arrays A1,...An in column order, and writes the data to a text file.
So in your function fprintf uses currentLine as format specification, resulting in an unexpected output for line 6. Correct application of fprintf by providing a formatSpec, fixes this issue and doesn't require any replace operations:
fprintf(fileID, '%s', currentLine);
Your script has no trouble reading the % characters correctly. The "problem" is with fprintf(). This function correctly interpretes the percent signs in the string as formatting characters. Therefore, I think you have to manually escape every single % character in your currentLine string:
currentLine = strrep(currentLine, '%', '%%');
At least, it worked when I checked it on your example data.
Thanks applesoup for identifying my fundamental oversight - the problem is in the fprintf - not in the file read
Thanks serial for enhancing the fprintf
Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.
AIR, ID
AIR.SIT
50 1 1 1 0 0 2 1
43.57 -116.24 1. 857.7
Hi, All,
I have a text file like above. Now in Matlab, I want to create 5000 text files, changing the number "2" (the specific number in the 3rd row) from 1 to 5000 in each file, while keeping other contents the same. In every loop, the changed number is the same with the loop number. And the output in every loop is saved into a new text file, with the name like AIR_LoopNumber.SIT.
I've spent some time writing on that. But it is kind of difficult for a newby. Here is what I have:
% - Read source file.
fid = fopen ('Air.SIT');
n = 1;
textline={};
while (~feof(fid))
textline(n,1)={fgetl(fid)};
end
FileName=Air;
% - Replace characters when relevant.
for i = 1 : 5000
filename = sprintf('%s_%d.SIT','filename',i);
end
Anybody can help on finishing the program?
Thanks,
James
If your file is so short you do not have to read it line by line. Just read the full thing in one variable, modify only the necessary part of it before each write, then write the full variable back in one go.
%% // read the full file as a long sequence of 'char'
fid = fopen ('Air.SIT');
fulltext = fread(fid,Inf,'char') ;
fclose(fid) ;
%% // add a few blank placeholder (3 exactly) to hold the 4 digits when we'll be counting 5000
fulltext = [fulltext(1:49) ; 32 ; 32 ; 32 ; fulltext(50:end) ] ;
idx2replace = 50:53 ; %// save the index of the characters which will be modified each loop
%% // Go for it
baseFileName = 'AIR_%d.SIT' ;
for iFile = 1:1000:5000
%// build filename
filename = sprintf(baseFileName,iFile);
%// modify the string to write
fulltext(idx2replace) = num2str(iFile,'%04d').' ; %//'
%// write the file
fidw = fopen( filename , 'w' ) ;
fwrite(fidw,fulltext) ;
fclose(fidw) ;
end
This example works with the text in your example, you may have to adjust slightly the indices of the characters to replace if your real case is different.
Also I set a step of 1000 for the loop to let you try and see if it works without writing 1000's of file. When you are satisfied with the result, remove the 1000 step in the for loop.
Edit:
The format specifier %04d I gave in the first solution insure the output will take 4 characters, and it will pad any smaller number with zero (ex: 23 => 0023). It is sometimes desirable to keep the length constant, and in your particular example it made things easier because the output string would be exactly the same length for all the files.
However it is not mandatory at all, if you do not want the loop number to be padded with zero, you can use the simple format %d. This will only use the required number of digits.
The side effect is that the output string will be of different length for different loop number, so we cannot use one string for all the iterations, we have to recreate a string at each iteration. So the simple modifications are as follow. Keep the first paragraph of the solution above as is, and replace the last 2 paragraphs with the following:
%% // prepare the block of text before and after the character to change
textBefore = fulltext(1:49) ;
textAfter = fulltext(51:end) ;
%% // Go for it
baseFileName = 'AIR_%d.SIT' ;
for iFile = 1:500:5000
%// build filename
filename = sprintf(baseFileName,iFile);
%// rebuild the string to write
fulltext = [textBefore ; num2str(iFile,'%d').' ; textAfter ]; %//'
%// write the file
fidw = fopen( filename , 'w' ) ;
fwrite(fidw,fulltext) ;
fclose(fidw) ;
end
Note:
The constant length of character for a number may not be important in the file, but it can be very useful for your file names to be named AIR_0001 ... AIR_0023 ... AIR_849 ... AIR_4357 etc ... because in a list they will appear properly ordered in any explorer windows.
If you want your files named with constant length numbers, the just use:
baseFileName = 'AIR_%04d.SIT' ;
instead of the current line.