matlab text read and write %s character (without escaping) - matlab

Dear All (with many thanks in advance),
The following script has trouble reading (and therefore writing) the %s character in the file 'master.py'.
I get that matlab thinks the %s is an escape character, so perhaps an option is to modify the terminator, but I have found this difficult.
(EDIT: Forgot to mention the file master.py is not in my control, so I can't modify the file to %%s for example).
%matlab script
%===============
fileID = fopen('script.py','w');
yMax=5;
fprintf(fileID,'yOverallDim = %d\n', -1*yMax);
%READ IN "master.py" for rest of script
fileID2 = fopen('master.py','r');
currentLine = fgets(fileID2);
while ischar(currentLine)
fprintf(fileID,currentLine);
currentLine = fgets(fileID2);
end
fclose(fileID);
fclose(fileID2);
The file 'master.py' looks like this (and the problem is on line 6 'setName ="Set-%s"%(i+1)':
i=0
for yPos in range (0,yOverallDim,yVoxelSize):
yCoordinate=yPos+(yVoxelSize/2) #
for xPos in range (0,xOverallDim,xVoxelSize):
xCoordinate=xPos+(xVoxelSize/2)
setName ="Set-%s"%(i+1)
p = mdb.models['Model-1'].parts['Part-1']
# p = mdb.models['Model-1'].parts['Part-2']
c = p.cells
cells = c.findAt(((xCoordinate, yCoordinate, 10.0), ))
region = p.Set(cells=cells, name=setName)
p.SectionAssignment(region=region, sectionName='Section-1', offset=0.0, offsetType=MIDDLE_SURFACE, offsetField='', thicknessAssignment=FROM_SECTION)
i+=1

In the documentation of fprintf you'll find this:
fprintf(fileID,formatSpec,A1,...,An) applies the formatSpec to all elements of arrays A1,...An in column order, and writes the data to a text file.
So in your function fprintf uses currentLine as format specification, resulting in an unexpected output for line 6. Correct application of fprintf by providing a formatSpec, fixes this issue and doesn't require any replace operations:
fprintf(fileID, '%s', currentLine);

Your script has no trouble reading the % characters correctly. The "problem" is with fprintf(). This function correctly interpretes the percent signs in the string as formatting characters. Therefore, I think you have to manually escape every single % character in your currentLine string:
currentLine = strrep(currentLine, '%', '%%');
At least, it worked when I checked it on your example data.

Thanks applesoup for identifying my fundamental oversight - the problem is in the fprintf - not in the file read
Thanks serial for enhancing the fprintf

Related

string concatenation from a .txt file matlab

I am trying to concatenate three lines (I want to leave the lines as is; 3 rows) from Shakespeare.txt file that shows:
To be,
or not to be:
that is the question.
My code right now is
fid = fopen('Shakespeare.txt')
while ~feof(fid)
a = fgets(fid);
b = fgets(fid);
c = fgets(fid);
end
fprintf('%s', strcat(a, b, c))
I'm supposed to use strcat and again, I want concatenated and leave them as three rows.
One method of keeping the rows separate is by storing the lines of the text file in a string array. Here a 1 by 3 string array is used. It may also be a good idea to use fgetl() which grabs each line of the text file at a time. Concantenating the outputs of fgetl() as strings may also be another option to ensure the they do not get stored as character (char) arrays. Also using the \n indicates to line break when printing the strings within the array String_Array.
fid = fopen('Shakespeare.txt');
while ~feof(fid)
String_Array(1) = string(fgetl(fid));
String_Array(2) = string(fgetl(fid));
String_Array(3) = string(fgetl(fid));
end
fprintf('%s\n', String_Array);
Ran using MATLAB R2019b

Blank cells while reading substring and numbers from with a string with textscan

I have a text file that consists of line after line of data in an xml-like format like this:
<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />
I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:
expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));
The I re-open the file to read the files with string with the following call:
while ~feof(InputFile) % Read all lines in file
data = textscan(InputFile,FormatString,'delimiter','\n');
end
But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?
Clarification:
Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.
"Remove the literal text 'Level' from each field in the second column of the data from the previous example."
filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}
Ok, after looking at this with some fresh eyes today I spotted my problem.
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
data = textscan(InputFile,FormatString,'delimiter',' ');
The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.

Read textfile with a mix of floats, integers and strings in the same column

Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.

How to read text file with variable row length in Matlab?

I have of bunch of CSV files to read in Matlab. All of files has similar structure, except last field is optional. I.e. some files contain it, others are not.
Also files contain both textual and numeric fields, so csvread is not applicable.
Only alternative I know is textscan. Unfortunately, I can't find specifiers for optional fields.
I am looking at spec:
formatSpec = '%d%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%s%[^\n\r]';
and wish last %s be optional.
You could try the strsplit function
http://www.mathworks.com/help/matlab/ref/strsplit.html
To read a file line-by-line, you can use the function fgetl. It reads one line, removes newline-characters and returns the line as a string. At the end of the file, a -1 is returned.
You can then use the sscanf to extract the data according to your format spec (including the %s). If your input data doesn't contain any string at the end, then the last field was empty.
fid = fopen('file.txt','r');
while 1
line = fgetl(fid);
if line == -1
break;
end
A = sscanf(line,formatSpec);
...
end
You can then do whatever you need with A.
For example look at the following example:
line = '1 2.5 3.6 abc';
A = sscanf(line,'%d %f %f %s')
A =
1.0000
2.5000
3.6000
97.0000
98.0000
99.0000
The string will be A(4:end). The string was empty if isempty(A(4:end)), that way you can store the data as you like, e.g. in a cell.
Assuming you don't need the optional column, why not ignore the rest of the line by %*s and delimiter set to newline?

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1