Blank cells while reading substring and numbers from with a string with textscan - matlab

I have a text file that consists of line after line of data in an xml-like format like this:
<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />
I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:
expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));
The I re-open the file to read the files with string with the following call:
while ~feof(InputFile) % Read all lines in file
data = textscan(InputFile,FormatString,'delimiter','\n');
end
But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?
Clarification:
Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.
"Remove the literal text 'Level' from each field in the second column of the data from the previous example."
filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}

Ok, after looking at this with some fresh eyes today I spotted my problem.
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
data = textscan(InputFile,FormatString,'delimiter',' ');
The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.

Related

matlab text read and write %s character (without escaping)

Dear All (with many thanks in advance),
The following script has trouble reading (and therefore writing) the %s character in the file 'master.py'.
I get that matlab thinks the %s is an escape character, so perhaps an option is to modify the terminator, but I have found this difficult.
(EDIT: Forgot to mention the file master.py is not in my control, so I can't modify the file to %%s for example).
%matlab script
%===============
fileID = fopen('script.py','w');
yMax=5;
fprintf(fileID,'yOverallDim = %d\n', -1*yMax);
%READ IN "master.py" for rest of script
fileID2 = fopen('master.py','r');
currentLine = fgets(fileID2);
while ischar(currentLine)
fprintf(fileID,currentLine);
currentLine = fgets(fileID2);
end
fclose(fileID);
fclose(fileID2);
The file 'master.py' looks like this (and the problem is on line 6 'setName ="Set-%s"%(i+1)':
i=0
for yPos in range (0,yOverallDim,yVoxelSize):
yCoordinate=yPos+(yVoxelSize/2) #
for xPos in range (0,xOverallDim,xVoxelSize):
xCoordinate=xPos+(xVoxelSize/2)
setName ="Set-%s"%(i+1)
p = mdb.models['Model-1'].parts['Part-1']
# p = mdb.models['Model-1'].parts['Part-2']
c = p.cells
cells = c.findAt(((xCoordinate, yCoordinate, 10.0), ))
region = p.Set(cells=cells, name=setName)
p.SectionAssignment(region=region, sectionName='Section-1', offset=0.0, offsetType=MIDDLE_SURFACE, offsetField='', thicknessAssignment=FROM_SECTION)
i+=1
In the documentation of fprintf you'll find this:
fprintf(fileID,formatSpec,A1,...,An) applies the formatSpec to all elements of arrays A1,...An in column order, and writes the data to a text file.
So in your function fprintf uses currentLine as format specification, resulting in an unexpected output for line 6. Correct application of fprintf by providing a formatSpec, fixes this issue and doesn't require any replace operations:
fprintf(fileID, '%s', currentLine);
Your script has no trouble reading the % characters correctly. The "problem" is with fprintf(). This function correctly interpretes the percent signs in the string as formatting characters. Therefore, I think you have to manually escape every single % character in your currentLine string:
currentLine = strrep(currentLine, '%', '%%');
At least, it worked when I checked it on your example data.
Thanks applesoup for identifying my fundamental oversight - the problem is in the fprintf - not in the file read
Thanks serial for enhancing the fprintf

MATLAB: Reading space separated float values from tex file

I am reading a text file using the textscan function of MATLAB. Problem here is that nothing is being read in value as the floating points are separated with three spaces and I am quite new in MATLAB programming to use some efficient syntax. My current code is given below:
Code:
values = textscan(input_file, '%f %f %f %f %f\n %*[^\n]');
The input file follows the following format:
File:
0.781844 952.962130 2251.430836 3412.734125 4456.016362
0.788094 983.834855 2228.432996 3196.415590 4378.885466
0.794344 967.653718 2200.798973 3119.844502 4374.097695
If the floating point values are # separated then the below statement works fine:
values = textscan(input_file, '%f#%f#%f#%f#%f\n %*[^\n]');
Is there any solution except for tokenization ?
You need to specify a delimiter, also you should activate the MultipleDelimsAsOne in order to treat the repeated space as a single delimiter:
value = textscan(input_file, '%f %f %f %f %f \n ','Delimiter',' ','MultipleDelimsAsOne',1);
If needed you can also specify several delimiters at the same time:
del = {';',' '};
If you don't have to use textscan, you could probably use importdata. There you can specify the delimiter as a parameter.
Documentation http://se.mathworks.com/help/matlab/ref/importdata.html
Code example
filename = 'myfile01.txt';
delimiterIn = ' ';
A = importdata(filename,delimiterIn);

How to read text file with variable row length in Matlab?

I have of bunch of CSV files to read in Matlab. All of files has similar structure, except last field is optional. I.e. some files contain it, others are not.
Also files contain both textual and numeric fields, so csvread is not applicable.
Only alternative I know is textscan. Unfortunately, I can't find specifiers for optional fields.
I am looking at spec:
formatSpec = '%d%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%s%[^\n\r]';
and wish last %s be optional.
You could try the strsplit function
http://www.mathworks.com/help/matlab/ref/strsplit.html
To read a file line-by-line, you can use the function fgetl. It reads one line, removes newline-characters and returns the line as a string. At the end of the file, a -1 is returned.
You can then use the sscanf to extract the data according to your format spec (including the %s). If your input data doesn't contain any string at the end, then the last field was empty.
fid = fopen('file.txt','r');
while 1
line = fgetl(fid);
if line == -1
break;
end
A = sscanf(line,formatSpec);
...
end
You can then do whatever you need with A.
For example look at the following example:
line = '1 2.5 3.6 abc';
A = sscanf(line,'%d %f %f %s')
A =
1.0000
2.5000
3.6000
97.0000
98.0000
99.0000
The string will be A(4:end). The string was empty if isempty(A(4:end)), that way you can store the data as you like, e.g. in a cell.
Assuming you don't need the optional column, why not ignore the rest of the line by %*s and delimiter set to newline?

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

How to read line from a text file as a string in matlab?

I am trying to read a text file in MATLAB which has a format like the following. I am looking to read the whole line as a string.
2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
I tried couple of things.
textscan(fid, '%s') reads the line but splits the line into cells at spaces.
fscanf(fid, '%s') reads the line as a string but removes all the spaces.
fgetl(fid) will do what you're looking for. Newline is stripped off.
textscan uses a whitespace delimeter by default. Set the delimiter to an empty string:
>> q = textscan(fid, '%s', 'Delimiter', '');
>> q{1}{:}
ans = 2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
If you want to read the whole file as string (your file has only one line), try:
s = fileread('input.txt'); %# returns a char vector
s = strtrim(s); %# trim whitespaces
If you look at the source code of FILEREAD function, it is basically reading the file in binary mode as an array of characters: fread(fid, '*char')
whitespace is treated as a delimiter by default with textscan.
specify a different delimiter (that is not present in your data) when calling, that should do the trick, add this f.e.
'delimiter', '|'
you can also use
file = textread(<fileref goes here>, '%s', 'delimiter', '\n')
then
file{1,1}
will return
ans =
2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
hope this helps
Use:
clc;
fid = fopen('fileName.m');
while ischar(tline)
disp(strcat("Line imported: ",tline))
tline = fgetl(fid);
end
fclose(fid);