MATLAB: Reading space separated float values from tex file - matlab

I am reading a text file using the textscan function of MATLAB. Problem here is that nothing is being read in value as the floating points are separated with three spaces and I am quite new in MATLAB programming to use some efficient syntax. My current code is given below:
Code:
values = textscan(input_file, '%f %f %f %f %f\n %*[^\n]');
The input file follows the following format:
File:
0.781844 952.962130 2251.430836 3412.734125 4456.016362
0.788094 983.834855 2228.432996 3196.415590 4378.885466
0.794344 967.653718 2200.798973 3119.844502 4374.097695
If the floating point values are # separated then the below statement works fine:
values = textscan(input_file, '%f#%f#%f#%f#%f\n %*[^\n]');
Is there any solution except for tokenization ?

You need to specify a delimiter, also you should activate the MultipleDelimsAsOne in order to treat the repeated space as a single delimiter:
value = textscan(input_file, '%f %f %f %f %f \n ','Delimiter',' ','MultipleDelimsAsOne',1);
If needed you can also specify several delimiters at the same time:
del = {';',' '};

If you don't have to use textscan, you could probably use importdata. There you can specify the delimiter as a parameter.
Documentation http://se.mathworks.com/help/matlab/ref/importdata.html
Code example
filename = 'myfile01.txt';
delimiterIn = ' ';
A = importdata(filename,delimiterIn);

Related

Blank cells while reading substring and numbers from with a string with textscan

I have a text file that consists of line after line of data in an xml-like format like this:
<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />
I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:
expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));
The I re-open the file to read the files with string with the following call:
while ~feof(InputFile) % Read all lines in file
data = textscan(InputFile,FormatString,'delimiter','\n');
end
But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?
Clarification:
Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.
"Remove the literal text 'Level' from each field in the second column of the data from the previous example."
filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}
Ok, after looking at this with some fresh eyes today I spotted my problem.
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
data = textscan(InputFile,FormatString,'delimiter',' ');
The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.

Matlab: how to read a .txt file with many separators

this is my first question here on stackoverflow. I have a problem reading a .txt file with Matlab using textread. The .txt, really messy, has a structure as below.
"ALMEMO";"BEREICH:";"L420";"DIGI";"DIGI";"DIGI";"DIGI";;;;;;;"DIGI";"DIGI";"DIGI";"DIGI";;;;;;;"DIGI";"DIGI";"DIGI";"DIGI";;;;;;;"DIGI";"DIGI";"DIGI";"DIGI";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;"CoCo";"CoCo";"CoCo";"CoCo";"CuCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";;;;;;;;;;;"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo";"CoCo"
"5690-2";"KOMMENTAR:";"";"T,t ";"T,t ";"Temperatur";"T,t ";;;;;;;"RH,Uw ";"RH,Uw ";"Feuchte ";"RH,Uw ";;;;;;;"DT,td ";"DT,td ";"Taupunkt ";"DT,td ";;;;;;;"MH,r g/kg ";"MH,r g/kg ";"Mischung ";"MH,r g/kg ";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;"";"";"";"";"";"";"";"";"";"";;;;;;;;;;;"";"";"";"";"";"";"";"";"";""
"SD3.10";"GW-MAX:";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
"ALMEMO.001";"GW-MIN:";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
"DATUM:";"ZEIT:";"M00: ms";"M01: øC";"M02: øC";"M03: øC";"M04: øC";;;;;;;"M11: %H";"M12: %H";"M13: %H";"M14: %H";;;;;;;"M21: øC";"M22: øC";"M23: øC";"M24: øC";;;;;;;"M31: gk";"M32: gk";"M33: gk";"M34: gk";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;"M70: øC";"M71: øC";"M72: øC";"M73: øC";"M74: øC";"M75: øC";"M76: øC";"M77: øC";"M78: øC";"M79: øC";;;;;;;;;;;"M90: øC";"M91: øC";"M92: øC";"M93: øC";"M94: øC";"M95: øC";"M96: øC";"M97: øC";"M98: øC";"M99: øC"
07.03.21;11:29:24;0,;22,91;23,15;23,68;22,75;;;;;;;38,3;74,1;70,;38,8;;;;;;;8,;18,3;17,8;8,1;;;;;;;6,6;13,2;12,8;6,6;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-
;11:30:24;0,;22,9;23,14;23,69;22,82;;;;;;;38,4;72,6;71,9;38,5;;;;;;;8,;18,;18,3;8,;;;;;;;6,6;12,9;13,2;6,6;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-
;11:31:24;0,;22,94;23,14;23,68;22,88;;;;;;;38,3;75,4;71,5;38,5;;;;;;;8,;18,6;18,2;8,1;;;;;;;6,6;13,4;13,1;6,6;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-
;11:32:24;0,;23,;23,13;23,68;22,95;;;;;;;38,2;73,;72,3;38,5;;;;;;;8,;18,1;18,4;8,1;;;;;;;6,6;13,;13,3;6,7;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-;;;;;;;;;;;-;-;-;-;-;-;-;-;-;-
six lines of header are followed by the actual data, that are separated by ';' and have the floating point numbers formatted with comas instead of dots. The data I need is not represented by the whole line but only the first nine elements (date, hour, 9 floating point numbers).
The code I wrote to read the file, a bit naively, looking at other codes, is the following:
[date1, hour1, V0, Temp1, Temp2, Temp3, Temp4, RH1, RH2, RH3, RH4] = textread('file.txt', '%c %c %f %f %f %f %f %c* %c* %c* %c* %c* %c* %f %f %f %f', 'headerlines', 7, 'delimiter', ';');
obviously it does not work. I think the headers should be skipped already in my version of the code, so, to summarize, the following questions remain:
How can I treat many separators as one? (or ignore them, as I tried to do in my code)
How can I make the date, that appears only in the first line after the header, appear in the whole code? (I think i can fill the first column of the output matrix afterwards with a for cycle)
How can I cut the lines of the text file, ignoring everything that comes after the ninth floating point number?
-How can I read coma separated floating point numbers? (I tried to convert them to dots with the notepad "Replace" function, this is a valid solution in my case, but still does not solve the problem)
Thanks in advance to everyone who will answer, take care,
Giuseppe
You could take advantage of the built-in arguments for textscan to correctly handle the lines of the header and parse out the multiple delimiters. Then handle the commas for dots replacement with strrep. Finally, you can convert your cell array of strings to an array of numbers with str2double.
fid = fopen('foo.txt');
C = textscan(fid, repmat('%s',1,9), 'Headerlines', 6, 'Delimiter', ';', 'MultipleDelimitersAsOne', 1);
col1 = str2double( strrep(C{1}, ',', '.') );
Very roundabout way of accomplishing your task but text handling is not exactly matlab's strong point.

Matlab: Parsing strings

How can I turn strings like 1-14 into 01_014? (and for 2-2 into 02_002)?
I can do something like this:
testpoint_number = '5-16';
temp = textscan(testpoint_number, '%s', 'delimiter', '-');
temp = temp{1};
first_part = temp{1};
second_part = temp{2};
output_prefix = strcat('0',first_part);
parsed_testpoint_number = strcat(output_prefix, '_',second_part);
parsed_testpoint_number
But I feel this is very tedious, and I don't know how to handle the second part (16 to 016)
As you are handling integer numbers, I would suggest to change the textscan to %d (integer numbers). With that, you can use the formatting power of the *printf commands (e.g. sprintf).
*printf allows you to specify the width of the integer. With %02d, a 2 chars wide integer, which will be filled up with zeros, is printed.
The textscan returns a {1x1} cell, which contains a 2x1 array of integers. *printf can handle this itsself, so you just have to supply the argument temp{1}:
temp = textscan(testpoint_number, '%d', 'delimiter', '-');
parsed_testpoint_number = sprintf('%02d_%03d',temp{1});
Your textscanning is probably the most intuitive way to do this, but from then on what I would recommend doing is instead converting the scanned first_part and second_part into numerical format, giving you integers.
You can then sprintf these into your target string using the correct 'c'-style formatters to indicate your zero-padding prefix width, e.g.:
temp = textscan(testpoint_number, '%d', 'delimiter', '-');
parsed_testpoint_number = sprintf('%02d_%03d', temp{1});
Take a look at the C sprintf() documentation for an explanation of the string formatting options.

How to read text file with variable row length in Matlab?

I have of bunch of CSV files to read in Matlab. All of files has similar structure, except last field is optional. I.e. some files contain it, others are not.
Also files contain both textual and numeric fields, so csvread is not applicable.
Only alternative I know is textscan. Unfortunately, I can't find specifiers for optional fields.
I am looking at spec:
formatSpec = '%d%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%s%[^\n\r]';
and wish last %s be optional.
You could try the strsplit function
http://www.mathworks.com/help/matlab/ref/strsplit.html
To read a file line-by-line, you can use the function fgetl. It reads one line, removes newline-characters and returns the line as a string. At the end of the file, a -1 is returned.
You can then use the sscanf to extract the data according to your format spec (including the %s). If your input data doesn't contain any string at the end, then the last field was empty.
fid = fopen('file.txt','r');
while 1
line = fgetl(fid);
if line == -1
break;
end
A = sscanf(line,formatSpec);
...
end
You can then do whatever you need with A.
For example look at the following example:
line = '1 2.5 3.6 abc';
A = sscanf(line,'%d %f %f %s')
A =
1.0000
2.5000
3.6000
97.0000
98.0000
99.0000
The string will be A(4:end). The string was empty if isempty(A(4:end)), that way you can store the data as you like, e.g. in a cell.
Assuming you don't need the optional column, why not ignore the rest of the line by %*s and delimiter set to newline?

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1