How to read a string containing a comma and an at sign with textread? - matlab

My prototype data line looks like this:
(1) 11 July England 0-0 Uruguay # Wembley Stadium, London
Currently I'm using this:
[no,dd,mm,t1,p1,p2,t2,loc]=textread('1966.txt','(%d) %d %s %s %d-%d %s # %[%s \n]');
But it gives me the following error:
Error using dataread
Trouble reading string from file (row 1, field 12) ==> Wembley Stadium, London\n
Error in textread (line 174)
[varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>
So it seems to have trouble with reading a string that contains a comma, or it's the at sign that causes trouble. I read the documentation thoroughly but nowhere does it mention what to do when you have special characters such as # or if you want to read a string that contains a delimiter even though it I don't want it recognized as a delimiter.

You want
[no,dd,mm,t1,p1,p2,t2,loc] = ...
textread('1966.txt','(%d) %d %s %s %d-%d %s # %[^\n]');

Related

How to get the length of a formatted string read in using fscanf in SystemVerilog?

I am reading a text file which has string test cases and decode them to process as Verilog test constructs to simulate. The code that I use to read a file is as follows:
integer pntr,file;
string a,b,c,d;
initial
begin
pntr = $fopen(FOO, "r");
end
always
begin
if(!$feof(pntr))
begin
file = $fscanf(pntr, "%s %s %s %s \n", a,b,c,d);
end
else
$fclose(pntr);
I have tried using
integer k;
k = strlen($fscanf(pntr, "%s %s %s %s \n", a,b,c,d));
$display(k);
and the display statement outputs an "x"
I also tried using
$display(file)
but this also gives me x as the display output. The above code is just a representation of my problem, I am using a larger formatted string to read in larger data. Each line of my testcase may have different size. I have initialized the format to the maximum number of string literals that my testcase can have. I wanted to ask if there is a way to get the length of each line that I read or number of string literals that fscanf read ?
Note: I am using Cadence tools for this task.
Input file looks like
read reg_loc1
write regloc2 2.5V regloc3 20mA
read regloc3 regloc5 regloc7
It's hard to debug your code when you have lots of typos and incomplete code. And you also have a race condition in that pntr may not have been assigned from $fopen if the always block executes before the initial block.
But in any case, the problem with using $fscanf and the %s format is that a newline gets treated as whitespace. It's better to use $fgets to read a line at a time, and the use $sscanf to parse the line:
module top;
integer pntr,file;
string a,b,c,d, line;
initial begin
pntr = $fopen("FOO", "r");
while(!$feof(pntr))
if ((file = $fgets(line,pntr)!=0)) begin
$write("%d line: ", file, line);
file = $sscanf(line, "%s %s %s %s \n", a,b,c,d);
$display(file,,a,b,,c,,d);
end
$fclose(pntr);
end
endmodule

Trying to load text file into Octave GUI into the correct format

I have a text file that contains 10 columns separated by comas and X number of rows denoted by return. The initial header line of the data is a string. The first two columns are character strings while the last 8 columns are integers. So far I have tried fscanf:
t1 = fscanf( test, '%c', inf );
which will import the data as large 1 by XXXXX character matrix and textread which imports it but does not format it correctly:
[a,b,c,d,e,f,g,h,i,j] = textread("test.txt", "%s %s %s %s %s %s %s %s %s %s",\
'headerlines', 1);
I suspect its a simple issue of formatting the notation on textread correctly to get my desired output. Any help is greatly appreciated.

Blank cells while reading substring and numbers from with a string with textscan

I have a text file that consists of line after line of data in an xml-like format like this:
<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />
I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:
expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));
The I re-open the file to read the files with string with the following call:
while ~feof(InputFile) % Read all lines in file
data = textscan(InputFile,FormatString,'delimiter','\n');
end
But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?
Clarification:
Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.
"Remove the literal text 'Level' from each field in the second column of the data from the previous example."
filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}
Ok, after looking at this with some fresh eyes today I spotted my problem.
newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
data = textscan(InputFile,FormatString,'delimiter',' ');
The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.

Printing a warning message over multiple lines

I am trying to print a warning message that is a little long and includes 2 variable calls. Here's my code:
warning( 'MATLAB:questionable_argument', ...
'the arguments dt (%d) and h (%d) are sub-optimal. Consider increasing nt or decreasing nx.', ...
dt, h )
Obviously, the line of text extends to the right when viewing the MATLAB code. How can I break it so it wraps nicely? I've tried multiple things but keep getting syntax errors.
As suggested in comments, just insert a \n where you want to break the line. You can also use a variable for the text, to make it easy to read also within the code:
txt = sprintf(['the arguments dt (%d) and h (%d) are sub-optimal.\n'...
'Consider increasing nt or decreasing nx.'],dt,h);
warning( 'MATLAB:questionable_argument',txt)
If you just embed escape characters such as \n in a warning string, it will not work:
warning('Hi there.\nPlease do not do that.')
will just print out:
Warning: hi there.\nPlease do not do that
However, if you pre-format the text using sprintf , then all the escape characters will work. For instance:
warnText = sprintf('Hi there.\nPlease do not do that.');
warning(warnText)
Produces what you want:
Warning: Hi there.
Please do not do that.
A more simple version than EBH had provided is as shown:
str1 = 'text 1';
str2 = 'text 2';
str3 = 'etc.';
str = sprintf('\n%s \n%s \n%s \n',str1,str2,str3);
warning(str)

How to recognize ID, Literals and Comments in Lex file

I have to write a lex program that has these rules:
Identifiers: String of alphanumeric (and _), starting with an alphabetic character
Literals: Integers and strings
Comments: Start with ! character, go to until the end of the line
Here is what I came up with
[a-zA-Z][a-zA-Z0-9]+ return(ID);
[+-]?[0-9]+ return(INTEGER);
[a-zA-Z]+ return ( STRING);
!.*\n return ( COMMENT );
However, I still get a lot of errors when I compile this lex file.
What do you think the error is?
It would have helped if you'd shown more clearly what the problem was with your code. For example, did you get an error message or did it not function as desired?
There are a couple of problems with your code, but it is mainly correct. The first issue I see is that you have not divided your lex program into the necessary parts with the %% divider. The first part of a lex program is the declarations section, where regular expression patterns are specified. The second part is where the action that match patterns are specified. The (optional) third section is where any code (for the compiler) is placed. Code for the compiler can also be placed in the declaration section when delineated by %{ and %} at the start of a line.
If we put your code through lex we would get this error:
"SoNov16.l", line 1: bad character: [
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: ]
"SoNov16.l", line 1: bad character: +
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: (
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: )
"SoNov16.l", line 1: bad character: ;
Did you get something like that? In your example code you are specifying actions (the return(ID); is an example of an action) and thus your code is for the second section. You therefore need to put a %% line ahead of it. It will then be a valid lex program.
You code is dependant on (probably) a parser, which consumes (and declares) the tokens. For testing purposes it is often easier to just print the tokens first. I solved this problem by making a C macro which will do the print and can be redefined to do the return at a later stage. Something like this:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z][a-zA-Z0-9]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
[a-zA-Z]+ TOKEN (STRING);
!.*\n TOKEN (COMMENT);
If we build and test this, we get the following:
abc
String: abc Matched: ID
abc123
String: abc123 Matched: ID
! comment text
String: ! comment text
Matched: COMMENT
Not quite correct. We can see that the ID rule is matching what should be a string. This is due to the ordering of the rules. We have to put the String rule first to ensure it matches first - unless of course you were supposed to match strings inside some quotes? You also missed the underline from the ID pattern. Its also a good idea to match and discard any whitespace characters:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z]+ TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT);
[ \t\r\n]+ ;
Which when tested shows:
abc
String: abc Matched: STRING
abc123_
String: abc123_ Matched: ID
-1234
String: -1234 Matched: INTEGER
abc abc123 ! comment text
String: abc Matched: STRING
String: abc123 Matched: ID
String: ! comment text
Matched: COMMENT
Just in case you wanted strings in quotes, that is easy too:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
\"[^"]+\" TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT );
[ \t\r\n] ;
"abc"
String: "abc" Matched: STRING