I'm getting an error saying, NEGATIVE RANGE IN CHARACTER CLASS in lines 15,17 and 19 - lex

Please find the error in lines 15,17 and 19:
%{
#include<stdio.h>
int c=0;
FILE *fp;
%}
operator [+-*/]
identifier [a-zA-Z][a-zA-Z0-9]*
number [0-9]+
expression ({identifier}|{number}){operator}({identifier}|{number})
%%
\n { c++; }
^"#".+ ;
^("int "|"float "|"char ").+ ;
"void main()" ;
{identifier}"="({expression}+";") {printf("Valid arithmetic expression in line %d",c+1);ECHO;printf("\n");}
{identifier}"="({number}|{identifier}";") {printf("Valid assignment statement in line %d",c+1);ECHO;printf("\n");}
({number}|([0-9]+[a-zA-Z0-9]*))"="{expression}+ {printf("Invalid: rules for naming identifier are violated in line %d",c+1);ECHO;printf("\n");}
{identifier}"=;" {printf("Invalid right side of expression missing in line %d",c+1);ECHO;printf("\n");}
{operator}{operator}+ {printf("Invalid multiple operators cannot occur consecutively in line %d",c+1);ECHO;printf("\n");}
. ;
%%
main()
{
yyin=fopen("3b.txt","r");
yylex();
fclose(yyin);
}

I don't think that your error "Negative Range in Character Class" is actually on lines 15, 17, or 19. I believe that it is on line 6. Your code says operator [+-*/], by which you appear to mean "the symbols +, -, *, and /".
However, the - is actually being interpreted as a "range" from + to *. Since + is character 43 and * is character 42, that range is backwards.
If you escape the - with \ before it, you should not have that error anymore.

Related

Lex Parsing for exponent

I am trying to parse a file the data looks like
size = [5e+09, 5e+09, 5e+09]
I have 'size OSQUARE NUMBER COMMA NUMBER COMMA NUMBER ESQUARE'
And NUMBER is defined in tokrules as
t_NUMBER = r'[-]?[0-9]*[\.]*[0-9]+([eE]-?[0-9]+)*'
But I get
Syntax error in input!
LexToken(ID,'e',6,113)
Illegal character '+'
Illegal character '+'
Illegal character '+'
What is wrong with my NUMBER definition?
I am using https://www.dabeaz.com/ply/
The part of your rule which matches exponents is
([eE]-?[0-9]+)*
Clearly, that won't match a +. It should be:
([eE][-+]?[0-9]+)*
Also, it will match 0 or more exponents, which is not correct. It should match 0 or 1:
([eE][-+]?[0-9]+)?

How to recognize ID, Literals and Comments in Lex file

I have to write a lex program that has these rules:
Identifiers: String of alphanumeric (and _), starting with an alphabetic character
Literals: Integers and strings
Comments: Start with ! character, go to until the end of the line
Here is what I came up with
[a-zA-Z][a-zA-Z0-9]+ return(ID);
[+-]?[0-9]+ return(INTEGER);
[a-zA-Z]+ return ( STRING);
!.*\n return ( COMMENT );
However, I still get a lot of errors when I compile this lex file.
What do you think the error is?
It would have helped if you'd shown more clearly what the problem was with your code. For example, did you get an error message or did it not function as desired?
There are a couple of problems with your code, but it is mainly correct. The first issue I see is that you have not divided your lex program into the necessary parts with the %% divider. The first part of a lex program is the declarations section, where regular expression patterns are specified. The second part is where the action that match patterns are specified. The (optional) third section is where any code (for the compiler) is placed. Code for the compiler can also be placed in the declaration section when delineated by %{ and %} at the start of a line.
If we put your code through lex we would get this error:
"SoNov16.l", line 1: bad character: [
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: ]
"SoNov16.l", line 1: bad character: +
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: (
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: )
"SoNov16.l", line 1: bad character: ;
Did you get something like that? In your example code you are specifying actions (the return(ID); is an example of an action) and thus your code is for the second section. You therefore need to put a %% line ahead of it. It will then be a valid lex program.
You code is dependant on (probably) a parser, which consumes (and declares) the tokens. For testing purposes it is often easier to just print the tokens first. I solved this problem by making a C macro which will do the print and can be redefined to do the return at a later stage. Something like this:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z][a-zA-Z0-9]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
[a-zA-Z]+ TOKEN (STRING);
!.*\n TOKEN (COMMENT);
If we build and test this, we get the following:
abc
String: abc Matched: ID
abc123
String: abc123 Matched: ID
! comment text
String: ! comment text
Matched: COMMENT
Not quite correct. We can see that the ID rule is matching what should be a string. This is due to the ordering of the rules. We have to put the String rule first to ensure it matches first - unless of course you were supposed to match strings inside some quotes? You also missed the underline from the ID pattern. Its also a good idea to match and discard any whitespace characters:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z]+ TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT);
[ \t\r\n]+ ;
Which when tested shows:
abc
String: abc Matched: STRING
abc123_
String: abc123_ Matched: ID
-1234
String: -1234 Matched: INTEGER
abc abc123 ! comment text
String: abc Matched: STRING
String: abc123 Matched: ID
String: ! comment text
Matched: COMMENT
Just in case you wanted strings in quotes, that is easy too:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
\"[^"]+\" TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT );
[ \t\r\n] ;
"abc"
String: "abc" Matched: STRING

Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE error perl

I can't fix this error...
#temp=split(/(/)/,$headerLine);
this error appears
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE
Use
#temp=split(/(\/)/,$headerLine);
or
#temp=split(m&(/)&,$headerLine);
The slash in the parentheses prematurely terminates your regular expression.
Your second / character is terminating the regular expression, so Perl is interpreting your code as:
#temp=split /(/
followed by garbage.
Simply escape the literal /:
#temp=split(/(\/)/, $headerLine)

Matlab: how to convert character array or string into a formatted output OR parse a string

Could someone please tell me how to convert character array into a formatted output using Matlab?
I am expecting data like this:
CHAR (1 x 29) : 0.050822999 3.141592979 ; (1)
OR
CELL (1 x 1) or string: '0.050822999 3.141592979 ; (1)'
I am looking for output like this:
d1 = 0.050822999; %double
d2 = 3.141592979; %double
index = 1; % integer
I tried transposing and then using str2num(Str'); but, it's returning me 0x 0 double.
Any help would be appreciated.
Regards,
DK
you can use regexp to parse the string
c = { '0.050822999 3.141592979 ; (1)' };
p = regexp( c{1}, '^(\d+\.\d+)\s(\d+\.\d+)\s*;\s*\((\d+)\)$', 'tokens', 'once' ); %//parse the input string
numbers = str2mat(p); %// convert extracted strings to numerical values
Example result
ans =
0.050822999
3.141592979
1
Explaining the regexp pattern:
^ - pattern starts at the beginning of the input string
(\d+\.\d+) - parentheses ('()') enclosing this sub-pattern indicates it as a single token
\d+ matches one or more digits, then expecting \. a dot (notice the \, since . alone in regexp acts as a wildcard) and after the dot \d+ one or more digits are expected.
This token should correspond to the first number, e.g., 0.050822999
\s expecting a single space
(\d+\.\d+) - again, expecting another decimal fraction as the second token.
\s* - expecting white space (zero or more).
; - capture the ; in the expression, but not as a token.
\s+ - expecting white space (zero or more).
\( - expecting an open parenthesis, note the \ since parentheses in regexp are used to denote tokens.
(\d+) - expecting one or more digits as the third token, only integer numbers are expected here. no decimal point.
\) - expecting a closing parenthesis.
$ - pattern should reach the end of the input string.
You can use something like this (if I understood you correctly)
function str_dump(var)
info = whos;
disp([info.class ' ' mat2str(info.size) ' : ' var]);
end
This just shows information about the string. If you want to parse it and convert to another Matlab's structure, you have to explain it more carefully.
%// Input
a = [0.050822999 3.141592979];
n = 1;
%// Output
str = [num2str(a,'%0.9f ') ' ; (' num2str(n) ')']
Result:
str =
0.050822999 3.141592979 ; (1)

Line continuation character in Scala

I want to split the following Scala code line like this:
ConditionParser.parseSingleCondition("field=*value1*").description
must equalTo("field should contain value1")
But which is the line continuation character?
Wrap it in parentheses:
(ConditionParser.parseSingleCondition("field=*value1*").description
must equalTo("field should contain value1"))
Scala does not have a "line continuation character" - it infers a semicolon always when:
An expression can end
The following (not whitespace) line begins not with a token that can start a statement
There are no unclosed ( or [ found before
Thus, to "delay" semicolon inference one can place a method call or the dot at the end of the line or place the dot at the beginning of the following line:
ConditionParser.
parseSingleCondition("field=*value1*").
description must equalTo("field should contain value1")
a +
b +
c
List(1,2,3)
.map(_+1)