Error in the semantic values returned by bison - return-value

A part of my bison grammar is as shown
head: OPEN statement CLOSE
{
$$=$2;
}
;
statement: word
{
$$=$1;
}
| statement word
{
$$=$1;
printf("%s",$$);
}
;
Now if my input is [hai hello] where [ is the OPEN & ] is the CLOSE respectively,then in the printf statement I get the output as "hai hello" itself..but in the $$ of head I get "hai hello]". Same happens with other grammars too.i.e., if i try to print valye of $1,the values of $2,$3,... are also printed.. why is it so.

The problem is probably in your lexer -- you probably have lexer actions that do something like yylval.str = yytext; to return a semantic value. The problem is that yytext is a pointer into the scanner's read buffer and is only valid until the next call to yylex. So all your semantic values in the parser quickly become dangling pointers and what they point at is no longer valid.
You need to make a copy of the token string in the lexer. Use an action something like yylval.str = strdup(yytext);. Of course, then you have potential memory leak issues in your parser -- you need to free the $n values you don't need anymore.

Related

ANTLR4 lexer rule creates errors or conflicts on perl grammar

I am having an issue on my PERL grammar, here are the relevant parts of my grammar :
element
: element (ASTERISK_CHAR | SLASH_CHAR | PERCENT_CHAR) element
| word
;
SLASH_CHAR: '/';
REGEX_STRING
: '/' (~('/' | '\r' | '\n') | NEW_LINE)* '/'
;
fragment NEW_LINE
: '\r'? '\n'
;
If the rule REGEX_STRING is not commented, then the following perl doesn't parse :
$b = 1/2;
$c = 1/2;
<2021/08/20-19:24:37> <ERROR> [parsing.AntlrErrorLogger] - Unit 1: <unknown>:2:6: extraneous input '/2;\r\n$c = 1/' expecting {<EOF>, '=', '**=', '+=', '-=', '.=', '*=', '/=', '%=', CROSS_EQUAL, '&=', '|=', '^=', '&.=', '|.=', '^.=', '<<=', '>>=', '&&=', '||=', '//=', '==', '>=', '<=', '<=>', '<>', '!=', '>', '<', '~~', '++', '--', '**', '.', '+', '-', '*', '/', '%', '=~', '!~', '&&', '||', '//', '&', '&.', '|', '|.', '^', '^.', '<<', '>>', '..', '...', '?', ';', X_KEYWORD, AND, CMP, EQ, FOR, FOREACH, GE, GT, IF, ISA, LE, LT, OR, NE, UNLESS, UNTIL, WHEN, WHILE, XOR, UNSIGNED_INTEGER}
Note that it doesn't matter where the lexer rule REGEX_STRING is used, even if it is not present anywhere in the parser rules just being here makes the parsing fails (so the issue is lexer side).
If I remove the lexer rule REGEX_STRING, then it gets parsed just fine, but then I can't parse :
$dateCalc =~ /^([0-9]{4})([0-9]{2})([0-9]{2})/
Also, I noticed that this perl parses, so there seems to be some kind of interaction between the first and the second '/'.
$b = 12; # Removed the / between 1 and 2
$c = 1/2; # Removing the / here would work as well.
I can't seem to find how to write my regex lexer rule to not make something fail.
What am I missing ? How can I parse both expressions just fine ?
The basic issue here is that ANTLR4, like many other parsing frameworks, performs lexical analysis independent of the syntax; the same tokens are produced regardless of which tokens might be acceptable to the parser. So it is the lexical analyser which must decide whether a given / is a division operator or the start of a regex, a decision which can really only be made using syntactic information. (There are parsing frameworks which do not have this limitation, and thus can be used to implement for scannerless parsers. These include PEG-based parsers and GLR/GLR parsers.)
There's an example of solving this lexical ambiguity, which also shows up in parsing ECMAScript, in the ANTLR4 example directory. (That's a github permalink so that the line numbers cited below continue to work.)
The basic strategy is to decide whether a / can start a regular expression based on the immediately previous token. This works in ECMAScript because the syntactic contexts in which an operator (such as / or /=) can appear are disjoint from the contexts in which an operand can appear. This will probably not translate directly into a Perl parser, but it might help show the possibilities.
Line 780-782: The regex token itself is protected by a semantic guard:
RegularExpressionLiteral
: {isRegexPossible()}? '/' RegularExpressionBody '/' RegularExpressionFlags
;
Lines 154-182: The guard function itself is simple, but obviously required a certain amount of grammatical analysis to generate the correct test. (Note: The list of tokens has been abbreviated; see the original file for the complete list):
private boolean isRegexPossible() {
if (this.lastToken == null) {
return true;
}
switch (this.lastToken.getType()) {
case Identifier:
case NullLiteral:
...
// After any of the tokens above, no regex literal can follow.
return false;
default:
// In all other cases, a regex literal _is_ possible.
return true;
}
}
}
Lines 127-147 In order for that to work, the scanner must retain the previous token in the member variable last_token. (Comments removed for space):
#Override
public Token nextToken() {
Token next = super.nextToken();
if (next.getChannel() == Token.DEFAULT_CHANNEL) {
this.lastToken = next;
}
return next;
}

Carriage return character not being matched in Swift

I'm trying to parse a file that (apparently) ends its lines with carriage returns, but they aren't being matched as such in Swift, despite having the same UTF8 value. I can see possible fixes for the problem, but I'm curious as to what these characters actually are.
Here's some sample code, with the output below. (CR is set using Character("\r"), although I've tried it using "\r" as well.
try f.forEach() { c in
print(c, terminator:" ") // DBG
if (c == "\r") {
print("Carriage return found!")
}
print(String(c).utf8.first!, terminator:" ")//DBG
print(String(describing:pstate)) // DBG
...
case .field:
switch c {
case CR,LF :
self.endline()
pstate = .eol
When it reaches the end of line (which shows up as such in my text editors), I get this:
. 46 field
0 48 field
13 field
I 73 field
It doesn't seem to be matching using == or in the switch statement. Is there another approach I should be using for this character?
(I'll note that the parsing works fine with files that terminate in newlines.)
I determined what the problem was. By looking at c.unicodeScalars I discovered that the end of line character was in fact "\r\n", not just "\r". As seen in my code I was only taking the first when printing it out as UTF-8. I don't know if that's something from String.forEach or in the file itself.
I know that there are tests to determine if something is a newline. Swift 5 has them directly (c.isNewline), and there is also the CharacterSet approach as noted by Bill Nattaner.
I'm happier with something that will work in my switch statement (and thus I'll define each one explicitly), but that might change if I expect to deal with a wider variety of files.
I'm a little hazy as to what the f.forEach represents, but if your variable c is of type Character then you could replace your if statement with:
if "\(c)".rangeOfCharacter( from: CharacterSet.newlines ) != nil
{
print("Carriage return found!")
}
That way you won't have to invent a list of all-possible new line characters.

How to Determine if a given User input is a Float, String, or a Integer in Ruby

The is_a? method doesn't work; I have tried it and it apparently it checks if the value is derived from an object or something.
I tried something like this:
printf "What is the Regular Price of the book that you are purchasing?"
regular_price=gets.chomp
if regular_price.to_i.to_s == regular_price
print "Thank You #{regular_price}"
break
else
print "Please enter your price as a number"
end
Can someone explain to me more what .to_i and .to_s do? I just thought they convert the user input to a string, or a Numerical Value. I actually don't know how to check input to see if what he put in was a float, a String, or a decimal.
I just keep getting Syntax errors. I just want to know how to check for any of the 3 values and handle them accordingly.
There's a lot to your question so I recommend that you read How do I ask a good question? to help you get answers in the future. I'll go through each of your questions and try to provide answers to point you in the right direction.
The is_a? method works by accepting a class as a parameter and returning boolean. For example:
'foo'.is_a?(String)
=> true
1234.is_a?(Integer)
=> true
'foo'.is_a?(Integer)
=> false
1234.is_a?(String)
=> false
1.234.is_a?(Float)
=> true
The .to_i method is defined on the String class and will convert a string to an Integer. If there is no valid integer at the start of the string then it will return 0. For example:
"12345".to_i #=> 12345
"99 red balloons".to_i #=> 99
"0a".to_i #=> 0
"hello".to_i #=> 0
The .to_s method on the Integer class will return the string representation of the Integer. For example:
1234.to_s
=> '1234'
The same is true of Float:
1.234.to_s
=> '1.234'
Now let's take a look at your code. When I run it I get SyntaxError: (eval):4: Can't escape from eval with break which is happening because break has nothing to break out of; it isn't used to break out of an if statement but is instead used to break out of a block. For example:
if true
break
end
raises an error. But this does not:
loop do
if true
break
end
end
The reason is that calling break says "break out of the enclosing block," which in this case is the loop do ... end block. In the previous example there was no block enclosing the if statement. You can find more detailed explanations of the behavior of break elsewhere on stackoverflow.
Your final question was "I just want to know how to check for any of the 3 values and handle them accordingly." This answer explains how to do that but the code example is written in a way that's hard to decipher, so I've rewritten it below in an expanded form to make it clear what's happening:
regular_price = gets.chomp
begin
puts Integer(regular_price)
rescue
begin
puts Float(regular_price)
rescue
puts 'please enter your price as an Integer or Float'
end
end
What this code does is first it attempts to convert the string regular_price to an Integer. This raises an exception if it can't be converted. For example:
Integer('1234')
=> 1234
Integer('1.234')
ArgumentError: invalid value for Integer(): "1.234"
Integer('foo')
ArgumentError: invalid value for Integer(): "foo"
If an exception is raised then the rescue line stops the exception from being raised and instead continues executing on the next line. In this case, we're saying "if you can't convert to Integer then rescue and try to convert to Float." This works the same way as converting to Integer:
Float('1.234')
=> 1.234
Float('foo')
ArgumentError: invalid value for Float(): "foo"
Finally, we say "if you can't convert to Float then rescue and show an error message."
I hope this helps and answers your questions.

Substring is getting too less data

I want to grab lots of text content from a .sql file between a --Start and --End comment.
Whatever I do somehow I don`t get the substring method correctly to grab only the text within the --Start and --End comment:
text.sql
This text I want not
--Start
this text I want here
--End
This text I want not
This is what I tried:
$insertStartComment = "--Start"
$insertEndComment = "--End"
$content = [IO.File]::ReadAllText("C:\temp\test.sql")
$insertStartPosition = $content.IndexOf($insertStartComment) + $insertStartComment.Length
$insertEndPosition = $content.IndexOf($insertEndComment)
$content1 = $content.Substring($insertStartPosition, $content1.Length - $insertEndPosition)
$content = $content1.Substring(0,$content1.Length - $insertEndPosition)
It would be nice if someone could help me out find my error :-)
There's an attempt to use uninitialized variable in the code:
$content1 = $content.Substring($insertStartPosition, $content1.Length - $insertEndPosition)
The variable $content1 isn't initialized yet, thus the substring call goes haywire. When you run the code again, the variable is set - and results are even more weird.
Use Powershell's Set-StrictMode to enable warnings about uninitialized variables.
It's not the substring approach you are looking for, but I figured that I would toss out a RegEx solution. This will find the text between the --Start and --End on a text file. In this case, I am grouping the matched text with a named capture called LineYouWant and display the matches that it finds. This also works if you have multiple instances of --Start--End blocks in a single file.
$Text = [IO.File]::ReadAllText("C:\users\proxb\desktop\SQL.txt")
[regex]::Matches($Text,'.*--Start\s+(?<LineYouWant>.*)\s+--End.*') | ForEach {
$_.Groups['LineYouWant'].Value
}

Why is $ split valid syntax? [duplicate]

I just discovered that perl ignores space between the sigil and its variable name and was wondering if someone could tell me if this was the expected behaviour. I've never run into this before and it can result in strange behaviour inside of strings.
For example, in the following code, $bar will end up with the value 'foo':
my $foo = 'foo';
my $bar = "$ foo";
This also works with variable declarations:
my $
bar = "foo\n";
print $bar;
The second case doesn't really matter much to me but in the case of string interpolation this can lead to very confusing behaviour. Anyone know anything about this?
Yes, it is part of the language. No, you should not use it for serious code. As for being confusing in interpolation, all dollar signs (that are not part of a variable) should be escaped, not just the ones next to letters, so it shouldn't be a problem.
I do not know if this is the real reason behind allowing whitespace in between the sigil and the variable name, but it allows you to do things like
my $ count = 0;
my $file_handle_foo = IO::File->new;
which might be seen by some people as handy (since it puts the sigils and the unique parts of the variable names next to each other). It is also useful for Obfu (see the end of line 9 and beginning of line 10):
#!/usr/bin/perl -w # camel code
use strict;
$_='ev
al("seek\040D
ATA,0, 0;");foreach(1..3)
{<DATA>;}my #camel1hump;my$camel;
my$Camel ;while( <DATA>){$_=sprintf("%-6
9s",$_);my#dromedary 1=split(//);if(defined($
_=<DATA>)){#camel1hum p=split(//);}while(#dromeda
ry1){my$camel1hump=0 ;my$CAMEL=3;if(defined($_=shif
t(#dromedary1 ))&&/\S/){$camel1hump+=1<<$CAMEL;}
$CAMEL--;if(d efined($_=shift(#dromedary1))&&/\S/){
$camel1hump+=1 <<$CAMEL;}$CAMEL--;if(defined($_=shift(
#camel1hump))&&/\S/){$camel1hump+=1<<$CAMEL;}$CAMEL--;if(
defined($_=shift(#camel1hump))&&/\S/){$camel1hump+=1<<$CAME
L;;}$camel.=(split(//,"\040..m`{/J\047\134}L^7FX"))[$camel1h
ump];}$camel.="\n";}#camel1hump=split(/\n/,$camel);foreach(#
camel1hump){chomp;$Camel=$_;y/LJF7\173\175`\047/\061\062\063\
064\065\066\067\070/;y/12345678/JL7F\175\173\047`/;$_=reverse;
print"$_\040$Camel\n";}foreach(#camel1hump){chomp;$Camel=$_;y
/LJF7\173\175`\047/12345678/;y/12345678/JL7F\175\173\0 47`/;
$_=reverse;print"\040$_$Camel\n";}';;s/\s*//g;;eval; eval
("seek\040DATA,0,0;");undef$/;$_=<DATA>;s/\s*//g;( );;s
;^.*_;;;map{eval"print\"$_\"";}/.{4}/g; __DATA__ \124
\1 50\145\040\165\163\145\040\157\1 46\040\1 41\0
40\143\141 \155\145\1 54\040\1 51\155\ 141
\147\145\0 40\151\156 \040\141 \163\16 3\
157\143\ 151\141\16 4\151\1 57\156
\040\167 \151\164\1 50\040\ 120\1
45\162\ 154\040\15 1\163\ 040\14
1\040\1 64\162\1 41\144 \145\
155\14 1\162\ 153\04 0\157
\146\ 040\11 7\047\ 122\1
45\15 1\154\1 54\171 \040
\046\ 012\101\16 3\16
3\15 7\143\15 1\14
1\16 4\145\163 \054
\040 \111\156\14 3\056
\040\ 125\163\145\14 4\040\
167\1 51\164\1 50\0 40\160\
145\162 \155\151
\163\163 \151\1
57\156\056