Composite grammars with ANTLR

Composite grammars with ANTLR - import

(Environment is ANTLR 4 with Javascript)
I have a Test.g4 grammar, importing two other grammars (details omitted).
grammar Test;
import Time, Basic;
request: PERIOD '=' exp=interval;
PERIOD: [Pp][Ee][Rr][Ii][Oo][Dd];
Time.g4 grammar is:
grammar Time;
epoch: Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit;
interval: '[' from=epoch ',' to=epoch ']';
Digit: DIGIT;
fragment DIGIT: [0-9];
Basic.g4 grammar is:
grammar Basic;
any: DECIMAL_LITERAL;
DECIMAL_LITERAL: DIGIT+;
fragment DIGIT: [0-9];
There is a custom visitor for Time grammar:
const Time = superclass => class extends superclass {
constructor() {
super();
superclass.call(this);
return this;
}
visitEpoch(ctx) {
return parseInt(ctx.getText());
}
visitInterval(ctx) {
var from = parseInt(this.visit(ctx.from));
var to = parseInt(this.visit(ctx.to));
return { from: from, to: to }
}
}
exports.Time = Time;
The test code is:
#!/usr/bin/env node
const antlr4 = require('antlr4');
const TestLexer = require('./TestLexer').TestLexer;
const TestParser = require('./TestParser').TestParser;
const TestVisitor = require('./TestVisitor').TestVisitor;
const Time = require('./time').Time;
class Visitor extends Time(TestVisitor) {
run(chars) {
var stream = new antlr4.InputStream(chars);
var lexer = new TestLexer(stream);
var tokens = new antlr4.CommonTokenStream(lexer);
var parser = new TestParser(tokens);
parser.buildParseTrees = true;
var tree = parser.request();
var result = this.visitRequest(tree);
console.log(result[2]);
}
}
var time = new Visitor();
time.run('period=[1234567890123,1234567890888]');
If I only import Time grammar, everything works fine. However, if I import both Time and Basic grammars, I get the following erros:
line 1:8 mismatched input '1234567890123' expecting Digit
line 1:22 mismatched input '1234567890888' expecting Digit
What am I doing wrong?
Thank you in advance,
RG

Related

Getting substring of a string in flutter that contain and emoji can cause a render failure

As part of the app I am shortening a user-given string to 40 characters and appending ellipses at the end if the string is longer than 40 characters. The users are allowed/able to use emoji in their strings.
If a string that is cut off has an emoji at the 40-character mark it causes it to fail to render and renders as a "�".
Is there a way to reliably get a sub-string of a text but without running into this issue?
Current code:
if (useStr.length > 40) {
useStr = useStr.substring(0, 40) + "...";
}

You should use package:characters and truncate strings based on graphemes (human-perceived characters) and not on (UTF-16) code units.
import 'package:characters/characters.dart';
void main() {
final originalString = '\u{1F336}\uFE0F' * 3; // 🌶️🌶️🌶️
const ellipsis = '\u2026';
for (var i = 1; i < 4; i += 1) {
var s = originalString;
var characters = s.chararacters;
if (characters.length > i) {
s = '${characters.take(i)}$ellipsis';
}
print(s);
}
}
which prints:
🌶️…
🌶️🌶️…
🌶️🌶️🌶️

Neatly parsing a date in "MMddyy" format along with other formats in dart

I guess it is not possible to parse a date in "MMddyy" format in dart.
void main() {
String strcandidate = "031623";
String format = "MMddyy";
var originalFormat = DateFormat(format).parse(strcandidate);
}
Output:
Uncaught Error: FormatException: Trying to read dd from 031623 at position 6
The following works fine when parsing a date in "MM-dd-yy" format.
void main() {
String strcandidate = "03-16-23";
String format = "MM-dd-yy";
var originalFormat = DateFormat(format).parse(strcandidate);
}
In the problem, the input date string can be in any format e.g ['yyyy-MM-dd', 'MMM'-yyyy, 'MM/dd/yy']. I am parsing the input string for these formats in a loop as follows.
dateFormatsList = ['yyyy-MM-dd', 'MMM'-yyyy, 'MM/dd/yy'];
for (String format in dateFormatsList ) {
try {
originalFormat = DateFormat(format).parse(strcandidate);
dateFound = true;
} catch (e) {}
}
Adding 'MMddyy' to dateFormatsList is not going to work.
But regular expression be used to parse this format.
However if all formats are parsed using parse method and one additional format is parsed using regular expression, then the code is not that neat, and cluttered.
To write as much neat and efficient code as possible, if you want, you can share your insights about any possibility for making it efficient and clean while incorporating 'MMddyy'format. Tysm!

See How do I convert a date/time string to a DateTime object in Dart? for how to parse various date/time strings to DateTime objects.
If you need to mix approaches, you can provide a unified interface. Instead of using a List<String> for your list of formats, you can use a List<DateTime Function(String)>:
import 'package:intl/intl.dart';
/// Parses a [DateTime] from [dateTimeString] using a [RegExp].
///
/// [re] must have named groups with names `year`, `month`, and `day`.
DateTime parseDateFromRegExp(RegExp re, String dateTimeString) {
var match = re.firstMatch(dateTimeString);
if (match == null) {
throw FormatException('Failed to parse: $dateTimeString');
}
var year = match.namedGroup('year');
var month = match.namedGroup('month');
var day = match.namedGroup('day');
if (year == null || month == null || day == null) {
throw ArgumentError('Regular expression is malformed');
}
// In case we're parsing a two-digit year format, instead of
// parsing the strings ourselves, reparse it with [DateFormat] so that it can
// apply its -80/+20 rule.
//
// [DateFormat.parse] doesn't work without separators, which is why we
// can't use directly on the original string. See:
// https://github.com/dart-lang/intl/issues/210
return DateFormat('yy-MM-dd').parse('$year-$month-$day');
}
typedef DateParser = DateTime Function(String);
DateParser dateParserFromRegExp(String rePattern) =>
(string) => parseDateFromRegExp(RegExp(rePattern), string);
var parserList = [
DateFormat('yyyy-MM-dd').parse,
DateFormat('MMM-yyyy').parse,
DateFormat('MM/dd/yy').parse,
dateParserFromRegExp(
r'^(?<month>\d{2})(?<day>\d{2})(?<year>\d{4})$',
)
];
void main() {
var strcandidate = '12311776';
DateTime? originalFormat;
for (var tryParse in parserList) {
try {
originalFormat = tryParse(strcandidate);
break;
} on Exception {
// Try the next format.
}
}
print(originalFormat);
}

I think it's a bit hacky but what about use a regular expression (RegExp) to parse the date divider and then replace it with just ""?

void main() {
String strcandidate = "031623";
String strYear = strcandidate.substring(4);
//Taken 20 as the year like 2023 as year is in 2 digits
String _newDateTime = '20' + strYear + strcandidate.substring(0, 4);
var _originalFormat = DateTime.parse(_newDateTime);
print(_originalFormat);
}

add the intl to yaml then write this code:
import 'package:intl/intl.dart';
void main() {
var strcandidate = DateTime(2023, 3, 16);
String format = "MMddyy";
var originalFormat = DateFormat(format).format(strcandidate);
print(originalFormat);
}

Lexical Analyzer not getting the next character

So I am working on a project where we are making a small compiler program but before I can move on to the other parts I am having troubles with getting the lexical analyzer to output anything after '\BEGIN' afterwards I debugged it and it seems the value is stuck in a loop where the condition is saying the next character is always a newline. Is it because I haven't added the pattern matching yet to the defined tokens?
Here is the code
import java.util
//import com.sun.javafx.fxml.expression.Expression.Parser.Token
/*Lexical analyzer will be responsible for the following:
- finds the lexemes
- Checks each given character determining the tokens
* */
class MyLexicalAnalyzer extends LexicalAnalyzer {
//Array full of the keywords
//val SpecialCharacters = List(']', '#', '*', '+', '\\', '[', '(',')', "![", '=')
val TEXT = "[a-z] | _ | 0-9 | [A-Z]:"
private var sourceLine: String = null
private val lexeme: Array[Char] = new Array[Char](999)
private var nextChar: Char = 0
private var lexLength: Int = 0
private var position: Int = 0
private val lexems: util.List[String] = new util.ArrayList[String]
def start(line: String): Unit = {
initializeLexems()
sourceLine = line
position = 0
getChar()
getNextToken()
}
// A helper method to determine if the current character is a space.
private def isSpace(c: Char) = c == ' '
//Defined and intialized tokens
def initializeLexems(): Any = {
lexems.add("\\BEGIN")
lexems.add("\\END")
lexems.add("\\PARAB")
lexems.add("\\DEF[")
lexems.add("\\USE[")
lexems.add("\\PARAE")
lexems.add("\\TITLE[")
lexems.add("]")
lexems.add("[")
lexems.add("\\")
lexems.add("(")
lexems.add(")")
lexems.add("![")
lexems.add("=")
lexems.add("+")
lexems.add("#")
}
//val pattern = new regex("''").r
def getNextToken() ={
lexLength = 0
// Ignore spaces and add the first character to the token
getNonBlank()
addChar()
getChar()
// Continue gathering characters for the token
while ( {
(nextChar != '\n') && (nextChar != ' ')
}) {
addChar()
getChar()
}
// Convert the gathered character array token into a String
val newToken: String = new String(lexeme)
if (lookup(newToken.substring(0, lexLength)))
MyCompiler.setCurrentToken(newToken.substring(0,lexLength))
}
// A helper method to get the next non-blank character.
private def getNonBlank(): Unit = {
while ( {
isSpace(nextChar)
}) getChar()
}
/*
Method of function that adds the current character to the token
after checking to make sure that length of the token isn't too
long, a lexical error in this case.
*/
def addChar(){
if (lexLength <= 998) {
lexeme({
lexLength += 1; lexLength - 1
}) = nextChar
lexeme(lexLength) = 0
}
else
System.out.println("LEXICAL ERROR - The found lexeme is too long!")
if (!isSpace(nextChar))
while ( {
!isSpace(nextChar)
})
getChar()
lexLength = 0
getNonBlank()
addChar()
}
//Reading from the file its obtaining the tokens
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
def lookup(candidateToken: String): Boolean ={
if (!(lexems.contains(candidateToken))) {
System.out.println("LEXICAL ERROR - '" + candidateToken + "' is not recognized.")
return false
}
return true
}
}
else nextChar = '\n'<- this is where the condition goes after rendering the first character '\BEGIN' then just keeps outputting in the debug console as listed below.
This is what the debug console it outputting after '\BEGIN' is read through
Can anyone please let me know why that is? This happens after I keep stepping into it many times as well.
Here is the driver class that uses the lexical analyzer
import scala.io.Source
object MyCompiler {
//check the arguments
//check file extensions
//initialization
//get first token
//call start state
var currentToken : String = ""
def main(args: Array[String]): Unit = {
val filename = args(0)
//check if an input file provided
if(args.length == 0) {
//usage error
println("USAGE ERROR: Must provide an input file. ")
System.exit(0)
}
if(!checkFileExtension(args(0))) {
println("USAGE ERROR: Extension name is invalid make sure its .gtx ")
System.exit(0)
}
val Scanner = new MyLexicalAnalyzer
val Parser = new MySyntaxAnalyzer
//getCurrentToken(Scanner.getNextToken())
//Parser.gittex()
for (line <- Source.fromFile(filename).getLines()){
Scanner.start(line)
println()
}
//.......
//If it gets here, it is compiled
//post processing
}
//checks the file extension if valid and ends with .gtx
def checkFileExtension(filename : String) : Boolean = filename.endsWith(".gtx")
def getCurrentToken() : String = this.currentToken
def setCurrentToken(t : String ) : Unit = this.currentToken = t
}

The code is operating as it is supposed to. The first line contains only the string \BEGIN so the lexical analyser is treating the end of the first line as an '\n' as shown in this method:
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
However, the comment directly above that method does not describe what the method actually does. This could be a hint as to where your confusion lies. If the comment says it should read from the file, but it is not reading from the file, maybe that's what you've forgotten to implement.

Parsing text and representing it with tokens using Scala

I'm getting frustrated trying to convert a small part of the Golang templating language to Scala.
Below are the key parts of the lex.go source code: https://github.com/golang/go/blob/master/src/text/template/parse/lex.go
The tests are here: https://github.com/golang/go/blob/master/src/text/template/parse/lex_test.go
Basically this "class" takes a string and returns an Array of "itemType". In the template string, the start and end of special tokens is using curly braces {{ and }}.
For for example:
"{{for}}"
returns an array of 4 items:
item{itemLeftDelim, 0, "{{" } // scala case class would be Item(ItemLeftDelim, 0, "")
item{itemIdentifier, 0, "for"}
item{itemRightDelim, 0, "}}"}
item{itemEOF, 0, ""}
The actual call would look like:
l := lex("for", `{{for}}`, "{{", "}}") // you pass in the start and end delimeters {{ and }}
for {
item := l.nextItem()
items = append(items, item)
if item.typ == itemEOF || item.typ == itemError {
break
}
}
return
The key parts of the source code are below:
// itemType identifies the type of lex items.
type itemType int
const (
itemError itemType = iota // error occurred; value is text of error
itemEOF
itemLeftDelim // left action delimiter
// .............. skipped
)
const (
leftDelim = "{{"
rightDelim = "}}"
leftComment = "/*"
rightComment = "*/"
)
// item represents a token or text string returned from the scanner.
type item struct {
typ itemType // The type of this item.
pos Pos // The starting position, in bytes, of this item in the input string.
val string // The value of this item.
}
// stateFn represents the state of the scanner as a function that returns the next state.
type stateFn func(*lexer) stateFn
// lexer holds the state of the scanner.
type lexer struct {
name string // the name of the input; used only for error reports
input string // the string being scanned
leftDelim string // start of action
rightDelim string // end of action
state stateFn // the next lexing function to enter
pos Pos // current position in the input
start Pos // start position of this item
width Pos // width of last rune read from input
lastPos Pos // position of most recent item returned by nextItem
items chan item // channel of scanned items
parenDepth int // nesting depth of ( ) exprs
}
// lex creates a new scanner for the input string.
func lex(name, input, left, right string) *lexer {
if left == "" {
left = leftDelim
}
if right == "" {
right = rightDelim
}
l := &lexer{
name: name,
input: input,
leftDelim: left,
rightDelim: right,
items: make(chan item),
}
go l.run()
return l
}
// run runs the state machine for the lexer.
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
// nextItem returns the next item from the input.
func (l *lexer) nextItem() item {
item := <-l.items
l.lastPos = item.pos
return item
}
// emit passes an item back to the client.
func (l *lexer) emit(t itemType) {
l.items <- item{t, l.start, l.input[l.start:l.pos]}
l.start = l.pos
}
// lexText scans until an opening action delimiter, "{{".
func lexText(l *lexer) stateFn {
for {
if strings.HasPrefix(l.input[l.pos:], l.leftDelim) {
if l.pos > l.start {
l.emit(itemText)
}
return lexLeftDelim
}
if l.next() == eof {
break
}
}
// Correctly reached EOF.
if l.pos > l.start {
l.emit(itemText)
}
l.emit(itemEOF)
return nil
}
// next returns the next rune in the input.
func (l *lexer) next() rune {
if int(l.pos) >= len(l.input) {
l.width = 0
return eof
}
r, w := utf8.DecodeRuneInString(l.input[l.pos:])
l.width = Pos(w)
l.pos += l.width
return r
}
// lexLeftDelim scans the left delimiter, which is known to be present.
func lexLeftDelim(l *lexer) stateFn {
l.pos += Pos(len(l.leftDelim))
if strings.HasPrefix(l.input[l.pos:], leftComment) {
return lexComment
}
l.emit(itemLeftDelim)
l.parenDepth = 0
return lexInsideAction
}
// lexRightDelim scans the right delimiter, which is known to be present.
func lexRightDelim(l *lexer) stateFn {
l.pos += Pos(len(l.rightDelim))
l.emit(itemRightDelim)
return lexText
}
// there are more stateFn
So I was able to write the item and itemType:
case class Item(typ: ItemType, pos: Int, v: String)
sealed trait ItemType
case object ItemError extends ItemType
case object ItemEOF extends ItemType
case object ItemLeftDelim extends ItemType
...
..
.
The stateFn and Lex definitions:
trait StateFn extends (Lexer => StateFn) {
}
I'm basically really stuck on the main parts here. So things seem to be kicked of like this:
A Lex is created, then "go l.run()" is called.
Run is a loop, which keeps looping until EOF or an error is found.
The loop initializes with lexText, which scans until it finds an {{, and then it sends a message to a channel with all the preceding text of type 'itemText', passing it an 'item'. It then returns the function lexLeftDelim. lexLeftDelim does the same sort of thing, it sends a message 'item' of type itemLeftDelim.
It keeps parsing the string until it reaches EOF basically.
I can't think in scala that well, but I know I can use an Actor here to pass it a message 'Item'.
The part of returning a function, I asked I got some good ideas here: How to model recursive function types?
Even after this, I am really frustrated and I can seem to glue these concepts together.
I'm not looking for someone to implement the entire thing for me, but if someone could write just enough code to parse a simple string like "{{}}" that would be awesome. And if they could explain why they did a certain design that would be great.
I created a case class for Lex:
case class Lex(
name: String,
input: String,
leftDelim: String,
rightDelim: String,
state: StateFn,
var pos: Int = 0,
var start: Int = 0,
var width: Int = 0,
var lastPos: Int = 0,
var parenDepth: Int = 0
) {
def next(): Option[String] = {
if (this.pos >= this.input.length) {
this.width = 0
return None
}
this.width = 1
val nextChar = this.input.drop(this.pos).take(1)
this.pos += 1
Some(nextChar)
}
}
The first stateFn is LexText and so far I have:
object LexText extends StateFn {
def apply(l: Lexer) = {
while {
if (l.input.startsWith(l.leftDelim)) {
if (l.pos > l.start) {
// ????????? emit itemText using an actor?
}
return LexLeftDelim
}
if (l.next() == None) {
break
}
}
if(l.pos > l.start) {
// emit itemText
}
// emit EOF
return None // ?? nil? how can I support an Option[StateFn]
}
}
I need guidance on getting the Actor's setup, along with the main run loop:
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
This is an interesting problem domain that I tried to tackle using Scala, and so far I am a bit confused hoping some else finds it interesting and can work with what little I have so far and provide some code and critique if I am doing it correctly or not.
I know deep down I shouldn't be mutating, but I'm still on the first few pages of the functional book :)

If you translate the go code literally into Scala, you'll get very unidiomatic piece of code. You'll probably get much more maintainable (and shorter!) Scala version by using parser combinators. There are plenty of resources about them on the internet.
import scala.util.parsing.combinator._
sealed trait ItemType
case object LeftDelim extends ItemType
case object RightDelim extends ItemType
case object Identifier extends ItemType
case class Item(ty: ItemType, token: String)
object ItemParser extends RegexParsers {
def left: Parser[Item] = """\{\{""".r ^^ { _ => Item(LeftDelim, "{{") }
def right: Parser[Item] = """\}\}""".r ^^ { _ => Item(RightDelim, "}}") }
def ident: Parser[Item] = """[a-z]+""".r ^^ { x => Item(Identifier, x) }
def item: Parser[Item] = left | right | ident
def items: Parser[List[Item]] = rep(item)
}
// ItemParser.parse(ItemParser.items, "{{foo}}")
// res5: ItemParser.ParseResult[List[Item]] =
// [1.8] parsed: List(Item(LeftDelim,{{), Item(Identifier,foo), Item(RightDelim,}}))
Adding whitespace skipping, or configurable left and right delimiters is trivial.

Pick up relevant information from a string using regular expression C#3.0

I have been given some file name which can be like
<filename>YYYYMMDD<fileextension>
some valid file names that will satisfy the above pattern are as under
xxx20100326.xls,
xxx2v20100326.csv,
x_20100326.xls,
xy2z_abc_20100326_xyz.csv,
abc.xyz.20100326.doc,
ab2.v.20100326.doc,
abc.v.20100326_xyz.xls
In what ever be the above defined case, I need to pick up the dates only. So for all the cases, the output will be 20100326.
I am trying to achieve the same but no luck.
Here is what I have done so far
string testdata = "x2v20100326.csv";
string strYYYY = #"\d{4}";
string strMM = #"(1[0-2]|0[1-9])";
string strDD = #"(3[0-1]|[1-2][0-9]|0[1-9])";
string regExPattern = #"\A" + strYYYY + strMM + strDD + #"\Z";
Regex regex = new Regex(regExPattern);
Match match = regex.Match(testdata);
if (match.Success)
{
string result = match.Groups[0].Value;
}
I am using c#3.0 and dotnet framework 3.5
Please help. It is very urgent
Thanks in advance.

Try this:
DateTime result = DateTime.MinValue;
System.Globalization.CultureInfo provider = System.Globalization.CultureInfo.InvariantCulture;
var testString = "x2v20100326.csv";
var format = "yyyyMMdd";
try
{
for (int i = 0; i < testString.Length - format.Length; i++)
{
if (DateTime.TryParseExact(testString.Substring(i, format.Length), format, provider, System.Globalization.DateTimeStyles.None, out result))
{
Console.WriteLine("{0} converts to {1}.", testString, result.ToString());
break;
}
}
}
catch (FormatException)
{
Console.WriteLine("{0} is not in the correct format.", testString);
}

This one fetches the last date in the string.
var re = new Regex("(?<date>[0-9]{8})");
var test = "asdf_wef_20100615_sdf.csv";
var datevalue = re.Match(test).Groups["date"].Value;
Console.WriteLine(datevalue); // prints 20100615

Characters \A and \Z - begin and end of the string respectivly.
I think you need pattern like:
string regExPattern = #"\A.*(?<FullDate>" + strYYYY + strMM + strDD + #").*\..*\Z";
".*" - any symbols at the begin
".*\..*" - any symbols before the dot, dot and any symbols after dot
And get a full date:
match.Groups["FullDate"]

You may need to group things in your month and day expressions:
((1[0-2])|(0[1-9))
((3[0-1])|([1-2][0-9])|(0[1-9]))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Composite grammars with ANTLR - import

Related

Getting substring of a string in flutter that contain and emoji can cause a render failure

Neatly parsing a date in "MMddyy" format along with other formats in dart

Lexical Analyzer not getting the next character

Parsing text and representing it with tokens using Scala

Pick up relevant information from a string using regular expression C#3.0

Categories

Resources