Lexical Analyzer not getting the next character - scala

So I am working on a project where we are making a small compiler program but before I can move on to the other parts I am having troubles with getting the lexical analyzer to output anything after '\BEGIN' afterwards I debugged it and it seems the value is stuck in a loop where the condition is saying the next character is always a newline. Is it because I haven't added the pattern matching yet to the defined tokens?
Here is the code
import java.util
//import com.sun.javafx.fxml.expression.Expression.Parser.Token
/*Lexical analyzer will be responsible for the following:
- finds the lexemes
- Checks each given character determining the tokens
* */
class MyLexicalAnalyzer extends LexicalAnalyzer {
//Array full of the keywords
//val SpecialCharacters = List(']', '#', '*', '+', '\\', '[', '(',')', "![", '=')
val TEXT = "[a-z] | _ | 0-9 | [A-Z]:"
private var sourceLine: String = null
private val lexeme: Array[Char] = new Array[Char](999)
private var nextChar: Char = 0
private var lexLength: Int = 0
private var position: Int = 0
private val lexems: util.List[String] = new util.ArrayList[String]
def start(line: String): Unit = {
initializeLexems()
sourceLine = line
position = 0
getChar()
getNextToken()
}
// A helper method to determine if the current character is a space.
private def isSpace(c: Char) = c == ' '
//Defined and intialized tokens
def initializeLexems(): Any = {
lexems.add("\\BEGIN")
lexems.add("\\END")
lexems.add("\\PARAB")
lexems.add("\\DEF[")
lexems.add("\\USE[")
lexems.add("\\PARAE")
lexems.add("\\TITLE[")
lexems.add("]")
lexems.add("[")
lexems.add("\\")
lexems.add("(")
lexems.add(")")
lexems.add("![")
lexems.add("=")
lexems.add("+")
lexems.add("#")
}
//val pattern = new regex("''").r
def getNextToken() ={
lexLength = 0
// Ignore spaces and add the first character to the token
getNonBlank()
addChar()
getChar()
// Continue gathering characters for the token
while ( {
(nextChar != '\n') && (nextChar != ' ')
}) {
addChar()
getChar()
}
// Convert the gathered character array token into a String
val newToken: String = new String(lexeme)
if (lookup(newToken.substring(0, lexLength)))
MyCompiler.setCurrentToken(newToken.substring(0,lexLength))
}
// A helper method to get the next non-blank character.
private def getNonBlank(): Unit = {
while ( {
isSpace(nextChar)
}) getChar()
}
/*
Method of function that adds the current character to the token
after checking to make sure that length of the token isn't too
long, a lexical error in this case.
*/
def addChar(){
if (lexLength <= 998) {
lexeme({
lexLength += 1; lexLength - 1
}) = nextChar
lexeme(lexLength) = 0
}
else
System.out.println("LEXICAL ERROR - The found lexeme is too long!")
if (!isSpace(nextChar))
while ( {
!isSpace(nextChar)
})
getChar()
lexLength = 0
getNonBlank()
addChar()
}
//Reading from the file its obtaining the tokens
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
def lookup(candidateToken: String): Boolean ={
if (!(lexems.contains(candidateToken))) {
System.out.println("LEXICAL ERROR - '" + candidateToken + "' is not recognized.")
return false
}
return true
}
}
else nextChar = '\n'<- this is where the condition goes after rendering the first character '\BEGIN' then just keeps outputting in the debug console as listed below.
This is what the debug console it outputting after '\BEGIN' is read through
Can anyone please let me know why that is? This happens after I keep stepping into it many times as well.
Here is the driver class that uses the lexical analyzer
import scala.io.Source
object MyCompiler {
//check the arguments
//check file extensions
//initialization
//get first token
//call start state
var currentToken : String = ""
def main(args: Array[String]): Unit = {
val filename = args(0)
//check if an input file provided
if(args.length == 0) {
//usage error
println("USAGE ERROR: Must provide an input file. ")
System.exit(0)
}
if(!checkFileExtension(args(0))) {
println("USAGE ERROR: Extension name is invalid make sure its .gtx ")
System.exit(0)
}
val Scanner = new MyLexicalAnalyzer
val Parser = new MySyntaxAnalyzer
//getCurrentToken(Scanner.getNextToken())
//Parser.gittex()
for (line <- Source.fromFile(filename).getLines()){
Scanner.start(line)
println()
}
//.......
//If it gets here, it is compiled
//post processing
}
//checks the file extension if valid and ends with .gtx
def checkFileExtension(filename : String) : Boolean = filename.endsWith(".gtx")
def getCurrentToken() : String = this.currentToken
def setCurrentToken(t : String ) : Unit = this.currentToken = t
}

The code is operating as it is supposed to. The first line contains only the string \BEGIN so the lexical analyser is treating the end of the first line as an '\n' as shown in this method:
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
However, the comment directly above that method does not describe what the method actually does. This could be a hint as to where your confusion lies. If the comment says it should read from the file, but it is not reading from the file, maybe that's what you've forgotten to implement.

Related

Simple Scala Function Not Removing Double Spaces

For some reason when I input a string with double spaces such as " ", the function does not remove them from the string, nor does it remove them when they are generated by two WUB's in a row
For example:
songDecoder("WUBCATWUBWUBBALLWUB") outputs "CAT_ _BALL" (underscores represent spaces)
I could fix this by other means, but since I have no idea why my current code isn't working I figured I should ask to patch my understanding.
def songDecoder(song:String):String = {
val l = song.indexOf("WUB")
if (song.contains(" ")) {
val e = song.indexOf(" ")
songDecoder(song.patch(e,Nil,1))
}
if (l==0) {
val c = song.patch(l,Nil,3)
songDecoder(c)
}
if (l== -1)
song.trim
else {
val c = song.patch(l,Nil,2)
val b = c.patch(l," ",1)
songDecoder(b)
}
}
The reason it doesn't work is because when you call a recursive method it eventually returns with its result. The code that clears out the double-whitespace doesn't save that result.
if (song.contains(" ")) {
val e = song.indexOf(" ")
songDecoder(song.patch(e,Nil,1)) //send patched song to decoder
} //don't save returned string
//continue with unpatched song
The 2nd if block also recurses without saving the result.
if (l==0) {
val c = song.patch(l,Nil,3)
songDecoder(c) //send patched song to decoder
} //don't save returned string
//continue with unpatched song
You can remove both of those if blocks and you'll get the same results from your method. The only code that effects the output is the final if/else and that's because it is at the end of the method's code block. So whatever the if/else produces that's what the method returns.
if (l== -1)
song.trim //return the final result string
else {
val c = song.patch(l,Nil,2) //remove one WUB
val b = c.patch(l," ",1) //replace with space
songDecoder(b) //return whatever the next recursion returns
}
Just as an FYI, here's a different approach.
def songDecoder(song:String):String =
"(WUB)+".r.replaceAllIn(song, " ").trim
How about something like:
song.split(“(WUB)+”).mkString(“ “).trim

How to return a variable in a function in kotlin

I created a function that recieves input and compare it to a list, when find a match it return the match, in this case this match is the attribute of a class that i created.
I understand that the problem is with the return statement, so in the beginning of the function I declare the return as "Any", further more than that I'm kinda lost.
The error is this: A 'return' expression required in a function with a block body ('{...}')
class Class1(var self: String)
var test_class = Class1("")
fun giver(){
test_class.self = "Anything"
}
class Funciones(){
fun match_finder(texto: String): Any{
var lista = listOf<String>(test_class.self)
var lista_de_listas = listOf<String>("test_class.self")
var count = -1
for (i in lista_de_listas){
count = count + 1
if (texto == i){
lista_de_listas = lista
var variable = lista_de_listas[count]
return variable
}
}
}
}
fun main(){
giver()
var x = "test_class.self"
var funcion = Funciones()
var y = funcion.match_finder(x)
println(y)
}
To explain you what the problem is, let's consider the following code:
class MyClass {
fun doSomething(): String {
val numbers = listOf(1, 2, 3)
for (number in numbers) {
if (number % 2 == 0) {
return "There is at least one even number in the list"
}
}
}
}
If you try compiling it you'll get the same error message as in your question: A 'return' expression required in a function with a block body ('{...}'). Why is that?
Well, we defined a function doSomething returning a String (it could be any other type) but we're returning a result only if the list of numbers contains at least one even number. What should it return if there's no even number? The compiler doesn't know that (how could it know?), so it prompts us that message. We can fix the code by returning a value or by throwing an exception:
class MyClass {
fun doSomething(): String {
val numbers = listOf(1, 2, 3)
for (number in numbers) {
if (number % 2 == 0) {
return "There is at least one even number in the list"
}
}
// return something if the list doesn't contain any even number
return "There is no even number in the list"
}
}
The same logic applies to your original code: what should the function return if there is no i such that texto == i?
Please also note that the solution you proposed may be syntactically correct - meaning it compiles correctly - but will probably do something unexpected. The for loop is useless since the if/else statement will always cause the function to return during the first iteration, so the value "There is no match" could be returned even if a match actually exists later in the list.
I searched online, if someone has the same problem, the correct code is as follows:
class Funciones(){
fun match_finder(texto: String): Any{
var lista = listOf<String>(test_class.self)
var lista_de_listas = listOf<String>("test_class.self")
var count = -1
var variable = " "
for (i in lista_de_listas){
count = count + 1
if (texto == i){
lista_de_listas = lista
var variable = lista_de_listas[count]
return variable
} else {
return "There is no match"
}
}
return variable
}
}

Read first element and next element from a file SCALA

I am reading a file that contain 277272 lines with Int triples (s,p,o) like:
10,44,22
10,47,12
15,38,3
15,41,30
16,38,5
16,44,15
16,47,18
22,38,21
22,41,42
34,44,40
34,47,36
40,38,39
40,41,42
45,38,27
45,41,30
46,44,45
46,47,48
From this file I create a Random access file object in order to navigate trough this file. However I want to extract some specific values that are in the first column, an example can be that I want to extract the values of the row that contains 16 in the first column, then I choose a pointer that is in the half, something like:
var lengthfile = (file.length().asInstanceOf[Int])
var initpointer = lengthfile/2
Then I analize if the first value is 16, if not I did a procedure to move the pointer to the nextlines, or as in this case in the back lines. Once I detect that the first value is 16, I need to know if it was in the first row, the sceond or the last one. The functions that I present here are to get the first value of the line where I have the pointer, and to know the first value from the next line.
def firstvalue(pf: Int, file:RandomAccessFile): List[Int] ={
//val file = new RandomAccessFile(filename, "r")
var pointer = pf
var flag = true
var fline = Option("a")
if (pointer <= file.length()-1){
file.seek(pointer)
fline = Option(file.readLine)
}
else {
pointer = file.length().toInt-12
file.seek(pointer)
fline = Option(file.readLine)
}
while (flag)
{
if (fline.get != "")
{
if (pointer == 0)
{
file.seek(pointer)
fline = Option(file.readLine)
pointer -= 1
flag = false
}
else{
pointer -= 1
file.seek(pointer)
fline = Option(file.readLine)
}
}
else if (fline.get == ""){
flag = false
}
}
pointer += 1
file.seek(pointer)
val line = Option(file.readLine)
val spl = line.get.split(',')
val p = spl.apply(0).toInt
//file.close()
val l = pointer :: p :: Nil
l
}
//def nextvalue(pf: Int, filename:String): List[Int] = {
//val file = new RandomAccessFile(filename, "r")
def nextvalue(pf: Int, file:RandomAccessFile): List[Int] = {
//val file = new RandomAccessFile(filename, "r")
var pointer = pf
var p = 0
var flag = true
var lastline = false
var fline = Option ("a")
if (pointer <= file.length()-1){
file.seek(pointer)
fline = Option(file.readLine)
}
//fline = Option(file.readLine)
while (flag){
if (fline.get != "")
{
if (fline == None)
{
flag = false
lastline = true
}
else{
pointer = file.getFilePointer.toInt
fline = Option(file.readLine)
flag = false
}
}
else if (fline.get == ""){
fline = Option(file.readLine)
flag = false
}
}
if (lastline == false)
{
//file.close()
if (fline != None){
val spl = fline.get.split(',')
p = spl.apply(0).toInt
}
}
val l = pointer :: p :: Nil
l
}
However I have a prformance problem, because I am reading character by character, I am trying to fix that during a lot of days, and I don't have a solution. I don't know If perhaps the file object have a function to read back lines, or something that allows to me improve this code? How can I improve this code?
Source
.fromFile(file)
.getLines
.zipWithIndex
.filter(_._1.startsWith("16,"))
Will give you all lines, that start with "16", along with their indices in the file. That should be faster then seeking back and forth ...

Changing an attribute in an object that belongs to RDD

I have the following code :
def generateStoriesnew(outputPath: String, groupedRDD:RDD[(String,Iterable[String])], isInChurnMode: Boolean, isInChurnPeriod: Boolean) {
val windowedRDD = groupedRDD.map(SOME CODE)
var windowedRDD2 = windowedRDD.filter(r => r != null).map(a=>a.churnPeriod(isInChurnPeriod,isInChurnMode))
val prettyStringRDD = windowedRDD2.map(r => {
r.toString
})
prettyStringRDD.saveAsTextFile(outputPath)
}
and here is the code for ChurnPriod function:
def churnPeriod( churnPeriod:Boolean, churnMode: Boolean): Unit = {
if (churnMode && rootEventType.equalsIgnoreCase("c")){
var churnCustStory: CustStoryN = null
var nonChurnCustStory: CustStoryN = null
var churnPeriodEventStory: mutable.MutableList[StoryEventN] = null
var NonChurnEventstory: mutable.MutableList[StoryEventN] = null
churnPeriodEventStory = new mutable.MutableList[StoryEventN]
NonChurnEventstory = new mutable.MutableList[StoryEventN]
var lastEventChurnPeriod = true
var currentEventStory = eventStory
var max = currentEventStory.length
println(max);
if (currentEventStory.size > 0) {
for (i <- 0 until currentEventStory.length) {
var currentEvent = currentEventStory(i)
if (currentEvent.timeSenseRootEvent < 90) {
churnPeriodEventStory.+=(currentEvent)
//lastEventChurnPeriod = true
}
else {
NonChurnEventstory.+=(currentEvent)
lastEventChurnPeriod = false
}
}
}
if (churnPeriod)
eventStory = churnPeriodEventStory
else
eventStory=null
}
}
but churn period function does not change eventstory which is a member of a custstory class. what am I missing here ?
class CustStoryN (val custId:String,
var rootEventType:String,
var rootEventTime:Long,
var eventStory:mutable.MutableList[StoryEventN])
my hypothesis is either:
1.map is not the right transformation for the function that I have
2.churnPeriod function never get called
3.I can not change eventstory which is a member of cust story class
Does anyone have any idea how I can troubleshoot this problem?
This would be trivial to determine by debugging. Just put a few breakpoints and you can see if the program stops inside the function, and what transformations do occur.
My guess would be that the problem is this line: if (currentEventStory.size > 0), thus, a list that starts at size 0 remains at size 0 forever. Another option is that churnPeriod is never true, thus, you compute a lot but never assign to the eventStory variable.
Your code does need a good cleanup ;-)

Parsing text and representing it with tokens using Scala

I'm getting frustrated trying to convert a small part of the Golang templating language to Scala.
Below are the key parts of the lex.go source code: https://github.com/golang/go/blob/master/src/text/template/parse/lex.go
The tests are here: https://github.com/golang/go/blob/master/src/text/template/parse/lex_test.go
Basically this "class" takes a string and returns an Array of "itemType". In the template string, the start and end of special tokens is using curly braces {{ and }}.
For for example:
"{{for}}"
returns an array of 4 items:
item{itemLeftDelim, 0, "{{" } // scala case class would be Item(ItemLeftDelim, 0, "")
item{itemIdentifier, 0, "for"}
item{itemRightDelim, 0, "}}"}
item{itemEOF, 0, ""}
The actual call would look like:
l := lex("for", `{{for}}`, "{{", "}}") // you pass in the start and end delimeters {{ and }}
for {
item := l.nextItem()
items = append(items, item)
if item.typ == itemEOF || item.typ == itemError {
break
}
}
return
The key parts of the source code are below:
// itemType identifies the type of lex items.
type itemType int
const (
itemError itemType = iota // error occurred; value is text of error
itemEOF
itemLeftDelim // left action delimiter
// .............. skipped
)
const (
leftDelim = "{{"
rightDelim = "}}"
leftComment = "/*"
rightComment = "*/"
)
// item represents a token or text string returned from the scanner.
type item struct {
typ itemType // The type of this item.
pos Pos // The starting position, in bytes, of this item in the input string.
val string // The value of this item.
}
// stateFn represents the state of the scanner as a function that returns the next state.
type stateFn func(*lexer) stateFn
// lexer holds the state of the scanner.
type lexer struct {
name string // the name of the input; used only for error reports
input string // the string being scanned
leftDelim string // start of action
rightDelim string // end of action
state stateFn // the next lexing function to enter
pos Pos // current position in the input
start Pos // start position of this item
width Pos // width of last rune read from input
lastPos Pos // position of most recent item returned by nextItem
items chan item // channel of scanned items
parenDepth int // nesting depth of ( ) exprs
}
// lex creates a new scanner for the input string.
func lex(name, input, left, right string) *lexer {
if left == "" {
left = leftDelim
}
if right == "" {
right = rightDelim
}
l := &lexer{
name: name,
input: input,
leftDelim: left,
rightDelim: right,
items: make(chan item),
}
go l.run()
return l
}
// run runs the state machine for the lexer.
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
// nextItem returns the next item from the input.
func (l *lexer) nextItem() item {
item := <-l.items
l.lastPos = item.pos
return item
}
// emit passes an item back to the client.
func (l *lexer) emit(t itemType) {
l.items <- item{t, l.start, l.input[l.start:l.pos]}
l.start = l.pos
}
// lexText scans until an opening action delimiter, "{{".
func lexText(l *lexer) stateFn {
for {
if strings.HasPrefix(l.input[l.pos:], l.leftDelim) {
if l.pos > l.start {
l.emit(itemText)
}
return lexLeftDelim
}
if l.next() == eof {
break
}
}
// Correctly reached EOF.
if l.pos > l.start {
l.emit(itemText)
}
l.emit(itemEOF)
return nil
}
// next returns the next rune in the input.
func (l *lexer) next() rune {
if int(l.pos) >= len(l.input) {
l.width = 0
return eof
}
r, w := utf8.DecodeRuneInString(l.input[l.pos:])
l.width = Pos(w)
l.pos += l.width
return r
}
// lexLeftDelim scans the left delimiter, which is known to be present.
func lexLeftDelim(l *lexer) stateFn {
l.pos += Pos(len(l.leftDelim))
if strings.HasPrefix(l.input[l.pos:], leftComment) {
return lexComment
}
l.emit(itemLeftDelim)
l.parenDepth = 0
return lexInsideAction
}
// lexRightDelim scans the right delimiter, which is known to be present.
func lexRightDelim(l *lexer) stateFn {
l.pos += Pos(len(l.rightDelim))
l.emit(itemRightDelim)
return lexText
}
// there are more stateFn
So I was able to write the item and itemType:
case class Item(typ: ItemType, pos: Int, v: String)
sealed trait ItemType
case object ItemError extends ItemType
case object ItemEOF extends ItemType
case object ItemLeftDelim extends ItemType
...
..
.
The stateFn and Lex definitions:
trait StateFn extends (Lexer => StateFn) {
}
I'm basically really stuck on the main parts here. So things seem to be kicked of like this:
A Lex is created, then "go l.run()" is called.
Run is a loop, which keeps looping until EOF or an error is found.
The loop initializes with lexText, which scans until it finds an {{, and then it sends a message to a channel with all the preceding text of type 'itemText', passing it an 'item'. It then returns the function lexLeftDelim. lexLeftDelim does the same sort of thing, it sends a message 'item' of type itemLeftDelim.
It keeps parsing the string until it reaches EOF basically.
I can't think in scala that well, but I know I can use an Actor here to pass it a message 'Item'.
The part of returning a function, I asked I got some good ideas here: How to model recursive function types?
Even after this, I am really frustrated and I can seem to glue these concepts together.
I'm not looking for someone to implement the entire thing for me, but if someone could write just enough code to parse a simple string like "{{}}" that would be awesome. And if they could explain why they did a certain design that would be great.
I created a case class for Lex:
case class Lex(
name: String,
input: String,
leftDelim: String,
rightDelim: String,
state: StateFn,
var pos: Int = 0,
var start: Int = 0,
var width: Int = 0,
var lastPos: Int = 0,
var parenDepth: Int = 0
) {
def next(): Option[String] = {
if (this.pos >= this.input.length) {
this.width = 0
return None
}
this.width = 1
val nextChar = this.input.drop(this.pos).take(1)
this.pos += 1
Some(nextChar)
}
}
The first stateFn is LexText and so far I have:
object LexText extends StateFn {
def apply(l: Lexer) = {
while {
if (l.input.startsWith(l.leftDelim)) {
if (l.pos > l.start) {
// ????????? emit itemText using an actor?
}
return LexLeftDelim
}
if (l.next() == None) {
break
}
}
if(l.pos > l.start) {
// emit itemText
}
// emit EOF
return None // ?? nil? how can I support an Option[StateFn]
}
}
I need guidance on getting the Actor's setup, along with the main run loop:
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
This is an interesting problem domain that I tried to tackle using Scala, and so far I am a bit confused hoping some else finds it interesting and can work with what little I have so far and provide some code and critique if I am doing it correctly or not.
I know deep down I shouldn't be mutating, but I'm still on the first few pages of the functional book :)
If you translate the go code literally into Scala, you'll get very unidiomatic piece of code. You'll probably get much more maintainable (and shorter!) Scala version by using parser combinators. There are plenty of resources about them on the internet.
import scala.util.parsing.combinator._
sealed trait ItemType
case object LeftDelim extends ItemType
case object RightDelim extends ItemType
case object Identifier extends ItemType
case class Item(ty: ItemType, token: String)
object ItemParser extends RegexParsers {
def left: Parser[Item] = """\{\{""".r ^^ { _ => Item(LeftDelim, "{{") }
def right: Parser[Item] = """\}\}""".r ^^ { _ => Item(RightDelim, "}}") }
def ident: Parser[Item] = """[a-z]+""".r ^^ { x => Item(Identifier, x) }
def item: Parser[Item] = left | right | ident
def items: Parser[List[Item]] = rep(item)
}
// ItemParser.parse(ItemParser.items, "{{foo}}")
// res5: ItemParser.ParseResult[List[Item]] =
// [1.8] parsed: List(Item(LeftDelim,{{), Item(Identifier,foo), Item(RightDelim,}}))
Adding whitespace skipping, or configurable left and right delimiters is trivial.