Parboiled2: reference to position in source text from AST - scala

I am writing a DSL, and learning parboiled2, at the same time. Once my AST is built, I would like to run some semantic checks and, if there are any errors, output error messages that reference the offending positions in the source text.
I am writing things like the following, which, so far, do work:
case class CtxElem[A](start:Int, end:Int, elem:A)
def Identifier = rule {
push(cursor) ~
capture(Alpha ~ zeroOrMore(AlphaNum)) ~
push(cursor) ~
WhiteSpace
~> ((start, identifier, finish) => CtxElem(start, finish, identifier))
}
Is there a better or simpler way?

Parboiled 2 (for now) doesn't support parser recovery strategies. It means that if parser will fail - it will stop. As far as I remember it should print the symbol where it failed, or at least you could get the cursor
So if you're trying to build your own DSL and you need that kind of functionality, I would propose you to use a different tool like ANTLR. Parboiled1 supports parser recovery techniques, but for now it's dead in buried if favor of support for the second version. Parboiled 2 is good in parsing of log files or configuration files (that should correct by default), but it's not good for building DSLs.

Related

Initializing the factory at compile time

I have a factory that should return an implementation depending on the name.
val moduleMap = Map(Modules.moduleName -> new ModuleImpl)
def getModule(moduleName: String): Module =
moduleMap.get(moduleName) match {
case Some(m) => m
case _ =>
throw new ModuleNotFoundException(
s"$moduleName - Module could not be found.")
}
In order for each call to the "getModule" method not to create an instance, there is a map in which all the modules must be initialized in bootstrap class.
I would like to get rid of the need to do this manually(also all classes have a distinctive feature).
List of options that came to my mind:
Reflection(we can use Scala Reflection API or any thrid-party
library)
Automated process.
Need to initialize immediately at startup.
Reflection is a pain.
Metaprogramming(ScalaMeta) + Reflection
Macros only change the code, the execution happens later.
Can we move initialization process to compile time?
I know that compiler can optimize and replace code, next fragment before compilation
val a = 5 + 5
after compilation compiler change that piece to 10, can we use some directives or another tools to evaluate and execute some code at compile time and use only final value?
Do you use any framework or you write your own? I answered similar question about Guice here. You can use it without Guice as well: instead of Module you will have your Factory, which you need to initialize from somewhere, and during initialization, you will fill your map using reflection
In general I think it is the easiest approach. Alternatively, you can write macros, which just replaces part of reflective initialization, but not sure that it will give you some profit (if I understand your question right, this initialization will happen just once at startup).
I do not see how scalameta can help you? Probably, only in case if all your implementations are in source tree available to you, so you can analyze it and generate initialization (similar to macros)? Probably, this would add such plus as easier search for implementation, but will add minus: will work only on implementations in your sources.
Your example of compile-time optimization is not applicable. In your example, you talk about compile-time constant (even with arithmetic it could be a problem, see this comment), but in your question you need specific run-time behavior. So compile time could be only code generation from macros or based on scalameta from my point of view.

Scala compiler output after cleanup phase

I would like to develop a tool that post-processes a scala program once all the heavy lifting has been completed by the Scala compiler. From what I understand the different phases of the Scala compiler incrementally simplify the program in terms of its syntactic sugars and advanced features like lambdas, closures, pattern-matching etc. However, I notice that what comes out of the so-called cleanup phase - which is the last phase before code-generation - looks like scala but it is not really scala.
Does anyone know personally or can point me to a resource that can help me understand the language that comes out of the cleanup phase ?
To give you an example, in the output of the cleanup phase I see things like:
case <synthetic> val x1: Foo$Bar = l;
case9(){
if (...some condition...)
matchEnd8(scala.Predef.Set().empty())
else
case10()
};
My hypothesis is that this is the result of translating pattern matching but it does not look like valid scala syntax as far as I understand (I am not an experienced Scala developer at all!).
I guess it all comes down to this: is it possible to convert the output of the cleanup phase to valid - compilable - scala code in general ?
In general, at any stage in the scalac compiler (even right after parsing), the internal representation used by the compiler is not valid Scala code anymore. That is essentially because of the existence of labels and gotos, which you discovered.
A structure of the form
labelName(...params){
...
}
is a label definition, and a call of the form
labelName(...args)
is a jump to that label, assigning the ...args to the ...params.
Labels and gotos are used by scalac (and dotc, but with a different representation) to represent while and do..while loops (immediately after parsing), the translation of matches and the tail-recursive-optimized functions.
In general, there is no way to go back from the internal representation to valid Scala code, especially so far in the pipeline as after cleanup.

Scala Anorm - how use it properly

Scala's play framework claims that Anorm, and writing your own SQL is better that ORM's. One of the reasons is that you anyway most often want only transfer data between database and frontend as json. However, most tutorials, and even Play documentation give examples of parsing sql's returned values into case classes, in order to parse it again into json. We still have an object relational mapping anyway, or am I missing a point?
In my database there exists a table with 33 columns. Declaring a case class takes me 33 lines, declaring a parser with ~ operator, takes another 33. Using case statement to create an Object, another 66! Seriously, what am I doing wrong? Is there any shortcut? In django the same thing takes only 33 lines.
If you're using Anorm within a Play application, then the mapping into a Json object of your case class (assuming it has fairly normal apply and unapply functions defined for it, which most do) should be pretty much as simple as defining an implicit which uses the >2.10 macro based Json-inception methods...so all you actually need is a definition like this:
implicit val myCaseFormats = Json.format[MyCaseClass]
where 'MyCaseClass' is the name of your case type. You could even bake this into the parser combinator you use for de-serialising row-sets back from the database...that would dramatically clean up your code and cut down the amount of code you have to write.
See here for details on the Json macros:
https://www.playframework.com/documentation/2.1.1/ScalaJsonInception
I use this quite extensively in a pretty large code-base and it does make things quite clean.
In terms of your parsers for Anorm, remember that you don't have to produce a case-class instance as a result of a parse...you can actually return anything you like, which could just be an indexed sequence of your column values (if you're using something like Shapeless to allow for mixed-type lists etc...) or some other structure.
You do hav macro support in Anorm as well so the the parsers for your case classes can be one liners like this:
import norm.{Macro, Rowset}
val parser = Macro.namedParser[MyCaseClass]
If you want to do something custom, (such as parse direct to JsValue) then you have the flexibility to just hand-craft a more crafty parser.
HTH

Scala Parser Combinators: Parsing in a stream

I'm using the native parser combinator library in scala, and I'd like to use it to parse a number of large files. I have my combinators set up, but the file that I'm trying to parse is too large to be read into memory all at once. I'd like to be able to stream from an input file through my parser and read it back to disk so that I don't need to store it all in memory at once.My current system looks something like this:
val f = Source.fromFile("myfile")
parser.parse(parser.document.+, f.reader).get.map{_.writeToFile}
f.close
This reads the whole file in as it parses, which I'd like to avoid.
There is no easy or built-in way to accomplish this using scala's parser combinators, which provide a facility for implementing parsing expression grammars.
Operators such as ||| (longest match) are largely incompatible with a stream parsing model, as they require extensive backtracking capabilities. In order to accomplish what you are trying to do, you would need to re-formulate your grammar such that no backtracking is required, ever. This is generally much harder than it sounds.
As mentioned by others, your best bet would be to look into a preliminary phase where you chunk your input (e.g. by line) so that you can handle a portion of the stream at a time.
One easy way of doing it is to grab an Iterator from the Source object and then walk through the lines like so:
val source = Source.fromFile("myFile")
val lines = source.getLines
for (line <- lines) {
// Do magic with the line-value
}
source.close // Close the file
But you will need to be able to use the lines one by one in your parser of course.
Source: https://groups.google.com/forum/#!topic/scala-user/LPzpXo3sUVE
You might try the StreamReader class that is part of the parsing package.
You would use it something like:
val f = StreamReader( fromFile("myfile","UTF-8").reader() )
parseAll( parser, f )
The longest match as one poster above mentioned combined with regex's using source.subSequence(0, source.length) means even StreamReader doesn't help.
The best kludgy answer I have is use getLines as others have mentioned, and chunk as the accepted answer mentions. My particular input required me to chunk 2 lines at a time. You could build an iterator out of the chunks you build to make it slightly less ugly.

Specs2 - Unit specification style should not be used in concurrent environments

Specs2 promotes functional style when dealing with Acceptance specification (even Unit specification if we want).
Risks of using old style (mutable style) are mentioned in the spec Specs2 philosophy and concernes potential unwanted side-effects:
The important things to know are:
side-effects are only used to build the specification fragments, by
mutating a variable they are also used to short-circuit the execution
of an example as soon as there is a failure (by throwing an exception).
If you build fragments in the body of examples or execute the same
specification concurrently, the sky should fall down. "context"
management is to be done with case classes or traits (see
org.specs2.examples.MutableSpec)
I don't figure out how a same specification could be run concurrently since each specification is distinct from the other (separated class's instances), even if we run the same twice or more simultaneously.
Indeed, specFragments (mutable variable):
protected[mutable] var specFragments: Fragments = new Fragments()
is declared in a trait called FragmentBuilder, not in an object(in scala sense => singleton) or other shared thing..., so specFragments is a local variable to each Specification's instance.
So what might a scenario be risking concurrency mechanism?
I don't really figure out a true scenario (non-stupid) proving the benefit of Specs2 functional style.
The issues with mutable specification can only be seen when the specification is being built, not when it is executed. When building a mutable specification it's easy to have unexpected side-effect
import org.specs2._
val spec = new mutable.Specification {
"example" >> ok
}
import spec._
def addTitle {
// WHOOPS, forgot to remove this line!
// test, add also an example
"this is only for testing" >> ok
"new title".title
}
addTitle
And the output is:
new title
+ example
+ this is only for testing
Total for specification new title
Finished in 0 ms
2 examples, 0 failure, 0 error
So, you're right, the highlighted sentence in the guide ("execute the same specification concurrently") is ambiguous. The construction of the specification itself might be unsafe if several threads were building the same specification object but not if they were running it (the whole process being called "execute" in that sentence).
Your other question is: what are the benefits of the "functional style"? The main benefit, from a user point of view, is that it is another style of writing specifications where all the text comes first and all the code is put elsewhere.
In conclusion, do not fear the mutable style of Specification if you like it!