Scala Compiler Plugins with a Macro API - scala

Typically, Scala compiler plugins operate directly on compiler internal data structures and utilities. Unfortunately, compiler APIs change rapidly, with every minor release. As a result, the effort required to maintain a compiler plugin is much larger than to maintain a Scala macro.
Is it possible to write a compiler plugin that uses the stable API of Scala macros? How can one do that?

It's unlikely that it's possible to be shielded from the changes in the infrastructure (order of phases, contracts of classes like PluginComponent, etc - that's pretty stable), but it's totally possible to refrain from using scala.tools.nsc.Global, which is what actually doesn't have any compatibility guarantees, and use the scala.reflect.macros.Universe subset of it.

Related

Compilation / Code Generation of External Scala DSL

My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. http://www.artima.com/pins1ed/combinator-parsing.html). Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
Scalisp
Scalispa
… and many others …

Is it possible/useful to transpile Scala to golang?

Scala native has been recently released, but the garbage collector they used (for now) is extremely rudimentary and makes it not suitable for serious use.
So I wonder: why not just transpile Scala to Go (a la Scala.js)? It's going to be a fast, portable runtime. And their GC is getting better and better. Not to mention the inheritance of a great concurrency model: channels and goroutines.
So why did scala-native choose to go so low level with LLVM?
What would be the catch with a golang transpiler?
There are two kinds of languages that are good targets for compilers:
Languages whose semantics closely match the source language's semantics.
Languages which have very low-level and thus very general semantics (or one might argue: no semantics at all).
Examples for #1 include: compiling ECMAScript 2015 to ECMAScript 5 (most language additions were specifically designed as syntactic sugar for existing features, you just have to desugar them), compiling CoffeeScript to ECMAScript, compiling TypeScript to ECMAScript (basically, after type checking, just erase the types and you are done), compiling Java to JVM byte code, compiling C♯ to CLI CIL bytecode, compiling Python to CPython bytecode, compiling Python to PyPy bytecode, compiling Ruby to YARV bytecode, compiling Ruby to Rubinius bytecode, compiling ECMAScript to SpiderMonkey bytecode.
Examples for #2 include: machine code for a general purpose CPU (RISC even more so), C--, LLVM.
Compiling Scala to Go fits neither of the two. Their semantics are very different.
You need either a language with powerful low-level semantics as the target language, so that you can build your own semantics on top, or you need a language with closely matching semantics, so that you can map your own semantics into the target language.
In fact, even JVM bytecode is already too high-level! It has constructs such as classes that do not match constructs such as Scala's traits, so there has to be a fairly complex encoding of traits into classes and interfaces. Likewise, before invokedynamic, it was actually pretty much impossible to represent dynamic dispatch on structural types in JVM bytecode. The Scala compiler had to resort to reflection, or in other words, deliberately stepping outside of the semantics of JVM bytecode (which resulted in a terrible performance overhead for method dispatch on structural types compared to method dispatch on other class types, even though both are the exact same thing).
Proper Tail Calls are another example: we would like to have them in Scala, but because JVM bytecode is not powerful enough to express them without a very complex mapping (basically, you have to forego using the JVM's call stack altogether and manage your own stack, which destroys both performance and Java interoperability), it was decided to not have them in the language.
Go has some of the same problems: in order to implement Scala's expressive non-local control-flow constructs such as exceptions or threads, we need an equally expressive non-local control-flow construct to map to. For typical target languages, this "expressive non-local control-flow construct" is either continuations or the venerable GOTO. Go has GOTO, but it is deliberately limited in its "non-localness". For writing code by humans, limiting the expressive power of GOTO is a good thing, but for a compiler target language, not so much.
It is very likely possible to rig up powerful control-flow using goroutines and channels, but now we are already leaving the comfortable confines of just mapping Scala semantics to Go semantics, and start building Scala high-level semantics on top of Go high-level semantics that weren't designed for such usage. Goroutines weren't designed as a general control-flow construct to build other kinds of control-flow on top of. That's not what they're good at!
So why did scala-native choose to go so low level with LLVM?
Because that's precisely what LLVM was designed for and is good at.
What would be the catch with a golang transpiler?
The semantics of the two languages are too different for a direct mapping and Go's semantics are not designed for building different language semantics on top of.
their GC is getting better and better
So can Scala-native's. As far as I understand, the choice for current use of Boehm-Dehmers-Weiser is basically one of laziness: it's there, it works, you can drop it into your code and it'll just do its thing.
Note that changing the GC is under discussion. There are other GCs which are designed as drop-ins rather than being tightly coupled to the host VM's object layout. E.g. IBM is currently in the process of re-structuring J9, their high-performance JVM, into a set of loosely coupled, independently re-usable "runtime building blocks" components and releasing them under a permissive open source license.
The project is called "Eclipse OMR" (source on GitHub) and it is already production-ready: the Java 8 implementation of IBM J9 was built completely out of OMR components. There is a Ruby + OMR project which demonstrates how the components can easily be integrated into an existing language runtime, because the components themselves assume no language semantics and no specific memory or object layout. The commit which swaps out the GC and adds a JIT and a profiler clocks in at just over 10000 lines. It isn't production-ready, but it boots and runs Rails. They also have a similar project for CPython (not public yet).
why not just transpile Scala to Go (a la Scala.js)?
Note that Scala.JS has a lot of the same problems I mentioned above. But they are doing it anyway, because the gain is huge: you get access to every web browser on the planet. There is no comparable gain for a hypothetical Scala.go.
There's a reason why there are initiatives for getting low-level semantics into the browser such as asm.js and WebAssembly, precisely because compiling a high-level language to another high-level language always has this "semantic gap" you need to overcome.
In fact, note that even for lowish-level languages that were specifically designed as compilation targets for a specific language, you can still run into trouble. E.g. Java has generics, JVM bytecode doesn't. Java has inner classes, JVM bytecode doesn't. Java has anonymous classes, JVM bytecode doesn't. All of these have to be encoded somehow, and specifically the encoding (or rather non-encoding) of generics has caused all sorts of pain.

Scala Metaprogramming at Runtime

I'm building a tool that will receive unpredictable data structure, and I want to generate case class to accomplish the structure of the received data.
I'm trying to figure out if it's possible to generate case class at runtime? This structure will be know only at runtime.
It's something similar to what macro does, but in runtime.
I've found this project on the internet
mars
Which is very close to what I want to do ,but I couldn't find if it was successful of not.
Another way of doing it is generate the code, compile and put the result in the classpath, like IScala is doing to use the code in an iterative way. But I don't think that this will scale.
Does anybody has already done something like runtime code generation?
This question was also posted in scala-user mailing list
UPDATE: (as per the comments)
If all you want is throw-away code generated at runtime to be fed into to a library that cannot work with just lists and maps, and not code to be stored and used later, it would make sense to look for solutions to this problem for Java or JVM. That is, unless the library requires some Scala specific features not available to vanilla JVM bytecode (Scala adds some extras to the bytecode, which Java code doesn't need/have).
what is the benefit of generating statically typed code dynamically? as opposed to using a dynamic data structure.
I would not attempt that at all. Just use a structure such as nested lists and maps.
Runtime code generation is one of the purposes of the Mars Project. Mars is under development, at the moment there is no release version. Mars requires its own toolchain to expand macros at runtime and should use several features unique to scala.meta (http://scalameta.org/), for example, AST interpretation and AST persistence. Currently we are working on ASTs typechecking in scala-reflect, required for runtime macros expansion.

intellij idea 11, scala slow execution [duplicate]

I've been programming in Scala for a while and I like it but one thing I'm annoyed by is the time it takes to compile programs. It's seems like a small thing but with Java I could make small changes to my program, click the run button in netbeans, and BOOM, it's running, and over time compiling in scala seems to consume a lot of time. I hear that with many large projects a scripting language becomes very important because of the time compiling takes, a need that I didn't see arising when I was using Java.
But I'm coming from Java which as I understand it, is faster than any other compiled language, and is fast because of the reasons I switched to Scala(It's a very simple language).
So I wanted to ask, can I make Scala compile faster and will scalac ever be as fast as javac.
There are two aspects to the (lack of) speed for the Scala compiler.
Greater startup overhead
Scalac itself consists of a LOT of classes which have to be loaded and jit-compiled
Scalac has to search the classpath for all root packages and files. Depending on the size of your classpath this can take one to three extra seconds.
Overall, expect a startup overhead of scalac of 4-8 seconds, longer if you run it the first time so disk-caches are not filled.
Scala's answer to startup overhead is to either use fsc or to do continuous building with sbt. IntelliJ needs to be configured to use either option, otherwise its overhead even for small files is unreasonably large.
Slower compilation speed. Scalac manages about 500 up to 1000 lines/sec. Javac manages about 10 times that. There are several reasons for this.
Type inference is costly, in particular if it involves implicit search.
Scalac has to do type checking twice; once according to Scala's rules and a second time after erasure according to Java's rules.
Besides type checking there are about 15 transformation steps to go from Scala to Java, which all take time.
Scala typically generates many more classes per given file size than Java, in particular if functional idioms are heavily used. Bytecode generation and class writing takes time.
On the other hand, a 1000 line Scala program might correspond to a 2-3K line Java program, so some of the slower speed when counted in lines per second has to balanced against more functionality per line.
We are working on speed improvements (for instance by generating class files in parallel), but one cannot expect miracles on this front. Scalac will never be as fast as javac.
I believe the solution will lie in compile servers like fsc in conjunction with good dependency analysis so that only the minimal set of files has to be recompiled. We are working on that, too.
The Scala compiler is more sophisticated than Java's, providing type inference, implicit conversion, and a much more powerful type system. These features don't come for free, so I wouldn't expect scalac to ever be as fast as javac. This reflects a trade-off between the programmer doing the work and the compiler doing the work.
That said, compile times have already improved noticeably going from Scala 2.7 to Scala 2.8, and I expect the improvements to continue now that the dust has settled on 2.8. This page documents some of the ongoing efforts and ideas to improve the performance of the Scala compiler.
Martin Odersky provides much more detail in his answer.
You should be aware that Scala compilation takes at least an order of magnitude longer than Java to compile. The reasons for this are as follows:
Naming conventions (a file XY.scala file need not contain a class called XY and may contain multiple top-level classes). The compiler may therefore have to search more source files to find a given class/trait/object identifier.
Implicits - heavy use of implicits means the compiler needs to search any in-scope implicit conversion for a given method and rank them to find the "right" one. (i.e. the compiler has a massively-increased search domain when locating a method.)
The type system - the scala type system is way more complicated than Java's and hence takes more CPU time.
Type inference - type inference is computationally expensive and a job that javac does not need to do at all
scalac includes an 8-bit simulator of a fully armed and operational battle station, viewable using the magic key combination CTRL-ALT-F12 during the GenICode compilation phase.
The best way to do Scala is with IDEA and SBT. Set up an elementary SBT project (which it'll do for you, if you like) and run it in automatic compile mode (command ~compile) and when you save your project, SBT will recompile it.
You can also use the SBT plug-in for IDEA and attach an SBT action to each of your Run Configurations. The SBT plug-in also gives you an interactive SBT console within IDEA.
Either way (SBT running externally or SBT plug-in), SBT stays running and thus all the classes used in building your project get "warmed up" and JIT-ed and the start-up overhead is eliminated. Additionally, SBT compiles only source files that need it. It is by far the most efficient way to build Scala programs.
The latest revisions of Scala-IDE (Eclipse) are much better atmanaging incremental compilation.
See "What’s the best Scala build system?" for more.
The other solution is to integrate fsc - Fast offline compiler for the Scala 2 language - (as illustrated in this blog post) as a builder in your IDE.
But not in directly Eclipse though, as Daniel Spiewak mentions in the comments:
You shouldn't be using FSC within Eclipse directly, if only because Eclipse is already using FSC under the surface.
FSC is basically a thin layer on top of the resident compiler which is precisely the mechanism used by Eclipse to compile Scala projects.
Finally, as Jackson Davis reminds me in the comments:
sbt (Simple build Tool) also include some kind of "incremental" compilation (through triggered execution), even though it is not perfect, and enhanced incremental compilation is in the work for the upcoming 0.9 sbt version.
Use fsc - it is a fast scala compiler that sits as a background task and does not need loading all the time. It can reuse previous compiler instance.
I'm not sure if Netbeans scala plugin supports fsc (documentation says so), but I couldn't make it work. Try nightly builds of the plugin.
You can use the JRebel plugin which is free for Scala. So you can kind of "develop in the debugger" and JRebel would always reload the changed class on the spot.
I read some statement somewhere by Martin Odersky himself where he is saying that the searches for implicits (the compiler must make sure there is not more than one single implicit for the same conversion to rule out ambiguities) can keep the compiler busy. So it might be a good idea to handle implicits with care.
If it doesn't have to be 100% Scala, but also something similar, you might give Kotlin a try.
-- Oliver
I'm sure this will be down-voted, but extremely rapid turn-around is not always conducive to quality or productivity.
Take time to think more carefully and execute fewer development micro-cycles. Good Scala code is denser and more essential (i.e., free from incidental details and complexity). It demands more thought and that takes time (at least at first). You can progress well with fewer code / test / debug cycles that are individually a little longer and still improve your productivity and the quality of your work.
In short: Seek an optimum working pattern better suited to Scala.

AOT compilation or native code compilation of Scala?

My scala application needs to perform simple operations over large arrays of integers & doubles, and performance is a bottleneck. I've struggled to put my finger on exactly when certain optimizations kick in (e.g. escape analysis) although I can observe their results through various benchmarking. I'd love to do some AOT compilation of my scala application, so I can see or enforce (or implement) certain optimizations ... or compile to native code, if possible, so I can cut corners like bounds checking and observe if it makes a difference.
My question: what alternative compilation methods work for scala? I'm interested in tools like llvm, vmkit, soot, gcj, etc. Who is using those successfully with scala at this point, or are none of these methods currently compatible or maintained?
GCJ can compile JVM classes to native code. This blog describes tests done with Scala code: http://lampblogs.epfl.ch/b2evolution/blogs/index.php/2006/10/02/scala_goes_native_almost?blog=7
To answer my own question, there is no alternative backend for Scala except for the JVM. The .NET backend has been in development for a long time, but its status is unclear. The LLVM backend is also not yet ready for use, and it's not clear what its future is.