I read the scala compiler recently.
Scala compiler written by Scala.
I realized. How did it first compile?
Scala didn't exist yet...right?
The first version of the Scala compiler was written in Java.
I read the scala compiler recently.
Scala compiler written by Scala.
I realized. How did it first compile?
It was compiled with the previous version of the Scala compiler, which was written in Java. Note that the name of the current Scala compiler is "nsc", which stands for "new Scala compiler" – that should give you a hint that there existed a compiler before the current one. And that compiler was written in Java.
However, it is not actually necessary that the compiler written in Scala is the second compiler. It could also be the first compiler. How would that work?
Well, the first option is that before there was a Scala compiler, there could have been a Scala interpreter. Then, you could have used that interpreter to run the Scala compiler and compile itself.
The second option is that the compiler could have been translated by hand into another language. Technically speaking, this is also compilation or interpretation (depending on how exactly you perform it), just done by a human brain instead of a program. Niklaus Wirth did this with the first version of the Oberon compiler. The Oberon compiler was always written in Oberon, there was never a different version. It was hand-translated by his students into a Fortran dialect, the translated version compiled by the Fortran compiler, then the compiled version was used to compile the original version.
Scala didn't exist yet...right?
That is a completely different question, actually. A language can exist without there ever having been an implementation. For example, Plankalkül existed for ~30 years, until it was implemented for the very first time. ISWIM has never been implemented AFAIK, and yet, it is a very important programming language.
Related
I succeed compiling a scala project to machine code using Scala Native.
But I want to generate some executable code at runtime (I plan to implement a standalone compiler from a scala-like language to machine code).
The goal is to have a self-hosted language, independent of JVM.
Is it possible to somehow embed the Scala Native compiler in my project?
As described in https://www.scala-native.org/en/v0.4.0/contrib/build.html,
The build of Scala Native contains the following JVM-based portions of which the 1st, 3rd, and 4th seem like they would be necessary for a Scala Native compiler embedded in your own compiler:
The Scala Native sbt plugin and its dependencies (directory names are in parentheses). These are JVM projects.
sbtScalaNative (sbt-scala-native)
tools
nir, util
nirparser
testRunner (test-runner)
So Scala Native is not independent of JVM as OP's question seeks. Conversely, studying the NIR (scala-Native Intermediate Representation) portions of the Scala Native codebase might indicate a point (somewhere near the emission of NIR onward) to factor out a nonJVM NIR-to-LLVM backend. Then OP's “self-hosted language” that compiles NIR to LLVM IR to machine code “from a scala-like language to machine code” as OP's question seeks might be possible, as derived from some backend extract/fragment of Scala Native's codebase after the parser, perhaps after the AST, which is dependent on Scala(-proper)'s JVM-based parser, whereas from NIR onward is in the JVM simply because the parser and AST were already within the JVM.
I'm going through the tutorial and it looks like Scala.js only runs under sbt.
Are there bits of Scala.js (or the Scala environment generally) that are not written in Scala? Or cannot all the necessary bits go through Scala.js for some other reason? What am I missing?
Mostly, this is because the Scala compiler uses too many parts of the JDK that have not been ported to Scala.js (yet). Some of these parts, most notably related to reading files (in the classpath, and source files), which cannot be implemented in JavaScript as such (though could be implemented for one particular platform, such as Node.js).
There is also the dependency on ASM, a Java bytecode manipulation library written in Java. Even though Scala.js compiles to JavaScript, the Java bytecode is still used for separate compilation (symbol lookup in previously compiled parts, such as libraries).
So, even though the Scala.js specific parts are written in a platform-independent way (e.g., we test that the Scala.js optimizer can optimize itself), there are a lot of parts in scalac that do not work out-of-the-box in Scala.js.
Apologies if this is a duplicate, I didn't hit on the magic keyword while searching.
I have a project where I pull in various dependencies. One of them (jooq) depends on scala 2.10, whereas my application depends on scala 2.11.x.
Although everything "works", I would like to understand better what are the runtime implications of doing something like this? How will the JVM resolve the different dependencies, and what type of overhead could I be looking at?
I am trying to determine if it's worthwhile to fork jooq, and compile it against 2.11 (assuming it will compile and work under 2.11).
Scala is not binary compatible between major versions (2.10 to 2.11 for example). This means that there are no guarantees that a library that is compiled for Scala 2.10 will work in a project using 2.11. You might be lucky enough that it works, but I would definitely not depend on that luck for any important codebase.
This is the reason why Scala libraries always has got the library version in their name and why SBT has got special syntax for dependencies to get the right library build for the Scala version used.
On a side note Martin Odersky (Scalas "father") has been proposing a solution to this problem during the year, storing an intermediate representation along with the byte code to allow automagical recompilation to a newer Scala version.
You have the possible danger of runtime exceptions.
As Scala 2.10 and 2.11 are quite similiar the danger is not as big as it has been with 2.9 to 2.10 or 2.8 to 2.9 but it still is there and if you want to do something that is meant to be prodcution code, you definitly should try to raise jooq to 2.11.
Testing runtime behavior is very well documented but with the advent of powerful type systems and macro system one might be interested in testing compile-time behavior.
For instance when writing a library that provides compile-time guarantees. Say I'm building a set of test matchers and I want to make sure a matcher is as type-safe as I claim it to be.
List(1,2) must beEqualTo(Set(1,2)) // should fail at compile-time
I can see in the scala compiler project that most of the tests are functional tests where the compiler output is asserted by comparing it with a reference file.
Is there a convention for such tests? An SBT plugin?
Thanks
I already have an interpreter for my language.
It is implemented with:
parser -> scala parser combinators;
AST -> scala case classes;
evaluator -> scala pattern matching.
Now I want to compile AST to native code and hopefully Java bytecode.
I am thinking of two main options to accomplish at least one of these two tasks:
generate LLVM IR code;
generate C code and/or Java code;
obs.: GCJ and SLEM seem to be unusable (GCJ works with simple code, as I could test)
Short Answer
I'd go with Java Bytecode.
Long Answer
The thing is, the higher-level the language you compile to,
The slower and more cumbersome the compilation process is
The more flexibility you get
For instance, if you compile to C, you can then get a lot of possible backends for C compilers - you can generate Java Bytecode, LLVM IR, asm for many architectures, etc., but you basically compile twice. If you choose LLVM IR you're already halfway to compiling to asm (parsing LLVM IR is far faster than parsing a language such as C), but you'll have a very hard time getting Java Bytecode from that. Both intermediate languages can compile to native, though.
I think compiling to some intermediate representation is preferable to compiling to a general-purpose programming language. Between LLVM IR and Java Bytecode I'd go with Java Bytecode - even though I personally like LLVM IR better - because you wrote that you basically want both, and while you can sort of convert Java Bytecode to LLVM IR, the other direction is very difficult.
The only remaining difficulty is translating your language to Java Bytecode. This related question about tools that can make it easier might help.
Finally, another advantage of Java Bytecode is that it'll play well with your interpreter, effectively allowing you to easily generate a hotspot-like JITter (or even a trace compiler).
I agree with #Oak about the choice of ByteCode as the most simple target. A possible Scala library to generate ByteCode is CafeBabe by #psuter.
You cannot do everything with it, but for small project it could be sufficient. The syntax is also very clear. Please see the project Wiki for more information.