What would be a good approach to generate native code from an interpreter written with scala parser combinators? - scala

I already have an interpreter for my language.
It is implemented with:
parser -> scala parser combinators;
AST -> scala case classes;
evaluator -> scala pattern matching.
Now I want to compile AST to native code and hopefully Java bytecode.
I am thinking of two main options to accomplish at least one of these two tasks:
generate LLVM IR code;
generate C code and/or Java code;
obs.: GCJ and SLEM seem to be unusable (GCJ works with simple code, as I could test)

Short Answer
I'd go with Java Bytecode.
Long Answer
The thing is, the higher-level the language you compile to,
The slower and more cumbersome the compilation process is
The more flexibility you get
For instance, if you compile to C, you can then get a lot of possible backends for C compilers - you can generate Java Bytecode, LLVM IR, asm for many architectures, etc., but you basically compile twice. If you choose LLVM IR you're already halfway to compiling to asm (parsing LLVM IR is far faster than parsing a language such as C), but you'll have a very hard time getting Java Bytecode from that. Both intermediate languages can compile to native, though.
I think compiling to some intermediate representation is preferable to compiling to a general-purpose programming language. Between LLVM IR and Java Bytecode I'd go with Java Bytecode - even though I personally like LLVM IR better - because you wrote that you basically want both, and while you can sort of convert Java Bytecode to LLVM IR, the other direction is very difficult.
The only remaining difficulty is translating your language to Java Bytecode. This related question about tools that can make it easier might help.
Finally, another advantage of Java Bytecode is that it'll play well with your interpreter, effectively allowing you to easily generate a hotspot-like JITter (or even a trace compiler).

I agree with #Oak about the choice of ByteCode as the most simple target. A possible Scala library to generate ByteCode is CafeBabe by #psuter.
You cannot do everything with it, but for small project it could be sufficient. The syntax is also very clear. Please see the project Wiki for more information.

Related

Can I compile a string containing Scala code to machine code using Scala Native as a library of my program?

I succeed compiling a scala project to machine code using Scala Native.
But I want to generate some executable code at runtime (I plan to implement a standalone compiler from a scala-like language to machine code).
The goal is to have a self-hosted language, independent of JVM.
Is it possible to somehow embed the Scala Native compiler in my project?
As described in https://www.scala-native.org/en/v0.4.0/contrib/build.html,
The build of Scala Native contains the following JVM-based portions of which the 1st, 3rd, and 4th seem like they would be necessary for a Scala Native compiler embedded in your own compiler:
The Scala Native sbt plugin and its dependencies (directory names are in parentheses). These are JVM projects.
sbtScalaNative (sbt-scala-native)
tools
nir, util
nirparser
testRunner (test-runner)
So Scala Native is not independent of JVM as OP's question seeks. Conversely, studying the NIR (scala-Native Intermediate Representation) portions of the Scala Native codebase might indicate a point (somewhere near the emission of NIR onward) to factor out a nonJVM NIR-to-LLVM backend. Then OP's “self-hosted language” that compiles NIR to LLVM IR to machine code “from a scala-like language to machine code” as OP's question seeks might be possible, as derived from some backend extract/fragment of Scala Native's codebase after the parser, perhaps after the AST, which is dependent on Scala(-proper)'s JVM-based parser, whereas from NIR onward is in the JVM simply because the parser and AST were already within the JVM.

GraalVM: How to implement compiler optimizations?

I want to develop a tool that performs certain optimizations in a program based on the program structure. For example, let's say I want to identify if-else within a loop, and my tool shall rewrite it into two loops.
I want the tool to be able to rewrite programs from a wide range of languages, example Java, C++, Python, Javascript, etc.
I am exploring if GraalVM can be used for this purpose, to act as the common platform in which I can implement the same optimizations for various languages.
Does GraalVM have a common intermediate representation (something like the LLVM IR)? I looked at the documentation but I am not sure where to get started. Any pointers?
Note: I am not looking for inter-operability between languages. You can assume that the programs I want to rewrite are written in one single language; the language may be different for different programs.
GraalVM has two components that are relevant for this:
compiler, which compiles Java bytecode to native code
truffle, which is a framework for implementing other programming languages on top of GraalVM.
Languages implemented with the Truffle framework get partially evaluated to Java bytecode, which is then compiled by the Graal compiler. This article/talk gives more details including the IR used by Graal compiler: https://chrisseaton.com/truffleruby/jokerconf17/. Depending on your concrete use case you may want to hook into Truffle, Truffle partial evaluator or Graal compiler.

Compilation / Code Generation of External Scala DSL

My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. http://www.artima.com/pins1ed/combinator-parsing.html). Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
Scalisp
Scalispa
… and many others …

Can I generate Scala bindings for Objective-C and C++ with scala-bindgen?

I've recently found scala-bindgen from a Gitter room on Scala Native. Seems like (at the present point in time) they are developing a tool for generating Scala bindings for C header files.
Are there plans for generating Scala bindings for Objective-C and C++ too?
The initial plan consists only on Scala bindings for C language. Bindings for Objective-C is something planned for future. Bindings for C++ are pretty unlikely to happen, due to the complexity involved in such task.
For more information:
http://github.com/frgomes/scala-bindgen

Can Scala.js not compile itself?

I'm going through the tutorial and it looks like Scala.js only runs under sbt.
Are there bits of Scala.js (or the Scala environment generally) that are not written in Scala? Or cannot all the necessary bits go through Scala.js for some other reason? What am I missing?
Mostly, this is because the Scala compiler uses too many parts of the JDK that have not been ported to Scala.js (yet). Some of these parts, most notably related to reading files (in the classpath, and source files), which cannot be implemented in JavaScript as such (though could be implemented for one particular platform, such as Node.js).
There is also the dependency on ASM, a Java bytecode manipulation library written in Java. Even though Scala.js compiles to JavaScript, the Java bytecode is still used for separate compilation (symbol lookup in previously compiled parts, such as libraries).
So, even though the Scala.js specific parts are written in a platform-independent way (e.g., we test that the Scala.js optimizer can optimize itself), there are a lot of parts in scalac that do not work out-of-the-box in Scala.js.