Scala native has been recently released, but the garbage collector they used (for now) is extremely rudimentary and makes it not suitable for serious use.
So I wonder: why not just transpile Scala to Go (a la Scala.js)? It's going to be a fast, portable runtime. And their GC is getting better and better. Not to mention the inheritance of a great concurrency model: channels and goroutines.
So why did scala-native choose to go so low level with LLVM?
What would be the catch with a golang transpiler?
There are two kinds of languages that are good targets for compilers:
Languages whose semantics closely match the source language's semantics.
Languages which have very low-level and thus very general semantics (or one might argue: no semantics at all).
Examples for #1 include: compiling ECMAScript 2015 to ECMAScript 5 (most language additions were specifically designed as syntactic sugar for existing features, you just have to desugar them), compiling CoffeeScript to ECMAScript, compiling TypeScript to ECMAScript (basically, after type checking, just erase the types and you are done), compiling Java to JVM byte code, compiling C♯ to CLI CIL bytecode, compiling Python to CPython bytecode, compiling Python to PyPy bytecode, compiling Ruby to YARV bytecode, compiling Ruby to Rubinius bytecode, compiling ECMAScript to SpiderMonkey bytecode.
Examples for #2 include: machine code for a general purpose CPU (RISC even more so), C--, LLVM.
Compiling Scala to Go fits neither of the two. Their semantics are very different.
You need either a language with powerful low-level semantics as the target language, so that you can build your own semantics on top, or you need a language with closely matching semantics, so that you can map your own semantics into the target language.
In fact, even JVM bytecode is already too high-level! It has constructs such as classes that do not match constructs such as Scala's traits, so there has to be a fairly complex encoding of traits into classes and interfaces. Likewise, before invokedynamic, it was actually pretty much impossible to represent dynamic dispatch on structural types in JVM bytecode. The Scala compiler had to resort to reflection, or in other words, deliberately stepping outside of the semantics of JVM bytecode (which resulted in a terrible performance overhead for method dispatch on structural types compared to method dispatch on other class types, even though both are the exact same thing).
Proper Tail Calls are another example: we would like to have them in Scala, but because JVM bytecode is not powerful enough to express them without a very complex mapping (basically, you have to forego using the JVM's call stack altogether and manage your own stack, which destroys both performance and Java interoperability), it was decided to not have them in the language.
Go has some of the same problems: in order to implement Scala's expressive non-local control-flow constructs such as exceptions or threads, we need an equally expressive non-local control-flow construct to map to. For typical target languages, this "expressive non-local control-flow construct" is either continuations or the venerable GOTO. Go has GOTO, but it is deliberately limited in its "non-localness". For writing code by humans, limiting the expressive power of GOTO is a good thing, but for a compiler target language, not so much.
It is very likely possible to rig up powerful control-flow using goroutines and channels, but now we are already leaving the comfortable confines of just mapping Scala semantics to Go semantics, and start building Scala high-level semantics on top of Go high-level semantics that weren't designed for such usage. Goroutines weren't designed as a general control-flow construct to build other kinds of control-flow on top of. That's not what they're good at!
So why did scala-native choose to go so low level with LLVM?
Because that's precisely what LLVM was designed for and is good at.
What would be the catch with a golang transpiler?
The semantics of the two languages are too different for a direct mapping and Go's semantics are not designed for building different language semantics on top of.
their GC is getting better and better
So can Scala-native's. As far as I understand, the choice for current use of Boehm-Dehmers-Weiser is basically one of laziness: it's there, it works, you can drop it into your code and it'll just do its thing.
Note that changing the GC is under discussion. There are other GCs which are designed as drop-ins rather than being tightly coupled to the host VM's object layout. E.g. IBM is currently in the process of re-structuring J9, their high-performance JVM, into a set of loosely coupled, independently re-usable "runtime building blocks" components and releasing them under a permissive open source license.
The project is called "Eclipse OMR" (source on GitHub) and it is already production-ready: the Java 8 implementation of IBM J9 was built completely out of OMR components. There is a Ruby + OMR project which demonstrates how the components can easily be integrated into an existing language runtime, because the components themselves assume no language semantics and no specific memory or object layout. The commit which swaps out the GC and adds a JIT and a profiler clocks in at just over 10000 lines. It isn't production-ready, but it boots and runs Rails. They also have a similar project for CPython (not public yet).
why not just transpile Scala to Go (a la Scala.js)?
Note that Scala.JS has a lot of the same problems I mentioned above. But they are doing it anyway, because the gain is huge: you get access to every web browser on the planet. There is no comparable gain for a hypothetical Scala.go.
There's a reason why there are initiatives for getting low-level semantics into the browser such as asm.js and WebAssembly, precisely because compiling a high-level language to another high-level language always has this "semantic gap" you need to overcome.
In fact, note that even for lowish-level languages that were specifically designed as compilation targets for a specific language, you can still run into trouble. E.g. Java has generics, JVM bytecode doesn't. Java has inner classes, JVM bytecode doesn't. Java has anonymous classes, JVM bytecode doesn't. All of these have to be encoded somehow, and specifically the encoding (or rather non-encoding) of generics has caused all sorts of pain.
Related
I want to develop a tool that performs certain optimizations in a program based on the program structure. For example, let's say I want to identify if-else within a loop, and my tool shall rewrite it into two loops.
I want the tool to be able to rewrite programs from a wide range of languages, example Java, C++, Python, Javascript, etc.
I am exploring if GraalVM can be used for this purpose, to act as the common platform in which I can implement the same optimizations for various languages.
Does GraalVM have a common intermediate representation (something like the LLVM IR)? I looked at the documentation but I am not sure where to get started. Any pointers?
Note: I am not looking for inter-operability between languages. You can assume that the programs I want to rewrite are written in one single language; the language may be different for different programs.
GraalVM has two components that are relevant for this:
compiler, which compiles Java bytecode to native code
truffle, which is a framework for implementing other programming languages on top of GraalVM.
Languages implemented with the Truffle framework get partially evaluated to Java bytecode, which is then compiled by the Graal compiler. This article/talk gives more details including the IR used by Graal compiler: https://chrisseaton.com/truffleruby/jokerconf17/. Depending on your concrete use case you may want to hook into Truffle, Truffle partial evaluator or Graal compiler.
My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. http://www.artima.com/pins1ed/combinator-parsing.html). Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
Scalisp
Scalispa
… and many others …
I've heard people claim that:
Scala's type system is amazing (existential types, variant, co-variant)
Because of the power of macros, everything is a library in Clojure: (pattern matching, logic programming, non-determinism, ..)
Question:
If both assertions are true, why is Scala's type system not a library in Clojure? Is it because:
types are one of these things that do not work well as a library? [i.e. the changes would somehow have to threaded through every existing clojure library, including clojure.core?]
is Scala's notion of types fundamentally incompatible with clojure protocol / records?
... ?
It's an interesting question.
You are certainly right about Scala having an amazing type system, and about Clojure being phenomenal for meta-programming and extension of the language (although that is about more than just macros....).
A few reasons I can think of:
Clojure is a dynamically typed language while Scala is a statically typed language. Having powerful type inference isn't so much use in a language where you can assume relatively little about the types of your inputs.
Clojure already has a very interesting project to add typing as a library (Typed Clojure) which looks very promising - however it's very different in approach to Scala as it is designed for a dynamic language from the start (inspired more by Typed Racket, I believe).
Clojure philosophy actually discourages certain OOP concepts (particularly implementation inheritance, mutable objects, and data encapsulation). A type system that supports these things (as Scala does) wouldn't be a good fit for Clojure idioms - at best they would be ignored, but they could easily encourage a style of development that would cause people to run into severe problems later.
Clojure already provides tools that solve many of the problems you would typically solve with types in other languages - e.g. the use of protocols for polymorphism.
There's a strong focus in the Clojure community on simplicity (in the sense of the excellent video "Simple Made Easy" - see particularly the slide at 39:30). While Scala's type system is certainly amazing, I think it's a stretch to describe it as "Simple"
Putting in a Scala-style type system would probably require a complete rewrite of the Clojure compiler and make it substantially more complex. Nobody seems to have signed up so far to take on that particular challenge... and there's a risk that even if someone were willing and able to do this then the changes could be rejected for the various cultural / technical reasons covered above.
In the absence of a major change to Clojure itself (which I think would be unlikely) then one interesting possibility would be to create a DSL within Clojure that provided Scala-style type inference for a specific domain and compiled this DSL direct to optimised Java bytecode. I could see that being a useful approach for specific problem domains (large scale numerical data crunching with big matrices, for example).
To simply answer your question "... why is Scala's type system not a library in Clojure?":
Because the type system is part of the scala compiler and not of the scala library. The whole power of scalas type system only exists at compile time. The JVM has no support for things like that, because of type erasure and also, because it would simply slow down execution. And also there is no need for it. If you have a statically typed language, you don't need type information at runtime, unless you want to do dirty stuff.
edit:
#mikera the jvm is sure capable of running the scala compiler, I did not say anything like that. I just said, that the jvm has no support for type systems like that. It does not even support generics. At runtime all these types are gone. The compiler checks for the correctness of a program and removes all the higher kinded types / generics.
example:
val xs: List[Int] = List(1,2,3,4)
val x1: Int = xs.head
will at runtime look like this:
val xs: List = List.apply(1,2,3,4)
val x1: Int = xs.head.asInstanceOf[Int]
But it doesn't matter, because the compiler checked it before. You can only get in trouble here, when you use reflection, because you could put any value in the list and it would break at runtime exactly where the value is casted to Int.
And this is one of the reasons, why the scala type system is not part of the scala library, but built into the compiler.
And also the question of the OP was "... why is Scala's type system not a library in Clojure?" and not "Is it possible to create a type system such as scalas for clojure?" and I perfectly answered that question.
I'm wondering why Scala does not have an IO Monad like Haskell.
So, in Scala the return type of method readLine is String whereas in Haskell the comparable function getLine has the return type IO String.
There is a similar question about this topic, but its answer it not satisfying:
Using IO is certainly not the dominant style in scala.
Can someone explain this a bit further? What was the design decision for not including IO Monads to Scala?
Because Scala is not pure (and has no means to enforce that a function is pure, like D has) and allows side effects. It interoperates closely with Java (e.g. reuses big parts of the Java libraries). Scala is not lazy, so there is no problem regarding execution order like in Haskell (e.g. no need for >> or seq). Under these circumstances introducing the IO Monad would make life harder without gaining much.
But if you really have applications where the IO monad has significant advantages, nothing stops you from writing your own implementation or to use scalaz. See e.g. http://apocalisp.wordpress.com/2011/12/19/towards-an-effect-system-in-scala-part-2-io-monad/
[Edit]
Why wasn't it done as a lazy and pure language?
This would have been perfectly possible (e.g. look at Frege, a JVM language very similar to Haskell). Of course this would make the Java interoperability more complicate, but I don't think this is the main reason. I think a lazy and pure language is a totally cool thing, but simply too alien to most Java programmers, which are the target audience of Scala. Scala was designed to cooperate with Java's object model (which is the exact opposite of pure and lazy), allowing functional and mixed functional-OO programming, but not enforcing it (which would have chased away almost all Java programmers). In fact there is no point in having yet another completely functional language: There is Haskell, Erlang, F# (and other MLs) and Clojure (and other Schemes / Lisps), which are all very sophisticated, stable and successful, and won't be easily replaced by a newcomer.
I've been hearing a lot about different JVM languages, still in vaporware mode, that propose to implement reification somehow. I have this nagging half-remembered (or wholly imagined, don't know which) thought that somewhere I read that Scala somehow took advantage of the JVM's type erasure to do things that it wouldn't be able to do with reification. Which doesn't really make sense to me since Scala is implemented on the CLR as well as on the JVM, so if reification caused some kind of limitation it would show up in the CLR implementation (unless Scala on the CLR is just ignoring reification).
So, is there a good side to type erasure for Scala, or is reification an unmitigated good thing?
See Ola Bini's blog. As we all know, Java has use-site covariance, implemented by having little question marks wherever you think variance is appropriate. Scala has definition-site covariance, implemented by the class designer. He says:
Generics is a complicated language feature. It becomes even more
complicated when added to an existing language that already has
subtyping. These two features don’t play very well together in the
general case, and great care has to be taken when adding them to a
language. Adding them to a virtual machine is simple if that machine
only has to serve one language - and that language uses the same
generics. But generics isn’t done. It isn’t completely understood how
to handle correctly and new breakthroughs are happening (Scala is a
good example of this). At this point, generics can’t be considered
“done right”. There isn’t only one type of generics - they vary in
implementation strategies, feature and corner cases.
...
What this all means is that if you want to add reified generics to the
JVM, you should be very certain that that implementation can encompass
both all static languages that want to do innovation in their own
version of generics, and all dynamic languages that want to create a
good implementation and a nice interfacing facility with Java
libraries. Because if you add reified generics that doesn’t fulfill
these criteria, you will stifle innovation and make it that much
harder to use the JVM as a multi language VM.
i.e. If we had reified generics in the JVM, most likely those reified generics wouldn't be suitable for the features we really like about Scala, and we'd be stuck with something suboptimal.