scala native vs. JNI - scala

I'm lifting a native API to Scala. There appear to be two paths: use JNI or use Scala Native.
JNI usage creates the methods you want in Java and then shadows them in C where you write C code to access your API. Pro: you can use the native API's data structures directly. Con: your Scala code now also has to provide its own native wrapper library increasing chances of portability complications, and now you wrapped the library twice, once in JNI C to get it into the JVM and then in the Java/Scala module. A lot of work, many places for bugs. I used this path many times, I figured it was the way the world was.
Then along comes Scala native which does the reverse and shadows the C functions in Scala. Pro: you don't write in C so no new native library and no double wrapper. Con: it seems you can only lift Scala native primitives so complex native data structures can't be accessed. That makes the native library useless if it can't use its data structures.
Neither is terribly portable, as expected, but is there some functionality I'm over looking that fixes the cons of one or both of these approaches? Or some other reason to pick on Scala native over JNI?

Scala Native does not "shadow" C functions in JVM-based Scala. Scala Native is the direct equivalent of bare-metal/native C language but with Scala syntax instead of C syntax and Scala semantics (without any JVM-based semantics) instead of C semantics. If you were to chose JNI, then you would conceivably have 2 choices in the scope of your question of which language to utilize to invoke the JNI injections into the JVM from outside the JVM: either C or Scala Native. If you were to choose Scala Native to write UI code on a JVM-based infrastructure (e.g., Android) then you would be effectively choosing JNI as well by having Scala Native invoke JNI (instead of C invoking JNI). (All mentioning of C here applies to C++ as well.) All that said, utilizing Scala Native invoking JNI to perform UI function invocation from outside the JVM is highly unorthodox (and utilized widely only in the Xamarin-based world).

Related

GraalVM: How to implement compiler optimizations?

I want to develop a tool that performs certain optimizations in a program based on the program structure. For example, let's say I want to identify if-else within a loop, and my tool shall rewrite it into two loops.
I want the tool to be able to rewrite programs from a wide range of languages, example Java, C++, Python, Javascript, etc.
I am exploring if GraalVM can be used for this purpose, to act as the common platform in which I can implement the same optimizations for various languages.
Does GraalVM have a common intermediate representation (something like the LLVM IR)? I looked at the documentation but I am not sure where to get started. Any pointers?
Note: I am not looking for inter-operability between languages. You can assume that the programs I want to rewrite are written in one single language; the language may be different for different programs.
GraalVM has two components that are relevant for this:
compiler, which compiles Java bytecode to native code
truffle, which is a framework for implementing other programming languages on top of GraalVM.
Languages implemented with the Truffle framework get partially evaluated to Java bytecode, which is then compiled by the Graal compiler. This article/talk gives more details including the IR used by Graal compiler: https://chrisseaton.com/truffleruby/jokerconf17/. Depending on your concrete use case you may want to hook into Truffle, Truffle partial evaluator or Graal compiler.

Access scala function in PySpark

I have a Scala library which contains some utility codes and UDF for the Scala Spark API.
However, I would love to now start to use this Scala library with PySpark. Using Java based classes seems to work pretty OK like outlined Running custom Java class in PySpark, however as I use a library written in Scala some the names of some classes might not be straight forward and contain characters like $.
How is interoperability still possible?
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
In general you don't. While access in such cases is sometimes possible, using __getattribute__ / getattr, Py4j is simply not designed with Scala in mind (that's really not Python specific - while Scala is technically speaking interpolatable with Java, it is much richer language, and many of its features are not easily accessible from other JVM languages).
In practice you should do the same thing that Spark does internally - instead of exposing Scala API directly, you create a lean* Java or Scala API, which is specifically designed for interoperability with guest languages. Since Py4j provides translation only between basic Python and Java types, and doesn't handle commonly used Scala interfaces, you will need such intermediate layer anyway, unless Scala library was specifically designed for Java interoperability.
As of your last concern
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
Py4j can handle Java generics just fine without any special treatment. Advanced Scala features (manifests, class tags, type tags) are typically no go, but once again, there are not designed (though it is possible) with Java interoperability in mind.
* As a rule of thumb, if something is Java friendly (doesn't require any crazy hacks, extensive type conversions, or filling the blanks normally handled by the Scala compiler), it should be a good fit for PySpark as well.

Compilation / Code Generation of External Scala DSL

My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. http://www.artima.com/pins1ed/combinator-parsing.html). Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
Scalisp
Scalispa
… and many others …

Is it possible/useful to transpile Scala to golang?

Scala native has been recently released, but the garbage collector they used (for now) is extremely rudimentary and makes it not suitable for serious use.
So I wonder: why not just transpile Scala to Go (a la Scala.js)? It's going to be a fast, portable runtime. And their GC is getting better and better. Not to mention the inheritance of a great concurrency model: channels and goroutines.
So why did scala-native choose to go so low level with LLVM?
What would be the catch with a golang transpiler?
There are two kinds of languages that are good targets for compilers:
Languages whose semantics closely match the source language's semantics.
Languages which have very low-level and thus very general semantics (or one might argue: no semantics at all).
Examples for #1 include: compiling ECMAScript 2015 to ECMAScript 5 (most language additions were specifically designed as syntactic sugar for existing features, you just have to desugar them), compiling CoffeeScript to ECMAScript, compiling TypeScript to ECMAScript (basically, after type checking, just erase the types and you are done), compiling Java to JVM byte code, compiling C♯ to CLI CIL bytecode, compiling Python to CPython bytecode, compiling Python to PyPy bytecode, compiling Ruby to YARV bytecode, compiling Ruby to Rubinius bytecode, compiling ECMAScript to SpiderMonkey bytecode.
Examples for #2 include: machine code for a general purpose CPU (RISC even more so), C--, LLVM.
Compiling Scala to Go fits neither of the two. Their semantics are very different.
You need either a language with powerful low-level semantics as the target language, so that you can build your own semantics on top, or you need a language with closely matching semantics, so that you can map your own semantics into the target language.
In fact, even JVM bytecode is already too high-level! It has constructs such as classes that do not match constructs such as Scala's traits, so there has to be a fairly complex encoding of traits into classes and interfaces. Likewise, before invokedynamic, it was actually pretty much impossible to represent dynamic dispatch on structural types in JVM bytecode. The Scala compiler had to resort to reflection, or in other words, deliberately stepping outside of the semantics of JVM bytecode (which resulted in a terrible performance overhead for method dispatch on structural types compared to method dispatch on other class types, even though both are the exact same thing).
Proper Tail Calls are another example: we would like to have them in Scala, but because JVM bytecode is not powerful enough to express them without a very complex mapping (basically, you have to forego using the JVM's call stack altogether and manage your own stack, which destroys both performance and Java interoperability), it was decided to not have them in the language.
Go has some of the same problems: in order to implement Scala's expressive non-local control-flow constructs such as exceptions or threads, we need an equally expressive non-local control-flow construct to map to. For typical target languages, this "expressive non-local control-flow construct" is either continuations or the venerable GOTO. Go has GOTO, but it is deliberately limited in its "non-localness". For writing code by humans, limiting the expressive power of GOTO is a good thing, but for a compiler target language, not so much.
It is very likely possible to rig up powerful control-flow using goroutines and channels, but now we are already leaving the comfortable confines of just mapping Scala semantics to Go semantics, and start building Scala high-level semantics on top of Go high-level semantics that weren't designed for such usage. Goroutines weren't designed as a general control-flow construct to build other kinds of control-flow on top of. That's not what they're good at!
So why did scala-native choose to go so low level with LLVM?
Because that's precisely what LLVM was designed for and is good at.
What would be the catch with a golang transpiler?
The semantics of the two languages are too different for a direct mapping and Go's semantics are not designed for building different language semantics on top of.
their GC is getting better and better
So can Scala-native's. As far as I understand, the choice for current use of Boehm-Dehmers-Weiser is basically one of laziness: it's there, it works, you can drop it into your code and it'll just do its thing.
Note that changing the GC is under discussion. There are other GCs which are designed as drop-ins rather than being tightly coupled to the host VM's object layout. E.g. IBM is currently in the process of re-structuring J9, their high-performance JVM, into a set of loosely coupled, independently re-usable "runtime building blocks" components and releasing them under a permissive open source license.
The project is called "Eclipse OMR" (source on GitHub) and it is already production-ready: the Java 8 implementation of IBM J9 was built completely out of OMR components. There is a Ruby + OMR project which demonstrates how the components can easily be integrated into an existing language runtime, because the components themselves assume no language semantics and no specific memory or object layout. The commit which swaps out the GC and adds a JIT and a profiler clocks in at just over 10000 lines. It isn't production-ready, but it boots and runs Rails. They also have a similar project for CPython (not public yet).
why not just transpile Scala to Go (a la Scala.js)?
Note that Scala.JS has a lot of the same problems I mentioned above. But they are doing it anyway, because the gain is huge: you get access to every web browser on the planet. There is no comparable gain for a hypothetical Scala.go.
There's a reason why there are initiatives for getting low-level semantics into the browser such as asm.js and WebAssembly, precisely because compiling a high-level language to another high-level language always has this "semantic gap" you need to overcome.
In fact, note that even for lowish-level languages that were specifically designed as compilation targets for a specific language, you can still run into trouble. E.g. Java has generics, JVM bytecode doesn't. Java has inner classes, JVM bytecode doesn't. Java has anonymous classes, JVM bytecode doesn't. All of these have to be encoded somehow, and specifically the encoding (or rather non-encoding) of generics has caused all sorts of pain.

What parts of the Java ecosystem and language should a developer learn to get the most out of Scala?

Many of the available resources for learning Scala assume some background in Java. This can prove challenging for someone who is trying to learn Scala with no Java background.
What are some Java-isms a new Scala developer should know about as they learn the language?
For example, it's useful to know what a CLASSPATH is, what the java command line options are, etc...
That's a really great question! I've never thought about people learning Java just so they have it easier to learn Scala...
Apart from all the basics like for loops and such, learning Java Generics can be really helpful. The Scala equivalent is much more potent (and much harder to understand) than Java Generics. You might want to try to figure out where the limits of Java Generics are, and then in which cases Scala's type constructors can be used to overcome those limitations. At the more basic level, it is important to know why Generics are necessary, and how Java is a strongly typed language.
Java allows you to have multiple constructors for one class. This knowledge will be of no use when you learn Scala, because Scala has another way that allows you to offer several methods to create instances of a class. So, you'd rather not have a deep look into this Java concept.
Here are some concepts that differ very strongly between Java and Scala. So, if you learn the Java concepts and then later on want to learn the equivalent in Scala, you should be aware that the Scala equivalent differs so greatly from the Java version that a typical Java developer will have some difficulty to adapt to the Scala way of thinking. Still, it usually helps to first get used to the Java way, because it is usually simpler and easier to learn. I personally prefer to think of Java as the introductory course, and Scala is the pro version.
Java mutable collection concept vs. Scala mutable/immutable differentiation
static methods (Java) vs. singleton objects (Scala)
for loops
Java return statement vs. Scala functional style ("every expression returns a value")
Java's use of null for "no value" vs. Scala's more explicit Option type
imports
Java's switch vs. Scala's match
And here is a list of stuff that you will probably use from the Java standard library, even if you develop in Scala:
IO
GUI (Scala has a wrapper for Swing, but hey)
URLs, URIs, files
date
timers
And finally, some of Scala's features that have no direct equivalent in Java or the Java standard library:
operator overloading
implicits and implicit conversions
multiple argument lists / currying
anonymous functions / functions as values
actors
streams
Scala pattern matching (which rocks)
traits
type inference
for comprehensions
awesome collection operations like fold or map
Of course, all the lists are incomplete. That's just my view on what is important. I hope it helps.
And, by the way: You should definitely know about the class path and other JVM basics.
The standard library, above all else, because that's what Scala has most in common with Java.
You should also get a basic idea of Java's syntax, because a lot of books end up comparing something in Scala to something in Java. But other than the platform and some of the library, they're totally distinctive languages.
There are a few trivial conventions passed from one to the other (like command line options), but as you read books and tutorials on Scala you should pick those up as you go regardless of previous Java experience.
The serie "Scala for Java Refugees" can gives some indications on typical Java topics you are supposed to know and how they translate into Scala.
For instance, the very basic main() Java function which translate into the Application trait, once considered harmful, and now improved (for Scala 2.9 anyway).