My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
… and many others …
I have a Scala library which contains some utility codes and UDF for the Scala Spark API.
However, I would love to now start to use this Scala library with PySpark. Using Java based classes seems to work pretty OK like outlined Running custom Java class in PySpark, however as I use a library written in Scala some the names of some classes might not be straight forward and contain characters like $.
How is interoperability still possible?
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
In general you don't. While access in such cases is sometimes possible, using __getattribute__ / getattr, Py4j is simply not designed with Scala in mind (that's really not Python specific - while Scala is technically speaking interpolatable with Java, it is much richer language, and many of its features are not easily accessible from other JVM languages).
In practice you should do the same thing that Spark does internally - instead of exposing Scala API directly, you create a lean* Java or Scala API, which is specifically designed for interoperability with guest languages. Since Py4j provides translation only between basic Python and Java types, and doesn't handle commonly used Scala interfaces, you will need such intermediate layer anyway, unless Scala library was specifically designed for Java interoperability.
As of your last concern
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
Py4j can handle Java generics just fine without any special treatment. Advanced Scala features (manifests, class tags, type tags) are typically no go, but once again, there are not designed (though it is possible) with Java interoperability in mind.
* As a rule of thumb, if something is Java friendly (doesn't require any crazy hacks, extensive type conversions, or filling the blanks normally handled by the Scala compiler), it should be a good fit for PySpark as well.
What are the steps required for evaluating an external DSL in scala, and what libraries are available for these?
After digging around i am able to create an AST out of case classes using parser combinators. What are the next steps in the process? I looked at kiama ( but it seems unclear from documentation ( may be due to my limited langauage processing knowledge ) how to maintain symbol tables, how to bind actions to dsl statements etc.
I agree that it would be good to have more tutorial-style documentation for common language processing tasks in Kiama. We are working on it, but I have nothing concrete to report at the moment.
In the meantime, all I can offer is the examples in the Kiama distribution. In particular, the minijava example is a reasonably accessible compiler for a non-trivial subset of Java. It does name and type analysis (see SemanticAnalysis.scala) and generates JVM bytecode. The semantic analysis uses a simple model of passing around an environment from declarations to uses of names. Feel free to contact us here or on the Kiama mailing list if you have specific questions about how the example works.
The Oberon-0 example is also a complete compiler from an imperative language to C, including semantic analysis.
I've a DSL written using Xtext. What I want is to execute that DSL to perform something good out of it.
I wrote myDslGenerator class implementing the interface IGenerator in xtend to generate java code and it's working fine.
I've two questions;
What is the difference between Interpreter and Code Generator?
Aren't both for executing DSL?
How to write an interpreter? Any step by step tutorial link? I found many tutorial to generate code using xtend but couldn't find any for writing an interpreter.
Thank you,
Basically, interpreters and code generators work really differently. Code generators are like a compiler: they create executable code of your DSL in another language; on the other hand, interpreters are used to traverse your DSL and execute them in your own environment. This means, the generated code does not have to (but of course it can) depend on your DSL, can be faster/more optimized; while interpreters need to understand the constructs of your language, but can be executed in your development IDE, not required to run an additional application.
AFAIK Xtext does not support writing interpreters, its somewhat out of their scope (not entirely - for Xbase expressions there is an XbaseInterpreter instance, that can be reused - provided you set its classpath correctly), as they are extremely language-specific.
I also don't know any step-by-step tutorial about interpreting Xtext DSLs (not even for the XbaseInterpreter), but it basically boils down to a traversal of the AST, and as a node is traversed, the corresponding statement is executed dynamically. For this traversal to work, as expected, the interpreter has to maintain a (possibly hierarchic) context of variables and other references.
Given a Scala AST, is there a way to generate Scala source code?
I'm looking into ways to autogenerate Scala source by parsing/analyzing other Scala source. Any tips would be appreciated!
I have been successfully using Scala-Refactoring by Mirko Stocker for this task.
For synthetically constructing ASTs, it relies strongly on the existing Tree DSL of Scala's NSC.
Although the code is a bit messy, you can find an example usage in my project ScalaCollider-UGens.
I have also come across a very useful class by Johannes Rudolph.
See our DMS Software Reengineering Toolkit.
DMS provides a complete ecosystem for parsing/analyzing/optimizing/transforming source code in many languages. It achieves this by provide generic machinery for these tasks as its core capabilities, and specializing those according to explicitly supplied language definitions ("front ends"). DMS has front ends for many languages (C, C++, C#, Java, COBOL, ...) that have been used in anger, and a process for defining others very quickly.
We work on expanding the language set more or less continuously. DMS already has parts of a Scala front end implemented, and we know how to finish it based on the other 30+ front ends we have built, with special emphasis on knowledge of Java.
Many of the available resources for learning Scala assume some background in Java. This can prove challenging for someone who is trying to learn Scala with no Java background.
What are some Java-isms a new Scala developer should know about as they learn the language?
For example, it's useful to know what a CLASSPATH is, what the java command line options are, etc...
That's a really great question! I've never thought about people learning Java just so they have it easier to learn Scala...
Apart from all the basics like for loops and such, learning Java Generics can be really helpful. The Scala equivalent is much more potent (and much harder to understand) than Java Generics. You might want to try to figure out where the limits of Java Generics are, and then in which cases Scala's type constructors can be used to overcome those limitations. At the more basic level, it is important to know why Generics are necessary, and how Java is a strongly typed language.
Java allows you to have multiple constructors for one class. This knowledge will be of no use when you learn Scala, because Scala has another way that allows you to offer several methods to create instances of a class. So, you'd rather not have a deep look into this Java concept.
Here are some concepts that differ very strongly between Java and Scala. So, if you learn the Java concepts and then later on want to learn the equivalent in Scala, you should be aware that the Scala equivalent differs so greatly from the Java version that a typical Java developer will have some difficulty to adapt to the Scala way of thinking. Still, it usually helps to first get used to the Java way, because it is usually simpler and easier to learn. I personally prefer to think of Java as the introductory course, and Scala is the pro version.
Java mutable collection concept vs. Scala mutable/immutable differentiation
static methods (Java) vs. singleton objects (Scala)
for loops
Java return statement vs. Scala functional style ("every expression returns a value")
Java's use of null for "no value" vs. Scala's more explicit Option type
Java's switch vs. Scala's match
And here is a list of stuff that you will probably use from the Java standard library, even if you develop in Scala:
GUI (Scala has a wrapper for Swing, but hey)
URLs, URIs, files
And finally, some of Scala's features that have no direct equivalent in Java or the Java standard library:
operator overloading
implicits and implicit conversions
multiple argument lists / currying
anonymous functions / functions as values
Scala pattern matching (which rocks)
type inference
for comprehensions
awesome collection operations like fold or map
Of course, all the lists are incomplete. That's just my view on what is important. I hope it helps.
And, by the way: You should definitely know about the class path and other JVM basics.
The standard library, above all else, because that's what Scala has most in common with Java.
You should also get a basic idea of Java's syntax, because a lot of books end up comparing something in Scala to something in Java. But other than the platform and some of the library, they're totally distinctive languages.
There are a few trivial conventions passed from one to the other (like command line options), but as you read books and tutorials on Scala you should pick those up as you go regardless of previous Java experience.
The serie "Scala for Java Refugees" can gives some indications on typical Java topics you are supposed to know and how they translate into Scala.
For instance, the very basic main() Java function which translate into the Application trait, once considered harmful, and now improved (for Scala 2.9 anyway).