Access scala function in PySpark - scala

I have a Scala library which contains some utility codes and UDF for the Scala Spark API.
However, I would love to now start to use this Scala library with PySpark. Using Java based classes seems to work pretty OK like outlined Running custom Java class in PySpark, however as I use a library written in Scala some the names of some classes might not be straight forward and contain characters like $.
How is interoperability still possible?
How can I use Java/Scala code which is offering a function requiring a generic type parameter?

In general you don't. While access in such cases is sometimes possible, using __getattribute__ / getattr, Py4j is simply not designed with Scala in mind (that's really not Python specific - while Scala is technically speaking interpolatable with Java, it is much richer language, and many of its features are not easily accessible from other JVM languages).
In practice you should do the same thing that Spark does internally - instead of exposing Scala API directly, you create a lean* Java or Scala API, which is specifically designed for interoperability with guest languages. Since Py4j provides translation only between basic Python and Java types, and doesn't handle commonly used Scala interfaces, you will need such intermediate layer anyway, unless Scala library was specifically designed for Java interoperability.
As of your last concern
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
Py4j can handle Java generics just fine without any special treatment. Advanced Scala features (manifests, class tags, type tags) are typically no go, but once again, there are not designed (though it is possible) with Java interoperability in mind.
* As a rule of thumb, if something is Java friendly (doesn't require any crazy hacks, extensive type conversions, or filling the blanks normally handled by the Scala compiler), it should be a good fit for PySpark as well.

Related

scala native vs. JNI

I'm lifting a native API to Scala. There appear to be two paths: use JNI or use Scala Native.
JNI usage creates the methods you want in Java and then shadows them in C where you write C code to access your API. Pro: you can use the native API's data structures directly. Con: your Scala code now also has to provide its own native wrapper library increasing chances of portability complications, and now you wrapped the library twice, once in JNI C to get it into the JVM and then in the Java/Scala module. A lot of work, many places for bugs. I used this path many times, I figured it was the way the world was.
Then along comes Scala native which does the reverse and shadows the C functions in Scala. Pro: you don't write in C so no new native library and no double wrapper. Con: it seems you can only lift Scala native primitives so complex native data structures can't be accessed. That makes the native library useless if it can't use its data structures.
Neither is terribly portable, as expected, but is there some functionality I'm over looking that fixes the cons of one or both of these approaches? Or some other reason to pick on Scala native over JNI?
Scala Native does not "shadow" C functions in JVM-based Scala. Scala Native is the direct equivalent of bare-metal/native C language but with Scala syntax instead of C syntax and Scala semantics (without any JVM-based semantics) instead of C semantics. If you were to chose JNI, then you would conceivably have 2 choices in the scope of your question of which language to utilize to invoke the JNI injections into the JVM from outside the JVM: either C or Scala Native. If you were to choose Scala Native to write UI code on a JVM-based infrastructure (e.g., Android) then you would be effectively choosing JNI as well by having Scala Native invoke JNI (instead of C invoking JNI). (All mentioning of C here applies to C++ as well.) All that said, utilizing Scala Native invoking JNI to perform UI function invocation from outside the JVM is highly unorthodox (and utilized widely only in the Xamarin-based world).

Compilation / Code Generation of External Scala DSL

My understanding is that it is quite simple to create & parse an external DSL in Scala (e.g. representing rules). Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
EDIT: To be more precise, my question is if I could achieve this (create an external domain specific language and generate java/scala code) with built-in Scala tools/libraries (e.g. http://www.artima.com/pins1ed/combinator-parsing.html). Not writing a whole parser / code generator completely by yourself in scala. It's also clear that you can achieve this with third-party tools but you have to learn additional stuff and have additional dependencies. I'm new in the area of implementing DSLs, so I have no gutfeeling so far when to use external tools like ANTLR and what you can (with a reasonable effort) do with Scala on-board stuff.
Is my assumption correct that the DSL can only be interpreted during runtime but does not support code generation (like ANTLR) for archiving better performance ?
No, this is wrong. It is possible to write a compiler in Scala, after all, Scala is Turing-complete (i.e. you can write anything), and you don't even need Turing-completeness for a compiler.
Some examples of compilers written in Scala include
the Scala compiler itself (in all its variations, Scala-JVM, Scala.js, Scala-native, Scala-virtualized, Typelevel Scala, the abandoned Scala.NET, …)
the Dotty compiler
Scalisp
Scalispa
… and many others …

Why do web development frameworks tend to work around the static features of languages?

I was a little surprised when I started using Lift how heavily it uses reflection (or appears to), it was a little unexpected in a statically-typed functional language. My experience with JSP was similar.
I'm pretty new to web development, so I don't really know how these tools work, but I'm wondering,
What aspects of web development encourage using reflection?
Are there any tools (in statically typed languages) that handle (1) referring to code from a template page (2) object-relational mapping, in a way that does not use reflection?
Please see lift source. It doesn't use reflection for most of the code that I have studied. Almost everything is statically typed. If you are referring to lift views they are processed as Xml nodes, that too is not reflection.
Specifically referring to the <lift:Foo.bar/> issue:
When <lift:Foo.bar/> is encountered in the code, Lift makes a few guesses, how the original name should have been (different naming conventions) and then calls java.lang.Class.forName to get the class. (Relevant code in LiftSession.scala and ClassHelpers.scala.) It will only find classes registered with addToPackages during boot.
Note that it is also possible (and common) to register classes and methods manually. Convention is still that all transformations must be of the form NodeSeq => NodeSeq because that is the only thing which makes sense for an untyped HTML/XHTML output.
So, what you have is Lift‘s internal registry of node transformations on one side, and on the other side the implicit registry of the module. Both types use a simple string lookup to execute a method. I guess it is arguable if one is more reflection based than the other.

What parts of the Java ecosystem and language should a developer learn to get the most out of Scala?

Many of the available resources for learning Scala assume some background in Java. This can prove challenging for someone who is trying to learn Scala with no Java background.
What are some Java-isms a new Scala developer should know about as they learn the language?
For example, it's useful to know what a CLASSPATH is, what the java command line options are, etc...
That's a really great question! I've never thought about people learning Java just so they have it easier to learn Scala...
Apart from all the basics like for loops and such, learning Java Generics can be really helpful. The Scala equivalent is much more potent (and much harder to understand) than Java Generics. You might want to try to figure out where the limits of Java Generics are, and then in which cases Scala's type constructors can be used to overcome those limitations. At the more basic level, it is important to know why Generics are necessary, and how Java is a strongly typed language.
Java allows you to have multiple constructors for one class. This knowledge will be of no use when you learn Scala, because Scala has another way that allows you to offer several methods to create instances of a class. So, you'd rather not have a deep look into this Java concept.
Here are some concepts that differ very strongly between Java and Scala. So, if you learn the Java concepts and then later on want to learn the equivalent in Scala, you should be aware that the Scala equivalent differs so greatly from the Java version that a typical Java developer will have some difficulty to adapt to the Scala way of thinking. Still, it usually helps to first get used to the Java way, because it is usually simpler and easier to learn. I personally prefer to think of Java as the introductory course, and Scala is the pro version.
Java mutable collection concept vs. Scala mutable/immutable differentiation
static methods (Java) vs. singleton objects (Scala)
for loops
Java return statement vs. Scala functional style ("every expression returns a value")
Java's use of null for "no value" vs. Scala's more explicit Option type
imports
Java's switch vs. Scala's match
And here is a list of stuff that you will probably use from the Java standard library, even if you develop in Scala:
IO
GUI (Scala has a wrapper for Swing, but hey)
URLs, URIs, files
date
timers
And finally, some of Scala's features that have no direct equivalent in Java or the Java standard library:
operator overloading
implicits and implicit conversions
multiple argument lists / currying
anonymous functions / functions as values
actors
streams
Scala pattern matching (which rocks)
traits
type inference
for comprehensions
awesome collection operations like fold or map
Of course, all the lists are incomplete. That's just my view on what is important. I hope it helps.
And, by the way: You should definitely know about the class path and other JVM basics.
The standard library, above all else, because that's what Scala has most in common with Java.
You should also get a basic idea of Java's syntax, because a lot of books end up comparing something in Scala to something in Java. But other than the platform and some of the library, they're totally distinctive languages.
There are a few trivial conventions passed from one to the other (like command line options), but as you read books and tutorials on Scala you should pick those up as you go regardless of previous Java experience.
The serie "Scala for Java Refugees" can gives some indications on typical Java topics you are supposed to know and how they translate into Scala.
For instance, the very basic main() Java function which translate into the Application trait, once considered harmful, and now improved (for Scala 2.9 anyway).

What are the key differences between Scala and Groovy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
On the surface Groovy and Scala look pretty similar, aside from Scala being statically typed, and Groovy dynamic.
What are the other key differences, and advantages each have over the other?
How similar are they really?
Is there competition between the two?
If so, who do you think will win in the long run?
They're both object oriented languages for the JVM that have lambdas and closures and interoperate with Java. Other than that, they're extremely different.
Groovy is a "dynamic" language in not only the sense that it is dynamically typed but that it supports dynamic meta-programming.
Scala is a "static" language in that it is statically typed and has virtually no dynamic meta-programming beyond the awkward stuff you can do in Java. Note, Scala's static type system is substantially more uniform and sophisticated than Java's.
Groovy is syntactically influenced by Java but semantically influenced more by languages like Ruby.
Scala is syntactically influenced by both Ruby and Java. It is semantically influenced more by Java, SML, Haskell, and a very obscure OO language called gBeta.
Groovy has "accidental" multiple dispatch due to the way it handles Java overloading.
Scala is single dispatch only, but has SML inspired pattern matching to deal with some of the same kinds of problems that multiple dispatch is meant to handle. However, where multiple dispatch can only dispatch on runtime type, Scala's pattern matching can dispatch on runtime types, values, or both. Pattern matching also includes syntactically pleasant variable binding. It's hard to overstress how pleasant this single feature alone makes programming in Scala.
Both Scala and Groovy support a form of multiple inheritance with mixins (though Scala calls them traits).
Scala supports both partial function application and currying at the language level, Groovy has an awkward "curry" method for doing partial function application.
Scala does direct tail recursion optimization. I don't believe Groovy does. That's important in functional programming but less important in imperative programming.
Both Scala and Groovy are eagerly evaluated by default. However, Scala supports call-by-name parameters. Groovy does not - call-by-name must be emulated with closures.
Scala has "for comprehensions", a generalization of list comprehensions found in other languages (technically they're monad comprehensions plus a bit - somewhere between Haskell's do and C#'s LINQ).
Scala has no concept of "static" fields, inner classes, methods, etc - it uses singleton objects instead. Groovy uses the static concept.
Scala does not have built in selection of arithmetic operators in quite the way that Groovy does. In Scala you can name methods very flexibly.
Groovy has the elvis operator for dealing with null. Scala programmers prefer to use Option types to using null, but it's easy to write an elvis operator in Scala if you want to.
Finally, there are lies, there are damn lies, and then there are benchmarks. The computer language benchmarks game ranks Scala as being between substantially faster than Groovy (ranging from twice to 93 times as fast) while retaining roughly the same source size. benchmarks.
I'm sure there are many, many differences that I haven't covered. But hopefully this gives you a gist.
Is there a competition between them? Yes, of course, but not as much as you might think. Groovy's real competition is JRuby and Jython.
Who's going to win? My crystal ball is as cracked as anybody else's.
scala is meant to be an oo/functional hybrid language and is very well planned and designed. groovy is more like a set of enhancements that many people would love to use in java.
i took a closer look at both, so i can tell :)
neither of them is better or worse than the other. groovy is very good at meta-programming, scala is very good at everything that does not need meta-programming, so...i tend to use both.
Scala has Actors, which make concurrency much easier to implement. And Traits which give true, typesafe multiple inheritance.
You've hit the nail on the head with the static and dynamic typing. Both are part of the new generation of dynamic languages, with closures, lambda expressions, and so on. There are a handful of syntactic differences between the two as well, but functionally, I don't see a huge difference between Groovy and Scala.
Scala implements Lists a bit differently; in Groovy, pretty much everything is an instance of java.util.List, whereas Scala uses both Lists and primitive arrays. Groovy has (I think) better string interpolation.
Scala is faster, it seems, but the Groovy folks are really pushing performance for the 2.0 release. 1.6 gave a huge leap in speed over the 1.5 series.
I don't think that either language will really 'win', as they target two different classes of problems. Scala is a high-performance language that is very Java-like without having quite the same level of boilerplate as Java. Groovy is for rapid prototyping and development, where speed is less important than the time it takes for programmers to implement the code.
Scala has a much steeper learning curve than Groovy. Scala has much more support for functional programming with its pattern matching and tail based recursion, meaning more tools for pure FP.
Scala also has dynamica compilation and I have done it using twitter eval lib (https://github.com/twitter/util ). I kept scala code in a flat file(without any extension) and using eval created scala class at run time.
I would say scala is meta programming and has feature of dynamic complication