I am very new at Scala and Spark area, and I found a strange grammar usage in the scala inside the Apache beam project and I can't understand.
Here is the strange place:
JavaDStream<Metadata> metadataDStream = mapWithStateDStream.map(new Tuple2MetadataFunction());
// register ReadReportDStream to report information related to this read.
new ReadReportDStream(metadataDStream.dstream(), id, getSourceName(source, id), stepName)
.register();
From the above code, you can see inside the constructor of ReadReportDstream, the first parameter is
metadataDStream.dstream()
If we go inside the dstream() method, you will see the following code:
class JavaDStream[T](val dstream: DStream[T])(implicit val classTag: ClassTag[T])
extends AbstractJavaDStreamLike[T, JavaDStream[T], JavaRDD[T]] {
I am wondering why it uses "metadataDStream.dstream()" in the constructor instead of "metadataDStream.dstream"? What does the "()" do?
It's mostly a question of convention. Methods with empty parameter lists are evaluated for their side-effects. Methods without parameters are assumed to be purely functional, and free of side-effects. You can read more about that here - https://docs.scala-lang.org/style/method-invocation.html (Arity-0 section)
So in that case, we're probably having some side-effects in metadataDStream.dstream(). However, syntactically writing it as metadataDStream.dstream won't be an error.
Related
Recently, I have developed a Spark Streaming application using Scala and Spark. In this application, I have extensively used Implicit Class (Pimp my Library pattern) to implement more general utilities like Writing a Dataframe to HBase by creating an implicit class that is extending Spark's Dataframe. For example,
implicit class DataFrameExtension(private val dataFrame: DataFrame) extends Serializable { ..... // Custom methods to perform some computations }
However, a senior architect from my team refactored the code (specifying some style mismatch and performance as a reason) and copied these methods to a new class. Now, these methods accept Dataframe as an argument.
Can anyone help me on,
Whether Scala's implicit classes creates any overhead during
run-time?
Does moving dataframe object between methods creates any overhead, either in terms of method calls or serialization?
I have searched a bit, but couldn't find any style guide that gives guidelines on using implicit classes or methods over traditional methods.
Thanks in advance.
Whether Scala's implicit classes creates any overhead during run-time?
Not in your case. There is some overhead when the implicit type is AnyVal (thus needs to be boxed). Implicits are resolved during compile time, and except for maybe a few virtual method calls there should be no overhead.
Does moving dataframe object between methods creates any overhead, either in terms of method calls or serialization?
No, no more then any other type. Obviously there will be no serialization.
... if I pass dataframes between methods in Spark code, it might create closure and as a result, will bring the parent class that holds the dataframe object.
Only if you use scoped variables inside your dataframe, for example filter($"col" === myVar) where myVar declared in the scope of the method. In this case, Spark might serialize the wrapping class, but it's easy to avoid that. Please remember that dataframes are passed quite often and quite deep inside Spark code, and probably in every other library that you might be using (datasources, for example).
It is very common (and handy) to use extension implicit classes like you did.
We are pretty familiar with implicits in Scala for now, but macros are pretty undiscovered area (at least for me) and, despite the presence of some great articles by Eugene Burmako, it is still not an easy material to just dive in.
In this particular question I'd like to find out if there is a possibility to achieve the analogous to the following code functionality using just macros:
implicit class Nonsense(val s: String) {
def ##(i:Int) = s.charAt(i)
}
So "asd" ## 0 will return 'a', for example. Can I implement macros that use infix notation? The reason to this is I'm writing a DSL for some already existing project and implicits allow making the API clear and concise, but whenever I write a new implicit class, I feel like introducing a new speed-reducing factor. And yes, I do know about value classes and stuff, I just think it would be really great if my DSL transformed into the underlying library API calls during compilation rather than in runtime.
TL;DR: can I replace implicits with macros while not changing the API? Can I write macros in infix form? Is there something even more suitable for this case? Is the trouble worth it?
UPD. To those advocating the value classes: in my case I have a little more than just a simple wrapper - they are often stacked. For example, I have an implicit class that takes some parameters, returns a lambda wrapping this parameters (i.e. partial function), and the second implicit class that is made specifically for wrapping this type of functions. I can achieve something like this:
a --> x ==> b
where first class wraps a and adds --> method, and the second one wraps the return type of a --> x and defines ==>(b). Plus it may really be the case when user creates considerable amount of objects in this fashion. I just don't know if this will be efficient, so if you could tell me that value classes cover this case - I'd be really glad to know that.
Back in the day (2.10.0-RC1) I had trouble using implicit classes for macros (sorry, I don't recollect why exactly) but the solution was to use:
an implicit def macro to convert to a class
define the infix operator as a def macro in that class
So something like the following might work for you:
implicit def toNonsense(s:String): Nonsense = macro ...
...
class Nonsense(...){
...
def ##(...):... = macro ...
...
}
That was pretty painful to implement. That being said, macro have become easier to implement since.
If you want to check what I did, because I'm not sure that applies to what you want to do, refer to this excerpt of my code (non-idiomatic style).
I won't address the relevance of that here, as it's been commented by others.
Within an annotation Macro, I'm enumerating members of a class and want the types of the methods that I find.
So I happily iterate over the body of the class, and collect all the DefDef members.
... which I can't typecheck.
For each DefDef I've tried wrapping it in an Expr and using actualType. I've tried duplicating the thing and transplanting it into an ad-hoc class (via quasiquotes). I've tried everything else I can think of :)
The best I can get is either NoType or Any, depending on the technique used. The worst I get is to have an exception thrown at me.
These are simple methods, of the form def foo(i: String) = i, so the return type needs to be inferred, but there's no external information required. There are no abstract types, or type params, or other members of the class involved here. I'd like to handle more advanced cases later, but want to have these trivial examples working first.
In a plugin, this would be simple. I'd just typecheck the entire unit with errors suppressed and get at what I want through the symbols, then reset the tree attributes for subsequent processing. As a macro... I'm stumped.
What am I missing?
In a macro it's the same. Instead of typed as in plugins, you call c.typeCheck, but have to be careful not to fall into a trap (https://github.com/scalamacros/paradise/issues/1) that is supposed to be fixed in 2.10.5 and 2.11.0. After a successful return from a c.typeCheck, you can get access to the symbol and do all the usual stuff.
So I'm playing with writing a battlecode player in Scala. In battlecode certain classes are disallowed and there is a runtime exception if you ever try to access them. When I use the Array.fill function I get a message from the battlecode server saying [java] Illegal class: scala/reflect/Manifest$. This is the offending line:
val g_score = Array.fill[Int](rc.getMapWidth(), rc.getMapHeight())(0)
The method takes an implicit ClassManifest argument which has the following documentation:
A ClassManifest[T] is an opaque descriptor for type T. It is used by the compiler
to preserve information necessary for instantiating Arrays in those cases where
the element type is unknown at compile time.
But I do know the type of the array elements at compile time, as shown above I explicitly state that they will be Int. Is there a way to avoid this? To workaround I've written my own version of Array.fill. This seems like a hack. As an aside, does Scala have real 2D arrays? Array.fill seems to return an Array[Array[T]] which is the only way I found to write my own. This also seems inelegant.
Edit: Using Scala 2.9.1
For background information, see this related question: What is a Manifest in Scala and when do you need it?. In this answer, you will find an explanation why manifests are needed for arrays.
In short: Although the JVM uses type erasure, arrays are an exception and need a manifest. Since you could compile your code, that manifest was found (manifests are always available for proper types). Your error occurs at runtime.
I don't know the details of the battlecode server, but there are two possibilities: Either you are running your compiled classes with a binary incompatible version of Scala (difference in major version, e.g. compiled with Scala 2.9 and server uses 2.10). Or the server doesn't even have the scala-library.jar on its class path.
As said in the comment, manifests are deprecated in Scala 2.10 and replaced by ClassTag.
EDIT: So it seems the class loader is artificially restricting the allowed classes. My suggestion would be: Add a helper Java class. You can easily mix Java and Scala code. If it's just about the Int-Array instantiation, you could provide something like:
public static class Helper {
public static int[][] makeArray(int d1, int d2) { return new int[d1][d2](); }
}
(hope that's valid java code, a bit rusty)
Also, have you tried to create the outer array with new Array[Array[Int]](d1), and then iterate to create the inner arrays?
In groovy one can do:
class Foo {
Integer a,b
}
Map map = [a:1,b:2]
def foo = new Foo(map) // map expanded, object created
I understand that Scala is not in any sense of the word, Groovy, but am wondering if map expansion in this context is supported
Simplistically, I tried and failed with:
case class Foo(a:Int, b:Int)
val map = Map("a"-> 1, "b"-> 2)
Foo(map: _*) // no dice, always applied to first property
A related thread that shows possible solutions to the problem.
Now, from what I've been able to dig up, as of Scala 2.9.1 at least, reflection in regard to case classes is basically a no-op. The net effect then appears to be that one is forced into some form of manual object creation, which, given the power of Scala, is somewhat ironic.
I should mention that the use case involves the servlet request parameters map. Specifically, using Lift, Play, Spray, Scalatra, etc., I would like to take the sanitized params map (filtered via routing layer) and bind it to a target case class instance without needing to manually create the object, nor specify its types. This would require "reliable" reflection and implicits like "str2Date" to handle type conversion errors.
Perhaps in 2.10 with the new reflection library, implementing the above will be cake. Only 2 months into Scala, so just scratching the surface; I do not see any straightforward way to pull this off right now (for seasoned Scala developers, maybe doable)
Well, the good news is that Scala's Product interface, implemented by all case classes, actually doesn't make this very hard to do. I'm the author of a Scala serialization library called Salat that supplies some utilities for using pickled Scala signatures to get typed field information
https://github.com/novus/salat - check out some of the utilities in the salat-util package.
Actually, I think this is something that Salat should do - what a good idea.
Re: D.C. Sobral's point about the impossibility of verifying params at compile time - point taken, but in practice this should work at runtime just like deserializing anything else with no guarantees about structure, like JSON or a Mongo DBObject. Also, Salat has utilities to leverage default args where supplied.
This is not possible, because it is impossible to verify at compile time that all parameters were passed in that map.