I have read various old StackOverflow discussions on this general topic but there is still one part of the puzzle which appears, to me at least, to be missing.
It is simply this: what is the actual mechanism by which the anonymous function is serialized? And, where could we find its source code?
Or is it all just magic?
Other relevant SO articles (the third of these itself points to some useful articles outside StockOverflow):
Serialization of Scala Functions
Why Scala can serialize...
How to serialize functions in Scala
I'm going to answer my own question with what, I believe is the correct answer. The reason I'm doing it this way is that it seems to me that this aspect of serialization is never explained and it does appear to work just by magic. I essentially confirmed (to my satisfaction) the answer as part of the research I was doing to ensure that my question above was indeed appropriate.
But the main reason I'm offering my own answer is that I invite knowledgeable users either to agree with it, to correct it, to expand upon it, or to destroy it. Here goes...
It's all magic. No, I'm just kidding. But essentially the mechanism, once Scala has taken the step of representing the anonymous function as a Class, is entirely provided for by Java. In addition, we, the programmer, need to ensure that an anonymous function is as much pure code as possible: no references to any objects that might not be serializable. The secret sauce is to be found in the Java class: ObjectStreamClass. Which, in turn, is invoked by the Java serialization classes: ObjectInputStream and ObjectOutputStream.
Essentially the serialized bytes contain the full pathname of the class, its serialVersionUID, and whatever other relevant information is necessary. When deserializing, the system will simply look up the class in the appropriate classpath and return a reference to it. This obviously assumes that the deserializing system has the class in its classpath. The mechanism for that is a little beyond the scope of my research but it's clear that in a system like Spark, it should be easy to arrange.
No (additional) compilation/decompilation of byte code is necessary as the classLoader has everything necessary. I'm slightly surprised to find the ObjectStreamClass in java.io rather than in the reflection package, but I suppose there's an argument for it being there, given the tight coupling with ObjectInputStream and ObjectOutputStream.
One thing to keep in mind is that while we think in terms of serializing/deserializing objects, rather than classes, what we are dealing with here is an object of type Class.
One more thing to note is that in Scala 2.12, anonymous functions are now implemented differently: as Java8 lambdas. This has broken the mechanism described above in a rather serious way. So serious, that Spark is currently having trouble supporting Scala 2.12. The holdup appears to be this issue: SPARK-14540.
Related
Is there anyway we can read scala doc comments using reflection. My requirement is to read the #group tag value and use it for counting how many functions are there for each group
No, you can't use Scala reflection to access documentation comments. The reason is simple: comments are, almost by definition, not part of the program. Therefore, it is logically impossible for them to be available via reflection.
In Python, for example, documentation is available from the running program (in fact, even without using reflection), because the documentation is not hidden away in comments, but rather simply assigned to a field of the object that is being documented. Many Lisps (e.g. Clojure), and also Ioke and Seph work that way, too.
In Newspeak, what they call "comments" is available using reflection, but that's because what they call "comments" are not really comments, it is more like arbitrary metadata that can be attached to objects. It is in fact more similar to an annotation in Scala than a comment.
In Scala, documentation is written in comments, and comments are not part of the program (they are literally equivalent to whitespace in the Scala Language Specification), and therefore, cannot possibly be part of the program and thus cannot possibly be accessed via reflection.
I have a very simple question. This is not only true with spray-json but I have read similar claims with argonaut and circe. So please enlighten me.
In spray-json, I have come across the statement saying There is no reflection involved. I understand for type class based approach, if the user provides JsonFormat then all is well. But is this claim also true when it comes to using DefaultJsonProtocol?
Because when we you look at this, you can see the usage of clazz.getMethods, clazz.getDeclaredFields, etc. Isn't this the usage of reflection? Though of course thanks to object#apply that we do not need to worry about setting unlike in Java world using reflection. But at least for reading the field names, I do not understand on how reflection can be overlooked.
I'm not very familiar with spray-json, so I won't defend its claims about reflection, which definitely seem to be at odds with the parts of ProductFormats you point to.
I do know more about circe and Argonaut and argonaut-shapeless and Play JSON, all of which do use a kind of reflection to derive codecs for case classes and other user-defined types. The important point is that these libraries don't use runtime reflection—they determine the field names and other information they need at compile time through Scala's macro system.
Generally when people talk about "reflection" in the context of Java or Scala, they mean runtime reflection, but macros also support a kind of reflection, so when I personally talk about how derivation works in these libraries, I try to be careful to specify that there's no runtime reflection involved.
You can argue that compile-time reflection (or metaprogramming, or whatever you want to call it) is much less bad than runtime reflection. It may make your code more complex, and it's very easy to abuse, but it doesn't introduce the same kinds of fragility as runtime reflection, and it doesn't undermine your ability to reason about your code in the same ways that runtime reflection does. If you understand what the macro does (which is a big if), you'll never be surprised at runtime.
Types are fundamentally about rejecting bad potential programs before you run them, and introspection on types at runtime muddles this all up (as Erik Osheim says, "If you meet a Type in the Runtime, kill it"). On the other hand, introspection on types at compile-time is exactly what compilers do, and macros just give you as the programmer a clean way of getting involved in that process (or at least relatively clean, compared to writing compiler plugins, etc.).
There may also be performance benefits to avoiding runtime reflection, but for me personally that's generally a secondary concern—I hate runtime reflection because I've wasted too much of my life debugging horrible Java code that uses horrible Java libraries that depend heavily on runtime reflection—not because runtime reflection might make my programs marginally slower.
That's all a very long-winded way to say that you should read "there is no reflection involved" in this context as "there is no runtime reflection involved" (and even then you shouldn't take the author at their word, I guess, given all that getMethods stuff in spray-json).
The title might be a little confusing so let me elaborate, I've been reading some criticism regarding Scala. It was an email sent to Tyepsafe regarding some deficiencies in Scala from Coda Hale (Yammer's Infrastructure Architect), so to quote:
we stopped seeing lambdas as free and started seeing them as syntactic sugar on top of anonymous classes and thus acquired the same distaste for them as we did anonymous classes.
So, from this, I have a couple of questions regarding how lambdas work in Scala:
What is the difference between a free function and a function that is bound to an anonymous class (technically, aren't all functions bound to the main singleton object)?
What is the impact on performance of using an anonymous class bound function instead of a free function?
Yes, lambdas are still objects, instances of anonymous classes.
This is how the JVM works, all references are objects. You can have either references or values (primitives) and there's no way around it.
Later versions of Java have MethodHandles. But it's worth noting that MethodHandle is also still just an abstract class - albeit one that the JVM specifically knows how to optimise away at runtime.
Also also worth noting is that the JVM can often perform escape analysis on abstract classes (such as Scala's functions), and optimise these away too.
On top of this, Scala can use any object with an apply method as though it were a Function. In this case, the explicit call to apply is emitted in the bytecode and you're not dealing with anonymous classes any more.
Given all of the above, it's impossible to make a general statement regarding the performance of Scala's function implementation, it depends on your specific code/use case. In general, I wouldn't worry unless you hit a corner case where your profiler pinpoints a problem here (which is very unlikely)
Well, in C for example a function is just a 32 or 64 bit pointer to a place in memory to jump to and the concept of a closure doesn't really apply since you can't declare an anonymous c function. I don't know how the C++ lambdas work, I guess the compiler makes a method and passes the fields you want in the closure along with parameters. Maybe that's what you're looking for. In the JVM you have to wrap your logic in a class so now you have a virtual table of methods, fields, and some methods related to synchronization and the type system.
What is the impact on performance?...I don't know, have you noticed an impact on performance? A lot of that extra Java stuff I described really isn't needed for an anonymous class and might just get optimized out. I imagine there are butterflies that influence the weather more than the extra JVM stuff would effect your software.
I'm having a hard time deciphering Scala API documentation.
For example, I've defined a timestamp for use in a database.
def postedDate = column[Timestamp]("posted_date", O NotNull, O Default new Timestamp(Calendar.getInstance.getTimeInMillis), O DBType("timestamp"))
If I hadn't read several examples, of which none were in the API doc, how could I construct this statement? From the Column documentation how could I know the parameters?
I guessed it had something to do with TimestampTypeMapperDelegate but it is still not crystal clear how to use it.
The first thing to note from the scaladoc for Column is that it is abstract, so you probably want to deal directly with one if its subclasses. For example, NamedColumn.
Other things to note are that it has a type parameter and the constructor takes an implicit argument of a TypeMapper of the same parameter type. The docs for TypeMapper provide an example of how to create a custom one, but if you look at the subclasses, there are plenty of provided ones (such as timestamp). The fact that the argument is declared as implicit suggests that there could be one in scope, and if so, it will automatically be used as the parameter without explicitly stating that. If there isn't an implicit in scope that satisfies the requirement, you'll have to provide it.
The next think to note is that a TypeMapper is a trait that extends a function with an argument of a BasicProfile and a TypeMapperDelegate result. Basically what's going on here is the definition of a type mapper is separated from the implementation. This is done to support multiple flavors of database. If look at the subclasses of BasicProfile, it will become apparent that ScalaQuery supports quite a few, and as we know, their implementations are sometimes quite different.
If you chase the docs for a while, you end up at the BasicTypeMapperDelegates trait that has a bunch of vals in it with delegates for each of the basic types (including timestamps).
BasicTable defines a method called column (which you've found), and the intent of the column method is to shield you from having to know anything about TypeMappers and Delegates as long as you are using standard types.
So, I guess to answer your question about whether there is enough information in the API docs, I'd personally say yes, but the docs could be enhanced with better descriptions of classes, objects, traits and methods.
All that said, I've always found that leveraging examples, API docs, and even the source code of the project provides a robust way of getting up to speed on most open source projects. To be quite blunt, many of these projects (including ScalaQuery) have saved me countless hours of work, but probably cost the author(s) countless hours of personal time to create and make available. These are not necessarily commercial products, and we as consumers shouldn't hold them to the same standards that we hold for-fee products. If you find docs inadequate, contribute!
I was a little surprised when I started using Lift how heavily it uses reflection (or appears to), it was a little unexpected in a statically-typed functional language. My experience with JSP was similar.
I'm pretty new to web development, so I don't really know how these tools work, but I'm wondering,
What aspects of web development encourage using reflection?
Are there any tools (in statically typed languages) that handle (1) referring to code from a template page (2) object-relational mapping, in a way that does not use reflection?
Please see lift source. It doesn't use reflection for most of the code that I have studied. Almost everything is statically typed. If you are referring to lift views they are processed as Xml nodes, that too is not reflection.
Specifically referring to the <lift:Foo.bar/> issue:
When <lift:Foo.bar/> is encountered in the code, Lift makes a few guesses, how the original name should have been (different naming conventions) and then calls java.lang.Class.forName to get the class. (Relevant code in LiftSession.scala and ClassHelpers.scala.) It will only find classes registered with addToPackages during boot.
Note that it is also possible (and common) to register classes and methods manually. Convention is still that all transformations must be of the form NodeSeq => NodeSeq because that is the only thing which makes sense for an untyped HTML/XHTML output.
So, what you have is Lift‘s internal registry of node transformations on one side, and on the other side the implicit registry of the module. Both types use a simple string lookup to execute a method. I guess it is arguable if one is more reflection based than the other.