I'm new to scala and struggling with the documentation a little bit. I was looking at a piece of code in the spark codebase (cosine similarity for RowMatrix) and saw that they use Iterator.tabulate. Not knowing what that function does I looked in the scala API docs, only to find out the function does not exist. Except that it does exist, because I can use it in the repl (hmm, maybe I'm looking at the wrong API docs version ... no, this this is the current version).
After a bit of searching I find out that tabulate is defined (at least) in scala.collection.generic.SeqFactory and scala.collection.generic.TraversableFactory. These two however appear not the be connected in the dependency graph. I can't find any path between the two, and hence no way of actually knowing - from looking at the API docs - that .tabulate even exists.
So the question is: how do you find .tabulate and it's documentation from looking at the API docs for the class (say Iterator or Seq). Do I just have to google my way around it, or is there some magic button in the scala docs that will make the thing appear?
This doesn't seem to be limited to just .tabulate but a more common issue (at least for me), looking at library code functions seem to exist that are never mentioned in the API. Another example is
org.apache.spark.mllib.linalg.distributed.RowMatrix.toBreeze
I still don't know if that function exists, some code seems to use it, but I can't find any documentation about it.
In Scala source code all logic of Iterator defined in one file Iterator.scala. Function tabulate that you're looking for is defined in object Iterator in Scala API you make search by trait Iterator so this is why you can't find it.
In right corner of doc you can switch to object iterator and here you will find Iterator$#tabulate util function.
Related
Is there anyway we can read scala doc comments using reflection. My requirement is to read the #group tag value and use it for counting how many functions are there for each group
No, you can't use Scala reflection to access documentation comments. The reason is simple: comments are, almost by definition, not part of the program. Therefore, it is logically impossible for them to be available via reflection.
In Python, for example, documentation is available from the running program (in fact, even without using reflection), because the documentation is not hidden away in comments, but rather simply assigned to a field of the object that is being documented. Many Lisps (e.g. Clojure), and also Ioke and Seph work that way, too.
In Newspeak, what they call "comments" is available using reflection, but that's because what they call "comments" are not really comments, it is more like arbitrary metadata that can be attached to objects. It is in fact more similar to an annotation in Scala than a comment.
In Scala, documentation is written in comments, and comments are not part of the program (they are literally equivalent to whitespace in the Scala Language Specification), and therefore, cannot possibly be part of the program and thus cannot possibly be accessed via reflection.
I have read various old StackOverflow discussions on this general topic but there is still one part of the puzzle which appears, to me at least, to be missing.
It is simply this: what is the actual mechanism by which the anonymous function is serialized? And, where could we find its source code?
Or is it all just magic?
Other relevant SO articles (the third of these itself points to some useful articles outside StockOverflow):
Serialization of Scala Functions
Why Scala can serialize...
How to serialize functions in Scala
I'm going to answer my own question with what, I believe is the correct answer. The reason I'm doing it this way is that it seems to me that this aspect of serialization is never explained and it does appear to work just by magic. I essentially confirmed (to my satisfaction) the answer as part of the research I was doing to ensure that my question above was indeed appropriate.
But the main reason I'm offering my own answer is that I invite knowledgeable users either to agree with it, to correct it, to expand upon it, or to destroy it. Here goes...
It's all magic. No, I'm just kidding. But essentially the mechanism, once Scala has taken the step of representing the anonymous function as a Class, is entirely provided for by Java. In addition, we, the programmer, need to ensure that an anonymous function is as much pure code as possible: no references to any objects that might not be serializable. The secret sauce is to be found in the Java class: ObjectStreamClass. Which, in turn, is invoked by the Java serialization classes: ObjectInputStream and ObjectOutputStream.
Essentially the serialized bytes contain the full pathname of the class, its serialVersionUID, and whatever other relevant information is necessary. When deserializing, the system will simply look up the class in the appropriate classpath and return a reference to it. This obviously assumes that the deserializing system has the class in its classpath. The mechanism for that is a little beyond the scope of my research but it's clear that in a system like Spark, it should be easy to arrange.
No (additional) compilation/decompilation of byte code is necessary as the classLoader has everything necessary. I'm slightly surprised to find the ObjectStreamClass in java.io rather than in the reflection package, but I suppose there's an argument for it being there, given the tight coupling with ObjectInputStream and ObjectOutputStream.
One thing to keep in mind is that while we think in terms of serializing/deserializing objects, rather than classes, what we are dealing with here is an object of type Class.
One more thing to note is that in Scala 2.12, anonymous functions are now implemented differently: as Java8 lambdas. This has broken the mechanism described above in a rather serious way. So serious, that Spark is currently having trouble supporting Scala 2.12. The holdup appears to be this issue: SPARK-14540.
I was doing some research of how to solve this question. However, I am wondering if I can start learning how the function works, or how they pass the argument into the local scope by reading the source code of scala.
I know the source code of scala is hosted in Github, my question is how to locate the definition of def.
Or more generally, how to locate the source code of certain built in functions, operators?
The source code for everything in the Scala standard library is under https://github.com/scala/scala/tree/2.11.x/src/library/scala.
Also, the Scaladoc for the standard library includes links to the source code. So e.g. if you're interested in scala.Option and you're looking at http://www.scala-lang.org/api/2.11.7/#scala.Option, notice that page has "Source: Option.scala" where "Option.scala" is hyperlinked to the source code.
For something like def, which is not part of the standard library, but part of the language, well... there is no single place where def itself is defined. The compiler has 25 phases (you can list them by running scalac -Xshow-phases) and basically every phase participates in the job of making def mean what it means.
If you want to understand def, you'd probably be better off reading the Scala Language Specification; it's highly technical, but still much more approachable than the source code for the compiler.
The part of the spec that addresses your question about named and default arguments is SLS 6.6.1.
I have just finished the first version of a Java 6 compiler plugin, that automatically generates wrappers (proxy, adapter, delegate, call it what you like) based on an annotation.
Since I am doing mixed Java/Scala projects, I would like to be able to use the same annotation inside my Scala code, and get the same generated code (except of course in Scala). That basically means starting from scratch.
What I would like to do, and for which I haven't found an example yet, is how do I generate the code inside a Scala compiler plugin in the same way as in the Java compiler plugin. That is, I match/find where my annotation is used, get the AST for the annotated interface, and then ask the API to give me a Stream/Writer in which I output the generated Scala source code, using String manipulation.
That last part is what I could not find. So how do I tell the API to create a new Scala source file, and give me a Stream/Writer/File/Handle, so I can just write in it, and when I'm done, the Scala compiler compiles it, within the same run in which the plugin was invoked?
Why would I want to do that? Firstly, because than both plugins have the same structure, so maintenance is easy. Secondly, I want to open source it, and there is just no way to support every option that anyone would want, so I expect potential users to want to extend the generation with their own code. This will be a lot easier for them if they just have to do some printf(), instead of learning the AST API (this also applies to me).
Short answer:
It can't be done
Long answer:
You could conceivably generate your source file and push that through a parser instance within your plugin. But not in any way that's likely to be of any use to you, because you'd now have a bigger problem to contend with:
In order to grab all the type/name information for generating the delagate/proxy, you'll have to pick up the annotated type's AST after it has run through both the namer and typer phases (which are inseperable). The catch is that any attempts to call your generated code will already have failed typechecking, the compiler will have thrown an error, and any further bets are off.
Method synthesis is possible in limited cases, so long as you can somehow fool the typechecker for just long enough to get your code generated, which is the trick I pulled with my Autoproxy 'lite' plugin. Even then, you're far better off working with TreeDSL to generate code instead of pumping out raw source.
Kevin is entirely correct, but just for completeness it's worth mentioning that there is another alternative - write a compiler plugin that generates source. This is the approach that I've adopted in Borachio. It's not a very satisfactory solution, but it can be made to work.
Edit - I just reread your question and realised that you're actually asking about generating source anyway
So there is no support for this directly, but it's basically just a question of opening a file and writing the relevant "print" statements. There's no way to invoke the compiler "inside" a plugin AFAIK, but I've written an sbt plugin which hides most of the complexity of invoking the compiler twice.
I use a lot of scala maps, occasionally I want to pass them in as a map to a legacy java api which wants a java.util.Map (and I don't care if it throws away any changes).
An excellent library I have found that does a better job of this:
http://github.com/jorgeortiz85/scala-javautils
(bad name, awesome library). You explicitly invoke .asJava or .asScala depending on what direction you want to go. No surprises.
Scala provides wrappers for Java collections so that they can be used as Scala collections but not the other way around. That being said it probably wouldn't be hard to write your own wrapper and I'm sure it would be useful for the community. This question comes up on a regular basis.
This question and answer discuss this exact problem and the possible solutions. It advises against transparent conversions as they can have very strange side-effects. It advocates using scala-javautils instead. I've been using them in a large project for a few months now and have found them to be very reliable and easy to use.