I am new to Scala and now I went through a construct like the following:
scala> var a = List(('a',1),('b',2))
I googled this one and it turned out to be a Scala tuple2. My question is:
Is this a special Scala constructs i.e whenever I use ('a',3), scala creates a Tuple2 or there is something configured that I can change to make scala create MyTuple2 instead of Tuple2?
Can I create my own class that makes scala use it whenever I use its constructor?
As pointed out in the comments, Tuple2 is just a class like any other. You can create your own like this:
case class MyTuple2[A, B](a: A, b: B)
val myTuples: List[MyTuple2[String, Int]] = List(MyTuple2("a", 1), MyTuple2("b", 2))
As for the syntax, you cannot override the syntax-sugar of normal tuples (afaik). There are certainly things you could do with implicits or macros to fake it, but I'd strongly encourage you not to do that as it's highly surprising to anyone else if something looks like a standard Tuple2, but doesn't behave like one.
I have a code where I wanted to update an RDD as below:
val xRDD = xRDD.zip(tempRDD)
This gave me the error : recursive value x needs type
I want to maintain the xRDD over iterations and modifying it with tempRDD in each iteration. How can I achieve it?
Thanks in advance.
The compiler is telling you that you're attempting to define a variable with itself and use it in it's own definition within an action. To say this another way, you're attempting to use something that doesn't exist in an action to define it.
Edit:
If you have a list of actions that produce new RDD that you'd like to zip together, perhaps you should look at a Fold:
listMyActions.foldLeft(origRDD){ (rdd, f) =>
val tempRDD = f(rdd)
rdd.zip(tempRDD)
}
Don't forget that vals are immutable, this means that you can't reassign something to a previously defined variable. However if you want to do this, you can replace it for a var, which is not recommended, this question is more related to Scala's feature than to Apache-Spark's one. Besides, if you want more information you can consult this post Use of def val and vars in scala.
I'm admittedly very new to Scala, and I'm having trouble with the syntactical sugar I see in many Scala examples.
It often results in a very concise statement, but honestly so far (for me) a bit unreadable.
So I wish to take a typical use of the Option class, safe-dereferencing, as a good place to start for understanding, for example, the use of the underscore in a particular example I've seen.
I found a really nice article showing examples of the use of Option to avoid the case of null.
https://medium.com/#sinisalouc/demystifying-the-monad-in-scala-cc716bb6f534#.fhrljf7nl
He describes a use as so:
trait User {
val child: Option[User]
}
By the way, you can also write those functions as in-place lambda
functions instead of defining them a priori. Then the code becomes
this:
val result = UserService.loadUser("mike")
.flatMap(user => user.child)
.flatMap(user => user.child)
That looks great! Maybe not as concise as one can do in groovy, but not bad.
So I thought I'd try to apply it to a case I am trying to solve.
I have a type Person where the existence of a Person is optional, but if we have a person, his attributes are guaranteed. For that reason, there are no use of the Option type within the Person type itself.
The Person has an PID which is of type Id. The Id type consists of two String types; the Id-Type and the Id-Value.
I've used the Scala console to test the following:
class Id(val idCode : String, val idVal : String)
class Person(val pid : Id, val name : String)
val anId: Id = new Id("Passport_number", "12345")
val person: Person = new Person(anId, "Sean")
val operson : Option[Person] = Some(person)
OK. That setup my person and it's optional instance.
I learned from the above linked article that I could get the Persons Id-Val by using flatMap; Like this:
val result = operson.flatMap(person => Some(person.pid)).flatMap(pid => Some(pid.idVal)).getOrElse("NoValue")
Great! That works. And if I infact have no person, my result is "NoValue".
I used flatMap (and not Map) because, unless I misunderstand (and my tests with Map were incorrect) if I use Map I have to provide an alternate or default Person instance. That I didn't want to have to do.
OK, so, flatMap is the way to go.
However, that is really not a very concise statement.
If I were writing that in more of a groovy style, I guess i'd be able to do something like this:
val result = person?.pid.idVal
Wow, that's a bit nicer!
Surely Scala has a means to provide something at least nearly as nice as Groovy?
In the above linked example, he was able to make his statement more concise using some of that syntactical sugar I mentioned before. The underscore:
or even more concise:
val result = UserService.loadUser("mike")
.flatMap(_.child)
.flatMap(_.child)
So, it seems in this case the underscore character allows you to skip specifying the type (as the type is inferred) and replace it with underscore.
However, when I try the same thing with my example:
val result = operson.flatMap(Some(_.pid)).flatMap(Some(_.idVal)).getOrElse("NoValue")
Scala complains.
<console>:15: error: missing parameter type for expanded function ((x$2) => x$2.idVal)
val result = operson.flatMap(Some(_.pid)).flatMap(Some(_.idVal)).getOrElse("NoValue")
Can someone help me along here?
How am I misunderstanding this?
Is there a short-hand method of writing my above lengthy statement?
Is flatMap the best way to achieve what I am after? Or is there a better more concise and/or readable way to do it ?
thanks in advance!
Why do you insist on using flatMap? I'd just use map for your example instead:
val result = operson.map(_.pid).map(_.idVal).getOrElse("NoValue")
or even shorter:
val result = operson.map(_.pid.idVal).getOrElse("NoValue")
You should only use flatMap with functions that return Options. Your pid and idVals are not Options, so just map them instead.
You said
I have a type Person where the existence of a Person is optional, but if we have a person, his attributes are guaranteed. For that reason, there are no use of the Option type within the Person type itself.
This is the essential difference between your example and the User example. In the User example, both the existence of a User instance, and its child field are options. This is why, to get a child, you need to flatMap. However, since in your example, only the existence of a Person is not guaranteed, after you've retrieved an Option[Person], you can safely map to any of its fields.
Think of flatMap as a map, followed by a flatten (hence its name). If I mapped on child:
val ouser = Some(new User())
val child: Option[Option[User]] = ouser.map(_.child)
I would end up with an Option[Option[User]]. I need to flatten that to a single Option level, that's why I use flatMap in the first place.
If you looking for the most concise solution, consider this:
val result = operson.fold("NoValue")(_.pid.idVal)
Though one could find it not clear or confusing
I'm trying to find documentation of the Map.toList method in Scala but looking at documentation this is a trait : http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Map . So how can I find the documentation for scala Map? WHen I instatiate a Map am I just instantiating the trait ?
Trait scala.collection.immutable.Map is a contract. It's valid for all implementations, so its documentation is a documentation for any immutable scala Map.
In current implementation method Map.apply (Map(a -> b, c -> d, ...)) creates HashMap for more than 4 elements.
There are also classes Map1 - Map4 for 1-4 elements. Also there is a singleton EmptyMap.
But this behavior could be changed in next scala versions in case there will be better implementation for general purpose.
It is defined in Predef. Also its source might be useful.
Traits cannot be instantiated. They are abstract by definition. If they're actually fully implemented, they can (appear to) be instantiated by creating an anonymous type at the point of instantiation:
val x = new FullyImplementedTraitName { }
As to your main question, the documentation for scala.collection.Map should tell you everything you need to know. When you have the full frameset for the ScalaDocs displayed the filter text field at the upper left allows you to narrow down the class and package list by entering a the name (or portion thereof) you're looking for.
I've been working with Scala for a while now and have written a 10,000+ line program with it, but I'm still confused by some of the inner workings. I came to Scala from Python after already having intimate familiarity with Java, C and Lisp, but even so it's been slow going, and a huge problem is the frustrating difficulty I've often found when trying to investigate the inner workings of objects/types/classes/etc. using the Scala REPL as compared with Python. In Python you can investigate any object foo (type, object in a global variable, built-in function, etc.) using foo to see what the thing evaluates to, type(foo) to show its type, dir(foo) to tell you the methods you can call on it, and help(foo) to get the built-in documentation. You can even do things like help("re") to find out the documentation on the package named re (which holds regular-expression objects and methods), even though there isn't an object associated with it.
In Scala, you can try and read the documentation online, go look up the source code to the library, etc., but this can often be very difficult for things where you don't know where or even what they are (and it's often a big chunk to bite off, given the voluminous type hierarchy) -- stuff is floating around in various places (package scala, Predef, various implicit conversions, symbols like :: that are nearly impossible to Google). The REPL should be the way to explore directly, but in reality, things are far more mysterious. Say that I've seen a reference to foo somewhere, but I have no idea what it is. There's apparently no such thing as a "guide to systematically investigating Scala thingies with the REPL", but the following is what I've pieced together after a great deal of trial and error:
If foo is a value (which presumably includes things stored in variables plus companion objects and other Scala objects), you can evaluate foo directly. This ought to tell you the type and value of the result. Sometimes the result is helpful, sometimes not.
If foo is a value, you can use :type foo to get its type. (Not necessarily enlightening.) If you use this on a function call, you get the type of the return value, without calling the function.
If foo is a value, you can use foo.getClass to get its class. (Often more enlightening than the previous, but how does an object's class differ from its type?)
For a class foo, you can use classOf[foo], although it's not obvious what the result means.
Theoretically, you can use :javap foo to disassemble a class -- which should be the most useful of all, but fails entirely and uniformly for me.
Sometimes you have to piece things together from error messages.
Example of failure using :javap:
scala> :javap List
Failed: Could not find class bytes for 'List'
Example of enlightening error message:
scala> assert
<console>:8: error: ambiguous reference to overloaded definition,
both method assert in object Predef of type (assertion: Boolean, message: => Any)Unit
and method assert in object Predef of type (assertion: Boolean)Unit
match expected type ?
assert
^
OK, now let's try a simple example.
scala> 5
res63: Int = 5
scala> :type 5
Int
scala> 5.getClass
res64: java.lang.Class[Int] = int
Simple enough ...
Now, let's try some real cases, where it's not so obvious:
scala> Predef
res65: type = scala.Predef$#3cd41115
scala> :type Predef
type
scala> Predef.getClass
res66: java.lang.Class[_ <: object Predef] = class scala.Predef$
What does this mean? Why is the type of Predef simply type, whereas the class is scala.Predef$? I gather that the $ is the way that companion objects are shoehorned into Java ... but Scala docs on Google tell me that Predef is object Predef extends LowPriorityImplicits -- how can I deduce this from the REPL? And how can I look into what's in it?
OK, let's try another confusing thing:
scala> `::`
res77: collection.immutable.::.type = ::
scala> :type `::`
collection.immutable.::.type
scala> `::`.getClass
res79: java.lang.Class[_ <: object scala.collection.immutable.::] = class scala.collection.immutable.$colon$colon$
scala> classOf[`::`]
<console>:8: error: type :: takes type parameters
classOf[`::`]
^
scala> classOf[`::`[Int]]
res81: java.lang.Class[::[Int]] = class scala.collection.immutable.$colon$colon
OK, this left me hopelessly confused, and eventually I had to go read the source code to make sense of this all.
So, my questions are:
What's the recommended best way from the true Scala experts of using the REPL to make sense of Scala objects, classes, methods, etc., or at least investigate them as best as can be done from the REPL?
How do I get :javap working from the REPL for built-in stuff? (Shouldn't it work by default?)
Thanks for any enlightenment.
You mentioned an important point which Scala lacks a bit: the documentation.
The REPL is a fantastic tool, but it is not as fantastic at it can be. There are too much missing features and features which can be improved - some of them are mentioned in your post. Scaladoc is a nice tool, too, but is far away to be perfect. Furthermore lots of code in the API is not yet or too less documented and code examples are often missing. The IDEs are full ob bugs and compared to the possibilities Java IDEs show us they look like some kindergarten toys.
Nevertheless there is a gigantic difference of Scalas current tools compared to the tools available as I started to learn Scala 2-3 years ago. At that time IDEs compiled permanently some trash in the background, the compiler crashed every few minutes and some documentation was absolutely nonexistent. Frequently I got rage attacks and wished death and corruption to Scala authors.
And now? I do not have any of these rage attacks any more. Because the tools we currently have are great although the are not perfect!
There is docs.scala-lang.org, which summarizes a lot of great documentation. There are Tutorials, Cheat-sheets, Glossaries, Guides and a lot of more great stuff. Another great tools is Scalex, which can find even the weirdest operator one can think of. It is Scalas Hoogle and even though it is not yet as good as his great ideal, it is very useful.
Great improvements are coming with Scala2.10 in form of Scalas own Reflection library:
// needs Scala2.10M4
scala> import scala.reflect.runtime.{universe => u}
import scala.reflect.runtime.{universe=>u}
scala> val t = u.typeOf[List[_]]
t: reflect.runtime.universe.Type = List[Any]
scala> t.declarations
res10: Iterable[reflect.runtime.universe.Symbol] = SynchronizedOps(constructor List, method companion, method isEmpty, method head, method tail, method ::, method :::, method reverse_:::, method mapConserve, method ++, method +:, method toList, method take, method drop, method slice, method takeRight, method splitAt, method takeWhile, method dropWhile, method span, method reverse, method stringPrefix, method toStream, method removeDuplicates)
Documentation for the new Reflection library is still missing, but in progress. It allows one to use scalac in an easy way inside of the REPL:
scala> u reify { List(1,2,3) map (_+1) }
res14: reflect.runtime.universe.Expr[List[Int]] = Expr[List[Int]](immutable.this.List.apply(1, 2, 3).map(((x$1) => x$1.$plus(1)))(immutable.this.List.canBuildFrom))
scala> import scala.tools.reflect.ToolBox
import scala.tools.reflect.ToolBox
scala> import scala.reflect.runtime.{currentMirror => m}
import scala.reflect.runtime.{currentMirror=>m}
scala> val tb = m.mkToolBox()
tb: scala.tools.reflect.ToolBox[reflect.runtime.universe.type] = scala.tools.reflect.ToolBoxFactory$ToolBoxImpl#32f7fa37
scala> tb.parseExpr("List(1,2,3) map (_+1)")
res16: tb.u.Tree = List(1, 2, 3).map(((x$1) => x$1.$plus(1)))
scala> tb.runExpr(res16)
res18: Any = List(2, 3, 4)
This is even greater when we want to know how Scala code is translated internally. Formerly wen need to type scala -Xprint:typer -e "List(1,2,3) map (_+1)"
to get the internally representation. Furthermore some small improvements found there way to the new release, for example:
scala> :type Predef
scala.Predef.type
Scaladoc will gain some type-hierarchy graph (click on type-hierarchy).
With Macros it is possible now, to improve error messages in a great way. There is a library called expecty, which does this:
// copied from GitHub page
import org.expecty.Expecty
case class Person(name: String = "Fred", age: Int = 42) {
def say(words: String*) = words.mkString(" ")
}
val person = Person()
val expect = new Expecty()
// Passing expectations
expect {
person.name == "Fred"
person.age * 2 == 84
person.say("Hi", "from", "Expecty!") == "Hi from Expecty!"
}
// Failing expectation
val word1 = "ping"
val word2 = "pong"
expect {
person.say(word1, word2) == "pong pong"
}
/*
Output:
java.lang.AssertionError:
person.say(word1, word2) == "pong pong"
| | | | |
| | ping pong false
| ping pong
Person(Fred,42)
*/
There is a tool which allows one to find libraries hosted on GitHub, called ls.implicit.ly.
The IDEs now have some semantic highlighting, to show if a member is a object/type/method/whatever. The semantic highlighting feature of ScalaIDE.
The javap feature of the REPL is only a call to the native javap, therefore it is not a very featue-rich tool. You have to fully qualify the name of a module:
scala> :javap scala.collection.immutable.List
Compiled from "List.scala"
public abstract class scala.collection.immutable.List extends scala.collection.AbstractSeq implements scala.collection.immutable.LinearSeq,scala.Product,scala.collection.LinearSeqOptimized{
...
Some time ago I have written a summary of how Scala code is compiled to Bytecode, which offers a lot of things to know.
And the best: This is all done in the last few months!
So, how to use all of these things inside of the REPL? Well, it is not possible ... not yet. ;)
But I can tell you that one day we will have such a REPL. A REPL which shows us documentation if we want to see it. A REPL which let us communicate with it (maybe like lambdabot). A REPL which let us do cool things we still cannot imagine. I don't know when this will be the case, but I know that a lot of stuff was done in the last years and I know even greater stuff will be done in the next years.
Javap works, but you are pointing it to scala.Predef.List, which is a type, not a class. Point it instead to scala.collection.immutable.List.
Now, for the most part just entering a value and seeing what the result's type is is enough. Using :type can be helpful sometimes. I find that use getClass is a really bad way of going about it, though.
Also, you are sometimes mixing types and values. For example, here you refer to the object :::
scala> `::`.getClass res79: java.lang.Class[_ <: object
scala.collection.immutable.::] = class
scala.collection.immutable.$colon$colon$
And here you refer to the class :::
scala> classOf[`::`[Int]] res81: java.lang.Class[::[Int]] = class
scala.collection.immutable.$colon$colon
Objects and classes are not the same thing, and, in fact, there's a common pattern of objects and classes by the same name, with a specific name for their relationship: companions.
Instead of dir, just use tab completion:
scala> "abc".
+ asInstanceOf charAt codePointAt codePointBefore codePointCount
compareTo compareToIgnoreCase concat contains contentEquals endsWith
equalsIgnoreCase getBytes getChars indexOf intern isEmpty
isInstanceOf lastIndexOf length matches offsetByCodePoints regionMatches
replace replaceAll replaceFirst split startsWith subSequence
substring toCharArray toLowerCase toString toUpperCase trim
scala> "abc".compareTo
compareTo compareToIgnoreCase
scala> "abc".compareTo
def compareTo(String): Int
If you enter the power mode, you'll get way more information, but that's hardly for beginners. The above shows types, methods, and method signatures. Javap will decompile stuff, though that requires you to have a good handle on bytecode.
There's other stuff in there -- be sure to look up :help, and see what's available.
Docs are only available through the scaladoc API. Keep it open on the browser, and use its search capability to quickly find classes and methods. Also, note that, as opposed to Java, you don't need to navigate through the inheritance list to get the description of the method.
And they do search perfectly fine for symbols. I suspect you haven't spent much time on scaladoc because other doc tools out there just aren't up to it. Javadoc comes to mind -- it's awful browsing through packages and classes.
If you have specific questions Stack Overflow style, use Symbol Hound to search with symbols.
Use the nightly Scaladocs: they'll diverge from whatever version you are using, but they'll always be the most complete. Besides, right now they are far better in many respects: you can use TAB to alternate between frames, with auto-focus on the search boxes, you can use arrows to navigate on the left frame after filtering, and ENTER to have the selected element appear on the right frame. They have the list of implicit methods, and have class diagrams.
I've made do with a far less powerful REPL, and a far poorer Scaladoc -- they do work, together. Granted, I skipped to trunk (now HEAD) just to get my hands on tab-completion.
Note that scala 2.11.8 New tab-completion in the Scala REPL can facilitate the type exploration/discovery.
It now includes:
CamelCase completion:
try:
(l: List[Int]).rroTAB,
it expands to:
(l: List[Int]).reduceRightOption
Find members by typing any CamelCased part of the name:
try:
classOf[String].typTAB,
to get getAnnotationsByType, getComponentType and others
Complete bean getters without typing get:
try:
(d: java.util.Date).dayTAB
Press TAB twice to see the method signature:
try:
List(1,2,3).partTAB,
which completes to:
List(1,2,3).partition;
press TAB again to display:
def partition(p: Int => Boolean): (List[Int], List[Int])
You need to pass fully qualified class name to javap.
First take it using classOf:
scala> classOf[List[_]]
res2: java.lang.Class[List[_]] = class scala.collection.immutable.List
Then use javap (doesn't work from repl for me: ":javap unavailable on this platform.") so example is from a command line, in repl, I believe, you don't need to specify classpath:
d:\bin\scala\scala-2.9.1-1\lib>javap -classpath scala-library.jar "scala.collection.immutable.List"
But I doubt this will help you. Probably you're trying to use techniques you used to use in dynamic languages. I extremely rarely use repl in scala (while use it often in javascript). An IDE and sources are my all.