I am new to Scala and was wondering what is the difference between initializing a Map data structure using the following three ways:
private val currentFiles: HashMap[String, Long] = new HashMap[String, Long]()
private val currentJars = new HashMap[String, Long]
private val currentVars = Map[String, Long]
There are two different parts to your question.
first, the difference between using an explicit type or not (cases 1 and 2) goes for any class, not necessarily containers.
val x = 1
Here the type is not explicit, and the compiler will try to figure it out using type inference. The type of x will be Int.
val x: Int = 1
Same as above, but now explicitly. If whatever you have at the right of = can't be cast to an Int, you will get a compiler error.
val x: Any = 1
Here we will still store a 1, but the type of the variable will be a parent class, using polymorphism.
The second part of your question is about initialization. The base initialization is as in java:
val x = new List[Int]()
This calls the class constructor and returns a new instance of the exact class.
Now, there is a special method called .apply that you can define and call with just parenthesis, like this:
val x = Seq[Int]()
This is a shortcut for this:
val x = Seq.apply[Int]()
Notice this is a function on the Seq object. The return type is whatever the function wants it to be, it is just another function. That said, it is mostly used to return a new instance of the given type, but there are no guarantees, you need to look at the function documentation to be sure of the contract.
That said, in the case of val x = Map[String, Long]() the implementation returns an actual instance of immutable.HashMap[String, Long], which is kind of the default Map implementation.
Map and HashMap are almost equivalent, but not exactly the same thing.
Map is trait, and HashMap is a class. Although under the hood they may be the same thing (scala.collection.immutable.HashMap) (more on that later).
When using
private val currentVars = Map[String, Long]()
You get a Map instance. In scala, () is a sugar, under the hood you are actually calling the apply() method of the object Map. This would be equivalent to:
private val currentVars = Map.apply[String, Long]()
Using
private val currentJars = new HashMap[String, Long]()
You get a HashMap instance.
In the third statement:
private val currentJars: HashMap[String, Long] = new HashMap[String, Long]()
You are just not relying anymore on type inference. This is exactly the same as the second statement:
private val currentJars: HashMap[String, Long] = new HashMap[String, Long]()
private val currentJars = new HashMap[String, Long]() // same thing
When / Which I use / Why
About type inference, I would recommend you to go with type inference. IMHO in this case it removes verbosity from the code where it is not really needed. But if you really miss like-java code, then include the type :) .
Now, about the two constructors...
Map vs HashMap
Short answer
You should probably always go with Map(): it is shorter, already imported and returns a trait (like a java interface). This last reason is nice because when passing this Map around you won't rely on implementation details since Map is just an interface of what you want or need.
On the other side, HashMap is an implementation.
Long answer
Map is not always a HashMap.
As seen in Programming in Scala, Map.apply[K, V]() can return a different class depending on how many key-value pairs you pass to it (ref):
Number of elements Implementation
0 scala.collection.immutable.EmptyMap
1 scala.collection.immutable.Map1
2 scala.collection.immutable.Map2
3 scala.collection.immutable.Map3
4 scala.collection.immutable.Map4
5 or more scala.collection.immutable.HashMap
When you have less then 5 elements you get an special class for each of these small collections and when you have an empty Map, you get a singleton object.
This is done mostly to get better performance.
You can try it out in repl:
import scala.collection.immutable.HashMap
val m2 = Map(1 -> 1, 2 -> 2)
m2.isInstanceOf[HashMap[Int, Int]]
// false
val m5 = Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5, 6 -> 6)
m5.isInstanceOf[HashMap[Int, Int]]
// true
If you are really curious you can even take a look at the source code.
So, even for performance you should also probably stick with Map().
Related
I am new to Scala, how to extends one properties value from other object. Like in angular, angular.copy(obj1, obj2);
If I have two objects, obj1, obj2, both having the same property names. obj2's value would be overridden to obj1.
How to achieve this in Scala?
Unlike JavaScript, Scala does not allow you to add new properties to existing objects. Objects are created with a specific type and that type defines the properties on the object. If your objects are implemented as a case class then it is easy to make a copy of an object:
val obj = obj1.copy()
You can also update some of the fields of obj1 during the copy and leave others untouched:
val obj = obj1.copy(count=obj2.count)
This will create a new object with all the same values as obj1 except for the field count which will take the value from obj2.
It has been noted in the comments that in Scala you will typically create new objects from old ones, rather than modifying existing objects. This may seem cumbersome at first but it make it much easier to understand the behaviour of more complex code because you know that objects won't change under your feet.
Note on object in Scala vs JS
Note that object in Scala as a completely different concept from the usage in Javascript.
Scala deliberately is not using Prototype base inheritance.
So in short that this is not possible is a feature not a mssing capability of the language.
see https://docs.scala-lang.org/tour/singleton-objects.html
Solution : Use a Map to simulate protoype base inheritance/composition
You can use a Map to achieve a similar behaviour:
scala> val sayHello = () => "Hello"
sayHello: () => String = <function0>
scala> val obj1 = Map("sayHello" -> sayHello , "id" -> 99)
obj1: scala.collection.immutable.Map[String,Any] = Map(sayHello -> <function0>, id -> 99)
scala> val obj2 = Map("id" -> 1)
obj2: scala.collection.immutable.Map[String,Int] = Map(id -> 1)
scala> obj1 ++ obj2
res0: scala.collection.immutable.Map[String,Any] = Map(sayHello -> <function0>, id -> 1)
I have developed a tool using pyspark. In that tool, the user provides a dict of model parameters, which is then passed to an spark.ml model such as Logistic Regression in the form of LogisticRegression(**params).
Since I am transferring to Scala now, I was wondering how this can be done in Spark using Scala? Coming from Python, my intuition is to pass a Scala Map such as:
val params = Map("regParam" -> 100)
val model = new LogisticRegression().set(params)
Obviously, it's not as trivial as that. It seem as in scala, we need to set every single parameter separately, like:
val model = new LogisticRegression()
.setRegParam(0.3)
I really want to avoid being forced to iterate over all user input parameters and set the appropriate parameters with tons of if clauses.
Any ideas how to solve this as elegantly as in Python?
According to the LogisticRegression API you need to set each param individually via setter:
Users can set and get the parameter values through setters and
getters, respectively.
An idea is to build your own mapping function to dynamically call the corresponding param setter using reflection.
Scala is a statically typed language, hence by-design doesn't have anything like Python's **params. As already being considered, you can store them in a Map of type[K, Any], but type erasure would erase types of the Map values due to JVM's runtime constraint.
Shapeless provides some neat mixed-type features that can circumvent the problem. An alternative is to use Scala's TypeTag to preserve type information, as in the following example:
import scala.reflect.runtime.universe._
case class Params[K]( m: Map[(K, TypeTag[_]), Any] ) extends AnyVal {
def add[V](k: K, v: V)(implicit vt: TypeTag[V]) = this.copy(
m = this.m + ((k, vt) -> v)
)
def grab[V](k: K)(implicit vt: TypeTag[V]) = m((k, vt)).asInstanceOf[V]
}
val params = Params[String](Map.empty).
add[Int]("a", 100).
add[String]("b", "xyz").
add[Double]("c", 5.0).
add[List[Int]]("d", List(1, 2, 3))
// params: Params[String] = Params( Map(
// (a,TypeTag[Int]) -> 100, (b,TypeTag[String]) -> xyz, (c,TypeTag[Double]) -> 5.0,
// (d,TypeTag[scala.List[Int]]) -> List(1, 2, 3)
// ) )
params.grab[Int]("a")
// res1: Int = 100
params.grab[String]("b")
// res2: String = xyz
params.grab[Double]("c")
// res3: Double = 5.0
params.grab[List[Int]]("d")
// res4: List[Int] = List(1, 2, 3)
What is the best way to resolve the compilation error in the example below? Assume that 'm' must be of type GenMap and I do not have control over the arguments of myFun.
import scala.collection.GenMap
object Test {
def myFun(m: Map[Int, String]) = m
val m: GenMap[Int, String] = Map(1 -> "One", 2 -> "two")
//Build error here on m.seq
// Found scala.collection.Map[Int, String]
// Required scala.collection.immutable.Map[Int, String]
val result = myFun(m.seq)
}
EDIT:
I should have been clearer. In my actual use-case I don't have control over myFun, so I have to pass it a Map. The 'm' also arises from another scala component as a GenMap. I need to convert one to another, but there appears to be a conflict between collection.Map and collection.immutable.Map
m.seq.toMap will solve your problem.
According to the signature presented in the API toMap returns a scala.collection.immutable.Map which is said to be required in your error message. scala.collection.Map returned by the seq method is a more general trait which besides being a parent to immutable map is also a parent to the mutable and concurrent map.
I thought it could be done as follows
val hash = new HashMap[String, ListBuffer[Int]].withDefaultValue(ListBuffer())
hash("A").append(1)
hash("B").append(2)
println(hash("B").head)
However the above prints the unintuitive value of 1. I would like
hash("B").append(2)
To do something like the following behind the scenes
if (!hash.contains("B")) hash.put("B", ListBuffer())
Use getOrElseUpdate to provide the default value at the point of access:
scala> import collection.mutable._
import collection.mutable._
scala> def defaultValue = ListBuffer[Int]()
defaultValue: scala.collection.mutable.ListBuffer[Int]
scala> val hash = new HashMap[String, ListBuffer[Int]]
hash: scala.collection.mutable.HashMap[String,scala.collection.mutable.ListBuffer[Int]] = Map()
scala> hash.getOrElseUpdate("A", defaultValue).append(1)
scala> hash.getOrElseUpdate("B", defaultValue).append(2)
scala> println(hash("B").head)
2
withDefaultValue uses exactly the same value each time. In your case, it's the same empty ListBuffer that gets shared by everyone.
If you use withDefault instead, you could generate a new ListBuffer every time, but it wouldn't get stored.
So what you'd really like is a method that would know to add the default value. You can create such a method inside a wrapper class and then write an implicit conversion:
class InstantiateDefaults[A,B](h: collection.mutable.Map[A,B]) {
def retrieve(a: A) = h.getOrElseUpdate(a, h(a))
}
implicit def hash_can_instantiate[A,B](h: collection.mutable.Map[A,B]) = {
new InstantiateDefaults(h)
}
Now your code works as desired (except for the extra method name, which you could pick to be shorter if you wanted):
val hash = new collection.mutable.HashMap[
String, collection.mutable.ListBuffer[Int]
].withDefault(_ => collection.mutable.ListBuffer())
scala> hash.retrieve("A").append(1)
scala> hash.retrieve("B").append(2)
scala> hash("B").head
res28: Int = 2
Note that the solution (with the implicit) doesn't need to know the default value itself at all, so you can do this once and then default-with-addition to your heart's content.
By dictionary I mean a lightweight map from names to values that can be used as the return value of a method.
Options that I'm aware of include making case classes, creating anon objects, and making maps from Strings -> Any.
Case classes require mental overhead to create (names), but are strongly typed.
Anon objects don't seem that well documented and it's unclear to me how to use them as arguments since there is no named type.
Maps from String -> Any require casting for retrieval.
Is there anything better?
Ideally these could be built from json and transformed back into it when appropriate.
I don't need static typing (though it would be nice, I can see how it would be impossible) - but I do want to avoid explicit casting.
Here's the fundamental problem with what you want:
def get(key: String): Option[T] = ...
val r = map.get("key")
The type of r will be defined from the return type of get -- so, what should that type be? From where could it be defined? If you make it a type parameter, then it's relatively easy:
import scala.collection.mutable.{Map => MMap}
val map: MMap[String, (Manifest[_], Any) = MMap.empty
def get[T : Manifest](key: String): Option[T] = map.get(key).filter(_._1 <:< manifest[T]).map(_._2.asInstanceOf[T])
def put[T : Manifest](key: String, obj: T) = map(key) = manifest[T] -> obj
Example:
scala> put("abc", 2)
scala> put("def", true)
scala> get[Boolean]("abc")
res2: Option[Boolean] = None
scala> get[Int]("abc")
res3: Option[Int] = Some(2)
The problem, of course, is that you have to tell the compiler what type you expect to be stored on the map under that key. Unfortunately, there is simply no way around that: the compiler cannot know what type will be stored under that key at compile time.
Any solution you take you'll end up with this same problem: somehow or other, you'll have to tell the compiler what type should be returned.
Now, this shouldn't be a burden in a Scala program. Take that r above... you'll then use that r for something, right? That something you are using it for will have methods appropriate to some type, and since you know what the methods are, then you must also know what the type of r must be.
If this isn't the case, then there's something fundamentally wrong with the code -- or, perhaps, you haven't progressed from wanting the map to knowing what you'll do with it.
So you want to parse json and turn it into objects that resemble the javascript objets described in the json input? If you want static typing, case classes are pretty much your only option and there are already libraries handling this, for example lift-json.
Another option is to use Scala 2.9's experimental support for dynamic typing. That will give you elegant syntax at the expense of type safety.
You can use approach I've seen in the casbah library, when you explicitly pass a type parameter into the get method and cast the actual value inside the get method. Here is a quick example:
case class MultiTypeDictionary(m: Map[String, Any]) {
def getAs[T <: Any](k: String)(implicit mf: Manifest[T]): T =
cast(m.get(k).getOrElse {throw new IllegalArgumentException})(mf)
private def cast[T <: Any : Manifest](a: Any): T =
a.asInstanceOf[T]
}
implicit def map2multiTypeDictionary(m: Map[String, Any]) =
MultiTypeDictionary(m)
val dict: MultiTypeDictionary = Map("1" -> 1, "2" -> 2.0, "3" -> "3")
val a: Int = dict.getAs("1")
val b: Int = dict.getAs("2") //ClassCastException
val b: Int = dict.getAs("4") //IllegalArgumetExcepton
You should note that there is no real compile-time checks, so you have to deal with all exceptions drawbacks.
UPD Working MultiTypeDictionary class
If you have only a limited number of types which can occur as values, you can use some kind of union type (a.k.a. disjoint type), having e.g. a Map[Foo, Bar | Baz | Buz | Blargh]. If you have only two possibilities, you can use Either[A,B], giving you a Map[Foo, Either[Bar, Baz]]. For three types you might cheat and use Map[Foo, Either[Bar, Either[Baz,Buz]]], but this syntax obviously doesn't scale well. If you have more types you can use things like...
http://cleverlytitled.blogspot.com/2009/03/disjoint-bounded-views-redux.html
http://svn.assembla.com/svn/metascala/src/metascala/OneOfs.scala
http://www.chuusai.com/2011/06/09/scala-union-types-curry-howard/