HashMap only has one element instead of three - scala

I am filling a HashMap in Scala like so:
val hashMap = new HashMap[P, List[T]]() { list.map(x => put(x.param1, x.param1.elements)) }
The problem is that hashMap will have only a size of 1 while list has a size of 3.
What am I doing wrong here?

You''re mixing imperative commands (put, new HashMap) with functional constructs (map). This cannot behave nicely.
What you should do (if I understand your goal correctly):
list.map(x => x.param1 -> x.param1.elements).toMap[P, List[T]]
Also, beware that if several elements in your list have the same param1, only the last one will be kept, since Map can only have one value for a given key.

Related

Why is only one new object created when called multiple times in `map`?

As I understand it, a way to create a new ArrayBuffer with one element is to say
val buffer = ArrayBuffer(element)
or something like this:
val buffer = ArrayBuffer[Option[String]](None)
Suppose x is a collection with 3 elements. I'm trying to create a map that creates a new 1-element ArrayBuffer for each element in x and associates the element in x with the new buffer. (These are intentionally separate mutable buffers that will be modified by threads.) I tried this:
x.map(elem => (elem, ArrayBuffer[Option[String]](None))).toMap
However, I found (using System.identityHashCode) that only one ArrayBuffer was created, and all 3 elements were mapped to the same value.
Why is this happening? I expected that the tuple expression would be evaluated for each element of x and that this would result in a new ArrayBuffer being created for each evaluation of the expression.
What would be a good workaround?
I'm using Scala 2.11.
Update
In the process of creating a reproducible example, I figured out the problem. Here's the example; Source is an interface defined in our application.
def test1(x: Seq[Source]): Unit = {
val containers = x.map(elem => (elem, ArrayBuffer[Option[String]](None))).toMap
x.foreach(elem => println(
s"test1: elem=${System.identityHashCode(elem)} container=${System.identityHashCode(containers(elem))}"))
x.indices.foreach(n => containers(x(n)).update(0, Some(n.toString)))
x.foreach(elem => println(s"resulting value: ${containers(elem)(0)}"))
}
What I missed was that for the values of x I was trying to use, the class implementing Source was returning true for equals() for all combinations of values. So the resulting map only had one key-value pair.
Apologies for not figuring this out sooner. I'll delete the question after a while.
I think your problem is the toMap. If all three elements are None then you have in a Map just one element (as all have the same key).
I played a bit on Scalafiddle (remove the .toMap and you will have 3 ByteArrays)
let me know if I have misunderstood you.
I cannot seem to replicate the issue, for example
val m =
List(Some("a"), Some("b"), Some("c"))
.map(elem => (elem, ArrayBuffer[Option[String]](None)))
.toMap
m(Some("a")) += Some("42")
m
outputs
res2: scala.collection.immutable.Map[Some[String],scala.collection.mutable.ArrayBuffer[Option[String]]] = Map(
Some(a) -> ArrayBuffer(None, Some(42)),
Some(b) -> ArrayBuffer(None),
Some(c) -> ArrayBuffer(None)
)
where we see Some("42") was added to one buffer whilst others were unaffected.

Scala style: constant map vs pattern matching

I need to declare a constant mapping in scala, and wounder what would be the proper way to do that.
The Java way is
private static final String[] numbers = {"zero","one","two","three"} //Java
val numbers = Array("zero","one","two","three") //Scala
val numbers = collection.immutable.HashMap(0 -> "zero", 1 -> "one", 2 => "two") //Scala maps
Another way to do that in Scala is
def array(i: Int) = i match {
case 0 => "zero"
case 1 => "one"
case 2 => "two"
}
Is there a standard/recommended way to do it in Scala?
Map provides features that a plain function does not. You can enumerate/scan/traverse/filter existing keys and values for example. Map/reduce/transform etc. (You can have a default value or generate an error on missing keys too, despite what the other answer suggests).
If you don't need any of that, there isn't much difference ... except, if the number of entries is fairly large, access to Map would generally be faster than evaluating the static pattern.
Not really. It depends on the purpose. Here's a version that generates the keys:
List("zero", "one", "two", "three").zipWithIndex.map(_.swap).toMap
(still a Map, assuming you can use the index)
I've seen both approaches used depending on the context.
If you need to serialize the mapping or pass it around or keep different versions of it, a Map would be better.
Otherwise, pattern matching might be better.

Scala practices: lists and case classes

I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!
This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.
What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})

Inverting a key to values mapping

Lets say I have a set of a class Action like this: actions: Set[Action], and each Action class has a val consequences : Set[Consequence], where Consequence is a case class.
I wish to get a map from Consequence to Set[Action] to determine which actions cause a specific Consequence. Obviously since an Action can have multiple Consequences it can appear in multiple sets in the map.
I have been trying to get my head around this (I am new to Scala), wondering if I can do it with something like map() and groupBy(), but a bit lost. I don't wish to revert to imperative programming, especially if there is some Scala mapping function that can help.
What is the best way to achieve this?
Not exactly elegant because groupBy doesn't handle the case of operating already on a Tuple2, so you end up doing a lot of tupling and untupling:
case class Conseq()
case class Action(conseqs: Set[Conseq])
def gimme(actions: Seq[Action]): Map[Conseq, Set[Action]] =
actions.flatMap(a => a.conseqs.map(_ -> a))
.groupBy(_._1)
.mapValues(_.map(_._2)(collection.breakOut))
The first line "zips" each action with all of its consequences, yielding a Seq[(Conseq, Action)], grouping this by the first product element gives Map[Conseq, Seq[(Conseq, Action)]. So the last step needs to transform the map's values from Seq[(Conseq, Action)] to a Set[Action]. This can be done with mapValues. Without the explicit builder factory, it would produce a Seq[Action], so one would have to write .mapValues(_.map(_._2)).toSet. Passing in collection.breakOut in the second parameter list to map makes it possible to save one step and make map directly produce the Set collection type.
Another possibility is to use nested folds:
def gimme2(actions: Seq[Action]) = (Map.empty[Conseq, Set[Action]] /: actions) {
(m, a) => (m /: a.conseqs) {
(m1, c) => m1.updated(c, m1.getOrElse(c, Set.empty) + a)
}
}
This is perhaps more readable. We start with an empty result map, traverse the actions, and in the inner fold traverse each action's consequences which get merged into the result map.

Converting a Scala Map to a List

I have a map that I need to map to a different type, and the result needs to be a List. I have two ways (seemingly) to accomplish what I want, since calling map on a map seems to always result in a map. Assuming I have some map that looks like:
val input = Map[String, List[Int]]("rk1" -> List(1,2,3), "rk2" -> List(4,5,6))
I can either do:
val output = input.map{ case(k,v) => (k.getBytes, v) } toList
Or:
val output = input.foldRight(List[Pair[Array[Byte], List[Int]]]()){ (el, res) =>
(el._1.getBytes, el._2) :: res
}
In the first example I convert the type, and then call toList. I assume the runtime is something like O(n*2) and the space required is n*2. In the second example, I convert the type and generate the list in one go. I assume the runtime is O(n) and the space required is n.
My question is, are these essentially identical or does the second conversion cut down on memory/time/etc? Additionally, where can I find information on storage and runtime costs of various scala conversions?
Thanks in advance.
My favorite way to do this kind of things is like this:
input.map { case (k,v) => (k.getBytes, v) }(collection.breakOut): List[(Array[Byte], List[Int])]
With this syntax, you are passing to map the builder it needs to reconstruct the resulting collection. (Actually, not a builder, but a builder factory. Read more about Scala's CanBuildFroms if you are interested.) collection.breakOut can exactly be used when you want to change from one collection type to another while doing a map, flatMap, etc. — the only bad part is that you have to use the full type annotation for it to be effective (here, I used a type ascription after the expression). Then, there's no intermediary collection being built, and the list is constructed while mapping.
Mapping over a view in the first example could cut down on the space requirement for a large map:
val output = input.view.map{ case(k,v) => (k.getBytes, v) } toList