First off, apologies for the lame question. I am reading the `Scala for the Impatient' religiously and trying to solve all the exercise questions (and doing some minimal exploration)
Background :
The exercise question goes like - Setup a map of prices for a number of gizmos that you covet. Then produce a second map with the same keys and the prices at a 10% discount.
Unfortunately, at this point, most parts of the scaladoc are still cryptic to me but I understand that the map function of the Map takes a function and returns another map after applying a function (I guess?) - def map[B](f: (A) ⇒ B): HashMap[B]. I tried googling but couldnt get much useful results for map function for Map in scala :-)
My Question:
As attempted in my variation 3, does using map function for this purpose make any sense or should I stick with the variation 2 which actually solves my problem.
Code :
val gizmos:Map[String,Double]=Map("Samsung Galaxy S4 Zoom"-> 1000, "Mac Pro"-> 6000.10, "Google Glass"->2000)
//1. Normal for/yield
val discountedGizmos=(for ((k,v)<-gizmos) yield (k, v*0.9)) //Works fine
//2. Variation using mapValues
val discGizmos1=gizmos.mapValues(_*0.9) //Works fine
//3. Variation using only map function
val discGizmos2=gizmos.map((_,v) =>v*0.9) //ERROR : Wrong number of parameters: expected 1
In this case, mapValues does seem the more appropriate method to use. You would use the map method when you need to perform a transformation that requires knowledge of the keys (eg. converting a product reference into a product name, say).
That said, the map method is more general as it gives you acces to both the keys and values for you to act upon, and you could emulate the mapValues method by simply transforming the values and passing the keys through untouched - and that is where you are going wrong in your code above. To use the map method correctly, you should be producing a (key, value) pair from your function, not just a key:
val discGizmos2=gizmos.map{ case (k,v) => (k,v*0.9) } // pass the key through unchanged
It can be also:
val discGizmos2 = gizmos.map(kv => (kv._1, kv._2*0.9))
Related
I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!
This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.
What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
In a Scala program I wrote I have a scala.collection.Map that maps a String to some calculated values (in detail it's Map[String, (Double, immutable.Map[String, Double], Double)] - I know that's ugly and should (and will be) wrapped). Now, if I do this:
stats.map { case(c, (prior, pwc, denom)) => {
println(c)
...
}
}
it takes about 30 seconds to print out roughly 50 times a value of c! The println is just a test statement - the actual calculation I need was even slower (I aborted after 1 minute of complete silence). However, if I do it like this:
stats.mapValues { case (prior, pwc, denom) => {
println(prior)
...
}
}
I don't run into these performance issues ... Can anyone explain why this is happening? Am I not following some important Scala guidelines?
Thanks for the help!
Edit:
I further investigated the behaviour. My guess is that the bottleneck comes from accessin the Map datastructure. If I do the following, I have have the same performance issues:
classes.foreach{c => {
println(c)
val ps = stats(c)
}
}
Here classes is a List[String] that stores the keys of the Map externally. Without the access to stats(c) no performance losses occur.
mapValues actually returns a view on the original map, which can lead to unexpected performance issues. From this blog post:
...here is a catch: map and mapValues are different in a not-so-subtle
way. mapValues, unlike map, returns a view on the original map. This
view holds references to both the original map and to the
transformation function (here (_ + 1)). Every time the returned map
(view) is queried, the original map is first queried and the
tranformation function is called on the result.
I recommend reading the rest of that post for some more details.
Lets say I have a set of a class Action like this: actions: Set[Action], and each Action class has a val consequences : Set[Consequence], where Consequence is a case class.
I wish to get a map from Consequence to Set[Action] to determine which actions cause a specific Consequence. Obviously since an Action can have multiple Consequences it can appear in multiple sets in the map.
I have been trying to get my head around this (I am new to Scala), wondering if I can do it with something like map() and groupBy(), but a bit lost. I don't wish to revert to imperative programming, especially if there is some Scala mapping function that can help.
What is the best way to achieve this?
Not exactly elegant because groupBy doesn't handle the case of operating already on a Tuple2, so you end up doing a lot of tupling and untupling:
case class Conseq()
case class Action(conseqs: Set[Conseq])
def gimme(actions: Seq[Action]): Map[Conseq, Set[Action]] =
actions.flatMap(a => a.conseqs.map(_ -> a))
.groupBy(_._1)
.mapValues(_.map(_._2)(collection.breakOut))
The first line "zips" each action with all of its consequences, yielding a Seq[(Conseq, Action)], grouping this by the first product element gives Map[Conseq, Seq[(Conseq, Action)]. So the last step needs to transform the map's values from Seq[(Conseq, Action)] to a Set[Action]. This can be done with mapValues. Without the explicit builder factory, it would produce a Seq[Action], so one would have to write .mapValues(_.map(_._2)).toSet. Passing in collection.breakOut in the second parameter list to map makes it possible to save one step and make map directly produce the Set collection type.
Another possibility is to use nested folds:
def gimme2(actions: Seq[Action]) = (Map.empty[Conseq, Set[Action]] /: actions) {
(m, a) => (m /: a.conseqs) {
(m1, c) => m1.updated(c, m1.getOrElse(c, Set.empty) + a)
}
}
This is perhaps more readable. We start with an empty result map, traverse the actions, and in the inner fold traverse each action's consequences which get merged into the result map.
I have a map that I need to map to a different type, and the result needs to be a List. I have two ways (seemingly) to accomplish what I want, since calling map on a map seems to always result in a map. Assuming I have some map that looks like:
val input = Map[String, List[Int]]("rk1" -> List(1,2,3), "rk2" -> List(4,5,6))
I can either do:
val output = input.map{ case(k,v) => (k.getBytes, v) } toList
Or:
val output = input.foldRight(List[Pair[Array[Byte], List[Int]]]()){ (el, res) =>
(el._1.getBytes, el._2) :: res
}
In the first example I convert the type, and then call toList. I assume the runtime is something like O(n*2) and the space required is n*2. In the second example, I convert the type and generate the list in one go. I assume the runtime is O(n) and the space required is n.
My question is, are these essentially identical or does the second conversion cut down on memory/time/etc? Additionally, where can I find information on storage and runtime costs of various scala conversions?
Thanks in advance.
My favorite way to do this kind of things is like this:
input.map { case (k,v) => (k.getBytes, v) }(collection.breakOut): List[(Array[Byte], List[Int])]
With this syntax, you are passing to map the builder it needs to reconstruct the resulting collection. (Actually, not a builder, but a builder factory. Read more about Scala's CanBuildFroms if you are interested.) collection.breakOut can exactly be used when you want to change from one collection type to another while doing a map, flatMap, etc. — the only bad part is that you have to use the full type annotation for it to be effective (here, I used a type ascription after the expression). Then, there's no intermediary collection being built, and the list is constructed while mapping.
Mapping over a view in the first example could cut down on the space requirement for a large map:
val output = input.view.map{ case(k,v) => (k.getBytes, v) } toList
I've this code :
val total = ListMap[String,HashMap[Int,_]]
val hm1 = new HashMap[Int,String]
val hm2 = new HashMap[Int,Int]
...
//insert values in hm1 and in hm2
...
total += "key1" -> hm1
total += "key2" -> hm2
....
val get = HashMap[Int,String] = total.get("key1") match {
case a : HashMap[Int,String] => a
}
This work, but I would know if exists a better (more readable) way to do this.
Thanks to all !
It looks like you're trying to re-implement tuples as maps.
val total : ( Map[Int,String], Map[Int,Int]) = ...
def get : Map[Int,String] = total._1
(edit: oh, sorry, I get it now)
Here's the thing: the code above doesn't work. Type parameters are erased, so the match above will ALWAYS return true -- try it with key2, for example.
If you want to store multiple types on a Map and retrieve them latter, you'll need to use Manifest and specialized get and put methods. But this has already been answers on Stack Overflow, so I won't repeat myself here.
Your total map, containing maps with non uniform value types, would be best avoided. The question is, when you retrieve the map at "key1", and then cast it to a map of strings, why did you choose String?
The most trivial reason might be that key1 and so on are simply constants, that you know all of them when you write your code. In that case, you probably should have a val for each of your maps, and dispense with map of maps entirely.
It might be that the calls made by the client code have this knowledge. Say that the client does stringMap("key1"), or intMap("key2") or that one way or another, the call implies that some given type is expected. That the client is responsible for not mixing types and names. Again in that case, there is no reason for total. You would have a map of string maps, a map of int maps (provided that you are previous knowledge of a limited number of value types)
What is your reason to have total?
First of all: this is a non-answer (as I would not recommend the approach I discuss), but it was too long for a comment.
If you haven't got too many different keys in your ListMap, I would suggest trying Malvolio's answer.
Otherwise, due to type erasure, the other approaches based on pattern matching are practically equivalent to this (which works, but is very unsafe):
val get = total("key1").asInstanceOf[HashMap[Int, String]]
the reasons why this is unsafe (unless you like living dangerously) are:
total("key1") is not returning an Option (unlike total.get("key1")). If "key1" does not exist, it will throw a NoSuchElementException. I wasn't sure how you were planning to manage the "None" case anyway.
asInstanceOf will also happily cast total("key2") - which should be a HashMap[Int, Int], but is at this point a HashMap[Int, Any] - to a HashMap[Int, String]. You will have problem later on when you try to access the Int value (which now scala believes is a String)