Scala: How do I use fold* with Map? - scala

I have a Map[String, String] and want to concatenate the values to a single string.
I can see how to do this using a List...
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.reduceLeft[String](_+_)
res8: String = testing123
fold* or reduce* seem to be the right approach I just can't get the syntax right for a Map.

Folds on a map work the same way they would on a list of pairs. You can't use reduce because then the result type would have to be the same as the element type (i.e. a pair), but you want a string. So you use foldLeft with the empty string as the neutral element. You also can't just use _+_ because then you'd try to add a pair to a string. You have to instead use a function that adds the accumulated string, the first value of the pair and the second value of the pair. So you get this:
scala> val m = Map("la" -> "la", "foo" -> "bar")
m: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(la -> la, foo -> bar)
scala> m.foldLeft("")( (acc, kv) => acc + kv._1 + kv._2)
res14: java.lang.String = lalafoobar
Explanation of the first argument to fold:
As you know the function (acc, kv) => acc + kv._1 + kv._2 gets two arguments: the second is the key-value pair currently being processed. The first is the result accumulated so far. However what is the value of acc when the first pair is processed (and no result has been accumulated yet)? When you use reduce the first value of acc will be the first pair in the list (and the first value of kv will be the second pair in the list). However this does not work if you want the type of the result to be different than the element types. So instead of reduce we use fold where we pass the first value of acc as the first argument to foldLeft.
In short: the first argument to foldLeft says what the starting value of acc should be.
As Tom pointed out, you should keep in mind that maps don't necessarily maintain insertion order (Map2 and co. do, but hashmaps do not), so the string may list the elements in a different order than the one in which you inserted them.

The question has been answered already, but I'd like to point out that there are easier ways to produce those strings, if that's all you want. Like this:
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.mkString
res0: String = testing123
scala> val m = Map(1 -> "abc", 2 -> "def", 3 -> "ghi")
m: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,abc), (2,def), (3,ghi))
scala> m.values.mkString
res1: String = abcdefghi

Related

How can I functionally iterate over a collection combining elements?

I have a sequence of values of type A that I want to transform to a sequence of type B.
Some of the elements with type A can be converted to a B, however some other elements need to be combined with the immediately previous element to produce a B.
I see it as a small state machine with two states, the first one handling the transformation from A to B when just the current A is needed, or saving A if the next row is needed and going to the second state; the second state combining the saved A with the new A to produce a B and then go back to state 1.
I'm trying to use scalaz's Iteratees but I fear I'm overcomplicating it, and I'm forced to return a dummy B when the input has reached EOF.
What's the most elegant solution to do it?
What about invoking the sliding() method on your sequence?
You might have to put a dummy element at the head of the sequence so that the first element (the real head) is evaluated/converted correctly.
If you map() over the result from sliding(2) then map will "see" every element with its predecessor.
val input: Seq[A] = ??? // real data here (no dummy values)
val output: Seq[B] = (dummy +: input).sliding(2).flatMap(a2b).toSeq
def a2b( arg: Seq[A] ): Seq[B] = {
// arg holds 2 elements
// return a Seq() of zero or more elements
}
Taking a stab at it:
Partition your list into two lists. The first is the one you can directly convert and the second is the one that you need to merge.
scala> val l = List("String", 1, 4, "Hello")
l: List[Any] = List(String, 1, 4, Hello)
scala> val (string, int) = l partition { case s:String => true case _ => false}
string: List[Any] = List(String, Hello)
int: List[Any] = List(1, 4)
Replace the logic in the partition block with whatever you need.
After you have the two lists, you can do whatever you need to with your second using something like this
scala> string ::: int.collect{case i:Integer => i}.sliding(2).collect{
| case List(a, b) => a+b.toString}.toList
res4: List[Any] = List(String, Hello, 14)
You would replace the addition with whatever your aggregate function is.
Hopefully this is helpful.

How to get combined lists from two list using scala?

I have two lists
val list1 = List((List("AAA"),"B1","C1"),(List("BBB"),"B2","C2"))
val list2 = List(("AAA",List("a","b","c")),("BBB",List("c","d","e")))
I want to match first element from list2 with first element of list1 and get combined list.
I want output as -
List((List("AAA"),"B1","C1",List("a","b","c")))
How to get above output using Scala??
This is what I came up with:
scala> val l1 = List((List("AAA"),"B1","C1"),(List("BBB"),"B2","C2"))
l1: List[(List[String], String, String)] = List((List(AAA),B1,C1), (List(BBB),B2,C2))
scala> val l2 = List((List("AAA"), List("a", "b", "c")), (List("BBB"), List("c", "d", "e")))
l2: List[(String, List[String])] = List((AAA,List(a, b, c)), (BBB,List(c, d, e)))
scala> l1.collectFirst {
| case tp => l2.find(tp2 => tp2._1.head == tp._1.head).map(founded => (tp._1, tp._2, tp._3, founded._2))
| }.flatten
res2: Option[(List[String], String, String, List[String])] = Some((List(AAA),B1,C1,List(a, b, c)))
You can use collectFirst to filter values you don't want and on every tuple you use find on the second list and map it into the tuple you want.
A couple of notes, this is horrible, I don't know how you got with a Tuple4 in the first place, I personally hate all that tp._* notation, it's hard to read, think about using case classes to wrap all that into some more manageable structure, second I had to use .head which in case of empty list will throw an exception so you may want to do some checking before that, but as I said, I would completely review my code and avoid spending time working on some flawed architecture in the first place.
You can use zip to combine both the list
val list1 = List((List("AAA"),"B1","C1"),(List("BBB"),"B2","C2"))
val list2 = List(("AAA",List("a","b","c")),("BBB",List("c","d","e")))
val combinedList = (list1 zip list2)
combinedList.head will give you the desired result
It will give the combined list from which you can get the first element

Scala: Map keys with wildcards?

Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?
How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }
Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a
In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)
Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2

Summing items within a Tuple

Below is a data structure of List of tuples, ot type List[(String, String, Int)]
val data3 = (List( ("id1" , "a", 1), ("id1" , "a", 1), ("id1" , "a", 1) , ("id2" , "a", 1)) )
//> data3 : List[(String, String, Int)] = List((id1,a,1), (id1,a,1), (id1,a,1),
//| (id2,a,1))
I'm attempting to count the occurences of each Int value associated with each id. So above data structure should be converted to List((id1,a,3) , (id2,a,1))
This is what I have come up with but I'm unsure how to group similar items within a Tuple :
data3.map( { case (id,name,num) => (id , name , num + 1)})
//> res0: List[(String, String, Int)] = List((id1,a,2), (id1,a,2), (id1,a,2), (i
//| d2,a,2))
In practice data3 is of type spark obj RDD , I'm using a List in this example for testing but same solution should be compatible with an RDD . I'm using a List for local testing purposes.
Update : based on following code provided by maasg :
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)}
I needed to amend slightly to get into format I expect which is of type
.RDD[(String, Seq[(String, Int)])]
which corresponds to .RDD[(id, Seq[(name, count-of-names)])]
:
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => ((id1),(id2,values.sum))}
val counted = result.groupedByKey
In Spark, you would do something like this: (using Spark Shell to illustrate)
val l = List( ("id1" , "a", 1), ("id1" , "a", 1), ("id1" , "a", 1) , ("id2" , "a", 1))
val rdd = sc.parallelize(l)
val grouped = rdd.groupBy{case (id1,id2,v) => (id1,id2)}
val result = grouped.map{case ((id1,id2),values) => (id1,id2,value.foldLeft(0){case (cumm, tuple) => cumm + tuple._3})}
Another option would be to map the rdd into a PairRDD and use groupByKey:
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)}
Option 2 is a slightly better option when handling large sets as it does not replicate the id's in the cummulated value.
This seems to work when I use scala-ide:
data3
.groupBy(tupl => (tupl._1, tupl._2))
.mapValues(v =>(v.head._1,v.head._2, v.map(_._3).sum))
.values.toList
And the result is the same as required by the question
res0: List[(String, String, Int)] = List((id1,a,3), (id2,a,1))
You should look into List.groupBy.
You can use the id as the key, and then use the length of your values in the map (ie all the items sharing the same id) to know the count.
#vptheron has the right idea.
As can be seen in the docs
def groupBy[K](f: (A) ⇒ K): Map[K, List[A]]
Partitions this list into a map of lists according to some discriminator function.
Note: this method is not re-implemented by views. This means when applied to a view it will >always force the view and return a new list.
K the type of keys returned by the discriminator function.
f the discriminator function.
returns
A map from keys to lists such that the following invariant holds:
(xs partition f)(k) = xs filter (x => f(x) == k)
That is, every key k is bound to a list of those elements x for which f(x) equals k.
So something like the below function, when used with groupBy will give you a list with keys being the ids.
(Sorry, I don't have access to an Scala compiler, so I can't test)
def f(tupule: A) :String = {
return tupule._1
}
Then you will have to iterate through the List for each id in the Map and sum up the number of integer occurrences. That is straightforward, but if you still need help, ask in the comments.
The following is the most readable, efficient and scalable
data.map {
case (key1, key2, value) => ((key1, key2), value)
}
.reduceByKey(_ + _)
which will give a RDD[(String, String, Int)]. By using reduceByKey it means the summation will paralellize, i.e. for very large groups it will be distributed and summation will happen on the map side. Think about the case where there are only 10 groups but billions of records, using .sum won't scale as it will only be able to distribute to 10 cores.
A few more notes about the other answers:
Using head here is unnecessary: .mapValues(v =>(v.head._1,v.head._2, v.map(_._3).sum)) can just use .mapValues(v =>(v_1, v._2, v.map(_._3).sum))
Using a foldLeft here is really horrible when the above shows .map(_._3).sum will do: val result = grouped.map{case ((id1,id2),values) => (id1,id2,value.foldLeft(0){case (cumm, tuple) => cumm + tuple._3})}

Nested Default Maps in Scala

I'm trying to construct nested maps in Scala, where both the outer and inner map use the "withDefaultValue" method. For example, the following :
val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m(1)(2)
res: Int = 3
m(1)(2) = 5
m(1)(2)
res: Int = 5
m(2)(3) = 6
m
res : scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
So the map, when addressed by the appropriate keys, gives me back what I put in. However, the map itself appears empty! Even m.size returns 0 in this example. Can anyone explain what's going on here?
Short answer
It's definitely not a bug.
Long answer
The behavior of withDefaultValue is to store a default value (in your case, a mutable map) inside the Map to be returned in the case that they key does not exist. This is not the same as a value that is inserted into the Map when they key is not found.
Let's look closely at what's happening. It will be easier to understand if we pull the default map out as a separate variable so we can inspect is at will; let's call it default
import collection.mutable.HashMap
val default = HashMap.empty[Int,Int].withDefaultValue(3)
So default is a mutable map (that has its own default value). Now we can create m and give default as the default value.
import collection.mutable.{Map => MMap}
val m = HashMap.empty[Int, MMap[Int,Int]].withDefaultValue(default)
Now whenever m is accessed with a missing key, it will return default. Notice that this is the exact same behavior as you have because withDefaultValue is defined as:
def withDefaultValue (d: B): Map[A, B]
Notice that it's d: B and not d: => B, so it will not create a new map each time the default is accessed; it will return the same exact object, what we've called default.
So let's see what happens:
m(1) // Map()
Since key 1 is not in m, the default, default is returned. default at this time is an empty Map.
m(1)(2) = 5
Since m(1) returns default, this operation stores 5 as the value for key 2 in default. Nothing is written to the Map m because m(1) resolves to default which is a separate Map entirely. We can check this by viewing default:
default // Map(2 -> 5)
But as we said, m is left unchanged
m // Map()
Now, how to achieve what you really wanted? Instead of using withDefaultValue, you want to make use of getOrElseUpdate:
def getOrElseUpdate (key: A, op: ⇒ B): B
Notice how we see op: => B? This means that the argument op will be re-evaluated each time it is needed. This allows us to put a new Map in there and have it be a separate new Map for each invalid key. Let's take a look:
val m2 = HashMap.empty[Int, MMap[Int,Int]]
No default values needed here.
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(3)) // Map()
Key 1 doesn't exist, so we insert a new HashMap, and return that new value. We can check that it was inserted as we expected. Notice that 1 maps to the newly added empty map and that they 3 was not added anywhere because of the behavior explained above.
m2 // Map(1 -> Map())
Likewise, we can update the Map as expected:
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(1))(2) = 6
and check that it was added:
m2 // Map(1 -> Map(2 -> 6))
withDefaultValue is used to return a value when the key was not found. It does not populate the map. So you map stays empty. Somewhat like using getOrElse(a, b) where b is provided by withDefaultValue.
I just had the exact same problem, and was happy to find dhg's answer. Since typing getOrElseUpdate all the time is not very concise, I came up with this little extension of the idea that I want to share:
You can declare a class that uses getOrElseUpdate as default behavior for the () operator:
class DefaultDict[K, V](defaultFunction: (K) => V) extends HashMap[K, V] {
override def default(key: K): V = return defaultFunction(key)
override def apply(key: K): V =
getOrElseUpdate(key, default(key))
}
Now you can do what you want to do like this:
var map = new DefaultDict[Int, DefaultDict[Int, Int]](
key => new DefaultDict(key => 3))
map(1)(2) = 5
Which does now result in map containing 5 (or rather: containing a DefaultDict containing the value 5 for the key 2).
What you're seeing is the effect that you've created a single Map[Int, Int] this is the default value whenever the key isn't in the outer map.
scala> val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m: scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
scala> m(2)(2)
res1: Int = 3
scala> m(1)(2) = 5
scala> m(2)(2)
res2: Int = 5
To get the effect that you're looking for, you'll have to wrap the Map with an implementation that actually inserts the default value when a key isn't found in the Map.
Edit:
I'm not sure what your actual use case is, but you may have an easier time using a pair for the key to a single Map.
scala> val m = HashMap.empty[(Int, Int), Int].withDefaultValue(3)
m: scala.collection.mutable.Map[(Int, Int),Int] = Map()
scala> m((1, 2))
res0: Int = 3
scala> m((1, 2)) = 5
scala> m((1, 2))
res3: Int = 5
scala> m
res4: scala.collection.mutable.Map[(Int, Int),Int] = Map((1,2) -> 5)
I know it's a bit late but I've just seen the post while I was trying to solve the same problem.
Probably the API are different from the 2012 version but you may want to use withDefaultinstead that withDefaultValue.
The difference is that withDefault takes a function as parameter, that is executed every time a missed key is requested ;)