Fixed Length SortedMap in Scala - scala

I'm new to Scala, does Scala support a fixed length SortedMap?
What I have in mind is a map that does the following:
Takes a max_size parameter upon creation
Upon an add, checks if there are already max_size elements
If there is, remove the smallest key and its value first (key's gonna be an Int)
Then adds the key and value to the map.
Strictly speaking, I don't need the map to be sorted, but it seems necessary/available if we're removing the smallest key
I wanted to ask before I started rolling my own. Also I will be running this under Samza, which I believe is single threaded and so concurrency won't be a concern.
I'm on scala 2.10

You can do something simple like this based on TreeMap which guarantees order of elements by key:
import scala.collection.immutable.TreeMap
def add[K,V](map: TreeMap[K,V], elem: (K,V), maxSize: Int): TreeMap[K,V] = {
map.takeRight(maxSize - 1) + elem
}
Here is how to use it:
scala> val m = TreeMap(1 -> "one", 2 -> "two", 3 -> "three")
m: scala.collection.immutable.TreeMap[Int,String] =
Map(1 -> one, 2 -> two, 3 -> three)
scala> val m1 = add(m, 0 -> "zero", 4)
m1: scala.collection.immutable.TreeMap[Int,String] =
Map(0 -> zero, 1 -> one, 2 -> two, 3 -> three)
scala> val m2 = add(m1, 4 -> "four", 4)
m2: scala.collection.immutable.TreeMap[Int,String] =
Map(1 -> one, 2 -> two, 3 -> three, 4 -> four)
scala> val m3 = add(m2, 5 -> "five", 4)
m3: scala.collection.immutable.TreeMap[Int,String] =
Map(2 -> two, 3 -> three, 4 -> four, 5 -> five)
scala> val m4 = add(m3, 0 -> "zero", 4)
m4: scala.collection.immutable.TreeMap[Int,String] =
Map(0 -> zero, 3 -> three, 4 -> four, 5 -> five)
You can obviously try to make it more convenient to suit your needs.

Aleksey's answer was very helpful. I made a small fix to it
import scala.collection.immutable.TreeMap
def add[K,V](map: TreeMap[K,V], elem: (K,V), maxSize: Int): TreeMap[K,V] = {
(map + elem).takeRight(maxSize - 1)
}
val m = TreeMap(1 -> "one", 2 -> "two", 3 -> "three")
val m1 = add(m, 0 -> "zero", 4)
val m2 = add(m1, 4 -> "four", 4)
val m3 = add(m2, 0 -> "zero", 4)
val m4 = add(m3, 1 -> "one", 4)
val m5 = add(m4, 0 -> "zero", 4)
val m6 = add(m5, 1 -> "one", 4)

Related

Concatenate two Scala mutable maps preserving the keys of the first map

The scala API let's you append one map to another as follows:
import scala.collection.mutable.{Map => MutableMap}
val m1: MutableMap[Int,String] = MutableMap(1 -> "A", 2 -> "B", 3 -> "C")
val m2: MutableMap[Int,String] = MutableMap(2 -> "X", 3 -> "Y", 4 -> "Z")
m1 ++= m2 // outputs: Map(2 -> X, 4 -> Z, 1 -> A, 3 -> Y)
m1 // outputs: Map(2 -> X, 4 -> Z, 1 -> A, 3 -> Y)
The behaviour is to override the repeated pairs with the pairs coming from the right map.
What is a good way to do it in the opposite way? That is, concatenating the pairs of m1 and m2 in m1 where the pairs of m1 are kept if repeated in m2.
m1 ++= (m2 ++ m1) perhaps?
Do you have to mutate m1 (that's rarely the right thing to do in scala anyway)?
You could just create a new map as m2 ++ m1 otherwise ...
Store as a list (or similar collection) and group them:
val l1 = List(1 -> "A", 2 -> "B", 3 -> "C")
val l2 = List(2 -> "X", 3 -> "Y", 4 -> "Z")
(l1 ::: l2).groupBy(_._1) //Map[Int, List[Int, String]]
//output: Map(2 -> List((2,B), (2,X)), 4 -> List((4,Z)), 1 -> List((1,A)), 3 -> List((3,C), (3,Y)))
You can of course remove the leftover integers from the Map's value lists if you want.

Removing elements from a map of type (Int, ListBuffer(Int))

I have LinkedHashMaps of type:
val map1 = LinkedHashMap(1 -> 1, 2 -> (1,2), 3 -> (1,2,3))
val map2 = LinkedHashMap(2 -> 2, 3 -> (2,3), 5 -> (2,3,5))
where the integers are nodes's ids of a graph, and the list is the path to that node. I want to implement the case of deleting a node. Suppose I want to delete node 3, I have to do two actions: remove the element with key = 3 in every map, remove the elements which have 3 in their list. How to do it in scala?
If you define you map like you have,
val map1 = LinkedHashMap(1 -> 1, 2 -> (1,2), 3 -> (1,2,3))
You do not have key: Int and value: List[Int] but you have key: Int and values: Any.
scala> val map1 = LinkedHashMap(1 -> 1, 2 -> (1,2), 3 -> (1,2,3))
// map1: scala.collection.mutable.LinkedHashMap[Int,Any] = Map(1 -> 1, 2 -> (1,2), 3 -> (1,2,3))
To match your requirement, you should define your map like following,
scala> val map1 = LinkedHashMap(1 -> List(1), 2 -> List(1,2), 3 -> List(1,2,3))
// map1: scala.collection.mutable.LinkedHashMap[Int,List[Int]] = Map(1 -> List(1), 2 -> List(1, 2), 3 -> List(1, 2, 3))
Now, if you want to delete a node 3,
scala> val map2 = map1.filter({
| case (key, list) => key != 3 && !list.contains(3)
| })
// map2: scala.collection.mutable.LinkedHashMap[Int,List[Int]] = Map(1 -> List(1), 2 -> List(1, 2))

When applying filter to a scala Map, how can I also check what entries were removed?

I am learning scala, and at one point I want to remove entries from a map based on the value (not the key). I also want to know how many entries were removed - my program expects that exactly one entry should be removed.
Removing entries by their values can be done by applying filterNot, ok -- but how can I verify that exactly one entry was removed?
So far the only way I saw to achieve that is to run the predicate twice -- once for the "count" method (to count how often the predicate matches), and then with filterNot, to actually remove the entries.
What is the Scala way of achieving that in one go?
The only other solution I found is to first use filter(...) to get the values to be removed ,and then use "-" to throw out the elements by their keys - but again, this requires two runs.
As long as you don't mind creating a collection out of the information that is removed you can use partition:
scala> Map(1 -> 1, 2 -> 2, 3 -> 3)
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> res0.partition { case (k, v) => v % 2 == 0 }
res3: (Map(2 -> 2),Map(1 -> 1, 3 -> 3))
If you just want to know if only one entry was removed then you can use size to get the size before and after the filter operation. The difference should be one.
scala> Map(1 -> 1, 2 -> 2, 3 -> 3)
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> res0.filterNot { case (k, v) => v % 2 == 0 }
res1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 3 -> 3)
scala> res0.size - res1.size
res2: Int = 1
Consider groupBy on the mapped values, as follows; let
val a = ((1 to 3) zip (11 to 33)).toMap
a: Map(1 -> 11, 2 -> 12, 3 -> 13)
then
a.groupBy(_._2 % 2 == 0)
res: Map(false -> Map(1 -> 11, 3 -> 13), true -> Map(2 -> 12))

Compare two Maps in Scala

Is there any pre-defined function that I can use to compare two Maps based on the key and give me the difference? Right now, I iterate Map1 and foreach key, I check if there is an element in Map2 and I pattern match to find the difference. Is there a much elegant way to do this?
Consider the difference between the maps converted into sets of tuples,
(m1.toSet diff m2.toSet).toMap
Try:
val diff = (m1.keySet -- m2.keySet) ++ (m2.keySet -- m1.keySet)
diff contains the elements that are in m1 and not in m2 and that are in m2 and not in m1.
This solution looks like right way:
scala> val x = Map(1 -> "a", 2 -> "b", 3 -> "c")
x: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b, 3 -> c)
scala> val y = Map(1 -> "a", 2 -> "b", 4 -> "d")
y: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b, 4 -> d)
scala> val diff : Map[Int, String] = x -- y.keySet
diff: Map[Int,String] = Map(3 -> c)
Found it here https://gist.github.com/frgomes/69068062e7849dfe9d5a53bd3543fb81
I think the -- operator will do what you're looking for: http://www.scala-lang.org/api/current/index.html#scala.collection.Map#--(xs:scala.collection.GenTraversableOnce[A]):Repr
Although this will probably only work given the assumption that Map2 is always a subset of Map1...

How to avoid the strange order in which maps are concatenated? (A++B++C ---> BAC)

Concatenating three maps a, b and c, I would expect the result to be in the same order as its respective original maps. But, as shown below, the result is like the maps were b, a and c:
Welcome to Scala version 2.10.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import collection.mutable
import collection.mutable
scala> val a = mutable.Map(1->2)
a: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> val b = mutable.Map(2->2)
b: scala.collection.mutable.Map[Int,Int] = Map(2 -> 2)
scala> val c = mutable.Map(3->2)
c: scala.collection.mutable.Map[Int,Int] = Map(3 -> 2)
scala> a ++ b ++ c
res0: scala.collection.mutable.Map[Int,Int] = Map(2 -> 2, 1 -> 2, 3 -> 2)
For four maps, it shows b, d, a, c. For two b, a. The resulting map is always in the same order, no matter the original sequence.
Testing the answer:
Welcome to Scala version 2.10.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import collection.mutable.LinkedHashMap
import collection.mutable.LinkedHashMap
scala> val a = LinkedHashMap(1 -> 2)
a: scala.collection.mutable.LinkedHashMap[Int,Int] = Map(1 -> 2)
scala> val b = LinkedHashMap(2 -> 2)
b: scala.collection.mutable.LinkedHashMap[Int,Int] = Map(2 -> 2)
scala> val c = LinkedHashMap(3 -> 2)
c: scala.collection.mutable.LinkedHashMap[Int,Int] = Map(3 -> 2)
scala> a ++ b ++ c
res0: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2, 2 -> 2, 3 -> 2)
Scala's Map (like Java's) does not have a defined iteration order. If you need to maintain insertion order, you can use a ListMap (which is immutable) or a LinkedHashMap (which is not):
scala> import collection.mutable.LinkedHashMap
import collection.mutable.LinkedHashMap
scala> val a = LinkedHashMap(1 -> 2)
a: scala.collection.mutable.LinkedHashMap[Int,Int] = Map(1 -> 2)
scala> a += (2 -> 2)
res0: a.type = Map(1 -> 2, 2 -> 2)
scala> a += (3 -> 2)
res1: a.type = Map(1 -> 2, 2 -> 2, 3 -> 2)
scala> a
res2: scala.collection.mutable.LinkedHashMap[Int,Int] = Map(1 -> 2, 2 -> 2, 3 -> 2)
But in general if you care about the order of your elements, you're probably better off with a different data structure.