Scala: difference of two sets by key

Scala: difference of two sets by key - scala

I have two sets of (k,v) pairs:
val x = Set((1,2), (2,10), (3,5), (7,15))
val y = Set((1,200), (3,500))
How to find difference of these two sets by keys, to get:
Set((2,10),(7,15))
Any quick and simple solution?

val ym = y.toMap
x.toMap.filterKeys(k => !(ym contains k)).toSet
Sets don't have keys, maps do. So you convert to map. Then, you can't create a difference on maps, but you can filter the keys to exclude the ones you don't want. And then you're done save for converting back to a Set. (It's not the most efficient way to do this, but it's not bad and it's easy to write.)

Let val keys = y.map(_._1).toSet be the set of keys (first element in the pair) that must not occur as key in x; thus
for ( p <- x if !keys(p._1) ) yield p
as well as
x.collect { case p#(a,b) if !keys(a) => p }
and
x.filter ( p => !keys(p._1) )
x.filterNot ( p => keys(p._1) )

you can try this one:
x filter{ m => y map{_._1} contains m._1} toSet

Related

Scala Maps getOrElseUpdate and Append result

I have a Map which looks like this
var map = Map[Int, Map[Int, Int]]()
I need the functionality where I need to lookup values in the inner map and if the value is found the result is "added" and not just updated.
I have written this code like
map.get(x) match {
case Some(innerMap) =>
innerMap.get(y) match {
case Some(z) => map(x)(y) = z + a
case None => map(x)(y) = z
case None => map(x) = Map(y -> a)
If you pass the inputs x=2,y=2,a=10 and x=2,y=2,a=10 to the code above the map will contain x=2,y=2,a=20 because values are added.
But I don't like this nested pattern matching. (reminds me of the nested ifs) I tried to simplify this using getOrElseUpdate
map
.getOrElseUpdate(x, Map(y -> a))
.getOrElseUpdate(y, a)
This is good but doesn't handle the scenario where y in found in the inner map and we needed to do a z + a

Not necessarily the most efficient solution, but simple - you can instantiate values with zeroes, and then add the value itself assuming all the keys are already there:
map
.getOrElseUpdate(x, Map(y -> 0))
.getOrElseUpdate(y, 0)
map(x)(y) += a

Scala test a Map is bijective

What is a simple approach to testing whether a Map[A,B] is bijective, namely for
val m1 = Map( 1 -> "a", 2 -> "b")
val m2 = Map( 1 -> "a", 2 -> "a")
we have that m1 is bijective unlike m2.

You could do
val m = Map(1 -> "a", 2 -> "a")
val isBijective = m.values.toSet.size == m.size
A bijection is one-to-one and onto. The mapping defined by a Map is definitely onto. Every value has a corresponding key.
The only way it's not one-to-one is if two keys map to the same value. But that means that we'll have fewer values than keys.

As explained here -
For a pairing between X and Y (where Y need not be different from X) to be a bijection, four properties must hold:
1) each element of X must be paired with at least one element of Y
This is inherent nature of Map. In some cases it can be mapped to None which is again one of the element of Y.
2) no element of X may be paired with more than one element of Y,
Again inherent neture of Map.
3) each element of Y must be paired with at least one element of X, and
Every element in Y will have some association with X otherwise it won't exist.
4) no element of Y may be paired with more than one element of X.
Map doesnt have this constraint. So we need to check for this. If Y contains duplicates then it is violated.
So, sufficient check should be "No duplicates in Y")
val a = scala.collection.mutable.Map[Int, Option[Int]]()
a(10) = None
a(20) = None
if(a.values.toArray.distinct.size != a.values.size) println("No") else println("Yes") // No
val a = Map("a"->10, "b"->20)
if(a.values.toArray.distinct.size != a.values.size) println("No") else println("Yes") // Yes
val a = scala.collection.mutable.Map[Int, Option[Int]]()
a(10) = Some(100)
a(20) = None
a(30) = Some(300)
if(a.values.toArray.distinct.size != a.values.size) println("No") else println("Yes") // Yes

Putting two placeholders inside flatMap in Spark Scala to create Array

I am applying flatMap on a scala array and create another array from it:
val x = sc.parallelize(Array(1,2,3,4,5,6,7))
val y = x.flatMap(n => Array(n,n*100,42))
println(y.collect().mkString(","))
1,100,42,2,200,42,3,300,42,4,400,42,5,500,42,6,600,42,7,700,42
But I am trying to use placeholder "_" in the second line of the code where I create y in the following way:
scala> val y = x.flatMap(Array(_,_*100,42))
<console>:26: error: wrong number of parameters; expected = 1
val y = x.flatMap(Array(_,_*100,42))
^
Which is not working. Could someone explain what to do in such cases if I want to use placeholder?

In scala, the number of placeholders in a lambda indicates the cardinality of the lambda parameters.
So the last line is expanded as
val y = x.flatMap((x1, x2) => Array(x1, x2*100, 42))
Long story short, you can't use a placeholder to refer twice to the same element.
You have to use named parameters in this case.
val y = x.flatMap(x => Array(x, x*100, 42))

You can only use _ placeholder once per parameter. (In your case, flatMap method takes single argument, but you are saying -- hey compiler, expect two arguments which is not going to work)
val y = x.flatMap(i => Array(i._1, i._2*100,42))
should do the trick.
val y = x.flatMap { case (i1, i2) => Array(i1, i2*100,42) }
should also work (and probably more readable)

Scala: Map keys with wildcards?

Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?

How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }

Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a

In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)

Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2

Combine values with same keys in Scala

I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.
I have currently implemented it as
val keys = List('a','b','a')
val values = List(45,65,12)
val finalMap:Map(char:Int) =
scala.collection.mutable.Map().withDefaultValue(0)
0 until keys.length map (w => finalMap(keys(w)) += values(w))
I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?

val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum)
EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so
keys.zip(values) = List((a, 45), (b, 65), (a, 12))
Now you want to group together all the pairs with the same first element. This can be done with groupBy:
keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65))))
groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.
The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].
You can do this by summing the values from the second element of each pair in the list.
You can extract the values from each pair using map e.g.
List((a, 45), (a, 12)).map(_._2) = List(45,12)
Now you can sum these values using sum:
List(45, 12).sum = 57
You can apply this transform to all the values in the map using mapValues to get the result you want.

I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.
scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) }
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.
I'm going to +1 Lee's typing speed anyway.
Edit: to explain how mapValues is a view:
scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v =>
| println("OK mapping")
| v._2
| }.sum)
OK mapping
OK mapping
OK mapping
res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
scala> res2('a') // recomputes
OK mapping
OK mapping
res4: Int = 57
Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.

You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:
( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) =>
m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) )
)
Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.