Reduce a List of Map of Tuples - scala

I have the following variable
val x1 = List((List(('a',1), ('e',1), ('t',1)),"eat"), (List(('a',1), ('e',1), ('t',1)),"ate"))
I want to get a
List(Map -> List)
that will look something like the following. The idea is to group words b the characters contained in them
Map(List((a,1), (e,1), (t,1)) -> List(eat, ate))
I have used the following to achieve this however not quite getting it right. I have used the following code and get the expected result
scala> val z1 = x1.groupBy(x => x._1 )
.map(x => Map(x._1 -> x._2.map(z=>z._2)))
.fold(){(a,b) => b}
z1: Any = Map(List((a,1), (e,1), (t,1)) -> List(eat, ate))
However, I would like to return the obvious type Map[List[(Char, Int)],List[String]] and not Any as is returned in my case. Also, i'm wondering if there is a better of doing the whole thing itself. Many thanks!

Try this.
scala> x1.groupBy(_._1).mapValues(_.map(_._2))
res2: scala.collection.immutable.Map[List[(Char, Int)],List[String]] = Map(List((a,1), (e,1), (t,1)) -> List(eat, ate))
But, yeah, I think you might want to reconsider your data representation. That List[(List[(Char,Int)], String)] business is rather cumbersome.

Related

Finding a map in list/vector of maps in Scala

I have a vector/list of maps (Map[String,Int]). How can I find if a key-value pair exists in one of these maps in the list of maps using .find?
val res = List(Map("1" -> 1), Map("2" -> 2)).find(t => t.exists(j => j == ("2", 2)))
println(res)
use find with exists to check whether it exists in maps.
chengpohi's solution is pretty inefficient, and also different to how I understand the question.
Let m: Map[String,Int].
Why chengpoi's solution is inefficient
First, using m.exists(j => j == ("2",2)), which can also be written m.contains("2" -> 2) looks at every entry of m, while m("2").toSeq.contains(2) performs only a single map lookup.
Note that m.contains("2" -> 2) will not work, as contains is overridden for Map to check for a key, i.e., m.contains("2") works—and is also fast.
To obtain the same result as chengpoi, but efficiently:
def mapExists[K,V](ms: List[Map[K,V]], k: K, v: V): Option[(K,V)] =
ms.get(k).filter(_ == v).map(_ => k -> v)
Note that this method returns its arguments, which is quite redundant.
How I understand the question
Second, I understood the question as checking whether the List contains a Map with a specific pair.
This would translate to
def mapExists[K,V](ms: List[Map[K,V]], k: K, v: V): Boolean =
ms.exists(_.get(k).contains(v))
It can be done even like this using just the key value we are interested to find:
scala> val res = List(Map("A" -> 10), Map("B" -> 20)).find(_.keySet.contains("B"))
res: Option[scala.collection.immutable.Map[String,Int]] = Some(Map(B -> 20))
scala>

Scala: Map keys with wildcards?

Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?
How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }
Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a
In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)
Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2

Scala: Why mapValues produces a view and is there any stable alternatives?

Just now I am surprised to learn that mapValues produces a view. The consequence is shown in the following example:
case class thing(id: Int)
val rand = new java.util.Random
val distribution = Map(thing(0) -> 0.5, thing(1) -> 0.5)
val perturbed = distribution mapValues { _ + 0.1 * rand.nextGaussian }
val sumProbs = perturbed.map{_._2}.sum
val newDistribution = perturbed mapValues { _ / sumProbs }
The idea is that I have a distribution, which is perturbed with some randomness then I renormalize it. The code actually fails in its original intention: since mapValues produces a view, _ + 0.1 * rand.nextGaussian is always re-evaluated whenever perturbed is used.
I am now doing something like distribution map { case (s, p) => (s, p + 0.1 * rand.nextGaussian) }, but that's just a little bit verbose. So the purpose of this question is:
Remind people who are unaware of this fact.
Look for reasons why they make mapValues output views.
Whether there is an alternative method that produces concrete Map.
Are there any other commonly-used collection methods that have this trap.
Thanks.
There's a ticket about this, SI-4776 (by YT).
The commit that introduces it has this to say:
Following a suggestion of jrudolph, made filterKeys and mapValues
transform abstract maps, and duplicated functionality for immutable
maps. Moved transform and filterNot from immutable to general maps.
Review by phaller.
I have not been able to find the original suggestion by jrudolph, but I assume it was done to make mapValues more efficient. Give the question, that may come as a surprise, but mapValues is more efficient if you are not likely to iterate over the values more than once.
As a work-around, one can do mapValues(...).view.force to produce a new Map.
The scala doc say:
a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements.
So this should be expected, but this scares me a lot, I'll have to review bunch of code tomorrow. I wasn't expecting a behavior like that :-(
Just an other workaround:
You can call toSeq to get a copy, and if you need it back to map toMap, but this unnecessary create objects, and have a performance implication over using map
One can relatively easy write, a mapValues which doesn't create a view, I'll do it tomorrow and post the code here if no one do it before me ;)
EDIT:
I found an easy way to 'force' the view, use '.map(identity)' after mapValues (so no need of implementing a specific function):
scala> val xs = Map("a" -> 1, "b" -> 2)
xs: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)
scala> val ys = xs.mapValues(_ + Random.nextInt).map(identity)
ys: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
scala> ys
res7: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
It's a shame the type returned isn't actually a view! othewise one would have been able to call 'force' ...
Is better(and deprecated) in scala 2.13, now returns a MapView:
API Doc

How to convert Map[String,Seq[String]] to Map[String,String]

I have a Map[String,Seq[String]] and want to basically covert it to a Map[String,String] since I know the sequence will only have one value.
Someone else already mentioned mapValues, but if I were you I would do it like this:
scala> val m = Map(1 -> Seq(1), 2 -> Seq(2))
m: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(1), 2 -> List(2))
scala> m.map { case (k,Seq(v)) => (k,v) }
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)
Two reasons:
The mapValues method produces a view of the result Map, meaning that the function will be recomputed every time you access an element. Unless you plan on accessing each element exactly once, or you only plan on accessing a very small percentage of them, you don't want that recomputation to take place.
Using a case with (k,Seq(v)) ensures that an exception will be thrown if the function ever sees a Seq that doesn't contain exactly one element. Using _(0) or _.head will throw an exception if there are zero elements, but will not complain if you had more than one, which will likely result in mysterious bugs later on when things go missing without errors.
You can use mapValues().
scala> Map("a" -> Seq("aaa"), "b" -> Seq("bbb"))
res0: scala.collection.immutable.Map[java.lang.String,Seq[java.lang.String]] = M
ap(a -> List(aaa), b -> List(bbb))
scala> res0.mapValues(_(0))
res1: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(a
-> aaa, b -> bbb)
I think I got it by doing the following:
mymap.flatMap(x => Map(x._1 -> x._2.head))
Yet another suggestion:
m mapValues { _.mkString }
This one's agnostic to whether the Seq has multiple elements -- it'll just concatenate all the strings together. If you're concerned about the recomputation of each value, you can make it happen up-front:
(m mapValues { _.mkString }).view.force

Extract elements from one list that aren't in another

Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.
Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")
retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)
The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))
val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)