I have several hash maps I need to generate combinations of:
A: [x->1, y->2,...]
B: [x->1, a->4,...]
C: [x->1, b->5,...]
...
some possible combinations:
A+B; A; A+C; A+B+C...
For each combination I need to produce the joint hashmap and perform an operation of key-value pairs with the same key in both hash maps.
All I could come up with was using a binary counter and mapping the digits to the respective hash map:
001 -> A
101 -> A,C
...
Although this solution works, the modulo operations are time consuming when I have more than 100 hashmaps. I'm new to Scala but I believe that there must be a better way to achieve this?
Scala sequences have a combinations function. This gives you combinations for choosing a certain number from the total. From you question it looks like you want to choose all different numbers, so your code in theory could be something like:
val elements = List('a, 'b, 'c, 'd)
(1 to elements.size).flatMap(elements.combinations).toList
/* List[List[Symbol]] = List(List('a), List('b), List('c), List('d), List('a, 'b),
List('a, 'c), List('a, 'd), List('b, 'c), List('b, 'd), List('c, 'd),
List('a, 'b, 'c), List('a, 'b, 'd), List('a, 'c, 'd), List('b, 'c, 'd),
List('a, 'b, 'c, 'd)) */
But as pointed out, all combinations will be too many. With 100 elements, choosing 2 from 100 will give you 4950 combinations, 3 will give you 161700, 4 will give you 3921225, and 5 will likely give you an overflow error. So if you just keep the argument for combinations to 2 or 3 you should be OK.
Well, think of how many combinations there are of your maps: suppose you have N maps.
(the maps individually) + (pairs of maps) + (triples of maps) + ... + (all the maps)
Which is of course
(N choose 1) + (N choose 2) + ... + (N choose N-1)
Where N choose M is defined as:
N! / (M! * (N-M)!)
For N=100 and M=50, N choose M is over 100,000,000,000,000,000,000,000,000,000 so "time consuming" really doesn't do justice to the problem!
Oh, and that assumes that ordering is irrelevant - that is that A + B is equal to B + A. If that assumption is wrong, you are faced with significantly more permutations than there are particles in the visible universe
Why scala might help with this problem: its parallel collections framework!
Following up on your idea to use integers to represent bitsets. Are you using the actual modulo operator? You can also use bitmasks to check whether some number is in a bitset. (Note that on the JVM they are both one instruction operations, so who knows what's happening there.)
Another potential major improvement is that, since your operation on the range of the maps is associative, you can save computations by reusing the previous ones. For example, if you combine A,B,C but have already combined, say, A,C into AC, for instance, you can just combine B with AC.
The following code implements both ideas:
type MapT = Map[String,Int] // for conciseness later
#scala.annotation.tailrec
def pow2(i : Int, acc : Int = 1) : Int = {
// for reasonably sized ints...
if(i <= 0) acc else pow2(i - 1, 2 * acc)
}
// initial set of maps
val maps = List(
Map("x" -> 1, "y" -> 2),
Map("x" -> 1, "a" -> 4),
Map("x" -> 1, "b" -> 5)
)
val num = maps.size
// any 'op' that's commutative will do
def combine(m1 : MapT, m2 : MapT)(op : (Int,Int)=>Int) : MapT =
((m1.keySet intersect m2.keySet).map(k => (k -> op(m1(k), m2(k))))).toMap
val numCombs = pow2(num)
// precomputes all required powers of two
val masks : Array[Int] = (0 until num).map(pow2(_)).toArray
// this array will be filled, à la Dynamic Algorithm
val results : Array[MapT] = Array.fill(numCombs)(Map.empty)
// fill in the results for "combinations" of one map
for((m,i) <- maps.zipWithIndex) { results(masks(i)) = m }
val zeroUntilNum = (0 until num).toList
for(n <- 2 to num; (x :: xs) <- zeroUntilNum.combinations(n)) {
// The trick here is that we already know the result of combining the maps
// indexed by xs, we just need to compute the corresponding bitmask and get
// the result from the array later.
val known = xs.foldLeft(0)((a,i) => a | masks(i))
val xm = masks(x)
results(known | xm) = combine(results(known), results(xm))(_ + _)
}
If you print the resulting array, you get:
0 -> Map()
1 -> Map(x -> 1, y -> 2)
2 -> Map(x -> 1, a -> 4)
3 -> Map(x -> 2)
4 -> Map(x -> 1, b -> 5)
5 -> Map(x -> 2)
6 -> Map(x -> 2)
7 -> Map(x -> 3)
Of course, like everyone else pointed out, it will blow up eventually as the number of input maps increases.
Related
I'm looking for an elegant way to combine every element of a Seq with the rest for a large collection.
Example: Seq(1,2,3).someMethod should produce something like
Iterator(
(1,Seq(2,3)),
(2,Seq(1,3)),
(3,Seq(1,2))
)
Order of elements doesn't matter. It doesn't have to be a tuple, a Seq(Seq(1),Seq(2,3)) is also acceptable (although kinda ugly).
Note the emphasis on large collection (which is why my example shows an Iterator).
Also note that this is not combinations.
Ideas?
Edit:
In my use case, the numbers are expected to be unique. If a solution can eliminate the dupes, that's fine, but not at additional cost. Otherwise, dupes are acceptable.
Edit 2: In the end, I went with a nested for-loop, and skipped the case when i == j. No new collections were created. I upvoted the solutions that were correct and simple ("simplicity is the ultimate sophistication" - Leonardo da Vinci), but even the best ones are quadratic just by the nature of the problem, and some create intermediate collections by usage of ++ that I wanted to avoid because the collection I'm dealing with has close to 50000 elements, 2.5 billion when quadratic.
The following code has constant runtime (it does everything lazily), but accessing every element of the resulting collections has constant overhead (when accessing each element, an index shift must be computed every time):
def faceMap(i: Int)(j: Int) = if (j < i) j else j + 1
def facets[A](simplex: Vector[A]): Seq[(A, Seq[A])] = {
val n = simplex.size
(0 until n).view.map { i => (
simplex(i),
(0 until n - 1).view.map(j => simplex(faceMap(i)(j)))
)}
}
Example:
println("Example: facets of a 3-dimensional simplex")
for ((i, v) <- facets((0 to 3).toVector)) {
println(i + " -> " + v.mkString("[", ",", "]"))
}
Output:
Example: facets of a 3-dimensional simplex
0 -> [1,2,3]
1 -> [0,2,3]
2 -> [0,1,3]
3 -> [0,1,2]
This code expresses everything in terms of simplices, because "omitting one index" corresponds exactly to the face maps for a combinatorially described simplex. To further illustrate the idea, here is what the faceMap does:
println("Example: how `faceMap(3)` shifts indices")
for (i <- 0 to 5) {
println(i + " -> " + faceMap(3)(i))
}
gives:
Example: how `faceMap(3)` shifts indices
0 -> 0
1 -> 1
2 -> 2
3 -> 4
4 -> 5
5 -> 6
The facets method uses the faceMaps to create a lazy view of the original collection that omits one element by shifting the indices by one starting from the index of the omitted element.
If I understand what you want correctly, in terms of handling duplicate values (i.e., duplicate values are to be preserved), here's something that should work. Given the following input:
import scala.util.Random
val nums = Vector.fill(20)(Random.nextInt)
This should get you what you need:
for (i <- Iterator.from(0).take(nums.size)) yield {
nums(i) -> (nums.take(i) ++ nums.drop(i + 1))
}
On the other hand, if you want to remove dups, I'd convert to Sets:
val numsSet = nums.toSet
for (num <- nums) yield {
num -> (numsSet - num)
}
seq.iterator.map { case x => x -> seq.filter(_ != x) }
This is quadratic, but I don't think there is very much you can do about that, because in the end of the day, creating a collection is linear, and you are going to need N of them.
import scala.annotation.tailrec
def prems(s : Seq[Int]):Map[Int,Seq[Int]]={
#tailrec
def p(prev: Seq[Int],s :Seq[Int],res:Map[Int,Seq[Int]]):Map[Int,Seq[Int]] = s match {
case x::Nil => res+(x->prev)
case x::xs=> p(x +: prev,xs, res+(x ->(prev++xs)))
}
p(Seq.empty[Int],s,Map.empty[Int,Seq[Int]])
}
prems(Seq(1,2,3,4))
res0: Map[Int,Seq[Int]] = Map(1 -> List(2, 3, 4), 2 -> List(1, 3, 4), 3 -> List(2, 1, 4),4 -> List(3, 2, 1))
I think you are looking for permutations. You can map the resulting lists into the structure you are looking for:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toList
res49: List[(Int, Seq[Int])] = List((1,List(2, 3)), (1,List(3, 2)), (2,List(1, 3)), (2,List(3, 1)), (3,List(1, 2)), (3,List(2, 1)))
Note that the final toList call is only there to trigger the evaluation of the expressions; otherwise, the result is an iterator as you asked for.
In order to get rid of the duplicate heads, toMap seems like the most straight-forward approach:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toMap
res50: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(3, 2), 2 -> List(3, 1), 3 -> List(2, 1))
Here is code I wrote to average coordinate values contained within the values of a Map :
val averaged = Map((2,10) -> List((2.0,11.0), (5.0,8.0)))
//> averaged : scala.collection.immutable.Map[(Int, Int),List[(Double, Double)
//| ]] = Map((2,10) -> List((2.0,11.0), (5.0,8.0)))
averaged.mapValues(m => {
val s1 = m.map(m => m._1).sum
val s2 = m.map(m => m._2).sum
(s1 / m.size , s2 / m.size)
}) //> res0: scala.collection.immutable.Map[(Int, Int),(Double, Double)] = Map((2,
//| 10) -> (3.5,9.5))
This code works as expected but the mapValues function requires number of passes equals to length of the List. Is there a more idiomatic method of achieving same using Scala ?
If I'm understanding your question correctly, you are asking if it is possible to avoid the traversal of m on each access. The mapValues method returns a view of a Map, meaning that there will be repeated work on access. To avoid that, just use map instead:
val averaged = Map((2, 10) -> List((2.0, 11.0), (5.0, 8.0)))
val result = averaged.map {
case (key, m) =>
val (s1, s2) = m.unzip
(s1.sum / m.size, s2.sum / m.size)
}
println(result)
// Map((2,10) -> (3.5,9.5))
Using unzip additionally means that the code won't traverse m more than once.
Beginner here.
Sorry but I did'nt found an answer so I ask the question here.
I want to know how to do this by using the Scala API :
(blabla))( -> List(('(',2),(')',2))
Currently I have this :
"(blabla))(".toCharArray.toList.filter(p => (p == '(' || p == ')')).sortBy(x => x)
Output :
List((, (, ), ))
Now how can I map each character to the tuples I describe ?
Example for a general case :
"t:e:s:t" -> List(('t',2),('e',1),('s',1),(':',3))
Thanks
val source = "ok:ok:k::"
val chars = source.toList
val shorter = chars.distinct.map( c => (c, chars.count(_ == c)))
//> shorter : List[(Char, Int)] = List((o,2), (k,3), (:,4))
Classic groupBy . mapValues use case:
scala> val str = "ok:ok:k::"
str: String = ok:ok:k::
scala> str.groupBy(identity).mapValues(_.size) // identity <=> (x => x)
res0: scala.collection.immutable.Map[Char,Int] = Map(k -> 3, : -> 4, o -> 2)
I like sschaef's solution very much, but I was wondering if anyone could weigh in on how efficient that solution is compared to this one:
scala> val str = "ok:ok:k::"
str: String = ok:ok:k::
scala> str.foldLeft(Map[Char,Int]().withDefaultValue(0))((current, c) => current.updated(c, current(c) + 1))
res29: scala.collection.immutable.Map[Char,Int] = Map(o -> 2, k -> 3, : -> 4)
I think my solution is slower. If we have n total occurrences and m unique values:
My solution: we have the fold left over all occurrences or n. For each of these occurrences we look up once to find the current count and then again to create the updated the map. I'm assuming that the creating of the updated map is constant time.
Total complexity: n * 2m or O(n*m)
sschaef's solution: we have the groupBy which I'm assuming just adds entries onto a list without checking the map (so for all values this would be a constant time look up plus appending to the list) so n. Then for the mapValues it probably iterates over the unique values and grabs the size for each key's list. I'm assuming that getting the size of each entry's list is constant time.
Total complexity: O(n + m)
Does this seem correct or am I mistaken in my assumptions?
I have a Map like:
Map("product1" -> List(Product1ObjectTypes), "product2" -> List(Product2ObjectTypes))
where ProductObjectType has a field usage. Based on the other field (counter) I have to update all ProductXObjectTypes.
The issue is that this update depends on previous ProductObjectType, and I can't find a way to get previous item when iterating over mapValues of this map. So basically, to update current usage I need: CurrentProduct1ObjectType.counter - PreviousProduct1ObjectType.counter.
Is there any way to do this?
I started it like:
val reportsWithCalculatedUsage =
reportsRefined.flatten.flatten.toList.groupBy(_._2.product).mapValues(f)
but I don't know in mapValues how to access previous list item.
I'm not sure if I understand completely, but if you want to update the values inside the lists based on their predecessors, this can generally be done with a fold:
case class Thing(product: String, usage: Int, counter: Int)
val m = Map(
"product1" -> List(Thing("Fnord", 10, 3), Thing("Meep", 0, 5))
//... more mappings
)
//> Map(product1 -> List(Thing(Fnord,10,3), Thing(Meep,0,5)))
m mapValues { list => list.foldLeft(List[Thing]()){
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (usage = head.usage + head.counter - previous.counter)
current :: tail
} reverse }
//> Map(product1 -> List(Thing(Fnord,10,3), Thing(Meep,2,5)))
Note that regular map is an unordered collection, you need to use something like TreeMap to have predictable order of iteration.
Anyways, from what I understand you want to get pairs of all values in a map. Try something like this:
scala> val map = Map(1 -> 2, 2 -> 3, 3 -> 4)
scala> (map, map.tail).zipped.foreach((t1, t2) => println(t1 + " " + t2))
(1,2) (2,3)
(2,3) (3,4)
Suppose you have
val docs = List(List("one", "two"), List("two", "three"))
where e.g. List("one", "two") represents a document containing terms "one" and "two", and you want to build a map with the document frequency for every term, i.e. in this case
Map("one" -> 1, "two" -> 2, "three" -> 1)
How would you do that in Scala? (And in an efficient way, assuming a much larger dataset.)
My first Java-like thought is to use a mutable map:
val freqs = mutable.Map.empty[String,Int]
for (doc <- docs)
for (term <- doc)
freqs(term) = freqs.getOrElse(term, 0) + 1
which works well enough but I'm wondering how you could do that in a more "functional" way, without resorting to a mutable map?
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity function just means x => x.
docs.flatten.foldLeft(new Map.WithDefault(Map[String,Int](),Function.const(0))){
(m,x) => m + (x -> (1 + m(x)))}
What a train wreck!
[Edit]
Ah, that's better!
docs.flatten.foldLeft(Map[String,Int]() withDefaultValue 0){
(m,x) => m + (x -> (1 + m(x)))}
Starting Scala 2.13, after flattening the list of lists, we can use groupMapReduce which is a one-pass alternative to groupBy/mapValues:
// val docs = List(List("one", "two"), List("two", "three"))
docs.flatten.groupMapReduce(identity)(_ => 1)(_ + _)
// Map[String,Int] = Map("one" -> 1, "three" -> 1, "two" -> 2)
This:
flattens the List of Lists as a List
groups list elements (identity) (group part of groupMapReduce)
maps each grouped value occurrence to 1 (_ => 1) (map part of groupMapReduce)
reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).