Cartesian product of two lists - scala

Given a map where a digit is associated to several characters
scala> val conversion = Map("0" -> List("A", "B"), "1" -> List("C", "D"))
conversion: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] =
Map(0 -> List(A, B), 1 -> List(C, D))
I want to generate all possible character sequences based on a sequence of digits. Examples:
"00" -> List("AA", "AB", "BA", "BB")
"01" -> List("AC", "AD", "BC", "BD")
I can do this with for comprehensions
scala> val number = "011"
number: java.lang.String = 011
Create a sequence of possible characters per index
scala> val values = number map { case c => conversion(c.toString) }
values: scala.collection.immutable.IndexedSeq[List[java.lang.String]] =
Vector(List(A, B), List(C, D), List(C, D))
Generate all the possible character sequences
scala> for {
| a <- values(0)
| b <- values(1)
| c <- values(2)
| } yield a+b+c
res13: List[java.lang.String] = List(ACC, ACD, ADC, ADD, BCC, BCD, BDC, BDD)
Here things get ugly and it will only work for sequences of three digits. Is there any way to achieve the same result for any sequence length?

The following suggestion is not using a for-comprehension. But I don't think it's a good idea after all, because as you noticed you'd be tied to a certain length of your cartesian product.
scala> def cartesianProduct[T](xss: List[List[T]]): List[List[T]] = xss match {
| case Nil => List(Nil)
| case h :: t => for(xh <- h; xt <- cartesianProduct(t)) yield xh :: xt
| }
cartesianProduct: [T](xss: List[List[T]])List[List[T]]
scala> val conversion = Map('0' -> List("A", "B"), '1' -> List("C", "D"))
conversion: scala.collection.immutable.Map[Char,List[java.lang.String]] = Map(0 -> List(A, B), 1 -> List(C, D))
scala> cartesianProduct("01".map(conversion).toList)
res9: List[List[java.lang.String]] = List(List(A, C), List(A, D), List(B, C), List(B, D))
Why not tail-recursive?
Note that above recursive function is not tail-recursive. This isn't a problem, as xss will be short unless you have a lot of singleton lists in xss. This is the case, because the size of the result grows exponentially with the number of non-singleton elements of xss.

I could come up with this:
val conversion = Map('0' -> Seq("A", "B"), '1' -> Seq("C", "D"))
def permut(str: Seq[Char]): Seq[String] = str match {
case Seq() => Seq.empty
case Seq(c) => conversion(c)
case Seq(head, tail # _*) =>
val t = permut(tail)
conversion(head).flatMap(pre => t.map(pre + _))
}
permut("011")

I just did that as follows and it works
def cross(a:IndexedSeq[Tree], b:IndexedSeq[Tree]) = {
a.map (p => b.map( o => (p,o))).flatten
}
Don't see the $Tree type that am dealing it works for arbitrary collections too..

Related

MapReduce example in Scala

I have this problem in Scala for a Homework.
The idea I have had but have not been able to successfully implement is
Iterate through each word, if the word is basketball, take the next word and add it to a map. Reduce by key, and sort from highest to lowest.
Unfortunately I do not know how to take the next next word in a list of words.
For example, i would like to do something like this:
val lines = spark.textFile("basketball_words_only.txt") // process lines in file
// split into individual words
val words = lines.flatMap(line => line.split(" "))
var listBuff = new ListBuffer[String]() // a list Buffer to hold each following word
val it = Iterator(words)
while (it.hasNext) {
listBuff += it.next().next() // <-- this is what I would like to do
}
val follows = listBuff.map(word => (word, 1))
val count = follows.reduceByKey((x, y) => x + y) // another issue as I cannot reduceByKey with a listBuffer
val sort = count.sortBy(_._2,false,1)
val result2 = sort.collect()
for (i <- 0 to result2.length - 1) {
printf("%s follows %d times\n", result1(2)._1, result2(i)._2);
}
Any help would be appreciated
You can get the max count for the first word in all distinct word pairs in a few steps:
Strip punctuations, split content into words which get lowercased
Use sliding(2) to create array of word pairs
Use reduceByKey to count occurrences of distinct word pairs
Use reduceByKey again to capture word pairs with max count for the first word
Sample code as follows:
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.rdd.RDDFunctions._
val wordPairCountRDD = sc.textFile("/path/to/textfile").
flatMap( _.split("""[\s,.;:!?]+""") ).
map( _.toLowerCase ).
sliding(2).
map{ case Array(w1, w2) => ((w1, w2), 1) }.
reduceByKey( _ + _ )
val wordPairMaxRDD = wordPairCountRDD.
map{ case ((w1, w2), c) => (w1, (w2, c)) }.
reduceByKey( (acc, x) =>
if (x._2 > acc._2) (x._1, x._2) else acc
).
map{ case (w1, (w2, c)) => ((w1, w2), c) }
[UPDATE]
If you only need the word pair counts to be sorted (in descending order) per your revised requirement, you can skip step 4 and use sortBy on wordPairCountRDD:
wordPairCountRDD.
sortBy( z => (z._2, z._1._1, z._1._2), ascending = false )
This is from https://spark.apache.org/examples.html:
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
As you can see it counts the occurrence of individual words because the key-value pairs are of the form (word, 1). Which part do you need to change to count combinations of words?
This might help you: http://daily-scala.blogspot.com/2009/11/iteratorsliding.html
Well, my text uses "b" instead of "basketball" and "a", "c" for other words.
scala> val r = scala.util.Random
scala> val s = (1 to 20).map (i => List("a", "b", "c")(r.nextInt (3))).mkString (" ")
s: String = c a c b a b a a b c a b b c c a b b c b
The result is gained by split, sliding, filter, map, groupBy, map and sortBy:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}.toList.sortBy (-_._2)
counts: List[(Char, Int)] = List((c,3), (b,2), (a,2))
In small steps, sliding:
scala> val counts = s.split (" ").sliding (2).toList
counts: List[Array[String]] = List(Array(c, a), Array(a, c), Array(c, b), Array(b, a), Array(a, b), Array(b, a), Array(a, a), Array(a, b), Array(b, c), Array(c, a), Array(a, b), Array(b, b), Array(b, c), Array(c, c), Array(c, a), Array(a, b), Array(b, b), Array(b, c), Array(c, b))
filter:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").toList
counts: List[Array[String]] = List(Array(b, a), Array(b, a), Array(b, c), Array(b, b), Array(b, c), Array(b, b), Array(b, c))
map (_(1)) (Array access element 2)
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList
counts: List[String] = List(a, a, c, b, c, b, c)
groupBy (_(0))
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0))
counts: scala.collection.immutable.Map[Char,List[String]] = Map(b -> List(b, b), a -> List(a, a), c -> List(c, c, c))
to size of List:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}
counts: scala.collection.immutable.Map[Char,Int] = Map(b -> 2, a -> 2, c -> 3)
Finally sort descending:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}.toList.sortBy (-_._2)
counts: List[(Char, Int)] = List((c,3), (b,2), (a,2))

Functional way to split a map of lists into a list of maps

I'm a bit stuck on this problem. I feel like I'm "thinking backwards" and it's confusing me a bit.
I have a Map[Long, Seq[String]] which I would like to convert into a Seq[Map[Long, String]]. Going the other direction is rather simple, as we can just group elements together, however, I'm not sure how to split this apart in a functional manner.
So,
val x = Map(1 -> List("a","b","c"), 2 -> List("d", "e"), 3 -> List("f"))
should become
List(Map(1 -> "a", 2 -> "d", 3 -> "f"), Map(1 -> "b", 2 -> "e"), Map(1 -> "c"))
I was thinking along the lines of using x.partition and then recursing on each resulting tuple, but I'm not really sure what I'd partition on :/
I'm writing in scala, but any functional answer is welcome (language agnostic).
In Haskell:
> import qualified Data.Map as M
> import Data.List
> m = M.fromList [(1,["a","b","c"]), (2,["d","e"]), (3,["f"])]
> map M.fromList . transpose . map (\(i,xs) -> map ((,) i) xs) . M.toList $ m
[fromList [(1,"a"),(2,"d"),(3,"f")],fromList [(1,"b"),(2,"e")],fromList [(1,"c")]]
M.toList and M.fromList convert a map to a list of association pairs, and back.
map ((,) i) xs is the same as [(i,x) | x<-xs], adding (i,...) to each element.
transpose exchanges the "rows" and "columns" in a list of lists, similarly to a matrix transposition.
Borrowing a neat transpose method from this SO answer, here's another way to do it:
def transpose[A](xs: List[List[A]]): List[List[A]] = xs.filter(_.nonEmpty) match {
case Nil => Nil
case ys: List[List[A]] => ys.map{ _.head }::transpose(ys.map{ _.tail })
}
transpose[(Int, String)](
x.toList.map{ case (k, v) => v.map( (k, _) ) }
).map{ _.toMap }
// Res1: List[scala.collection.immutable.Map[Int,String]] = List(
// Map(1 -> a, 2 -> d, 3 -> f), Map(1 -> b, 2 -> e), Map(1 -> c)
// )
In Scala:
val result = x.toList
.flatMap { case (k, vs) => vs.zipWithIndex.map { case (v, i) => (i, k, v) } } // flatten and add indices to inner lists
.groupBy(_._1) // group by index
.toList.sortBy(_._1).map(_._2) // can be replaced with .values if order isn't important
.map(_.map { case (_, k, v) => (k, v) }.toMap) // remove indices
Here is my answer in OCaml (using just Standard Library):
module M = Map.Make(struct type t = int let compare = compare end)
let of_bindings b =
List.fold_right (fun (k, v) m -> M.add k v m) b M.empty
let splitmap m =
let split1 (k, v) (b1, b2) =
match v with
| [] -> (b1, b2)
| [x] -> ((k, x) :: b1, b2)
| h :: t -> ((k, h) :: b1, (k, t) :: b2)
in
let rec loop sofar m =
if M.cardinal m = 0 then
List.rev sofar
else
let (b1, b2) =
List.fold_right split1 (M.bindings m) ([], [])
in
let (ms, m') = (of_bindings b1, of_bindings b2) in
loop (ms :: sofar) m'
in
loop [] m
It works for me:
# let m = of_bindings [(1, ["a"; "b"; "c"]); (2, ["d"; "e"]); (3, ["f"])];;
val m : string list M.t = <abstr>
# let ms = splitmap m;;
val ms : string M.t list = [<abstr>; <abstr>; <abstr>]
# List.map M.bindings ms;;
- : (M.key * string) list list =
[[(1, "a"); (2, "d"); (3, "f")]; [(1, "b"); (2, "e")]; [(1, "c")]]

Does Scala have a statement equivalent to ML's "as" construct?

In ML, one can assign names for each element of a matched pattern:
fun findPair n nil = NONE
| findPair n (head as (n1, _))::rest =
if n = n1 then (SOME head) else (findPair n rest)
In this code, I defined an alias for the first pair of the list and matched the contents of the pair. Is there an equivalent construct in Scala?
You can do variable binding with the # symbol, e.g.:
scala> val wholeList # List(x, _*) = List(1,2,3)
wholeList: List[Int] = List(1, 2, 3)
x: Int = 1
I'm sure you'll get a more complete answer later as I'm not sure how to write it recursively like your example, but maybe this variation would work for you:
scala> val pairs = List((1, "a"), (2, "b"), (3, "c"))
pairs: List[(Int, String)] = List((1,a), (2,b), (3,c))
scala> val n = 2
n: Int = 2
scala> pairs find {e => e._1 == n}
res0: Option[(Int, String)] = Some((2,b))
OK, next attempt at direct translation. How about this?
scala> def findPair[A, B](n: A, p: List[Tuple2[A, B]]): Option[Tuple2[A, B]] = p match {
| case Nil => None
| case head::rest if head._1 == n => Some(head)
| case _::rest => findPair(n, rest)
| }
findPair: [A, B](n: A, p: List[(A, B)])Option[(A, B)]

How to generate the power set of a set in Scala

I have a Set of items of some type and want to generate its power set.
I searched the web and couldn't find any Scala code that adresses this specific task.
This is what I came up with. It allows you to restrict the cardinality of the sets produced by the length parameter.
def power[T](set: Set[T], length: Int) = {
var res = Set[Set[T]]()
res ++= set.map(Set(_))
for (i <- 1 until length)
res = res.map(x => set.map(x + _)).flatten
res
}
This will not include the empty set. To accomplish this you would have to change the last line of the method simply to res + Set()
Any suggestions how this can be accomplished in a more functional style?
Looks like no-one knew about it back in July, but there's a built-in method: subsets.
scala> Set(1,2,3).subsets foreach println
Set()
Set(1)
Set(2)
Set(3)
Set(1, 2)
Set(1, 3)
Set(2, 3)
Set(1, 2, 3)
Notice that if you have a set S and another set T where T = S ∪ {x} (i.e. T is S with one element added) then the powerset of T - P(T) - can be expressed in terms of P(S) and x as follows:
P(T) = P(S) ∪ { p ∪ {x} | p ∈ P(S) }
That is, you can define the powerset recursively (notice how this gives you the size of the powerset for free - i.e. adding 1-element doubles the size of the powerset). So, you can do this tail-recursively in scala as follows:
scala> def power[A](t: Set[A]): Set[Set[A]] = {
| #annotation.tailrec
| def pwr(t: Set[A], ps: Set[Set[A]]): Set[Set[A]] =
| if (t.isEmpty) ps
| else pwr(t.tail, ps ++ (ps map (_ + t.head)))
|
| pwr(t, Set(Set.empty[A])) //Powerset of ∅ is {∅}
| }
power: [A](t: Set[A])Set[Set[A]]
Then:
scala> power(Set(1, 2, 3))
res2: Set[Set[Int]] = Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3), Set(1, 2))
It actually looks much nicer doing the same with a List (i.e. a recursive ADT):
scala> def power[A](s: List[A]): List[List[A]] = {
| #annotation.tailrec
| def pwr(s: List[A], acc: List[List[A]]): List[List[A]] = s match {
| case Nil => acc
| case a :: as => pwr(as, acc ::: (acc map (a :: _)))
| }
| pwr(s, Nil :: Nil)
| }
power: [A](s: List[A])List[List[A]]
Here's one of the more interesting ways to write it:
import scalaz._, Scalaz._
def powerSet[A](xs: List[A]) = xs filterM (_ => true :: false :: Nil)
Which works as expected:
scala> powerSet(List(1, 2, 3)) foreach println
List(1, 2, 3)
List(1, 2)
List(1, 3)
List(1)
List(2, 3)
List(2)
List(3)
List()
See for example this discussion thread for an explanation of how it works.
(And as debilski notes in the comments, ListW also pimps powerset onto List, but that's no fun.)
Use the built-in combinations function:
val xs = Seq(1,2,3)
(0 to xs.size) flatMap xs.combinations
// Vector(List(), List(1), List(2), List(3), List(1, 2), List(1, 3), List(2, 3),
// List(1, 2, 3))
Note, I cheated and used a Seq, because for reasons unknown, combinations is defined on SeqLike. So with a set, you need to convert to/from a Seq:
val xs = Set(1,2,3)
(0 to xs.size).flatMap(xs.toSeq.combinations).map(_.toSet).toSet
//Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3),
//Set(1, 2))
Can be as simple as:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] =
xs.foldLeft(Seq(Seq[A]())) {(sets, set) => sets ++ sets.map(_ :+ set)}
Recursive implementation:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] = {
def go(xsRemaining: Seq[A], sets: Seq[Seq[A]]): Seq[Seq[A]] = xsRemaining match {
case Nil => sets
case y :: ys => go(ys, sets ++ sets.map(_ :+ y))
}
go(xs, Seq[Seq[A]](Seq[A]()))
}
All the other answers seemed a bit complicated, here is a simple function:
def powerSet (l:List[_]) : List[List[Any]] =
l match {
case Nil => List(List())
case x::xs =>
var a = powerSet(xs)
a.map(n => n:::List(x)):::a
}
so
powerSet(List('a','b','c'))
will produce the following result
res0: List[List[Any]] = List(List(c, b, a), List(b, a), List(c, a), List(a), List(c, b), List(b), List(c), List())
Here's another (lazy) version... since we're collecting ways of computing the power set, I thought I'd add it:
def powerset[A](s: Seq[A]) =
Iterator.range(0, 1 << s.length).map(i =>
Iterator.range(0, s.length).withFilter(j =>
(i >> j) % 2 == 1
).map(s)
)
Here's a simple, recursive solution using a helper function:
def concatElemToList[A](a: A, list: List[A]): List[Any] = (a,list) match {
case (x, Nil) => List(List(x))
case (x, ((h:List[_]) :: t)) => (x :: h) :: concatElemToList(x, t)
case (x, (h::t)) => List(x, h) :: concatElemToList(x, t)
}
def powerSetRec[A] (a: List[A]): List[Any] = a match {
case Nil => List()
case (h::t) => powerSetRec(t) ++ concatElemToList(h, powerSetRec (t))
}
so the call of
powerSetRec(List("a", "b", "c"))
will give the result
List(List(c), List(b, c), List(b), List(a, c), List(a, b, c), List(a, b), List(a))

Reverse / transpose a one-to-many map in Scala

What is the best way to turn a Map[A, Set[B]] into a Map[B, Set[A]]?
For example, how do I turn a
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
into a
Map("a" -> Set(1),
"b" -> Set(1, 2),
"c" -> Set(2, 3),
"d" -> Set(3))
(I'm using immutable collections only here. And my real problem has nothing to do with strings or integers. :)
with help from aioobe and Moritz:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_)(v)))).toMap
It's a bit more readable if you explicitly call contains:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_).contains(v)))).toMap
Best I've come up with so far is
val intToStrs = Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
def mappingFor(key: String) =
intToStrs.keys.filter(intToStrs(_) contains key).toSet
val newKeys = intToStrs.values.flatten
val inverseMap = newKeys.map(newKey => (newKey -> mappingFor(newKey))).toMap
Or another one using folds:
def reverse2[A,B](m:Map[A,Set[B]])=
m.foldLeft(Map[B,Set[A]]()){case (r,(k,s)) =>
s.foldLeft(r){case (r,e)=>
r + (e -> (r.getOrElse(e, Set()) + k))
}
}
Here's a one statement solution
orginalMap
.map{case (k, v)=>value.map{v2=>(v2,k)}}
.flatten
.groupBy{_._1}
.transform {(k, v)=>v.unzip._2.toSet}
This bit rather neatly (*) produces the tuples needed to construct the reverse map
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
.map{case (k, v)=>v.map{v2=>(v2,k)}}.flatten
produces
List((a,1), (b,1), (b,2), (c,2), (c,3), (d,3))
Converting it directly to a map overwrites the values corresponding to duplicate keys though
Adding .groupBy{_._1} gets this
Map(c -> List((c,2), (c,3)),
a -> List((a,1)),
d -> List((d,3)),
b -> List((b,1), (b,2)))
which is closer. To turn those lists into Sets of the second half of the pairs.
.transform {(k, v)=>v.unzip._2.toSet}
gives
Map(c -> Set(2, 3), a -> Set(1), d -> Set(3), b -> Set(1, 2))
QED :)
(*) YMMV
A simple, but maybe not super-elegant solution:
def reverse[A,B](m:Map[A,Set[B]])={
var r = Map[B,Set[A]]()
m.keySet foreach { k=>
m(k) foreach { e =>
r = r + (e -> (r.getOrElse(e, Set()) + k))
}
}
r
}
The easiest way I can think of is:
// unfold values to tuples (v,k)
// for all values v in the Set referenced by key k
def vk = for {
(k,vs) <- m.iterator
v <- vs.iterator
} yield (v -> k)
// fold iterator back into a map
(Map[String,Set[Int]]() /: vk) {
// alternative syntax: vk.foldLeft(Map[String,Set[Int]]()) {
case (m,(k,v)) if m contains k =>
// Map already contains a Set, so just add the value
m updated (k, m(k) + v)
case (m,(k,v)) =>
// key not in the map - wrap value in a Set and return updated map
m updated (k, Set(v))
}