Cartesian Product and Map Combined in Scala - scala

This is a followup to: Expand a Set of Sets of Strings into Cartesian Product in Scala
The idea is you want to take:
val sets = Set(Set("a","b","c"), Set("1","2"), Set("S","T"))
and get back:
Set("a&1&S", "a&1&T", "a&2&S", ..., "c&2&T")
A general solution is:
def combine[A](f:(A, A) => A)(xs:Iterable[Iterable[A]]) =
xs.reduceLeft { (x, y) => x.view.flatMap {a => y.map(f(a, _)) } }
used as follows:
val expanded = combine{(x:String, y:String) => x + "&" + y}(sets).toSet
Theoretically, there should be a way to take input of type Set[Set[A]] and get back a Set[B]. That is, to convert the type while combining the elements.
An example usage would be to take in sets of strings (as above) and output the lengths of their concatenation. The f function in combine would something of the form:
(a:Int, b:String) => a + b.length
I was not able to come up with an implementation. Does anyone have an answer?

If you really want your combiner function to do the mapping, you can use a fold but as Craig pointed out you'll have to provide a seed value:
def combine[A, B](f: B => A => B, zero: B)(xs: Iterable[Iterable[A]]) =
xs.foldLeft(Iterable(zero)) {
(x, y) => x.view flatMap { y map f(_) }
}
The fact that you need such a seed value follows from the combiner/mapper function type (B, A) => B (or, as a curried function, B => A => B). Clearly, to map the first A you encounter, you're going to need to supply a B.
You can make it somewhat simpler for callers by using a Zero type class:
trait Zero[T] {
def zero: T
}
object Zero {
implicit object IntHasZero extends Zero[Int] {
val zero = 0
}
// ... etc ...
}
Then the combine method can be defined as:
def combine[A, B : Zero](f: B => A => B)(xs: Iterable[Iterable[A]]) =
xs.foldLeft(Iterable(implicitly[Zero[B]].zero)) {
(x, y) => x.view flatMap { y map f(_) }
}
Usage:
combine((b: Int) => (a: String) => b + a.length)(sets)
Scalaz provides a Zero type class, along with a lot of other goodies for functional programming.

The problem that you're running into is that reduce(Left|Right) takes a function (A, A) => A which doesn't allow you to change the type. You want something more like foldLeft which takes (B, A) ⇒ B, allowing you to accumulate an output of a different type. folds need a seed value though, which can't be an empty collection here. You'd need to take xs apart into a head and tail, map the head iterable to be Iterable[B], and then call foldLeft with the mapped head, the tail, and some function (B, A) => B. That seems like more trouble than it's worth though, so I'd just do all the mapping up front.
def combine[A, B](f: (B, B) => B)(g: (A) => B)(xs:Iterable[Iterable[A]]) =
xs.map(_.map(g)).reduceLeft { (x, y) => x.view.flatMap {a => y.map(f(a, _)) } }
val sets = Set(Set(1, 2, 3), Set(3, 4), Set(5, 6, 7))
val expanded = combine{(x: String, y: String) => x + "&" + y}{(i: Int) => i.toString}(sets).toSet

Related

Trying to understand scanLeft on trees in Scala

I'm in a Coursera's course and I'm trying to understand the logic of the scanLeft on trees.
we have the following code:
Here we have as an input a tree (without intermediate values, only with values in Leafs) and returns a tree with intermediate values (with values in nodes)
def upsweep[A](t: Tree[A], f: (A,A) => A): TreeRes[A] = t match {
case Leaf(v) => LeafRes(v)
case Node(l, r) => {
val (tL, tR) = parallel(upsweep(l, f), upsweep(r, f))
NodeRes(tL, f(tL.res, tR.res), tR)
}
}
And the following code that given a tree with intermediate values (with values in nodes) returns a tree without intermediate values (a0 is the reduce of all elements left of the tree t).
def downsweep[A](t: TreeRes[A], a0: A, f : (A,A) => A): Tree[A] = t match {
case LeafRes(a) => Leaf(f(a0, a))
case NodeRes(l, _, r) => {
val (tL, tR) = parallel(downsweep[A](l, a0, f),
downsweep[A](r, f(a0, l.res), f))
Node(tL, tR) }
}
And finally the scanLeft code:
def scanLeft[A](t: Tree[A], a0: A, f: (A,A) => A): Tree[A] = {
val tRes = upsweep(t, f)
val scan1 = downsweep(tRes, a0, f)
prepend(a0, scan1)
}
And my question is, why is necesary to use the upsweep method before downsweep?
With upsweep we generate the intermediate values and later with downsweep we "remove" (we dont need to use) them.
Thanks in advance.
Actually look more closely at this part
case NodeRes(l, _, r) => {
val (tL, tR) = parallel(downsweep[A](l, a0, f),
downsweep[A](r, f(a0, l.res), f))
what is l.res? why it is so necessary to have it?(it is created at upsweep) I recommend you to draw on a piece of paper step by step what exactly is being done by this algorithm with easy function like (_ + _). Also it Is very good technique if you do not understand smth to do it, just go easy step by step and resolve it by yourself.

Scala: Elegant solution to iterate over (List, List)

I'm trying to come up with an "elegant" solution to iterate over two lists (pairs of values), and perform some tests on the resulting values.
Any ideas? Here's what I have so far, but I get "value filter is not a member of (List[Int], List[Int])," which surprises me I thought this would work. AND, I feel like there must be a much cleaner way to express this in Scala.
val accounts = random(count = 100, minimum = 1, maximum = GPDataTypes.integer._2)
val ids = random(count = 100, minimum = 1, maximum = GPDataTypes.integer._2)
for ((id, accountId) <- (ids, accounts)) {
val g = new GPGlimple(Some(id), Some(timestamp), accountId, false, false, 2)
println(g)
g.accountId mustEqual accountId
g.id mustEqual id
g.created.get must beLessThan(System.currentTimeMillis)
g.layers must beNone
g.version must be equalTo 2
}
The simplest solution for this is zip:
(ids zip accounts)
The documentation for zip says:
Returns a list formed from this list and another iterable collection by combining corresponding elements in pairs.
In other words, zip will return a list of tuples.
The zipped method could also work here:
(ids, accounts).zipped
You can find the zipped source for 2-tuples here. Note that this is made available through an enrichment of (T, U) where T is implicitly viewable as a TraversableLike and U is implicitly viewable as an IterableLike. That method returns a ZippedTraversable2, which is a minimal interface that encapsulates this sort of zipped return, and behaves more efficiently for large sequences by inhibiting the creation of intermediary collections. These are generally more performant because they use iterators internally, as can be seen in the source.
Note that the returns here are of different types, which could affect downstream behavior. One important difference is that the normal combinator methods on ZippedTraversable2 are slightly different that those on a Traversable of tuples. The methods on ZippedTraversable2 generally expect a function of 2 arguments, while those on a Traversable of tuples will expect a function with a single argument that is a tuple. For example, you can check this in the REPL for the foreach method:
val s1 = List(1, 2, 3)
val s2 = List('a', 'b', 'c')
(s1 -> s2).zipped.foreach _
// ((Int, Char) => Any) => Unit = <function1>
(s1 zip s2).foreach _
// (((Int, Char)) => Any) => Unit = <function1>
//Notice the extra parens here, signifying a method with a tuple argument
This difference means that you sometimes have to use a different syntax when using zip and zipped:
(s1 zip s2).map { x => x._1 + x._2 }
(s1, s2).zipped.map { x => x._1 + x._2 } //This won't work! The method shouldn't expect a tuple argument
//conversely
(s1, s2).zipped.map { (x, y) => x + y }
(s1 zip s2).map { (x, y) => x + y } //This won't work! The method shouldn't expect 2 arguments
//Added note: methods with 2 arguments can often use the more concise underscore notation:
(s1, s2).zipped.map { _ + _ }
Note that if you use the case notation, you're covered either way:
//case works for both syntaxes
(s1, s2).zipped.map { case (x, y) => x + y } \
(s1 zip s2).map { case (x, y) => x + y }
This works since the compiler understands this notation for methods with either two arguments, or a single tuple argument, as explained in section 8.5 of the spec:
val f: (Int, Int) => Int = { case (a, b) => a + b }
val g: ((Int, Int)) => Int = { case (a, b) => a + b }
Use zip:
for ((id, accountId) <- ids.zip(accounts)) {
// ...
}

How is a match word omitted in Scala?

In Scala, you can do
list.filter { item =>
item match {
case Some(foo) => foo.bar > 0
}
}
But you can also do the quicker way by omitting match:
list.filter {
case Some(foo) => foo.bar > 0
}
How is this supported in Scala? Is this new in 2.9? I have been looking for it, and I can figure out what makes this possible. Is it just part of the Scala compiler?
Edit: parts of this answer are wrong; please refer to huynhjl's answer.
If you omit the match, you signal the compiler that you are defining a partial function. A partial function is a function that is not defined for every input value. For instance, your filter function is only defined for values of type Some[A] (for your custom type A).
PartialFunctions throw a MatchError when you try to apply them where they are not defined. Therefore, you should make sure, when you pass a PartialFunction where a regular Function is defined, that your partial function will never be called with an unhanded argument. Such a mechanism is very useful e.g. for unpacking tuples in a collection:
val tupleSeq: Seq[(Int, Int)] = // ...
val sums = tupleSeq.map { case (i1, i2) => i1 + i2 }
APIs which ask for a partial function, like the collect filter-like operation on collections, usually call isDefinedAt before applying the partial function. There, it is safe (and often wanted) to have a partial function that is not defined for every input value.
So you see that although the syntax is close to that of a match, it is actually quite a different thing we're dealing with.
The language specification addresses that in section 8.5. The relevant portions:
An anonymous function can be defined by a sequence of cases
{ case p1 => b1 ... case pn => bn }
If the expected type is scala.Functionk[S1, ..., Sk, R] , the expression is taken to
be equivalent to the anonymous function:
(x1 : S1, ..., xk : Sk) => (x1, ..., xk) match {
case p1 => b1 ... case pn => bn
}
If the expected type is scala.PartialFunction[S, R], the expression is taken to
be equivalent to the following instance creation expression:
new scala.PartialFunction[S, T ] {
def apply(x: S): T = x match {
case p1 => b1 ... case pn => bn
}
def isDefinedAt(x: S): Boolean = {
case p1 => true ... case pn => true
case _ => false
}
}
So typing the expression as PartialFunction or a Function influences how the expression is compiled.
Also trait PartialFunction [-A, +B] extends (A) ⇒ B so a partial function PartialFunction[A,B] is also a Function[A,B].
-- Revised post --
Hmm, I'm not sure I see a difference, Scala 2.9.1.RC3,
val f: PartialFunction[Int, Int] = { case 2 => 3 }
f.isDefinedAt(1) // evaluates to false
f.isDefinedAt(2) // evaluates to true
f(1) // match error
val g: PartialFunction[Int, Int] = x => x match { case 2 => 3 }
g.isDefinedAt(1) // evaluates to false
g.isDefinedAt(2) // evaluates to true
g(1) // match error
It seems f and g behave exactly the same as PartialFunctions.
Here's another example demonstrating the equivalence:
Seq(1, "a").collect(x => x match { case s: String => s }) // evaluates to Seq(a)
Even more interesting:
// this compiles
val g: PartialFunction[Int, Int] = (x: Int) => {x match { case 2 => 3 }}
// this fails; found Function[Int, Int], required PartialFunction[Int, Int]
val g: PartialFunction[Int, Int] = (x: Int) => {(); x match { case 2 => 3 }}
So there's some special casing at the compiler level to convert between x => x match {...} and just {...}.
Update. After reading the language spec, this seems like a bug to me. I filed SI-4940 in the bug tracker.

How to generate transitive closure of set of tuples?

What is the best way to generate transitive closure of a set of tuples?
Example:
Input Set((1, 2), (2, 3), (3, 4), (5, 0))
Output Set((1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), (5, 0))
//one transitive step
def addTransitive[A, B](s: Set[(A, B)]) = {
s ++ (for ((x1, y1) <- s; (x2, y2) <- s if y1 == x2) yield (x1, y2))
}
//repeat until we don't get a bigger set
def transitiveClosure[A,B](s:Set[(A,B)]):Set[(A,B)] = {
val t = addTransitive(s)
if (t.size == s.size) s else transitiveClosure(t)
}
println(transitiveClosure(Set((1,2), (2,3), (3,4))))
This is not a very efficient implementation, but it is simple.
With the help of unfold,
def unfoldRight[A, B](seed: B)(f: B => Option[(A, B)]): List[A] = f(seed) match {
case Some((a, b)) => a :: unfoldRight(b)(f)
case None => Nil
}
def unfoldLeft[A, B](seed: B)(f: B => Option[(B, A)]) = {
def loop(seed: B)(ls: List[A]): List[A] = f(seed) match {
case Some((b, a)) => loop(b)(a :: ls)
case None => ls
}
loop(seed)(Nil)
}
it becomes rather simple:
def transitiveClosure(input: Set[(Int, Int)]) = {
val map = input.toMap
def closure(seed: Int) = unfoldLeft(map get seed) {
case Some(`seed`) => None
case Some(n) => Some(seed -> n -> (map get n))
case _ => None
}
map.keySet flatMap closure
}
Another way of writing closure is this:
def closure(seed: Int) = unfoldRight(seed) {
case n if map.get(n) != seed => map get n map (x => seed -> x -> x)
case _ => None
}
I'm not sure which way I like best, myself. I like the elegance of testing for Some(seed) to avoid loops, but, by the same token, I also like the elegance of mapping the result of map get n.
Neither version returns seed -> seed for loops, so you'll have to add that if needed. Here:
def closure(seed: Int) = unfoldRight(map get seed) {
case Some(`seed`) => Some(seed -> seed -> None)
case Some(n) => Some(seed -> n -> (map get n))
case _ => None
}
Model the problem as a directed graph as follows:
Represent the numbers in the tuples as vertices in a graph.
Then each tuple (x, y) represents a directed edge from x to y. After that, use Warshall's Algorithm to find the transitive closure of the graph.
For the resulting graph, each directed edge is then converted to an (x, y) tuple. That is the transitive closure of the set of tuples.
Assuming that what you have is a DAG (there are no cycles in your example data), you could use the code below. It expects the DAG as a Map from T to List[T], which you could get from your input using
input.groupBy(_._1) mapValues ( _ map (_._2) )
Here's the transitive closure:
def transitiveClosure[T]( dag: Map[ T, List[T] ] ) = {
var tc = Map.empty[ T, List[T] ]
def getItemTC( item:T ): List[T] = tc.get(item) match {
case None =>
val itemTC = dag(item) flatMap ( x => x::getItemTC(x) )
tc += ( item -> itemTC )
itemTC
case Some(itemTC) => itemTC
}
dag.keys foreach getItemTC
tc
}
This code figures out the closure for each element just once. However:
This code can cause a stack overflow if there are long enough paths through the DAG (the recursion is not tail recursion).
For a large graph, you would probably be better off making tc a mutable Map and then converting it at the end if you wanted an immutable Map.
If your elements were really small integers as in your example, you could improve performance significantly by using Arrays rather than Maps, although doing so would complicate some things.
To eliminate the stack overflow problem (for DAGs), you could do a topological sort, reverse it, and process the items in order. But see also this page:
best known transitive closure algorithm for graph

How to interpret scaladoc?

How does foldRight[B](B) from scaladoc match the actual call foldRight(0)
args is an array of integers in string representation
val elems = args map Integer.parseInt
elems.foldRight(0) (_ + _)
Scaladoc says:
scala.Iterable.foldRight[B](B)((A, B) => B) : B
Combines the elements of this list together using the binary function f, from right to left, and starting with the value z.
#note Will not terminate for infinite-sized collections.
#return f(a0, f(a1, f(..., f(an, z)...))) if the list is [a0, a1, ..., an].
And not so imporant what do the periods after f(an, z) mean?
As Steve said, the "..." are just ellipsis, indicating that a variable number of parameters that are not being shown.
Let's go to the Scaladoc, and show this step by step:
def foldRight[B](z: B)(op: (B, A) ⇒ B): B
That doesn't show enough. What is A? That is defined in the Iterable class (or whatever other class it is defined for):
trait Iterable[+A] extends AnyRef // Scala 2.7
trait Iterable[+A] extends Traversable[A] with GenericTraversableTemplate[A, Iterable[A][A]] with IterableLike[A, Iterable[A]] // scala 2.8
Ok, so A is the type of the collection. In your example, A would stand for Int:
val elems = args map Integer.parseInt
Next, [B]. That's a type parameter. Basically, the following two calls are identical in practice, but the first has the type parameter inferred by the compiler:
elems.foldRight(0) (_ + _)
elems.foldRight[Int](0) (_ + _)
If you used 0L instead of 0, then B would stand for Long instead. If you passed a "" instead of 0, then B would stand for String. You can try these out, they all will work.
So, B is Int and z is 0. Note that there are two sets parenthesis in the declaration. That means the function is curried. It receives two sets of parameters, beyond, as well as the type parameter ([B]). What that means is that you can ommit the second set of parameter, and that will return a function which takes that second set of parameter, and returns the expected result. For example:
val elemsFolder: ((Int, Int) => Int) => Int = elems.foldRight(0)
Which you could then call like this:
elemsFolder(_ + _)
Anyway, the second set receives op, which is expected to be of type (B, A) => B. Or, in other words, a function which receives two parameters -- the first being the same type as z, and the second being the same type as the type of the collection -- and returns a result of the same type as the first parameter. Since both A and B are Int, it will be a function of (Int, Int) => Int. If you passed "", then it would be a function of type (String, Int) => String.
Finally, the return type of the collection is B, which means whatever is the type of z, that will be the type returned by foldRight.
As for how foldRight works, it goes a bit like this:
def foldRight[B](z: B)(op: (B, A) => B): B = {
var acc: B = z
var it = this.reverse.elements // this.reverse.iterator on Scala 2.8
while (!it.isEmpty) {
acc = op(acc, it.next)
}
return acc
}
Which, I hope should be easy enough to understand.
Everything you need to know about foldLeft and foldRight can be gleaned from the following:
scala> List("1", "2", "3").foldRight("0"){(a, b) => "f(" + a + ", " + b + ")"}
res21: java.lang.String = f(1, f(2, f(3, 0)))
scala> List("1", "2", "3").foldLeft("0"){(a, b) => "f(" + a + ", " + b + ")"}
res22: java.lang.String = f(f(f(0, 1), 2), 3)