"for" translation into lists high order functions - scala

As far as I understand, for expressions are translated into Scala expressions which are build upon:
map
flatMap
filterWith
foreach
High order lists methods.
A common example is the one where:
for(b1 <= books; b2 <- books if b1 != b2;
a1 <- b1.authors; a2 <- b2.authors if a1 == a2) yield a1;
Results in:
books flatMap (b1 =>
books withFilter( b2 => b1 != b2) flatMap( b2 =>
b1.authors flatMap ( a1 =>
b2.authors withFilter ( a2 => a2 == a1 ) map ( a2 => a1 )
)
)
)
Where:
books is a list of class Book objects (List[Book])
Book has a public attribute authors of type List[String]
My question is about this line:
b2.authors withFilter ( a2 => a2 == a1 ) map ( a2 => a1 )
Since the condition is a2 == a1 that line is equivalent to:
b2.authors withFilter ( a2 => a2 == a1 ) map ( x => x )
Why the generated code isn't just?
b2.authors filter ( a2 => a2 == a1 )
Can it be explained by the fact that the example is the reproduction of code automatically generated by Scala's compiler?
Is filter out of the for "building bricks"?

The translation of for/yield syntax into method calls is very simple and mechanical, almost at the level of string manipulation. withFilter is necessary in some places for its laziness, therefore it's used everywhere for simplicity. I don't understand the phrasing of your final question, but for/yield expressions are AIUI never translated into calls to filter except in a deprecated way for objects that don't yet have a withFilter method.

Related

Fold function scala's immutable list

I have created an immutable list and try to fold it to a map, where each element is mapped to a constant string "abc". I do it for practice.
While I do that, I am getting an error. I am not sure why the map (here, e1 which has mutable map type) is converted to Any.
val l = collection.immutable.List(1,2,3,4)
l.fold (collection.mutable.Map[Int,String]()) ( (e1,e2) => e1 += (e2,"abc") )
l.fold (collection.mutable.Map[Int,String]()) ( (e1,e2) => e1 += (e2,"abc") )
<console>:13: error: value += is not a member of Any
Expression does not convert to assignment because receiver is not assignable.
l.fold (collection.mutable.Map[Int,String]()) ( (e1,e2) => e1 += (e2,"abc") )
At least three different problem sources here:
Map[...] is not a supertype of Int, so you probably want foldLeft, not fold (the fold acts more like the "banana brackets", it expects the first argument to act like some kind of "zero", and the binary operation as some kind of "addition" - this does not apply to mutable maps and integers).
The binary operation must return something, both for fold and foldLeft. In this case, you probably want to return the modified map. This is why you need ; m (last expression is what gets returned from the closure).
The m += (k, v) is not what you think it is. It attempts to invoke a method += with two separate arguments. What you need is to invoke it with a single pair. Try m += ((k, v)) instead (yes, those problems with arity are annoying).
Putting it all together:
l.foldLeft(collection.mutable.Map[Int, String]()){ (m, e) => m += ((e, "abc")); m }
But since you are using a mutable map anyway:
val l = (1 to 4).toList
val m = collection.mutable.Map[Int, String]()
for (e <- l) m(e) = "abc"
This looks arguably clearer to me, to be honest. In a foldLeft, I wouldn't expect the map to be mutated.
Folding is all about combining a sequence of input elements into a single output element. The output and input elements should have the same types in Scala. Here is the definition of fold:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
In your case type A1 is Int, but output element (sum type) is mutable.Map. So if you want to build a Map throug iteration, then you can use foldLeft or any other alternatives where you can use different input and output types. Here is the definition of foldLeft:
def foldLeft[B](z: B)(op: (B, A) => B): B
Solution:
val l = collection.immutable.List(1, 2, 3, 4)
l.foldLeft(collection.immutable.Map.empty[Int, String]) { (e1, e2) =>
e1 + (e2 -> "abc")
}
Note: I'm not using a mutabe Map

Checking for items in a MiniZinc array

I want to create two arrays in MiniZinc with the same items, not necessarily in the same order. Here, every item in A0 should also be in A1:
array[1..3] of var int:A0;
array[1..3] of var int:A1;
constraint forall(A2 in A0)(
(A2 in A1) /\ A2 < 5
);
But here, there seems to be a type error:
MiniZinc: type error: type error in operator application for `'in''. No matching operator found with left-hand side type `var int' and right-hand side type `array[int] of var int'
How is it possible to check if an array contains the same item that is in another array?
Edit: There is an array2set in the file builtins.mzn but it is not documented in https://www.minizinc.org/doc-2.4.2/ .
The following model works for most FlatZinc solvers such as Gecode, Google-OR-tools, Choco, PicatSAT, and JaCoP, but not for Chuffed (see below). Note the include of "nosets.mzn" so that solvers without innate support for set variables can run the model. Also, I added a smaller domain of A0 and A1 for easier testing.
include "nosets.mzn"; % Support for set variables for all solvers
array[1..3] of var 0..10: A0;
array[1..3] of var 0..10: A1;
constraint
forall(A2 in A0)(
A2 in array2set(A1) /\ A2 < 5
)
/\
forall(A2 in A1)(
A2 in array2set(A0) /\ A2 < 5
);
solve satisfy;
output [ "A0: \(A0) A1: \(A1)\n" ];
However, some solvers don't like this:
Chuffed: Throws "Error: Registry: Constraint bool_lin_eq not found in line no. 101"
Even later note: If the domains is var int (instead of my var 0..10) then MiniZinc croaks with a weird (and long) error:
...
in array comprehension expression
comprehension iterates over an infinite set
So array2set seems to require that the variable domains must be bounded.
This is the first answer
Here is an approach that seems to work, i.e. using exists and check for element equality:
constraint forall(A2 in A0)(
exists(i in 1..3) ( A2 = A1[i] /\ A2 < 5)
);
Note: This constraint only ensures that the elements in A0 is in A1. Thus there might be elements in A1 that is not in A0. E.g.
A0: [1,1,4]
A1: [1,4,3]
I guess that you also want the converse i.e. that all elements in A1 is in A0 as well:
constraint forall(A2 in A1) (
exists(i in 1..3) ( A2 = A0[i] /\ A2 < 5)
);
Note: The following DOES NOT work but would be nice to have. Both yield the error MiniZinc: internal error: var set comprehensions not supported yet.
% idea 1
constraint forall(A2 in A0)(
A2 in {A1[i] | i in 1..3} /\ A2 < 5
);
% idea 2
constraint forall(A2 in A0) (
A2 in {a | a in A1} /\ A2 < 5
);

Scala list of tuples of different size zip issues?

Hi my two lists as follows:
val a = List((1430299869,"A",4200), (1430299869,"A",0))
val b = List((1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
when I used zip two lists as below :
val c = a zip b
it returns results as
List(((1430299869,A,4200),(1430302366,B,4100)), ((1430299869,A,0),(1430302366,B,4200)))
Not all lists of tuples, how can I zip all above data?
EDIT
expected results as combine of two lists like :
List((1430299869,"A",4200), (1430299869,"A",0),(1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
Second Edit
I tried this :
val d = for(((a,b,c),(d,e,f)) <- (a zip b)if(b.equals(e) && c.equals(f))) yield (d,e,f)
but it gives empty results because of (a zip b) but I replaced a zip b as a ++ b then it shows following error :
constructor cannot be instantiated to expected type;
So how can I get matching tuples?
Just add one list to another:
a ++ b
According to your 2nd edit, what you need is:
for {
(a1,b1,c) <- a //rename extracted to a1 and b1 to avoid confusion
(d,e,f) <- b
if b1.equals(e) && c.equals(f)
} yield (d,e,f)
Or:
for {
(a1, b1, c) <- a
(d, `b1`, `c`) <- b //enclosing it in backticks avoids capture and matches against already defined values
} yield (d, b1, c)
Zipping won't help since you need to compare all tuples in a with all tuples in b , it seems.
a zip b creates a list of pairs of elements from a and b.
What you're most likely looking for is list concatenation, which is a ++ b
On zipping (pairing) all data in the lists, consider first a briefer input for illustrating the case,
val a = (1 to 2).toList
val b = (10 to 12).toList
Then for instance a for comprehension may convey the needs,
for (i <- a; j <- b) yield (i,j)
which delivers
List((1,10), (1,11), (1,12),
(2,10), (2,11), (2,12))
Update
From OP latest update, consider a dedicated filtering function,
type triplet = (Int,String,Int)
def filtering(key: triplet, xs: List[triplet]) =
xs.filter( v => key._2 == v._2 && key._3 == v._3 )
and so apply it with flatMap,
a.flatMap(filtering(_, b))
List((1430302366,A,4200))
One additional step is to encapsulate this in an implicit class,
implicit class OpsFilter(val keys: List[triplet]) extends AnyVal {
def filtering(xs: List[triplet]) = {
keys.flatMap ( key => xs.filter( v => key._2 == v._2 && key._3 == v._3 ))
}
}
and likewise,
a.filtering(b)
List((1430302366,A,4200))

How to dynamically generate parallel futures with for-yield

I have below code:
val f1 = Future(genA1)
val f2 = Future(genA2)
val f3 = Future(genA3)
val f4 = Future(genA4)
val results: Future[Seq[A]] = for {
a1 <- f1
a2 <- f2
a3 <- f3
a4 <- f4
} yield Seq(a, b, c, d)
Now I have a requirement to optionally exclude a2, how to modified the code? ( with map or flatMap is also acceptable)
Further more, say if I have M possible future needs to be aggregated like above, and N of M could be optionally excluded against some flag (biz logic), how should I handle it?
thanks in advance!
Leon
In question1, I understand that you want to exclude one entry (e.g B) from the sequence given some logic and in question2, you want to supress N entries from a total of M, and have the future computed on those results. We could generalize both cases to something like this:
// Using a map as simple example, but 'generators' could be a function that creates the required computation
val generators = Map('a' -> genA1, 'b' -> genA1, 'c' -> genA3, 'd' -> genA4)
...
// shouldAccept(k) => Business logic to decide which computations should be executed.
val selectedGenerators = generators.filter{case (k,v) => shouldAccept(k)}
// Create Seq[Future] from the selected computations
val futures = selectedGenerators.map{case (k,v) => Future(v)}
// Create Future[Seq[_]] to have the result of computing all entries.
val result = Future.sequence(futures)
In general, what I think you are looking for is Future.sequence, which takes a Seq[Future[_]] and produces a Future[Seq[_]], which is basically what you are doing "by hand" with the for-comprehension.

Scala for comprehension efficiency?

In the book "Programming In Scala", chapter 23, the author give an example like:
case class Book(title: String, authors: String*)
val books: List[Book] = // list of books, omitted here
// find all authors who have published at least two books
for (b1 <- books; b2 <- books if b1 != b2;
a1 <- b1.authors; a2 <- b2.authors if a1 == a2)
yield a1
The author said, this will translated into:
books flatMap (b1 =>
books filter (b2 => b1 != b2) flatMap (b2 =>
b1.authors flatMap (a1 =>
b2.authors filter (a2 => a1 == a2) map (a2 =>
a1))))
But if you look into the map and flatmap method definition(TraversableLike.scala), you may find, they are defined as for loops:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
def flatMap[B, That](f: A => Traversable[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) b ++= f(x)
b.result
}
Well, I guess this for will continually be translated to foreach and then translated to while statement which is a construct not an expression, scala doesn't have a for construct, because it wants the for always yield something.
So, what I want to discuss with you is that, why does Scala do this "For translation" ?
The author's example used 4 generators, which will be translated into 4 level nested for loop in the end, I think it'll have really horrible performance when the books is large.
Scala encourage people to use this kind of "Syntactic Sugar", you can always see codes that heavily make use of filter, map and flatmap, which seems programmers are forgetting what they really do is nesting one loop inside another, and what achieved is only to make codes looks a bit shorter. What's your idea?
For comprehensions are syntactic sugar for monadic transformation, and, as such, are useful in all sorts of places. At that, they are much more verbose in Scala than the equivalent Haskell construct (of course, Haskell is non-strict by default, so one can't talk about performance of the construct like in Scala).
Also important, this construct keeps what is being done clear, and avoids quickly escalating indentation or unnecessary private method nesting.
As to the final consideration, whether that hides the complexity or not, I'll posit this:
for {
b1 <- books
b2 <- books
if b1 != b2
a1 <- b1.authors
a2 <- b2.authors
if a1 == a2
} yield a1
It is very easy to see what is being done, and the complexity is clear: b^2 * a^2 (the filter won't alter the complexity), for number of books and number of authors. Now, write the same code in Java, either with deep indentation or with private methods, and try to ascertain, in a quick look, what the complexity of the code is.
So, imho, this doesn't hide the complexity, but, on the contrary, makes it clear.
As for the map/flatMap/filter definitions you mention, they do not belong to List or any other class, so they won't be applied. Basically,
for(x <- List(1, 2, 3)) yield x * 2
is translated into
List(1, 2, 3) map (x => x * 2)
and that is not the same thing as
map(List(1, 2, 3), ((x: Int) => x * 2)))
which is how the definition you passed would be called. For the record, the actual implementation of map on List is:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
I write code so that it's easy to understand and maintain. I then profile. If there's a bottleneck that's where I devote my attention. If it's in something like you've described I'll attack the problem in a different manner. Until then, I love the "sugar." It saves me the trouble of writing things out or thinking hard about it.
There are actually 6 loops. One loop for each filter/flatMap/map
The filter->map pairs can be done in one loop by using lazy views of the collections (iterator method)
In general, tt is running 2 nested loops for books to find all book pairs and then two nested loops to find if the author of one book is in the list of authors of the other.
Using simple data structures, you would do the same when coding explicitly.
And of course, the example here is to show a complex 'for' loop, not to write the most efficient code. E.g., instead of a sequence of authors, one could use a Set and then find if the intersection is non empty:
for (b1 <- books; b2 <- books; a <- (b1.authors & b2.authors)) yield a
Note that in 2.8, the filter call was changed to withFilter which is lazy and would avoid constructing an intermediate structure. See guide to move from filter to withFilter?.
I believe the reason that for is translated to map, flatMap and withFilter (as well as value definitions if present) is to make the use of monads easier.
In general I think if the computation you are doing involves looping 4 times, it is fine using the for loop. If the computation can be done more efficiently and performance is important then you should use the more efficient algorithm.
One follow-up to #IttayD's answer on the algorithm's efficiency. It's worth noting that the algorithm in the original post (and in the book) is a nested loop join. In practice, this isn't an efficient algorithm for large datasets, and most databases would use a hash aggregate here instead. In Scala, a hash aggregate would look something like:
(for (book <- books;
author <- book.authors) yield (book, author)
).groupBy(_._2).filter(_._2.size > 1).keys