Difference between applicative and monadic computation in scala

Difference between applicative and monadic computation in scala - scala

Given this simple computation i can not clearly see the difference between using applicative style over monadic style. Are there some better examples out there ( in scala ) when to use the one over the other.
println( (3.some |#| none[Int] |#| 4.some )( (a:Int,b:Int,c:Int) => { a + b + c } ) ) // prints None
println( for(
a <- Some(3);
b <- none[Int];
c <- Some(4)
) yield( a + b + c ) ) // prints None
Both computations ending up in a None so the end result is the same. The only difference i can see ist that there is no temporaray access to those vars in the for comprehension when using the applicative syntax.
Furthermore having one None value stops the whole computation. I thought applicative means "not dependent on the result of the computation before"

The applicative builder syntax will evaluate each term and can not use the result of a prior computation. However, even if the first result is None, all the other expressions will still be evaluated.
Whereas, with the for comprehension, it will 'fail fast' (it will not evaluate any further expressions after a None, in your case), plus you can access the results of previous computations.
Don't think of these things as simply different styles, they are calling different functions with different behaviours: i.e. flatMap vs apply

Monads represent sequential computations where each next computation depends on previous ones (if previous computation is empty you can't proceed, so you "fail fast"), more generic example of monadic computation:
println( for(
a <- Some(1);
b <- Some(a);
c <- Some(a + b)
) yield( a + b + c ) ) //=> 4
Applicative is just fmap on steroids where not only an argument, but a mapping function itself can be empty. In your case it can be rewritten as:
4.some <*>
{ none[Int] <*>
{ 3.some <*>
{ (_: Int) + (_: Int) + (_: Int) }.curried.some } }
On some step your function becomes Option[Int => Int] = None, but it doesn't stop from applying it to 4.some, only the result is None as expected. You still need to know the value of 4.some.

Related

Fold method using List as accumulator

To find prime factors of a number I was using this piece of code :
def primeFactors(num: Long): List[Long] = {
val exists = (2L to math.sqrt(num).toLong).find(num % _ == 0)
exists match {
case Some(d) => d :: primeFactors(num/d)
case None => List(num)
}
}
but this I found a cool and more functional approach to solve this using this code:
def factors(n: Long): List[Long] = (2 to math.sqrt(n).toInt)
.find(n % _ == 0).fold(List(n)) ( i => i.toLong :: factors(n / i))
Earlier I was using foldLeft or fold simply to get sum of a list or other simple calculations, but here I can't seem to understand how fold is working and how this is breaking out of the recursive function.Can somebody plz explain how fold functionality is working here.

Option's fold
If you look at the signature of Option's fold function, it takes two parameters:
def fold[B](ifEmpty: => B)(f: A => B): B
What it does is, it applies f on the value of Option if it is not empty. If Option is empty, it simply returns output of ifEmpty (this is termination condition for recursion).
So in your case, i => i.toLong :: factors(n / i) represents f which will be evaluated if Option is not empty. While List(n) is termination condition.
fold used for collection / iterators
The other fold that you are taking about for getting sum of collection, comes from TraversableOnce and it has signature like:
def foldLeft[B](z: B)(op: (B, A) => B): B
Here, z is starting value (suppose incase of sum it's 0) and op is associative binary operator which is applied on z and each value of collection from left to right.
So both folds differ in their implementation.

What is point of /: function?

Documentation for /: includes
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator
is associative and commutative.
( src)
This just applies if the par version of this function is run, otherwise the result is deterministic (same as foldLeft) ?
Also this function is calling foldLeft under the hood : def /:[B](z: B)(op: (B, A) => B): B = foldLeft(z)(op)
Their function definitions are same (except for function param label, "op" instad of "f") :
def /:[B](z: B)(op: (B, A) ⇒ B): B
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
For these reasons what is point of /: function and when should it be used in favour of foldLeft ?
Is my reasoning incorrect ?

It's just an alternative syntax. Methods ending in : are called on the right hand side.
Instead of
list.foldLeft(0) { op(_, _) }
or
list./:(0) { op(_, _) }
you can
( z /: list ) { op(_, _) }
For example,
scala> val a = List(1,2,3,4)
a: List[Int] = List(1, 2, 3, 4)
scala> ( 0 /: a ) { _ + _ }
res5: Int = 10

Yes, those are aliases originating from dark times when people liked their operators like this:
val x = y |#<#|: z.
The point of the note is to remind that for collections with unspecified iteration order the result of folds might differ. Consider having a Set {1,2,3} that doesn't guarantee the same access order even if left unmodified, and applying an operation that is not e. g. associative (like /). Even if run not after par call, this might result in the following (pseudocode):
{1,2,3} foldLeft / ==> (1 / 2) / 3 ==> 1/6 = 0.1(6)
{3,1,2} foldLeft / ==> (3 / 1) / 2 ==> 3/2 = 1.5
In terms of consistency this is similar to applying non-parallelizable operations to parallel collections, though.

Most idiomatic way to mix synchronous, asynchronous, and parallel computation in a scala for comprehension of futures

Suppose I have 4 future computations to do. The first two can be done in parallel, but the third must be done after the first two (even though the values of the first two are not used in the third -- think of each computation as a command that performs some db operation). Finally, there is a 4th computation that must occur after all of the first 3. Additionally, there is a side effect that can be started after the first 3 complete (think of this as kicking off a periodic runnable). In code, this could look like the following:
for {
_ <- async1 // not done in parallel with async2 :( is there
_ <- async2 // any way of achieving this cleanly inside of for?
_ <- async3
_ = sideEffect // do I need "=" here??
_ <- async4
} yield ()
The comments show my doubts about the quality of the code:
What's the cleanest way to do two operations in parallel in a for comprehension?
Is there is a way to achieve this result without so many "_" characters (nor assigning a named reference, at least in the case of sideEffect)
what's the cleanest and most idiomatic way to do this?

You can use zip to combine two futures, including the result of zip itself. You'll end up with tuples holding tuples, but if you use infix notation for Tuple2 it is easy to take them apart. Below I define a synonym ~ for succinctness (this is what the parser combinator library does, except its ~ is a different class that behaves similiarly to Tuple2).
As an alternative for _ = for the side effect, you can either move it into the yield, or combine it with the following statement using braces and a semicolon. I would still consider _ = to be more idiomatic, at least so far as having a side effecting statement in the for is idiomatic at all.
val ~ = Tuple2
for {
a ~ b ~ c <- async1 zip
async2 zip
async3
d <- { sideEffect; async4 }
} yield (a, b, c, d)

for-comprehensions represent monadic operations, and monadic operations are sequenced. There's superclass of monad, applicative, where computations don't depend on the results of prior computations, thus may be run in parallel.
Scalaz has a |#| operator for combining applicatives, so you can use (future1 |#| future2)(proc(_, _)) to dispatch two futures in parallel and then run "proc" on the result of both of them, as opposed to sequential computation of for {a <- future1; b <- future2(a)} yield b (or just future1 flatMap future2).
There's already a method on stdlib Futures called .zip that combines Futures in parallel, and indeed the scalaz impl uses this: https://github.com/scalaz/scalaz/blob/scalaz-seven/core/src/main/scala/scalaz/std/Future.scala#L36
And .zip and for-comprehensions may be intermixed to have parallel and sequential parts, as appropriate.
So just using the stdlib syntax, your above example could be written as:
for {
_ <- async1 zip async2
_ <- async3
_ = sideEffect
_ <- async4
} yield ()
Alternatively, written w/out a for-comprehension:
async1 zip async2 flatMap (_=> async3) flatMap {_=> sideEffect; async4}

Just as an FYI, it's really simple to get two futures to run in parallel and still process them via a for-comprehension. The suggested solutions of using zip can certainly work, but I find that when I want to handle a couple of futures and do something when they are all done, and I have two or more that are independent of each other, I do something like this:
val f1 = async1
val f2 = async2
//First two futures now running in parallel
for {
r1 <- f1
r2 <- f2
_ <- async3
_ = sideEffect
_ <- async4
} yield {
...
}
Now the way the for comprehension is structured certainly waits on f1 before checking on the completion status of f2, but the logic behind these two futures is running at the same time. This is a little simpler then some of the suggestions but still might give you what you need.

Your code already looks structured minus computing futures in parallel.
Use helper functions, ideally writing a code generator to print out
helpers for all tuple cases
As far as I know, you need to name the result or assign it _
Example code
Example code with helpers.
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Example {
def run: Future[Unit] = {
for {
(a, b, c) <- par(
Future.successful(1),
Future.successful(2),
Future.successful(3)
)
constant = 100
(d, e) <- par(
Future.successful(a + 10),
Future.successful(b + c)
)
} yield {
println(constant)
println(d)
println(e)
}
}
def par[A,B](a: Future[A], b: Future[B]): Future[(A, B)] = {
for {
a <- a
b <- b
} yield (a, b)
}
def par[A,B,C](a: Future[A], b: Future[B], c: Future[C]): Future[(A, B, C)] = {
for {
a <- a
b <- b
c <- c
} yield (a, b, c)
}
}
Example.run
Edit:
generated code for 1 to 20 futures: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parhelpers-scala
parPrinter script: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parprinter-scala

Example of the Scala aggregate function

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.
Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:
val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))
After applying aggregate:
Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))
Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.

Let's see if some ascii art doesn't help. Consider the type signature of aggregate:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:
z A z A z A z A
\ / \ /seqop\ / \ /
B B B B
\ / combop \ /
B _ _ B
\ combop /
B
Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:
Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.
import scala.collection.GenSeq
val seq = GenSeq("This", "is", "an", "example")
val chars = seq.par.aggregate(0)(_ + _.length, _ + _)
So, first it would compute this:
0 + "This".length // 4
0 + "is".length // 2
0 + "an".length // 2
0 + "example".length // 7
What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):
4 + 2 // 6
2 + 7 // 9
At which point it concludes with
6 + 9 // 15
which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!
Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.
The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.
Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.
Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:
(((0 + "This".length) + "is".length) + "an".length) + "example".length
Each of the inner parenthesis must be computed before the outer one can proceed.

The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want groupBy. Close to at least. As you start with a Seq[(String, String)], and you group by taking the first item in the tuple (which is (String, String) => String), it would return a Map[String, Seq[(String, String)]). You then have to discard the first parameter in the Seq[String, String)] values.
So
list.groupBy(_._1).mapValues(_.map(_._2))
There you get a Map[String, Seq[(String, String)]. If you want a Seq instead of Map, call toSeq on the result. I don't think you have a guarantee on the order in the resulting Seq though
Aggregate is a more difficult function.
Consider first reduceLeft and reduceRight.
Let as be a non empty sequence as = Seq(a1, ... an) of elements of type A, and f: (A,A) => A be some way to combine two elements of type A into one. I will note it as a binary operator #, a1 # a2 rather than f(a1, a2). as.reduceLeft(#) will compute (((a1 # a2) # a3)... # an). reduceRight will put the parentheses the other way, (a1 # (a2 #... # an)))). If # happens to be associative, one does not care about the parentheses. One could compute it as (a1 #... # ap) # (ap+1 #...#an) (there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when # is known to be associative, and the reduceLeft method cannot know that.
Still, there could be method reduce, whose caller would be responsible for ensuring that the operation is associative. Then reduce would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.
There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type: # has to be (A,A) => A. But one could have the more general problem of combining them into a B. One starts with a value b of type B, and combine it with every elements of the sequence. The operator # is (B,A) => B, and one computes (((b # a1) # a2) ... # an). foldLeft does that. foldRight does the same thing but starting with an. There, the # operation has no chance to be associative. When one writes b # a1 # a2, it must mean (b # a1) # a2, as (a1 # a2) would be ill-typed. So foldLeft and foldRight have to be sequential.
Suppose however, that each A can be turned into a B, let's write it with !, a! is of type B. Suppose moreover that there is a + operation (B,B) => B, and that # is such that b # a is in fact b + a!. Rather than combining elements with #, one could first transform all of them to B with !, then combine them with +. That would be as.map(!).reduceLeft(+). And if + is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).
Aggregate is very close to that. It may be however, that there is a more efficient way to implement b#a than b+a! For instance, if type B is List[A], and b#a is a::b, then a! will be a::Nil, and b1 + b2 will be b2 ::: b1. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use #, one first splits b + a1! + ... + an!, into (b + a1! + ap!) + (ap+1! + ..+ an!), then go back to using # with (b # a1 # an) + (ap+1! # # an). One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that, as.associativeFold(!, +) could be changed to as.optimizedAssociativeFold(b, !, #, +).
Back to +. + is associative, or equivalently, (B, +) is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element z (for zero) in B, so that for each b, z + b = b + z = b. In that case, the ! operation that make sense is likely to be be a! = z # a. Moreover, as z is a neutral element b # a1 ..# an = (b + z) # a1 # an which is b + (z + a1 # an). So is is always possible to start the aggregation with z. If b is wanted instead, you do b + result at the end. With all those hypotheses, we can do as.aggregate(z, #, +). That is what aggregate does. # is the seqop argument (applied in a sequence z # a1 # a2 # ap), and + is combop (applied to already partially combined results, as in (z + a1#...#ap) + (z + ap+1#...#an)).
To sum it up, as.aggregate(z)(seqop, combop) computes the same thing as as.foldLeft(z)( seqop) provided that
(B, combop, z) is a monoid
seqop(b,a) = combop(b, seqop(z,a))
aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.
Finally, solving the initial problem using aggregate is left as an exercise to the reader. A hint: implement using foldLeft, then find z and combo that will satisfy the conditions stated above.

The signature for a collection with elements of type A is:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
z is an object of type B acting as a neutral element. If you want to count something, you can use 0, if you want to build a list, start with an empty list, etc.
segop is analoguous to the function you pass to fold methods. It takes two argument, the first one is the same type as the neutral element you passed and represent the stuff which was already aggregated on previous iteration, the second one is the next element of your collection. The result must also by of type B.
combop: is a function combining two results in one.
In most collections, aggregate is implemented in TraversableOnce as:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B
= foldLeft(z)(seqop)
Thus combop is ignored. However, it makes sense for parallel collections, becauseseqop will first be applied locally in parallel, and then combopis called to finish the aggregation.
So for your example, you can try with a fold first:
val seqOp =
(map:Map[String,Set[String]],tuple: (String,String)) =>
map + ( tuple._1 -> ( map.getOrElse( tuple._1, Set[String]() ) + tuple._2 ) )
list.foldLeft( Map[String,Set[String]]() )( seqOp )
// returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Then you have to find a way of collapsing two multimaps:
val combOp = (map1: Map[String,Set[String]], map2: Map[String,Set[String]]) =>
(map1.keySet ++ map2.keySet).foldLeft( Map[String,Set[String]]() ) {
(result,k) =>
result + ( k -> ( map1.getOrElse(k,Set[String]() ) ++ map2.getOrElse(k,Set[String]() ) ) )
}
Now, you can use aggregate in parallel:
list.par.aggregate( Map[String,Set[String]]() )( seqOp, combOp )
//Returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Applying the method "par" to list, thus using the parallel collection(scala.collection.parallel.immutable.ParSeq) of the list to really take advantage of the multi core processors. Without "par", there won't be any performance gain since the aggregate is not done on the parallel collection.

aggregate is like foldLeft but may executed in parallel.
As missingfactor says, the linear version of aggregate(z)(seqop, combop) is equivalent to foldleft(z)(seqop). This is however impractical in the parallel case, where we would need to combine not only the next element with the previous result (as in a normal fold) but we want to split the iterable into sub-iterables on which we call aggregate and need to combine those again. (In left-to-right order but not associative as we might have combined the last parts before the fist parts of the iterable.) This re-combining in in general non-trivial, and therefore, one needs a method (S, S) => S to accomplish that.
The definition in ParIterableLike is:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
which indeed uses combop.
For reference, Aggregate is defined as:
protected[this] class Aggregate[S](z: S, seqop: (S, T) => S, combop: (S, S) => S, protected[this] val pit: IterableSplitter[T])
extends Accessor[S, Aggregate[S]] {
#volatile var result: S = null.asInstanceOf[S]
def leaf(prevr: Option[S]) = result = pit.foldLeft(z)(seqop)
protected[this] def newSubtask(p: IterableSplitter[T]) = new Aggregate(z, seqop, combop, p)
override def merge(that: Aggregate[S]) = result = combop(result, that.result)
}
The important part is merge where combop is applied with two sub-results.

Here is the blog on how aggregate enable performance on the multi cores processor with bench mark.
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
Here is video on "Scala parallel collections" talk from "Scala Days 2011".
http://days2011.scala-lang.org/node/138/272
The description on the video
Scala Parallel Collections
Aleksandar Prokopec
Parallel programming abstractions become increasingly important as the number of processor cores grows. A high-level programming model enables the programmer to focus more on the program and less on low-level details such as synchronization and load-balancing. Scala parallel collections extend the programming model of the Scala collection framework, providing parallel operations on datasets.
The talk will describe the architecture of the parallel collection framework, explaining their implementation and design decisions. Concrete collection implementations such as parallel hash maps and parallel hash tries will be described. Finally, several example applications will be shown, demonstrating the programming model in practice.

The definition of aggregate in TraversableOnce source is:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B =
foldLeft(z)(seqop)
which is no different than a simple foldLeft. combop doesn't seem to be used anywhere. I am myself confused as to what the purpose of this method is.

Just to clarify explanations of those before me, in theory the idea is that
aggregate should work like this, (I have changed the names of the parameters to make them clearer):
Seq(1,2,3,4).aggragate(0)(
addToPrev = (prev,curr) => prev + curr,
combineSums = (sumA,sumB) => sumA + sumB)
Should logically translate to
Seq(1,2,3,4)
.grouped(2) // split into groups of 2 members each
.map(prevAndCurrList => prevAndCurrList(0) + prevAndCurrList(1))
.foldLeft(0)(sumA,sumB => sumA + sumB)
Because the aggregation and mapping are separate, the original list could theoretically be split into different groups of different sizes and run in parallel or even on different machine.
In practice scala current implementation does not support this feature by default but you can do this in your own code.

What is Scala's yield?

I understand Ruby and Python's yield. What does Scala's yield do?

I think the accepted answer is great, but it seems many people have failed to grasp some fundamental points.
First, Scala's for comprehensions are equivalent to Haskell's do notation, and it is nothing more than a syntactic sugar for composition of multiple monadic operations. As this statement will most likely not help anyone who needs help, let's try again… :-)
Scala's for comprehensions is syntactic sugar for composition of multiple operations with map, flatMap and filter. Or foreach. Scala actually translates a for-expression into calls to those methods, so any class providing them, or a subset of them, can be used with for comprehensions.
First, let's talk about the translations. There are very simple rules:
This
for(x <- c1; y <- c2; z <-c3) {...}
is translated into
c1.foreach(x => c2.foreach(y => c3.foreach(z => {...})))
This
for(x <- c1; y <- c2; z <- c3) yield {...}
is translated into
c1.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
This
for(x <- c; if cond) yield {...}
is translated on Scala 2.7 into
c.filter(x => cond).map(x => {...})
or, on Scala 2.8, into
c.withFilter(x => cond).map(x => {...})
with a fallback into the former if method withFilter is not available but filter is. Please see the section below for more information on this.
This
for(x <- c; y = ...) yield {...}
is translated into
c.map(x => (x, ...)).map((x,y) => {...})
When you look at very simple for comprehensions, the map/foreach alternatives look, indeed, better. Once you start composing them, though, you can easily get lost in parenthesis and nesting levels. When that happens, for comprehensions are usually much clearer.
I'll show one simple example, and intentionally omit any explanation. You can decide which syntax was easier to understand.
l.flatMap(sl => sl.filter(el => el > 0).map(el => el.toString.length))
or
for {
sl <- l
el <- sl
if el > 0
} yield el.toString.length
withFilter
Scala 2.8 introduced a method called withFilter, whose main difference is that, instead of returning a new, filtered, collection, it filters on-demand. The filter method has its behavior defined based on the strictness of the collection. To understand this better, let's take a look at some Scala 2.7 with List (strict) and Stream (non-strict):
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> Stream.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
The difference happens because filter is immediately applied with List, returning a list of odds -- since found is false. Only then foreach is executed, but, by this time, changing found is meaningless, as filter has already executed.
In the case of Stream, the condition is not immediatelly applied. Instead, as each element is requested by foreach, filter tests the condition, which enables foreach to influence it through found. Just to make it clear, here is the equivalent for-comprehension code:
for (x <- List.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
for (x <- Stream.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
This caused many problems, because people expected the if to be considered on-demand, instead of being applied to the whole collection beforehand.
Scala 2.8 introduced withFilter, which is always non-strict, no matter the strictness of the collection. The following example shows List with both methods on Scala 2.8:
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> List.range(1,10).withFilter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
This produces the result most people expect, without changing how filter behaves. As a side note, Range was changed from non-strict to strict between Scala 2.7 and Scala 2.8.

It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too).
It is applied in combination with for and writes a new element into the resulting sequence.
Simple example (from scala-lang)
/** Turn command line arguments to uppercase */
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
The corresponding expression in F# would be
[ for a in args -> a.toUpperCase ]
or
from a in args select a.toUpperCase
in Linq.
Ruby's yield has a different effect.

Yes, as Earwicker said, it's pretty much the equivalent to LINQ's select and has very little to do with Ruby's and Python's yield. Basically, where in C# you would write
from ... select ???
in Scala you have instead
for ... yield ???
It's also important to understand that for-comprehensions don't just work with sequences, but with any type which defines certain methods, just like LINQ:
If your type defines just map, it allows for-expressions consisting of a
single generator.
If it defines flatMap as well as map, it allows for-expressions consisting
of several generators.
If it defines foreach, it allows for-loops without yield (both with single and multiple generators).
If it defines filter, it allows for-filter expressions starting with an if
in the for expression.

Unless you get a better answer from a Scala user (which I'm not), here's my understanding.
It only appears as part of an expression beginning with for, which states how to generate a new list from an existing list.
Something like:
var doubled = for (n <- original) yield n * 2
So there's one output item for each input (although I believe there's a way of dropping duplicates).
This is quite different from the "imperative continuations" enabled by yield in other languages, where it provides a way to generate a list of any length, from some imperative code with almost any structure.
(If you're familiar with C#, it's closer to LINQ's select operator than it is to yield return).

Consider the following for-comprehension
val A = for (i <- Int.MinValue to Int.MaxValue; if i > 3) yield i
It may be helpful to read it out loud as follows
"For each integer i, if it is greater than 3, then yield (produce) i and add it to the list A."
In terms of mathematical set-builder notation, the above for-comprehension is analogous to
which may be read as
"For each integer , if it is greater than , then it is a member of the set ."
or alternatively as
" is the set of all integers , such that each is greater than ."

The keyword yield in Scala is simply syntactic sugar which can be easily replaced by a map, as Daniel Sobral already explained in detail.
On the other hand, yield is absolutely misleading if you are looking for generators (or continuations) similar to those in Python. See this SO thread for more information: What is the preferred way to implement 'yield' in Scala?

Yield is similar to for loop which has a buffer that we cannot see and for each increment, it keeps adding next item to the buffer. When the for loop finishes running, it would return the collection of all the yielded values. Yield can be used as simple arithmetic operators or even in combination with arrays.
Here are two simple examples for your better understanding
scala>for (i <- 1 to 5) yield i * 3
res: scala.collection.immutable.IndexedSeq[Int] = Vector(3, 6, 9, 12, 15)
scala> val nums = Seq(1,2,3)
nums: Seq[Int] = List(1, 2, 3)
scala> val letters = Seq('a', 'b', 'c')
letters: Seq[Char] = List(a, b, c)
scala> val res = for {
| n <- nums
| c <- letters
| } yield (n, c)
res: Seq[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), (3,a), (3,b), (3,c))
Hope this helps!!

val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.filter(_ > 3).map(_ + 1)
println( res3 )
println( res4 )
These two pieces of code are equivalent.
val res3 = for (al <- aList) yield al + 1 > 3
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
These two pieces of code are also equivalent.
Map is as flexible as yield and vice-versa.

val doubledNums = for (n <- nums) yield n * 2
val ucNames = for (name <- names) yield name.capitalize
Notice that both of those for-expressions use the yield keyword:
Using yield after for is the “secret sauce” that says, “I want to yield a new collection from the existing collection that I’m iterating over in the for-expression, using the algorithm shown.”
taken from here

According to the Scala documentation, it clearly says "yield a new collection from the existing collection".
Another Scala documentation says, "Scala offers a lightweight notation for expressing sequence comprehensions. Comprehensions have the form for (enums) yield e, where enums refers to a semicolon-separated list of enumerators. An enumerator is either a generator which introduces new variables, or it is a filter. "

yield is more flexible than map(), see example below
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
yield will print result like: List(5, 6), which is good
while map() will return result like: List(false, false, true, true, true), which probably is not what you intend.