Scala: return a value with a chance? - scala

I'd like to model a function which would return a set of values with weighed probability. Something like this:
25% => return "a"
25% => return "b"
50% => return "c"
Most of the documentation I've seen so far is rather heavy and delves quickly into scientific depths without examples, thus the question:
What's the easiest way to achieve this?
EDIT: I am using Gatling DSL to write a load test with weighed actions. The built-in weighed distribution has a limitation (won't work in loops), which I would like to avoid by having an own implementation. The snippet looks like this:
override def scenarioBuilder = scenario(getClass.getSimpleName)
.exec(Account.create)
.exec(Account.login)
.exec(Account.activate)
.exec(Loop.create)
.forever(getAction)
def getAction = {
// Here is where I lost my wits
// 27.6% => Log.putEvents
// 18.2% => Log.putBinary
// 17.1% => Loop.list
// 14.8% => Key.listIncomingRequests
// rest => Account.get
}

Here's the shortest version of a generic function to choose based on probabilities:
val probabilityMap = List(0.25 -> "a", 0.25 -> "b")
val otherwise = "c"
def getWithProbability[T](probs: List[(Double, T)], otherwise: T): T = {
// some input validations:
assert(probs.map(_._1).sum <= 1.0)
assert(probs.map(_._1).forall(_ > 0))
// get random in (0, 1) range
val rand = Random.nextDouble()
// choose by probability:
probs.foldLeft((rand, otherwise)) {
case ((r, _), (prob, value)) if prob > r => (1.0, value)
case ((r, result), (prob, _)) => (r-prob, result)
}._2
}
foldLeft keeps advancing over the probabilities until it finds one where r is smaller; If we haven't found it, we move onto the next probability with r-prob to "remove" the part of the range we've already covered.
EDIT: an equivalent but perhaps easier to read version can use scanLeft to create cumulative ranges before "searching" for the range within which rand has landed:
def getWithProbability[T](probs: List[(Double, T)], otherwise: T): T = {
// same validations..
// get random in (0, 1) range
val rand = Random.nextDouble()
// create cumulative values, in this case (0.25, a), (0,5, b)
val ranges = probs.tail.scanLeft(probs.head) {
case ((prob1, _), (prob2, value)) => (prob1+prob2, value)
}
ranges.dropWhile(_._1 < rand).map(_._2).headOption.getOrElse(otherwise)
}

There, clean and ready:
randomSwitchOrElse(
27.6 -> exec(Log.putEvents),
18.2 -> exec(Log.putBinary))

Related

Performance around functional programming in scala

I'm working with below stuff as a way to learn functional programming and scala, I came from a python background.
case class Point(x: Int, y:Int)
object Operation extends Enumeration {
type Operation = Value
val TurnOn, TurnOff, Toggle = Value
}
object Status extends Enumeration {
type Status = Value
val On, Off = Value
}
val inputs: List[String]
def parseInputs(s: String): (Point, Point, Operation)
Idea is that we have a light matrix(Point), each Point can be either On or Off as describe in Status.
My inputs is a series of command, asking to either TurnOn, TurnOff or Toggle all the lights from one Point to another Point (The rectangular area defined using two points are bottom-left corner and upper-right corner).
My original solution is like this:
type LightStatus = mutable.Map[Point, Status]
val lightStatus = mutable.Map[Point, Status]()
def updateStatus(p1: Point, p2: Point, op: Operation): Unit = {
(p1, p2) match {
case (Point(x1, y1), Point(x2, y2)) =>
for (x <- x1 to x2)
for (y <- y1 to y2) {
val p = Point(x, y)
val currentStatus = lightStatus.getOrElse(p, Off)
(op, currentStatus) match {
case (TurnOn, _) => lightStatus.update(p, On)
case (TurnOff, _) => lightStatus.update(p, Off)
case (Toggle, On) => lightStatus.update(p, Off)
case (Toggle, Off) => lightStatus.update(p, On)
}
}
}
}
for ((p1, p2, op) <- inputs.map(parseInputs)) {
updateStatus(p1, p2, op)
}
Now I have lightStatus as a map to describe the end status of the entire matrix. This works, but seems less functional to me as I was using a mutable Map instead of an immutable object, so I tried to re-factor this into a more functional way, I ended up with this:
inputs.flatMap(s => parseInputs(s) match {
case (Point(x1, y1), Point(x2, y2), op) =>
for (x <- x1 to x2;
y <- y1 to y2)
yield (Point(x, y), op)
}).foldLeft(Map[Point, Status]())((m, item) => {
item match {
case (p, op) =>
val currentStatus = m.getOrElse(p, Off)
(op, currentStatus) match {
case (TurnOn, _) => m.updated(p, On)
case (TurnOff, _) => m.updated(p, Off)
case (Toggle, On) => m.updated(p, Off)
case (Toggle, Off) => m.updated(p, On)
}
}
})
I have couple questions regarding this process:
My second version doesn't seem as clean and straightforward as the first version to me, I'm not sure if this is because I'm not that familiar with functional programming or I just wrote bad functional code.
Is there someway to simplify the syntax on second piece? Especially the (m, item) => ??? function in the foldLeft part? Something like (m, (point, operation)) => ??? gives me syntax error
The second piece of code takes significantly longer to run, which surprise me a bit as these two code essentially is doing the same thing, As I don't have too much Java background, Any idea what might be causing the performance issue?
Many thanks!
From a Functional Programming perspective, your code suffers from the fact that...
The lightStatus Map "maintains state" and thus requires mutation.
A large "area" of status changes == a large number of data updates.
If you can accept each light status as a Boolean value, here's a design that requires no mutation and has fast status updates even over very large areas.
case class Point(x: Int, y:Int)
class LightGrid private (status: Point => Boolean) {
def apply(p: Point): Boolean = status(p)
private def isWithin(p:Point, ll:Point, ur:Point) =
ll.x <= p.x && ll.y <= p.y && p.x <= ur.x && p.y <= ur.y
//each light op returns a new LightGrid
def turnOn(lowerLeft: Point, upperRight: Point): LightGrid =
new LightGrid(point =>
isWithin(point, lowerLeft, upperRight) || status(point))
def turnOff(lowerLeft: Point, upperRight: Point): LightGrid =
new LightGrid(point =>
!isWithin(point, lowerLeft, upperRight) && status(point))
def toggle(lowerLeft: Point, upperRight: Point): LightGrid =
new LightGrid(point =>
isWithin(point, lowerLeft, upperRight) ^ status(point))
}
object LightGrid { //the public constructor
def apply(): LightGrid = new LightGrid(_ => false)
}
usage:
val ON = true
val OFF = false
val lg = LightGrid().turnOn(Point(2,2), Point(11,11)) //easy numbers
.turnOff(Point(8,8), Point(10,10))
.toggle(Point(1,1), Point(9,9))
lg(Point(1,1)) //ON
lg(Point(7,7)) //OFF
lg(Point(8,8)) //ON
lg(Point(9,9)) //ON
lg(Point(10,10)) //OFF
lg(Point(11,11)) //ON
lg(Point(12,12)) //OFF

scala using calculations from pattern matching's guard (if) in body

I'm using pattern matching in scala a lot. Many times I need to do some calculations in guard part and sometimes they are pretty expensive. Is there any way to bind calculated values to separate value?
//i wan't to use result of prettyExpensiveFunc in body safely
people.collect {
case ...
case Some(Right((x, y))) if prettyExpensiveFunc(x, y) > 0 => prettyExpensiveFunc(x)
}
//ideally something like that could be helpful, but it doesn't compile:
people.collect {
case ...
case Some(Right((x, y))) if {val z = prettyExpensiveFunc(x, y); y > 0} => z
}
//this sollution works but it isn't safe for some `Seq` types and is risky when more cases are used.
var cache:Int = 0
people.collect {
case ...
case Some(Right((x, y))) if {cache = prettyExpensiveFunc(x, y); cache > 0} => cache
}
Is there any better solution?
ps: Example is simplified and I don't expect anwers that shows that I don't need pattern matching here.
You can use cats.Eval to make expensive calculations lazy and memoizable, create Evals using .map and extract .value (calculated at most once - if needed) in .collect
values.map { value =>
val expensiveCheck1 = Eval.later { prettyExpensiveFunc(value) }
val expensiveCheck2 = Eval.later { anotherExpensiveFunc(value) }
(value, expensiveCheck1, expensiveCheck2)
}.collect {
case (value, lazyResult1, _) if lazyResult1.value > 0 => ...
case (value, _, lazyResult2) if lazyResult2.value > 0 => ...
case (value, lazyResult1, lazyResult2) if lazyResult1.value > lazyResult2.value => ...
...
}
I don't see a way of doing what you want without creating some implementation of lazy evaluation, and if you have to use one, you might as well use existing one instead of rolling one yourself.
EDIT. Just in case you haven't noticed - you aren't losing the ability to pattern match by using tuple here:
values.map {
// originial value -> lazily evaluated memoized expensive calculation
case a # Some(Right((x, y)) => a -> Some(Eval.later(prettyExpensiveFunc(x, y)))
case a => a -> None
}.collect {
// match type and calculation
...
case (Some(Right((x, y))), Some(lazyResult)) if lazyResult.value > 0 => ...
...
}
Why not run the function first for every element and then work with a tuple?
Seq(1,2,3,4,5).map(e => (e, prettyExpensiveFunc(e))).collect {
case ...
case (x, y) if y => y
}
I tried own matchers and effect is somehow OK, but not perfect. My matcher is untyped, and it is bit ugly to make it fully typed.
class Matcher[T,E](f:PartialFunction[T, E]) {
def unapply(z: T): Option[E] = if (f.isDefinedAt(z)) Some(f(z)) else None
}
def newMatcherAny[E](f:PartialFunction[Any, E]) = new Matcher(f)
def newMatcher[T,E](f:PartialFunction[T, E]) = new Matcher(f)
def prettyExpensiveFunc(x:Int) = {println(s"-- prettyExpensiveFunc($x)"); x%2+x*x}
val x = Seq(
Some(Right(22)),
Some(Right(10)),
Some(Left("Oh now")),
None
)
val PersonAgeRank = newMatcherAny { case Some(Right(x:Int)) => (x, prettyExpensiveFunc(x)) }
x.collect {
case PersonAgeRank(age, rank) if rank > 100 => println("age:"+age + " rank:" + rank)
}
https://scalafiddle.io/sf/hFbcAqH/3

Parallel Aggregate is not working on lists .length > 8

I'm writing a small exercise app that calculates number of unique letters (incl Unicode) in a seq of strings, and I'm using aggregate for it, as I try to run in parallel
here's my code:
class Frequency(seq: Seq[String]) {
type FreqMap = Map[Char, Int]
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
}
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
}
and then the code that makes use of that
object Frequency extends App {
val text = List("abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc");
def frequency(seq: Seq[String]):Map[Char, Int] = {
new Frequency(seq).calculate()
}
Console println frequency(seq=text)
}
though I supplied "abc" 9 times, the result is Map(a -> 8, b -> 8, c -> 8), as it is for any number of "abc"'s > 8
I've looked at this, and it seems like I'm using aggregate correctly
Any suggestions to make it work?
You're discarding already collected results (the first fm) in your seqop. You need to add these to the new results you're computing, e.g. like this:
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
val res = s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
// I'm reusing your existing combinator function here:
reduce(res,fm)
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
Depending on how the parallel collections divide the work you discard some of it. In your case (9x "abc") it divides the thing in 8 parallel seqop operations which means you discard exactly one result set. This varies depending on numbers, if you run in with say 17x "abc" it runs in 13 parallel operations, discarding 4 result sets (on my machine anyway - I'm not familiar with the underlying code and how it divides the work, this probably depends on the used ExecutionContext/Threadpool and subsequently number of CPUs/cores and so on).
Generally parallel collections are a drop in replacement for sequential collections, meaning if you drop .par you should still get the same result, albeit usually slower. If you do this with your original code you get a result of 1, which tells you that it's not a parallelization problem. This is a good way to test if you're doing to right thing when using these.
And last but not least: This was harder to spot than usual for me because you use the same variable name twice and subsequently shadow fm. Not doing that would make the code more readable and mistakes such as this easier to spot.

Accumulate result until some condition is met in functional way

I have some expensive computation in a loop, and I need to find max value produced by the calculations, though if, say, it will equal to LIMIT I'd like to stop the calculation and return my accumulator.
It may easily be done by recursion:
val list: List[Int] = ???
val UpperBound = ???
def findMax(ls: List[Int], max: Int): Int = ls match {
case h :: rest =>
val v = expensiveComputation(h)
if (v == UpperBound) v
else findMax(rest, math.max(max, v))
case _ => max
}
findMax(list, 0)
My question: whether this behaviour template has a name and reflected in scala collection library?
Update: Do something up to N times or until condition is met in Scala - There is an interesting idea (using laziness and find or exists at the end) but it is not directly applicable to my particular case or requires mutable var to track accumulator.
I think your recursive function is quite nice, so honestly I wouldn't change that, but here's a way to use the collections library:
list.foldLeft(0) {
case (max, next) =>
if(max == UpperBound)
max
else
math.max(expensiveComputation(next), max)
}
It will iterate over the whole list, but after it has hit the upper bound it won't perform the expensive computation.
Update
Based on your comment I tried adapting foldLeft a bit, based on LinearSeqOptimized's foldLeft implementation.
def foldLeftWithExit[A, B](list: Seq[A])(z: B)(exit: B => Boolean)(f: (B, A) => B): B = {
var acc = z
var remaining = list
while (!remaining.isEmpty && !exit(acc)) {
acc = f(acc, list.head)
remaining = remaining.tail
}
acc
}
Calling it:
foldLeftWithExit(list)(0)(UpperBound==){
case (max, next) => math.max(expensiveComputation(next), max)
}
You could potentially use implicits to omit the first parameter of list.
Hope this helps.

Abort early in a fold

What's the best way to terminate a fold early? As a simplified example, imagine I want to sum up the numbers in an Iterable, but if I encounter something I'm not expecting (say an odd number) I might want to terminate. This is a first approximation
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => None
}
}
However, this solution is pretty ugly (as in, if I did a .foreach and a return -- it'd be much cleaner and clearer) and worst of all, it traverses the entire iterable even if it encounters a non-even number.
So what would be the best way to write a fold like this, that terminates early? Should I just go and write this recursively, or is there a more accepted way?
My first choice would usually be to use recursion. It is only moderately less compact, is potentially faster (certainly no slower), and in early termination can make the logic more clear. In this case you need nested defs which is a little awkward:
def sumEvenNumbers(nums: Iterable[Int]) = {
def sumEven(it: Iterator[Int], n: Int): Option[Int] = {
if (it.hasNext) {
val x = it.next
if ((x % 2) == 0) sumEven(it, n+x) else None
}
else Some(n)
}
sumEven(nums.iterator, 0)
}
My second choice would be to use return, as it keeps everything else intact and you only need to wrap the fold in a def so you have something to return from--in this case, you already have a method, so:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
Some(nums.foldLeft(0){ (n,x) =>
if ((n % 2) != 0) return None
n+x
})
}
which in this particular case is a lot more compact than recursion (though we got especially unlucky with recursion since we had to do an iterable/iterator transformation). The jumpy control flow is something to avoid when all else is equal, but here it's not. No harm in using it in cases where it's valuable.
If I was doing this often and wanted it within the middle of a method somewhere (so I couldn't just use return), I would probably use exception-handling to generate non-local control flow. That is, after all, what it is good at, and error handling is not the only time it's useful. The only trick is to avoid generating a stack trace (which is really slow), and that's easy because the trait NoStackTrace and its child trait ControlThrowable already do that for you. Scala already uses this internally (in fact, that's how it implements the return from inside the fold!). Let's make our own (can't be nested, though one could fix that):
import scala.util.control.ControlThrowable
case class Returned[A](value: A) extends ControlThrowable {}
def shortcut[A](a: => A) = try { a } catch { case Returned(v) => v }
def sumEvenNumbers(nums: Iterable[Int]) = shortcut{
Option(nums.foldLeft(0){ (n,x) =>
if ((x % 2) != 0) throw Returned(None)
n+x
})
}
Here of course using return is better, but note that you could put shortcut anywhere, not just wrapping an entire method.
Next in line for me would be to re-implement fold (either myself or to find a library that does it) so that it could signal early termination. The two natural ways of doing this are to not propagate the value but an Option containing the value, where None signifies termination; or to use a second indicator function that signals completion. The Scalaz lazy fold shown by Kim Stebel already covers the first case, so I'll show the second (with a mutable implementation):
def foldOrFail[A,B](it: Iterable[A])(zero: B)(fail: A => Boolean)(f: (B,A) => B): Option[B] = {
val ii = it.iterator
var b = zero
while (ii.hasNext) {
val x = ii.next
if (fail(x)) return None
b = f(b,x)
}
Some(b)
}
def sumEvenNumbers(nums: Iterable[Int]) = foldOrFail(nums)(0)(_ % 2 != 0)(_ + _)
(Whether you implement the termination by recursion, return, laziness, etc. is up to you.)
I think that covers the main reasonable variants; there are some other options also, but I'm not sure why one would use them in this case. (Iterator itself would work well if it had a findOrPrevious, but it doesn't, and the extra work it takes to do that by hand makes it a silly option to use here.)
The scenario you describe (exit upon some unwanted condition) seems like a good use case for the takeWhile method. It is essentially filter, but should end upon encountering an element that doesn't meet the condition.
For example:
val list = List(2,4,6,8,6,4,2,5,3,2)
list.takeWhile(_ % 2 == 0) //result is List(2,4,6,8,6,4,2)
This will work just fine for Iterators/Iterables too. The solution I suggest for your "sum of even numbers, but break on odd" is:
list.iterator.takeWhile(_ % 2 == 0).foldLeft(...)
And just to prove that it's not wasting your time once it hits an odd number...
scala> val list = List(2,4,5,6,8)
list: List[Int] = List(2, 4, 5, 6, 8)
scala> def condition(i: Int) = {
| println("processing " + i)
| i % 2 == 0
| }
condition: (i: Int)Boolean
scala> list.iterator.takeWhile(condition _).sum
processing 2
processing 4
processing 5
res4: Int = 6
You can do what you want in a functional style using the lazy version of foldRight in scalaz. For a more in depth explanation, see this blog post. While this solution uses a Stream, you can convert an Iterable into a Stream efficiently with iterable.toStream.
import scalaz._
import Scalaz._
val str = Stream(2,1,2,2,2,2,2,2,2)
var i = 0 //only here for testing
val r = str.foldr(Some(0):Option[Int])((n,s) => {
println(i)
i+=1
if (n % 2 == 0) s.map(n+) else None
})
This only prints
0
1
which clearly shows that the anonymous function is only called twice (i.e. until it encounters the odd number). That is due to the definition of foldr, whose signature (in case of Stream) is def foldr[B](b: B)(f: (Int, => B) => B)(implicit r: scalaz.Foldable[Stream]): B. Note that the anonymous function takes a by name parameter as its second argument, so it need no be evaluated.
Btw, you can still write this with the OP's pattern matching solution, but I find if/else and map more elegant.
Well, Scala does allow non local returns. There are differing opinions on whether or not this is a good style.
scala> def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
| nums.foldLeft (Some(0): Option[Int]) {
| case (None, _) => return None
| case (Some(s), n) if n % 2 == 0 => Some(s + n)
| case (Some(_), _) => None
| }
| }
sumEvenNumbers: (nums: Iterable[Int])Option[Int]
scala> sumEvenNumbers(2 to 10)
res8: Option[Int] = None
scala> sumEvenNumbers(2 to 10 by 2)
res9: Option[Int] = Some(30)
EDIT:
In this particular case, as #Arjan suggested, you can also do:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => return None
}
}
You can use foldM from cats lib (as suggested by #Didac) but I suggest to use Either instead of Option if you want to get actual sum out.
bifoldMap is used to extract the result from Either.
import cats.implicits._
def sumEven(nums: Stream[Int]): Either[Int, Int] = {
nums.foldM(0) {
case (acc, n) if n % 2 == 0 => Either.right(acc + n)
case (acc, n) => {
println(s"Stopping on number: $n")
Either.left(acc)
}
}
}
examples:
println("Result: " + sumEven(Stream(2, 2, 3, 11)).bifoldMap(identity, identity))
> Stopping on number: 3
> Result: 4
println("Result: " + sumEven(Stream(2, 7, 2, 3)).bifoldMap(identity, identity))
> Stopping on number: 7
> Result: 2
Cats has a method called foldM which does short-circuiting (for Vector, List, Stream, ...).
It works as follows:
def sumEvenNumbers(nums: Stream[Int]): Option[Long] = {
import cats.implicits._
nums.foldM(0L) {
case (acc, c) if c % 2 == 0 => Some(acc + c)
case _ => None
}
}
If it finds a not even element it returns None without computing the rest, otherwise it returns the sum of the even entries.
If you want to keep count until an even entry is found, you should use an Either[Long, Long]
#Rex Kerr your answer helped me, but I needed to tweak it to use Either
def foldOrFail[A,B,C,D](map: B => Either[D, C])(merge: (A, C) => A)(initial: A)(it: Iterable[B]): Either[D, A] = {
val ii= it.iterator
var b= initial
while (ii.hasNext) {
val x= ii.next
map(x) match {
case Left(error) => return Left(error)
case Right(d) => b= merge(b, d)
}
}
Right(b)
}
You could try using a temporary var and using takeWhile. Here is a version.
var continue = true
// sample stream of 2's and then a stream of 3's.
val evenSum = (Stream.fill(10)(2) ++ Stream.fill(10)(3)).takeWhile(_ => continue)
.foldLeft(Option[Int](0)){
case (result,i) if i%2 != 0 =>
continue = false;
// return whatever is appropriate either the accumulated sum or None.
result
case (optionSum,i) => optionSum.map( _ + i)
}
The evenSum should be Some(20) in this case.
You can throw a well-chosen exception upon encountering your termination criterion, handling it in the calling code.
A more beutiful solution would be using span:
val (l, r) = numbers.span(_ % 2 == 0)
if(r.isEmpty) Some(l.sum)
else None
... but it traverses the list two times if all the numbers are even
Just for an "academic" reasons (:
var headers = Source.fromFile(file).getLines().next().split(",")
var closeHeaderIdx = headers.takeWhile { s => !"Close".equals(s) }.foldLeft(0)((i, S) => i+1)
Takes twice then it should but it is a nice one liner.
If "Close" not found it will return
headers.size
Another (better) is this one:
var headers = Source.fromFile(file).getLines().next().split(",").toList
var closeHeaderIdx = headers.indexOf("Close")