Optimization of foldLeft - scala

I'm using the following code, and I'm looking for some ideas to make some optimizations.
analyzePayload:
Input: payload which is JsObject and list of rules, each rule has several conditions.
Output: MyReport of all the rules which succeed, notApplicable or failed on this specific payload.
The size of the list can be pretty big, also each Rule has a big amount of conditions.
I am looking for some ideas on how to optimize that code, maybe with a lazy collection? view? stream? tailrec? and why - Thanks!
Also, note that I have anaylzeMode which can run only until one rule succeeds for ex.
def analyzePayload(payload: JsObject, rules: List[Rule]): MyReport = {
val analyzeMode = appConfig.analyzeMode
val (succeed, notApplicable, failed) = rules.foldLeft((List[Rule](), List[Rule](), List[Rule]())) { case ( seed # (succeedRules,notApplicableRules,failedRules), currRule) =>
// Evaluate Single Rule
def step(): (List[Rule], List[Rule], List[Rule]) = evalService.eval(currRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => (currRule :: succeedRules, notApplicableRules, failedRules)
// If the result is notApplicable
case EvalResult(_, missing # _ :: _, _) => (succeedRules, currRule :: notApplicableRules, failedRules
)
// If the result is unmatched
case EvalResult(_, _, unmatched # _ :: _) => (succeedRules, notApplicableRules, currRule :: failedRules)
}
analyzeMode match {
case UNTIL_FIRST_SUCCEED => if(succeedRules.isEmpty) step() else seed
case UNTIL_FIRST_NOT_APPLICABLE => if(notApplicableRules.isEmpty) step() else seed
case UNTIL_FIRST_FAILED => if(failedRules.isEmpty) step() else seed
case DEFAULT => step()
case _ => throw new IllegalArgumentException(s"Unknown mode = ${analyzeMode}")
}
}
MyReport(succeed.reverse, notApplicable.reverse, failed.reverse)
}
First Edit:
Changed the code to use tailrec from #Tim Advise, any other suggestions? or some suggestions to make the code a little prettier?
Also, i wanted to ask if there any difference to use view before the foldLeft on the previous implementation.
Also maybe use other collection such as ListBuffer or Vector
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): MyReport = {
val analyzeMode = appConfig.analyzeMode
def isCompleted(succeed: List[Rule], notApplicable: List[Rule], failed: List[Rule]) = ((succeed, notApplicable, failed), analyzeMode) match {
case (( _ :: _, _, _), UNTIL_FIRST_SUCCEED) | (( _,_ :: _, _), UNTIL_FIRST_NOT_APPLICABLE) | (( _, _, _ :: _), UNTIL_FIRST_FAILED) => true
case (_, DEFAULT) => false
case _ => throw new IllegalArgumentException(s"Unknown mode on analyzePayload with mode = ${analyzeMode}")
}
#tailrec
def _analyzePayload(actionRules: List[ActionRule])(succeed: List[Rule], notApplicable: List[Rule], failed: List[Rule]): (List[Rule], List[Rule] ,List[Rule]) = actionRules match {
case Nil | _ if isCompleted(succeed, notApplicable, failed) => (succeed, notApplicable, failed)
case actionRule :: tail => actionRuleService.eval(actionRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => _analyzePayload(tail)(actionRule :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing # _ :: _, _) => _analyzePayload(tail)(succeed, actionRule :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched # _ :: _) => _analyzePayload(tail)(succeed, notApplicable, actionRule :: failed)
}
}
val res = _analyzePayload(actionRules)(Nil,Nil,Nil)
MyReport(res._1, res._2, res._3)
}
Edit 2: (Questions)
If there result will be forwarded to the Client - There no meaning for do it as view? since all the data will be evaluated right?
Maybe should I use ParSeq instead? or this will be just slower since the operation of the evalService.eval(...) is not a heavy operation?

Two obvious optimisations:
Use a tail-recursive function rater than foldLeft so that the compiler can generate an optimised loop and terminate as soon as the appropriate rule is found.
Since analyzeMode is constant, take the match outside the foldLeft. Either have separate code paths for each mode, or use analyzeMode to select a function that is used inside the loop to check for termination.

The code is rather fine, the main thing to revisit would be to make evalService.eval evaluate multiple rules in a single traversal of the json object, assuming the size of the json is not negligible

Related

Parser combinator handling alternation with an optional parser

I have a parser p of type Parser[Option[X]] and another q of type Parser[Y]. (X and Y are concrete types but that's not important here).
I'd like to combine them in such a way that the resulting parser returns a Parser[Either[X, Y]]. This parser will succeed with Left(x) if p yields Some(x) or, failing that, it will succeed with Right(y) if q yields a y. Otherwise, it will fail. Input will be consumed in the successful cases but not in the unsuccessful case.
I'd appreciate any help with this as I can't quite figure out how to make it work.
A little more perseverance after taking a break and I was able to solve this. I don't think my solution is the most elegant and would appreciate feedback:
def compose[X, Y](p: Parser[Option[X]], q: Parser[Y]): Parser[Either[X, Y]] = Parser {
in =>
p(in) match {
case s#this.Success(Some(_), _) => s map (xo => Left(xo.get))
case _ => q(in) match {
case s#this.Success(_, _) => s map (x => Right(x))
case _ => this.Failure("combine: failed", in)
}
}
}
implicit class ParserOps[X](p: Parser[Option[X]]) {
def ?|[Y](q: => Parser[Y]): Parser[Either[X, Y]] = compose(p, q)
}
// Example of usage
def anadicTerm: Parser[AnadicTerm] = (maybeNumber ?| anadicOperator) ^^ {
case Left(x: Number) => debug("anadicTerm (Number)", AnadicTerm(Right(x)))
case Right(x: String) => debug("anadicTerm (String)", AnadicTerm(Left(x)))
}

foldLeft vs foldRight vs tailrec on Scala List

Assume I have a collection of c: List[T] sorted.
And I need to make aggregation with foldLeft/foldRight which outputs (List[T],List[T],List[T]), sorted.
To complete this, I have 2 possibilities:
foldLeft, and then reverse each one of the lists (use ::, otherwise its O(n) each step)
foldRight and remain in the same order (use ::)
I know that foldLeft is tailrec implemented and optimized, but making O(n) more steps to reverse the collections, in this use case - which method will generate me better performance?
Assume that the op is O(1)
Example of the code:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
val (succeed, notApplicable, failed) = actionRules.foldLeft((List[RuleResult](), List[RuleResult](), List[RuleResult]())) { case ( seed # (succeed, notApplicable, failed), actionRule) =>
// Evaluate Single ActionRule
def step(): (List[RuleResult], List[RuleResult], List[RuleResult]) = actionRuleService.eval(actionRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => (RuleResult.fromSucceed(actionRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => (succeed, RuleResult.fromNotApplicable(actionRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => (succeed, notApplicable, RuleResult.fromFailed(actionRule, appConfig.showFailedAppData, unmatched) :: failed)
}
analyzeMode match {
case UntilFirstSucceed => if (succeed.isEmpty) step() else seed
case UntilFirstNotApplicable => if (notApplicable.isEmpty) step() else seed
case UntilFirstFailed => if (failed.isEmpty) step() else seed
case Default => step()
case _ => throw new RuntimeException(s"Unknown mode on analyzePayload with mode = ${analyzeMode}")
}
}
InspectorReport(succeed.reverse, notApplicable.reverse, failed.reverse)
}
Before that, I used tailrec which made bad preformance: (why??)
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
def isCompleted(succeed: Int, notApplicable: Int, failed: Int) = {
analyzeMode match {
case Default => false
case UntilFirstSucceed => if (succeed == 0) false else true
case UntilFirstNotApplicable => if (notApplicable == 0) false else true
case UntilFirstFailed => if (failed == 0) false else true
}
}
#tailrec
def analyzePayloadRec(rules: List[ActionRule])(succeed: List[RuleResult], notApplicable: List[RuleResult], failed: List[RuleResult]): (List[RuleResult], List[RuleResult], List[RuleResult]) = {
if (isCompleted(succeed.size, notApplicable.size, failed.size)) (succeed, notApplicable, failed)
else rules match {
// Base cases:
case Nil => (succeed, notApplicable, failed)
// Evaluate case:
case nextRule :: tail =>
actionRuleService.eval(nextRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => analyzePayloadRec(tail)(RuleResult.fromSucceed(nextRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => analyzePayloadRec(tail)(succeed, RuleResult.fromNotApplicable(nextRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => analyzePayloadRec(tail)(succeed, notApplicable, RuleResult.fromFailed(nextRule, appConfig.showFailedAppData, unmatched) :: failed)
}
}
}
analyzePayloadRec(actionRules.reverse)(Nil, Nil, Nil).toInspectorReport // todo: if the analyzeModes are not Default - consider use Streams for lazy collection
}
Performance:
Until 22:16, this tailrec analyze was in use, then, the above code.
which you can see that generates much better performance - the same data was in use.
The x-axis is timeline and y-axis is time in ms (1.4s)
Any ideas why? tailrec should be faster, that's why I consider using foldRight.
What makes your tailrec version slow is
isCompleted(succeed.size, notApplicable.size, failed.size)
taking the size of a linked list in Scala is linear in the size, and you're computing the size of all three lists (at least two of which you're never using (and you don't use any of them by Default)).
On the nth iteration of the tailrec, you're going to walk n list nodes between the size computations, so n iterations of tailrec is at least O(n^2).
Passing the lists themselves to isCompleted (as you effectively do in the foldLeft version) and checking isEmpty should dramatically speed things up (especially in the Default case).
Assuming we have those case classes:
case class InspectorReport(succeed: List[ActionRule], notApplicable: List[ActionRule], failed: List[ActionRule])
case class ActionRule()
case class EvalResult(success: Boolean, missing: List[String], unmatched: List[String])
And the evaluator:
class ActionRuleService {
def eval(actionRule: ActionRule, payload: JsObject): EvalResult = EvalResult(true, List(), List())
}
val actionRuleService = new ActionRuleService
I would try something like this:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val (success, failed) = evaluated.partition(_._2.success)
val (missing, unmatched) = failed.partition(_._2.missing.nonEmpty)
InspectorReport(success.map(_._1), missing.map(_._1), unmatched.map(_._1))
}
Or:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val success = evaluated.collect {
case (actionRule, EvalResult(true, _, _)) =>
actionRule
}
val missing = evaluated.collect {
case (actionRule, EvalResult(false, missing, _)) if missing.nonEmpty =>
actionRule
}
val unmatched = evaluated.collect {
case (actionRule, EvalResult(false, missing, unmatched)) if missing.isEmpty && unmatched.nonEmpty =>
actionRule
}
InspectorReport(success, missing, unmatched)
}

how to print the index of failure in catch expression in scala?

I have a code looking like this:
import scala.util.{Try, Success, Failure}
Try(
for (i <- 1 to 1000) {
doSomething(df(i))
}
) match {
case Success(t) => println(s"success")
case Failure(t) => println(s"failure")
}
i want to print the index of the failed input. how to print the index i in catch expression?
You can do this instead using Cats:
import scala.util.Try
import cats.implicits._
(1 to 1000).traverse(i => Try(doSomething(df(i))).toEither.left.map(ex => (ex, i))) match {
case Right(_) => println("success")
case Left((ex, i)) => println(s"failure: ${ex.getMessage} on idx: ${i}")
}
If you do not want to use Cats, you can just:
val attempts = for {
i <- Stream.range(start = 1, end = 1000) // Thanks to Bogdan for the idea of using a Stream.
} yield Try(doSomething(df(i))).toEither.left.map(ex => (ex, i))
attempts.collectFirst { case Left((ex, i)) => ex -> i } match {
case None => println("success")
case Some((ex, i)) => println(s"failure: ${ex.getMessage} on idx: ${i}")
}
You should definitely follow Luis's answer, but to address your comment, you could also catch IllegalArgumentException and re-throw it with the added index to the message, perhaps something like so:
Try(
for (i <- 1 to 1000) {
try doSomething(i) catch { case e: IllegalArgumentException => throw new IllegalArgumentException(s"Failed with index $i", e)}
}
) match {
case Success(t) => println(s"success")
case Failure(t) => println(s"failure", t)
}
However this seems hideous, and I do not advise it.
IMO the question hints that the code is lying.
you could write the code differently:
import scala.util.{Try, Success, Failure}
for (i <- 1 to 1000) {
Try(
doSomething(df(i))
) match {
case Failure(t) => println(s"failure on $i")
case _ =>
}
}
But you don't want to. Why not? Because you want to stop the iteration after the first failure. But you're using loop from 1 to 1000. You don't really intend to do the whole 1000 iterations. You're using an exception to break a for loop.
I would rewrite this code to make it clear that i don't intend to iterate the entire range explicitly.
You could, for example, use find instead of for, to find the index that causes a failure to happen. if None was found -> everything was successful.
so something similar to (untested):
(1 to 1000).indexWhere{index=>Try{doSomething(index)}.isFailure
i'm not sure if it's find or indexWhere in scala but you get the idea.
if you would like to obtain the exception as well, not just the index you could use views (https://docs.scala-lang.org/overviews/collections/views.html) to change your sequence to a lazily evaluated one, map the list to a tuple of form (index, Try) (without iterating the entire collection, due to lazyness of .view result), and then collectFirst where second element of tuple is Failure.
so something like (untested):
(1 to 1000).view.map{index => (index, doSomething(index)}.collectFirst{case (i,Failure(e)) => println(s"error was $e at index $i")}
alternatively you could write a very very small recursion to iterate the index sequence (also untested)
def findException(indexes: Seq[Int]): Option[(Int, Exception)] = indexes match {
case Nil => None
case index+:remaining =>
Try(doSomething(i)) match {
case Success(_) => findException(remaining)
case Failure(e) => Option((index,e))
}
findException(1 to 1000).map(println)
one question is how did you determine 1 to 1000?
this question would look differently if you had a collection of elements to verify, and not a range. in that case you would probably just use foldLeft.

Find person and immediate neighbours in Seq[Person]

Given a Seq[Person], which contains 1-n Persons (and the minimum 1 Person beeing "Tom"), what is the easiest approach to find a Person with name "Tom" as well as the Person right before Tome and the Person right after Tom?
More detailed explanation:
case class Person(name:String)
The list of persons can be arbitrarily long, but will have at least one entry, which must be "Tom". So those lists can be a valid case:
val caseOne = Seq(Person("Tom"), Person("Mike"), Person("Dude"),Person("Frank"))
val caseTwo = Seq(Person("Mike"), Person("Tom"), Person("Dude"),Person("Frank"))
val caseThree = Seq(Person("Tom"))
val caseFour = Seq(Person("Mike"), Person("Tom"))
You get the idea. Since I already have "Tom", the task is to get his left neighbour (if it exists), and the right neighbour (if it exists).
What is the most efficient way to achieve to do this in scala?
My current approach:
var result:Tuple2[Option[Person], Option[Person]] = (None,None)
for (i <- persons.indices)
{
persons(i).name match
{
case "Tom" if i > 0 && i < persons.size-1 => result = (Some(persons(i-1)), Some(persons(i+1))) // (...), left, `Tom`, right, (...)
case "Tom" if i > 0 => result = (Some(persons(i-1)), None) // (...), left, `Tom`
case "Tom" if i < persons.size-1 => result = (Some(persons(i-1)), None) // `Tom`, right, (...)
case "Tom" => result = (None, None) // `Tom`
}
}
Just doesn't feel like I am doing it the scala way.
Solution by Mukesh prajapati:
val arrayPersons = persons.toArray
val index = arrayPersons.indexOf(Person("Tom"))
if (index >= 0)
result = (arrayPersons.lift(index-1), arrayPersons.lift(index+1))
Pretty short, seems to cover all cases.
Solution by anuj saxena
result = persons.sliding(3).foldLeft((Option.empty[Person], Option.empty[Person]))
{
case ((Some(prev), Some(next)), _) => (Some(prev), Some(next))
case (_, prev :: Person(`name`) :: next :: _) => (Some(prev), Some(next))
case (_, _ :: prev :: Person(`name`) :: _) => (Some(prev), None)
case (_, Person(`name`) :: next :: _) => (None, Some(next))
case (neighbours, _) => neighbours
}
First find out index where "Tom" is present, then use "lift". "lift" turns partial function into a plain function returning an Option result:
index = persons.indexOf("Tom")
doSomethingWith(persons.lift(index-1), persons.lift(index+1))
A rule of thumb: we should never access the content of a list / seq using indexes as it is prone to errors (like IndexNotFoundException).
If we want to use indexes, we better use Array as it provides us random access.
So to the current solution, here is my code to find prev and next element of a certain data in a Seq or List:
def findNeighbours(name: String, persons: Seq[Person]): Option[(Person, Person)] = {
persons.sliding(3).flatMap{
case prev :: person :: next :: Nil if person.name == name => Some(prev, next)
case _ => None
}.toList.headOption
}
Here the return type is in Option because there is a possibility that we may not find it here (in case of only one person is in the list or the required person is not in the list).
This code will pick the pair on the first occurrence of the person provided in the parameter.
If you have a probability that there might be several occurrences for the provided person, remove the headOption in the last line of the function findNeighbours. Then it will return a List of tuples.
Update
If Person is a case class then we can use deep match like this:
def findNeighbours(name: String, persons: Seq[Person]): Option[(Person, Person)] = {
persons.sliding(3).flatMap{
case prev :: Person(`name`) :: next :: Nil => Some(prev, next)
case _ => None
}.toList.headOption
}
For your solution need to add more cases to it (cchanged it to use foldleft in case of a single answer):
def findNeighboursV2(name: String, persons: Seq[Person]): (Option[Person], Option[Person]) = {
persons.sliding(3).foldLeft((Option.empty[Person], Option.empty[Person])){
case ((Some(prev), Some(next)), _) => (Some(prev), Some(next))
case (_, prev :: Person(`name`) :: next :: _) => (Some(prev), Some(next))
case (_, _ :: prev :: Person(`name`) :: _) => (Some(prev), None)
case (_, Person(`name`) :: next :: _) => (None, Some(next))
case (neighbours, _) => neighbours
}
}
You can use sliding function:
persons: Seq[Person] = initializePersons()
persons.sliding(size = 3).find { itr =>
if (itr(1).name = "Tom") {
val before = itr(0)
val middle = itr(1)
val after = itr(2)
}
}
If you know that there will be only one instance of "Tom" in your Seq use indexOf instead of looping by hand:
tomIndex = persons.indexOf("Tom")
doSomethingWith(persons(tomIndex-1), persons(tomIndex+1))
// Start writing your ScalaFiddle code here
case class Person(name: String)
val persons1 = Seq(Person("Martin"),Person("John"),Person("Tom"),Person("Jack"),Person("Mary"))
val persons2 = Seq(Person("Martin"),Person("John"),Person("Tom"))
val persons3 = Seq(Person("Tom"),Person("Jack"),Person("Mary"))
val persons4 = Seq(Person("Tom"))
def f(persons:Seq[Person]) =
persons
.sliding(3)
.filter(_.contains(Person("Tom")))
.maxBy {
case _ :: Person("Tom") :: _ => 1
case _ => 0
}
.toList
.take(persons.indexOf(Person("Tom")) + 2) // In the case where "Tom" is first, drop the last person
.drop(persons.indexOf(Person("Tom")) - 1) // In the case where "Tom" is last, drop the first person
println(f(persons1)) // List(Person(John), Person(Tom), Person(Jack))
println(f(persons2)) // List(Person(John), Person(Tom))
println(f(persons3)) // List(Person(Tom), Person(Jack))
println(f(persons4)) // List(Person(Tom))
Scalafiddle

Split a list into a target element, and the rest of the list?

Let's say I have something like the following:
case class Thing(num: Int)
val xs = List(Thing(1), Thing(2), Thing(3))
What I'd like to do is separate the list into one particular value, and the rest of the list. The target value can be at any position in the list, or may not be present at all. The single value needs to be handled separately, after the other values are handled, so I can't simply use pattern matching.
What I have so far is this:
val (targetList, rest) = xs.partition(_.num == 2)
val targetEl = targetList match {
case x :: Nil => x
case _ => null
}
Is it possible to combine the two steps? Like
val (targetEl, rest) = xs.<some_method>
A note on handling order:
The reason that the target element must be handled last is that this is for use in a HTML template (Play framework). The other elements are looped through, and a HTML element is rendered for each. After that group of elements, another HTML element is created for the target element.
You can do it with pattern-matching in map, you just need multiple cases:
xs map {
case t # Thing(1) => // do something with thing 1
case t => // do something with the other things
}
To handle the OP's extra requirements:
xs map {
case t # Thing(num) if(num != 1) => // do something with things that are not "1"
case t => // do something with thing 1
}
Following produces two lists as tuples for some condition.
case class Thing(num: Int)
val xs = List(Thing(1), Thing(2), Thing(3))
val partioned = xs.foldLeft((List.empty[Thing], List.empty[Thing]))((x, y) => y match {
case t # Thing(1) => (x._1, t :: x._2)
case t => (t :: x._1, x._2)
})
//(List(Thing(3), Thing(2)),List(Thing(1)))
Try this:
val (targetEl, rest) = (xs.head, xs.tail)
It works for non-empty list. Nil case must be handled separately.
After some experimentation, I've come up with the following, which is almost what I'm looking for:
var (maybeTargetEl, rest) = xs
.foldLeft((Option.empty[Thing], List[Thing]())) { case ((opt, ls), x) =>
if (x.num == 1)
(Some(x), ls)
else
(opt, x :: ls)
}
The target value is still wrapped in a container, but at least it guarantees a single value.
After that I can do
rest map <some_method>
maybeTargetEl map <some_other_method>
If the order of the original list is important:
var (maybeTargetEl, rest) = xs.
foldLeft((Option.empty[Thing], ListBuffer[Thing]())){ case ((opt, lb), x) =>
if (x.num == 1)
(Some(x), ls)
else
(opt, lb += x)
} match {
case (opt, lb) => (opt, lb.toList)
}
#evanjdooner Your solution with fold works if target element is present only once. If you want to extract only one occurrence of target element:
def find(xs: List[T], target: T, prefix: List[T]) = xs match {
case target :: tail => (target, prefix ::: tail)
case other :: tail => find(tail, target, other :: prefix)
case Nil => throw new Exception("Not found")
}
val (el, rest) = find(xs, target, Nil)
Sorry, I can't add it as a comment.