foldLeft vs foldRight vs tailrec on Scala List - scala

Assume I have a collection of c: List[T] sorted.
And I need to make aggregation with foldLeft/foldRight which outputs (List[T],List[T],List[T]), sorted.
To complete this, I have 2 possibilities:
foldLeft, and then reverse each one of the lists (use ::, otherwise its O(n) each step)
foldRight and remain in the same order (use ::)
I know that foldLeft is tailrec implemented and optimized, but making O(n) more steps to reverse the collections, in this use case - which method will generate me better performance?
Assume that the op is O(1)
Example of the code:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
val (succeed, notApplicable, failed) = actionRules.foldLeft((List[RuleResult](), List[RuleResult](), List[RuleResult]())) { case ( seed # (succeed, notApplicable, failed), actionRule) =>
// Evaluate Single ActionRule
def step(): (List[RuleResult], List[RuleResult], List[RuleResult]) = actionRuleService.eval(actionRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => (RuleResult.fromSucceed(actionRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => (succeed, RuleResult.fromNotApplicable(actionRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => (succeed, notApplicable, RuleResult.fromFailed(actionRule, appConfig.showFailedAppData, unmatched) :: failed)
}
analyzeMode match {
case UntilFirstSucceed => if (succeed.isEmpty) step() else seed
case UntilFirstNotApplicable => if (notApplicable.isEmpty) step() else seed
case UntilFirstFailed => if (failed.isEmpty) step() else seed
case Default => step()
case _ => throw new RuntimeException(s"Unknown mode on analyzePayload with mode = ${analyzeMode}")
}
}
InspectorReport(succeed.reverse, notApplicable.reverse, failed.reverse)
}
Before that, I used tailrec which made bad preformance: (why??)
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val analyzeMode = appConfig.analyzeMode
def isCompleted(succeed: Int, notApplicable: Int, failed: Int) = {
analyzeMode match {
case Default => false
case UntilFirstSucceed => if (succeed == 0) false else true
case UntilFirstNotApplicable => if (notApplicable == 0) false else true
case UntilFirstFailed => if (failed == 0) false else true
}
}
#tailrec
def analyzePayloadRec(rules: List[ActionRule])(succeed: List[RuleResult], notApplicable: List[RuleResult], failed: List[RuleResult]): (List[RuleResult], List[RuleResult], List[RuleResult]) = {
if (isCompleted(succeed.size, notApplicable.size, failed.size)) (succeed, notApplicable, failed)
else rules match {
// Base cases:
case Nil => (succeed, notApplicable, failed)
// Evaluate case:
case nextRule :: tail =>
actionRuleService.eval(nextRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => analyzePayloadRec(tail)(RuleResult.fromSucceed(nextRule, appConfig.showSucceedAppData) :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing #_ :: _, _) => analyzePayloadRec(tail)(succeed, RuleResult.fromNotApplicable(nextRule, appConfig.showNotApplicableAppData, missing) :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched #_ :: _) => analyzePayloadRec(tail)(succeed, notApplicable, RuleResult.fromFailed(nextRule, appConfig.showFailedAppData, unmatched) :: failed)
}
}
}
analyzePayloadRec(actionRules.reverse)(Nil, Nil, Nil).toInspectorReport // todo: if the analyzeModes are not Default - consider use Streams for lazy collection
}
Performance:
Until 22:16, this tailrec analyze was in use, then, the above code.
which you can see that generates much better performance - the same data was in use.
The x-axis is timeline and y-axis is time in ms (1.4s)
Any ideas why? tailrec should be faster, that's why I consider using foldRight.

What makes your tailrec version slow is
isCompleted(succeed.size, notApplicable.size, failed.size)
taking the size of a linked list in Scala is linear in the size, and you're computing the size of all three lists (at least two of which you're never using (and you don't use any of them by Default)).
On the nth iteration of the tailrec, you're going to walk n list nodes between the size computations, so n iterations of tailrec is at least O(n^2).
Passing the lists themselves to isCompleted (as you effectively do in the foldLeft version) and checking isEmpty should dramatically speed things up (especially in the Default case).

Assuming we have those case classes:
case class InspectorReport(succeed: List[ActionRule], notApplicable: List[ActionRule], failed: List[ActionRule])
case class ActionRule()
case class EvalResult(success: Boolean, missing: List[String], unmatched: List[String])
And the evaluator:
class ActionRuleService {
def eval(actionRule: ActionRule, payload: JsObject): EvalResult = EvalResult(true, List(), List())
}
val actionRuleService = new ActionRuleService
I would try something like this:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val (success, failed) = evaluated.partition(_._2.success)
val (missing, unmatched) = failed.partition(_._2.missing.nonEmpty)
InspectorReport(success.map(_._1), missing.map(_._1), unmatched.map(_._1))
}
Or:
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): InspectorReport = {
val evaluated = actionRules.map(actionRule => (actionRule, actionRuleService.eval(actionRule, payload)))
val success = evaluated.collect {
case (actionRule, EvalResult(true, _, _)) =>
actionRule
}
val missing = evaluated.collect {
case (actionRule, EvalResult(false, missing, _)) if missing.nonEmpty =>
actionRule
}
val unmatched = evaluated.collect {
case (actionRule, EvalResult(false, missing, unmatched)) if missing.isEmpty && unmatched.nonEmpty =>
actionRule
}
InspectorReport(success, missing, unmatched)
}

Related

Validation JSON schema runtime

I want to avoid Runtime undefined behaivors as follows:
val jsonExample = Json.toJson(0)
jsonExample.asOpt[Instant]
yield Some(1970-01-01T00:00:00Z)
How can I verify this using partial function with a lift or some other way, to check thatits indeed Instant, or how you recommend to validate?
ex1:
val jsonExample = Json.toJson(Instant.now())
jsonExample match { ... }
ex2:
val jsonExample = Json.toJson(0)
jsonExample match { ... }
Examples for desired output:
validateInstant(Json.toJson(Instant.now())) -> return Some(...)
validateInstant(Json.toJson(0)) -> return None
I can do somthing as follows, maybe some other ideas?
Just wanted to add a note regarding parsing json, there some runtime undefined problems when we are trying to parse .asOpt[T]
for example:
Json.toJson("0").asOpt[BigDecimal] // yields Some(0)
Json.toJson(0).asOpt[Instant] // yields Some(1970-01-01T00:00:00Z)
We can validate it as follows or some other way:
Json.toJson("0") match {
case JsString(value) => Some(value)
case _ => None
}
Json.toJson(0) match {
case JsNumber(value) => Some(value)
case _ => None
}
Json.toJson(Instant.now()) match {
case o # JsString(_) => o.asOpt[Instant]
case _ => None
}
You can use Option:
def filterNumbers[T](value: T)(implicit tjs: Writes[T]): Option[Instant] = {
Option(Json.toJson(value)).filter(_.asOpt[JsNumber].isEmpty).flatMap(_.asOpt[Instant])
}
Then the following:
println(filterNumbers(Instant.now()))
println(filterNumbers(0))
will output:
Some(2021-02-22T10:35:13.777Z)
None

Optimization of foldLeft

I'm using the following code, and I'm looking for some ideas to make some optimizations.
analyzePayload:
Input: payload which is JsObject and list of rules, each rule has several conditions.
Output: MyReport of all the rules which succeed, notApplicable or failed on this specific payload.
The size of the list can be pretty big, also each Rule has a big amount of conditions.
I am looking for some ideas on how to optimize that code, maybe with a lazy collection? view? stream? tailrec? and why - Thanks!
Also, note that I have anaylzeMode which can run only until one rule succeeds for ex.
def analyzePayload(payload: JsObject, rules: List[Rule]): MyReport = {
val analyzeMode = appConfig.analyzeMode
val (succeed, notApplicable, failed) = rules.foldLeft((List[Rule](), List[Rule](), List[Rule]())) { case ( seed # (succeedRules,notApplicableRules,failedRules), currRule) =>
// Evaluate Single Rule
def step(): (List[Rule], List[Rule], List[Rule]) = evalService.eval(currRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => (currRule :: succeedRules, notApplicableRules, failedRules)
// If the result is notApplicable
case EvalResult(_, missing # _ :: _, _) => (succeedRules, currRule :: notApplicableRules, failedRules
)
// If the result is unmatched
case EvalResult(_, _, unmatched # _ :: _) => (succeedRules, notApplicableRules, currRule :: failedRules)
}
analyzeMode match {
case UNTIL_FIRST_SUCCEED => if(succeedRules.isEmpty) step() else seed
case UNTIL_FIRST_NOT_APPLICABLE => if(notApplicableRules.isEmpty) step() else seed
case UNTIL_FIRST_FAILED => if(failedRules.isEmpty) step() else seed
case DEFAULT => step()
case _ => throw new IllegalArgumentException(s"Unknown mode = ${analyzeMode}")
}
}
MyReport(succeed.reverse, notApplicable.reverse, failed.reverse)
}
First Edit:
Changed the code to use tailrec from #Tim Advise, any other suggestions? or some suggestions to make the code a little prettier?
Also, i wanted to ask if there any difference to use view before the foldLeft on the previous implementation.
Also maybe use other collection such as ListBuffer or Vector
def analyzePayload(payload: JsObject, actionRules: List[ActionRule]): MyReport = {
val analyzeMode = appConfig.analyzeMode
def isCompleted(succeed: List[Rule], notApplicable: List[Rule], failed: List[Rule]) = ((succeed, notApplicable, failed), analyzeMode) match {
case (( _ :: _, _, _), UNTIL_FIRST_SUCCEED) | (( _,_ :: _, _), UNTIL_FIRST_NOT_APPLICABLE) | (( _, _, _ :: _), UNTIL_FIRST_FAILED) => true
case (_, DEFAULT) => false
case _ => throw new IllegalArgumentException(s"Unknown mode on analyzePayload with mode = ${analyzeMode}")
}
#tailrec
def _analyzePayload(actionRules: List[ActionRule])(succeed: List[Rule], notApplicable: List[Rule], failed: List[Rule]): (List[Rule], List[Rule] ,List[Rule]) = actionRules match {
case Nil | _ if isCompleted(succeed, notApplicable, failed) => (succeed, notApplicable, failed)
case actionRule :: tail => actionRuleService.eval(actionRule, payload) match {
// If the result is succeed
case EvalResult(true, _, _) => _analyzePayload(tail)(actionRule :: succeed, notApplicable, failed)
// If the result is notApplicable
case EvalResult(_, missing # _ :: _, _) => _analyzePayload(tail)(succeed, actionRule :: notApplicable, failed)
// If the result is unmatched
case EvalResult(_, _, unmatched # _ :: _) => _analyzePayload(tail)(succeed, notApplicable, actionRule :: failed)
}
}
val res = _analyzePayload(actionRules)(Nil,Nil,Nil)
MyReport(res._1, res._2, res._3)
}
Edit 2: (Questions)
If there result will be forwarded to the Client - There no meaning for do it as view? since all the data will be evaluated right?
Maybe should I use ParSeq instead? or this will be just slower since the operation of the evalService.eval(...) is not a heavy operation?
Two obvious optimisations:
Use a tail-recursive function rater than foldLeft so that the compiler can generate an optimised loop and terminate as soon as the appropriate rule is found.
Since analyzeMode is constant, take the match outside the foldLeft. Either have separate code paths for each mode, or use analyzeMode to select a function that is used inside the loop to check for termination.
The code is rather fine, the main thing to revisit would be to make evalService.eval evaluate multiple rules in a single traversal of the json object, assuming the size of the json is not negligible

Scala: How to add match vals to a list val

I have a few vals that match for matching values
Here is an example:
val job_ = Try(jobId.toInt) match {
case Success(value) => jobs.findById(value).map(_.id)
.getOrElse( Left(WrongValue("jobId", s"$value is not a valid job id")))
case Failure(_) => jobs.findByName(jobId.toString).map(_.id)
.getOrElse( Left(WrongValue("jobId", s"'$jobId' is not a known job title.")))
}
// Here the value arrives as a string e.i "yes || no || true || or false" then converted to a boolean
val bool_ = bool.toLowerCase() match {
case "yes" => true
case "no" => false
case "true" => true
case "false" => false
case other => Left(Invalid("bool", s"wrong value received"))
}
Note: invalid case is case class Invalid(x: String, xx: String)
above i'm looking for a given job value and checking whether it exist in the db or not,
No I have a few of these and want to add to a list, here is my list val and flatten it:
val errors = List(..all my vals errors...).flatten // <--- my_list_val (how do I include val bool_ and val job_)
if (errors.isEmpty) { do stuff }
My result should contain errors from val bool_ and val job_
THANK!
You need to fix the types first. The type of bool_ is Any. Which does not give you something you can work with.
If you want to use Either, you need to use it everwhere.
Then, the easiest approach would be to use a for comprehension (I am assuming you're dealing with Either[F, T] here, where WrongValue and Invalid are both sub-classes of F and you're not really interested in the errors).
for {
foundJob <- job_
_ <- bool_
} yield {
// do stuff
}
Note, that in Scala >= 2.13 you can use toIntOption when converting the String to Int:
vaj job_: Either[F, T] = jobId.toIntOption match {
case Some(value) => ...
case _ => ...
}
Also, in case expressions, you can use alternatives when you have the same statement for several cases:
val bool_: Either[F, Boolean] = bool.toLowerCase() match {
case "yes" | "true" => Right(true)
case "no" | "false" => Right(false)
case other => Left(Invalid("bool", "wrong value received"))
}
So, according to your question, and your comments, these are the types you're dealing with.
type ID = Long //whatever id is
def WrongValue(x: String, xx: String) :String = "?-?-?"
case class Invalid(x: String, xx: String)
Now let's create a couple of error values.
val job_ :Either[String,ID] = Left(WrongValue("x","xx"))
val bool_ :Either[Invalid,Boolean] = Left(Invalid("x","xx"))
To combine and report them you might do something like this.
val errors :List[String] =
List(job_, bool_).flatMap(_.swap.toOption.map(_.toString))
println(errors.mkString(" & "))
//?-?-? & Invalid(x,xx)
After checking types as #cbley explained. You can just do a filter operation with pattern matching on your list:
val error = List(// your variables ).filter(_ match{
case Left(_) => true
case _ => false
})

Find person and immediate neighbours in Seq[Person]

Given a Seq[Person], which contains 1-n Persons (and the minimum 1 Person beeing "Tom"), what is the easiest approach to find a Person with name "Tom" as well as the Person right before Tome and the Person right after Tom?
More detailed explanation:
case class Person(name:String)
The list of persons can be arbitrarily long, but will have at least one entry, which must be "Tom". So those lists can be a valid case:
val caseOne = Seq(Person("Tom"), Person("Mike"), Person("Dude"),Person("Frank"))
val caseTwo = Seq(Person("Mike"), Person("Tom"), Person("Dude"),Person("Frank"))
val caseThree = Seq(Person("Tom"))
val caseFour = Seq(Person("Mike"), Person("Tom"))
You get the idea. Since I already have "Tom", the task is to get his left neighbour (if it exists), and the right neighbour (if it exists).
What is the most efficient way to achieve to do this in scala?
My current approach:
var result:Tuple2[Option[Person], Option[Person]] = (None,None)
for (i <- persons.indices)
{
persons(i).name match
{
case "Tom" if i > 0 && i < persons.size-1 => result = (Some(persons(i-1)), Some(persons(i+1))) // (...), left, `Tom`, right, (...)
case "Tom" if i > 0 => result = (Some(persons(i-1)), None) // (...), left, `Tom`
case "Tom" if i < persons.size-1 => result = (Some(persons(i-1)), None) // `Tom`, right, (...)
case "Tom" => result = (None, None) // `Tom`
}
}
Just doesn't feel like I am doing it the scala way.
Solution by Mukesh prajapati:
val arrayPersons = persons.toArray
val index = arrayPersons.indexOf(Person("Tom"))
if (index >= 0)
result = (arrayPersons.lift(index-1), arrayPersons.lift(index+1))
Pretty short, seems to cover all cases.
Solution by anuj saxena
result = persons.sliding(3).foldLeft((Option.empty[Person], Option.empty[Person]))
{
case ((Some(prev), Some(next)), _) => (Some(prev), Some(next))
case (_, prev :: Person(`name`) :: next :: _) => (Some(prev), Some(next))
case (_, _ :: prev :: Person(`name`) :: _) => (Some(prev), None)
case (_, Person(`name`) :: next :: _) => (None, Some(next))
case (neighbours, _) => neighbours
}
First find out index where "Tom" is present, then use "lift". "lift" turns partial function into a plain function returning an Option result:
index = persons.indexOf("Tom")
doSomethingWith(persons.lift(index-1), persons.lift(index+1))
A rule of thumb: we should never access the content of a list / seq using indexes as it is prone to errors (like IndexNotFoundException).
If we want to use indexes, we better use Array as it provides us random access.
So to the current solution, here is my code to find prev and next element of a certain data in a Seq or List:
def findNeighbours(name: String, persons: Seq[Person]): Option[(Person, Person)] = {
persons.sliding(3).flatMap{
case prev :: person :: next :: Nil if person.name == name => Some(prev, next)
case _ => None
}.toList.headOption
}
Here the return type is in Option because there is a possibility that we may not find it here (in case of only one person is in the list or the required person is not in the list).
This code will pick the pair on the first occurrence of the person provided in the parameter.
If you have a probability that there might be several occurrences for the provided person, remove the headOption in the last line of the function findNeighbours. Then it will return a List of tuples.
Update
If Person is a case class then we can use deep match like this:
def findNeighbours(name: String, persons: Seq[Person]): Option[(Person, Person)] = {
persons.sliding(3).flatMap{
case prev :: Person(`name`) :: next :: Nil => Some(prev, next)
case _ => None
}.toList.headOption
}
For your solution need to add more cases to it (cchanged it to use foldleft in case of a single answer):
def findNeighboursV2(name: String, persons: Seq[Person]): (Option[Person], Option[Person]) = {
persons.sliding(3).foldLeft((Option.empty[Person], Option.empty[Person])){
case ((Some(prev), Some(next)), _) => (Some(prev), Some(next))
case (_, prev :: Person(`name`) :: next :: _) => (Some(prev), Some(next))
case (_, _ :: prev :: Person(`name`) :: _) => (Some(prev), None)
case (_, Person(`name`) :: next :: _) => (None, Some(next))
case (neighbours, _) => neighbours
}
}
You can use sliding function:
persons: Seq[Person] = initializePersons()
persons.sliding(size = 3).find { itr =>
if (itr(1).name = "Tom") {
val before = itr(0)
val middle = itr(1)
val after = itr(2)
}
}
If you know that there will be only one instance of "Tom" in your Seq use indexOf instead of looping by hand:
tomIndex = persons.indexOf("Tom")
doSomethingWith(persons(tomIndex-1), persons(tomIndex+1))
// Start writing your ScalaFiddle code here
case class Person(name: String)
val persons1 = Seq(Person("Martin"),Person("John"),Person("Tom"),Person("Jack"),Person("Mary"))
val persons2 = Seq(Person("Martin"),Person("John"),Person("Tom"))
val persons3 = Seq(Person("Tom"),Person("Jack"),Person("Mary"))
val persons4 = Seq(Person("Tom"))
def f(persons:Seq[Person]) =
persons
.sliding(3)
.filter(_.contains(Person("Tom")))
.maxBy {
case _ :: Person("Tom") :: _ => 1
case _ => 0
}
.toList
.take(persons.indexOf(Person("Tom")) + 2) // In the case where "Tom" is first, drop the last person
.drop(persons.indexOf(Person("Tom")) - 1) // In the case where "Tom" is last, drop the first person
println(f(persons1)) // List(Person(John), Person(Tom), Person(Jack))
println(f(persons2)) // List(Person(John), Person(Tom))
println(f(persons3)) // List(Person(Tom), Person(Jack))
println(f(persons4)) // List(Person(Tom))
Scalafiddle

Tuple seen as Product, compiler rejects reference to element

Constructing phoneVector:
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (p == None) None
else
if (t == None) (p,"Main") else (p,t)
}
).filter(_ != None)
Consider this very simple snippet:
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
//val pKey = pTuple._1.replaceAll("[^\\d]","")
associate() // stub prints "associate"
}
When I run it, I see output like this:
scala.Tuple2
((609) 954-3815,Mobile)
associate
When I uncomment the line with replaceAll(), compile fails:
....scala:57: value _1 is not a member of Product with Serializable
[error] val pKey = pTuple._1.replaceAll("[^\\d]","")
[error] ^
Why does it not recognize pTuple as a Tuple2 and treat it only as Product
OK, this compiles and produces the desired result. But it's too verbose. Can someone please demonstrate a more concise solution for dealing with this typesafe stuff?
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
val pPhone = pTuple match {
case t:Tuple2[_,_] => t._1
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
You can do:
for (pTuple <- phoneVector) {
val pPhone = pTuple match {
case (key, value) => key
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
Or simply phoneVector.map(_._1.replaceAll("[^\\d]",""))
By changing the construction of phoneVector, as wrick's question implied, I've been able to eliminate the match/case stuff because Tuple is assured. Not thrilled by it, but Change is Hard, and Scala seems cool.
Now, it's still possible to slip a None value into either of the Tuple values. My match/case does not check for that, and I suspect that could lead to a runtime error in the replaceAll call. How is that allowed?
def killNS (s:Option[_]) = {
(s match {
case _:Some[_] => s.get
case _ => None
}) match {
case None => None
case "" => None
case s => s
}
}
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (t == None) (p,"Main") else (p,t)
}
).filter(_._1 != None)
println(phoneVector)
println(name)
println
// Create the Neo4j nodes:
for (pTuple <- phoneVector) {
val pPhone = pTuple._1 match { case p:String => p }
val pType = pTuple._2
val pKey = pPhone.replaceAll(",.*","").replaceAll("[^\\d]","")
associate(Map("target"->Map("label"->"Phone","key"->pKey,
"dial"->pPhone),
"relation"->Map("label"->"IS_AT","key"->pType),
"source"->Map("label"->"Person","name"->name)
)
)
}
}