Why doesn't Scala optimize calls to the same Extractor?

Why doesn't Scala optimize calls to the same Extractor? - scala

Take the following example, why is the extractor called multiple times as opposed to temporarily storing the results of the first call and matching against that. Wouldn't it be reasonable to assume that results from unapply would not change given the same string.
object Name {
val NameReg = """^(\w+)\s(?:(\w+)\s)?(\w+)$""".r
def unapply(fullName: String): Option[(String, String, String)] = {
val NameReg(fname, mname, lname) = fullName
Some((fname, if (mname == null) "" else mname, lname))
}
}
"John Smith Doe" match {
case Name("Jane", _, _) => println("I know you, Jane.")
case Name(f, "", _) => println(s"Hi ${f}")
case Name(f, m, _) => println(s"Howdy, ${f} ${m}.")
case _ => println("Don't know you")
}

Wouldn't it be reasonable to assume that results from unapply would not change given the same string.
Unfortunately, assuming isn't good enough for a (static) compiler. In order for memoizing to be a legal optimization, the compiler has to prove that the expression being memoized is pure and referentially transparent. However, in the general case, this is equivalent to solving the Halting Problem.
It would certainly be possible to write an optimization pass which tries to prove purity of certain expressions and memoizes them iff and only iff it succeeds, but that may be more trouble than it's worth. Such proofs get very hard very quickly, so they are only likely to succeed for very trivial expressions, which execute very quickly anyway.

What is a pattern match? The spec says it matches the "shape" of the value and binds vars to its "components."
In the realm of mutation, you have questions like, if I match on case class C(var v: V), does a case C(x) capture the mutable field? Well, the answer was no:
https://issues.scala-lang.org/browse/SI-5158
The spec says (sorry) that order of evaluation may be changed, so it recommends against side-effects:
In the interest of efficiency the evaluation of a pattern matching
expression may try patterns in some other order than textual sequence.
This might affect evaluation through side effects in guards.
That's in relation to guard expressions, presumably because extractors were added after case classes.
There's no special promise to evaluate extractors exactly once. (Explicitly in the spec, that is.)
The semantics are only that "patterns are tried in sequence".
Also, your example for regexes can be simplified, since a regex will not be re-evaluated when unapplied to its own Match. See the example in the doc. That is,
Name.NameReg findFirstMatchIn s map {
case Name("Jane",_,_) =>
}
where
object Name { def unapply(m: Regex.Matcher) = ... }

Related

Scala match case with multiple branch with if

I have a match case with if and the expression is always the same.
I put some pseudo code:
value match {
case A => same expression
case B(_) if condition1 => same expression
case _ if condition2 => same expression
...
case _ => different expression //similar to an else
}
The match contains both case object (case A) matching and case class(case B(_))
Is it the best practice?

Try to explain this code in words. "This function returns one of two values. The first is returned if the input is A. Or if the input is of type B and a condition holds. Oh, or if a different condition holds. Otherwise, it's the other value". That sounds incredibly complex to me.
I have to recommend breaking this down at least a bit. At minimum, you've got two target expressions, and which one is chosen depends on some predicate of value. That sounds like a Boolean to me. Assuming value is of some trait type Foo (which A.type and B extend), you could write
sealed trait Foo {
def isFrobnicated: Boolean = this match {
case A => true
case B(_) if condition1 => true
case _ => condition2
}
}
...
if (value.isFrobnicated) {
same expression
} else {
different expression
}
Now the cognitive load is split into two different, smaller chunks of code to digest, and presumably isFrobnicated will be given a self-documenting name and a chunk of comments explaining why this distinction is important. Anyone reading the bottom snippet can simply understand "Oh, there's two options, based on the frobnication status", and if they want more details, there's some lovely prose they can go read in the isFrobnicated docs. And all of the complexity of "A or B if this or anything if that" is thrown into its own function, separate from everything else.
If A and B don't have a common supertype that you control, then you can always write a standalone function, an implicit class, or (if you're in Scala 3) a proper extension method. Take your pick.
Depending on your actual use case, there may be more that can be done, but this should be a start.

Pattern match on value of Either inside a for comprehension?

I have a for comprehension like this:
for {
(value1: String, value2: String, value3: String) <- getConfigs(args)
// more stuff using those values
}
getConfigs returns an Either[Throwable, (Seq[String], String, String)] and when I try to compile I get this error:
value withFilter is not a member of Either[Throwable,(Seq[String], String, String)]
How can I use this method (that returns an Either) in the for comprehension?

Like this:
for {
tuple <- getConfigs()
} println(tuple)
Joking aside, I think that is an interesting question but it is misnamed a bit.
The problem (see above) is not that for comprehensions are not possible but that pattern matching inside the for comprehension is not possible within Either.
There is documentation how for comprehensions are translated but they don't cover each case. This one is not covered there, as far as I can see. So I looked it up in my instance of "Programming in Scala" -- Second Edition (because that is the one I have by my side on dead trees).
Section 23.4 - Translation of for-expressions
There is a subchapter "Translating patterns in generators", which is what is the problem here, as described above. It lists two cases:
Case One: Tuples
Is exactly our case:
for ((x1, …, xn) <- expr1) yield expr2
should translate to expr1.map { case (x1, …, xn) => expr2).
Which is exactly what IntelliJ does, when you select the code and do an "Desugar for comprehension" action. Yay!
… but that makes it even weirder in my eyes, because the desugared code actually runs without problems.
So this case is the one which is (imho) matching the case, but is not what is happening. At least not what we observed. Hm?!
Case two: Arbitrary patterns
for (pat <- expr1) yield expr2
translates to
expr1 withFilter {
case pat => true
case _ => false
} map {
case pat => expr2
}
where there is now an withFilter method!
This case totally explains the error message and why pattern matching in an Either is not possible.
The chapter ultimately refers to the scala language specification (to an older one though) which is where I stop now.
So I a sorry I can't totally answer that question, but hopefully I could hint enough what is the root of the problem here.
Intuition
So why is Either problematic and doesn't propose an withFilter method, where Try and Option do?
Because filter removes elements from the "container" and probably "all", so we need something that is representing an "empty container".
That is easy for Option, where this is obviously None. Also easy for e.g. List. Not so easy for Try, because there are multiple Failure, each one can hold a specific exception. However there are multiple failures taking this place:
NoSuchElementException and
UnsupportedOperationException
and which is why Try[X] runs, but an Either[Throwable, X] does not.
It's almost the same thing, but not entirely. Try knows that Left are Throwable and the library authors can take advantage out of it.
However on an Either (which is now right biased) the "empty" case is the Left case; which is generic. So the user determines which type it is, so the library authors couldn't pick generic instances for each possible left.
I think this is why Either doesn't provide an withFilter out-of-the-box and why your expression fails.
Btw. the
expr1.map { case (x1, …, xn) => expr2) }
case works, because it throws an MatchError on the calling stack and panics out of the problem which… in itself might be a greater problem.
Oh and for the ones that are brave enough: I didn't use the "Monad" word up until now, because Scala doesn't have a datastructure for it, but for-comprehensions work just without it. But maybe a reference won't hurt: Additive Monads have this "zero" value, which is exactly what Either misses here and what I tried to give some meaning in the "intuition" part.

I guess you want your loop to run only if the value is a Right. If it is a Left, it should not run. This can be achieved really easy:
for {
(value1, value2, value3) <- getConfigs(args).right.toOption
// more stuff using those values
}
Sidenote: I don't know whats your exact use case, but scala.util.Try is better suited for cases where you either have a result or a failure (an exception).
Just write Try { /*some code that may throw an exception*/ } and you'll either have Success(/*the result*/) or a Failure(/*the caught exception*/).
If your getConfigs method returns a Try instead of Either, then your above could would work without any changes.

You can do this using Oleg's better-monadic-for compiler plugin:
build.sbt:
addCompilerPlugin("com.olegpy" %% "better-monadic-for" % "0.2.4")
And then:
object Test {
def getConfigs: Either[Throwable, (String, String, String)] = Right(("a", "b", "c"))
def main(args: Array[String]): Unit = {
val res = for {
(fst, snd, third) <- getConfigs
} yield fst
res.foreach(println)
}
}
Yields:
a
This works because the plugin removes the unnecessary withFilter and unchecked while desugaring and uses a .map call. Thus, we get:
val res: Either[Throwable, String] =
getConfigs
.map[String](((x$1: (String, String, String)) => x$1 match {
case (_1: String, _2: String, _3: String)
(String, String, String)((fst # _), (snd # _), (third # _)) => fst
}));

I think the part you may find surprising is that the Scala compiler emits this error because you deconstruct the tuple in place. This is surprisingly forces the compiler to check for withFilter method because it looks to the compilers like an implicit check for the type of the value inside the container and checks on values are implemented using withFilter. If you write your code as
for {
tmp <- getConfigs(args)
(value1: Seq[String], value2: String, value3: String) = tmp
// more stuff using those values
}
it should compile without errors.

PartialFunction That Isn't Partial

Is there a reason to use a PartialFunction on a function that's not partial?
scala> val foo: PartialFunction[Int, Int] = {
| case x => x * 2
| }
foo: PartialFunction[Int,Int] = <function1>
foo is defined as a PartialFunction, but of course the case x will catch all input.
Is this simply bad code as the PartialFunction type indicates to the programmer that the function is undefined for certain inputs?

There is no advantage in using a PartialFunction instead of a Function, but if you have to pass a PartialFunction, then you have to pass a PartialFunction.
Note that, because of the inheritance between these two, overloading a method to accept both results in something difficult to use, as the type inference won't work.

The thing is, there are many examples of times when what you need to define on a trait/object/function definition is a PartialFunction but in reality the real implementation may not be one. Case in point, take a look at def collect[B](f: PartialFunction[A,B]):
val myList = thatList collect {
case Right(value) => value
case Left(other) => other.toInt
}
It's clearly not a "real" partial as it is defined for all input. That said, if I wanted to, I could just have the Right match.
However, if I were to have written collect as a full on plain function, then I'd miss out on the desired behavior (that is to be both a filter and a map rolled into one base on when a function is defined.) That's nice behavior and allows for a lot of flexibility when writing my own code.
So I guess the better question is, will you ever want behavior to reflect that a function might not be defined everywhere? If the answer is no, then don't do it.

PartialFunction literals allow pattern matching directly on arguments (e.g. { case (a, b) => ... } instead of _ match { case (a, b) => ... }), which makes code more readable (see #wheaties' answer for another example).
EDIT: apparently this is wrong, see Daniel C. Sobral's comment on his answer. Not deleting, so that the comments still make sense.

Ending a for-comprehension loop when a check on one of the items returns false

I am a bit new to Scala, so apologies if this is something a bit trivial.
I have a list of items which I want to iterate through. I to execute a check on each of the items and if just one of them fails I want the whole function to return false. So you can see this as an AND condition. I want it to be evaluated lazily, i.e. the moment I encounter the first false return false.
I am used to the for - yield syntax which filters items generated through some generator (list of items, sequence etc.). In my case however I just want to break out and return false without executing the rest of the loop. In normal Java one would just do a return false; within the loop.
In an inefficient way (i.e. not stopping when I encounter the first false item), I could do it:
(for {
item <- items
if !satisfiesCondition(item)
} yield item).isEmpty
Which is essentially saying that if no items make it through the filter all of them satisfy the condition. But this seems a bit convoluted and inefficient (consider you have 1 million items and the first one already did not satisfy the condition).
What is the best and most elegant way to do this in Scala?

Stopping early at the first false for a condition is done using forall in Scala. (A related question)
Your solution rewritten:
items.forall(satisfiesCondition)
To demonstrate short-circuiting:
List(1,2,3,4,5,6).forall { x => println(x); x < 3 }
1
2
3
res1: Boolean = false
The opposite of forall is exists which stops as soon as a condition is met:
List(1,2,3,4,5,6).exists{ x => println(x); x > 3 }
1
2
3
4
res2: Boolean = true

Scala's for comprehensions are not general iterations. That means they cannot produce every possible result that one can produce out of an iteration, as, for example, the very thing you want to do.
There are three things that a Scala for comprehension can do, when you are returning a value (that is, using yield). In the most basic case, it can do this:
Given an object of type M[A], and a function A => B (that is, which returns an object of type B when given an object of type A), return an object of type M[B];
For example, given a sequence of characters, Seq[Char], get UTF-16 integer for that character:
val codes = for (char <- "A String") yield char.toInt
The expression char.toInt converts a Char into an Int, so the String -- which is implicitly converted into a Seq[Char] in Scala --, becomes a Seq[Int] (actually, an IndexedSeq[Int], through some Scala collection magic).
The second thing it can do is this:
Given objects of type M[A], M[B], M[C], etc, and a function of A, B, C, etc into D, return an object of type M[D];
You can think of this as a generalization of the previous transformation, though not everything that could support the previous transformation can necessarily support this transformation. For example, we could produce coordinates for all coordinates of a battleship game like this:
val coords = for {
column <- 'A' to 'L'
row <- 1 to 10
} yield s"$column$row"
In this case, we have objects of the types Seq[Char] and Seq[Int], and a function (Char, Int) => String, so we get back a Seq[String].
The third, and final, thing a for comprehension can do is this:
Given an object of type M[A], such that the type M[T] has a zero value for any type T, a function A => B, and a condition A => Boolean, return either the zero or an object of type M[B], depending on the condition;
This one is harder to understand, though it may look simple at first. Let's look at something that looks simple first, say, finding all vowels in a sequence of characters:
def vowels(s: String) = for {
letter <- s
if Set('a', 'e', 'i', 'o', 'u') contains letter.toLower
} yield letter.toLower
val aStringVowels = vowels("A String")
It looks simple: we have a condition, we have a function Char => Char, and we get a result, and there doesn't seem to be any need for a "zero" of any kind. In this case, the zero would be the empty sequence, but it hardly seems worth mentioning it.
To explain it better, I'll switch from Seq to Option. An Option[A] has two sub-types: Some[A] and None. The zero, evidently, is the None. It is used when you need to represent the possible absence of a value, or the value itself.
Now, let's say we have a web server where users who are logged in and are administrators get extra javascript on their web pages for administration tasks (like wordpress does). First, we need to get the user, if there's a user logged in, let's say this is done by this method:
def getUser(req: HttpRequest): Option[User]
If the user is not logged in, we get None, otherwise we get Some(user), where user is the data structure with information about the user that made the request. We can then model that operation like this:
def adminJs(req; HttpRequest): Option[String] = for {
user <- getUser(req)
if user.isAdmin
} yield adminScriptForUser(user)
Here it is easier to see the point of the zero. When the condition is false, adminScriptForUser(user) cannot be executed, so the for comprehension needs something to return instead, and that something is the "zero": None.
In technical terms, Scala's for comprehensions provides syntactic sugars for operations on monads, with an extra operation for monads with zero (see list comprehensions in the same article).
What you actually want to accomplish is called a catamorphism, usually represented as a fold method, which can be thought of as a function of M[A] => B. You can write it with fold, foldLeft or foldRight in a sequence, but none of them would actually short-circuit the iteration.
Short-circuiting arises naturally out of non-strict evaluation, which is the default in Haskell, in which most of these papers are written. Scala, as most other languages, is by default strict.
There are three solutions to your problem:
Use the special methods forall or exists, which target your precise use case, though they don't solve the generic problem;
Use a non-strict collection; there's Scala's Stream, but it has problems that prevents its effective use. The Scalaz library can help you there;
Use an early return, which is how Scala library solves this problem in the general case (in specific cases, it uses better optimizations).
As an example of the third option, you could write this:
def hasEven(xs: List[Int]): Boolean = {
for (x <- xs) if (x % 2 == 0) return true
false
}
Note as well that this is called a "for loop", not a "for comprehension", because it doesn't return a value (well, it returns Unit), since it doesn't have the yield keyword.
You can read more about real generic iteration in the article The Essence of The Iterator Pattern, which is a Scala experiment with the concepts described in the paper by the same name.

forall is definitely the best choice for the specific scenario but for illustration here's good old recursion:
#tailrec def hasEven(xs: List[Int]): Boolean = xs match {
case head :: tail if head % 2 == 0 => true
case Nil => false
case _ => hasEven(xs.tail)
}
I tend to use recursion a lot for loops w/short circuit use cases that don't involve collections.

UPDATE:
DO NOT USE THE CODE IN MY ANSWER BELOW!
Shortly after I posted the answer below (after misinterpreting the original poster's question), I have discovered a way superior generic answer (to the listing of requirements below) here: https://stackoverflow.com/a/60177908/501113
It appears you have several requirements:
Iterate through a (possibly large) list of items doing some (possibly expensive) work
The work done to an item could return an error
At the first item that returns an error, short circuit the iteration, throw away the work already done, and return the item's error
A for comprehension isn't designed for this (as is detailed in the other answers).
And I was unable to find another Scala collections pre-built iterator that provided the requirements above.
While the code below is based on a contrived example (transforming a String of digits into a BigInt), it is the general pattern I prefer to use; i.e. process a collection and transform it into something else.
def getDigits(shouldOnlyBeDigits: String): Either[IllegalArgumentException, BigInt] = {
#scala.annotation.tailrec
def recursive(
charactersRemaining: String = shouldOnlyBeDigits
, accumulator: List[Int] = Nil
): Either[IllegalArgumentException, List[Int]] =
if (charactersRemaining.isEmpty)
Right(accumulator) //All work completed without error
else {
val item = charactersRemaining.head
val isSuccess =
item.isDigit //Work the item
if (isSuccess)
//This item's work completed without error, so keep iterating
recursive(charactersRemaining.tail, (item - 48) :: accumulator)
else {
//This item hit an error, so short circuit
Left(new IllegalArgumentException(s"item [$item] is not a digit"))
}
}
recursive().map(digits => BigInt(digits.reverse.mkString))
}
When it is called as getDigits("1234") in a REPL (or Scala Worksheet), it returns:
val res0: Either[IllegalArgumentException,BigInt] = Right(1234)
And when called as getDigits("12A34") in a REPL (or Scala Worksheet), it returns:
val res1: Either[IllegalArgumentException,BigInt] = Left(java.lang.IllegalArgumentException: item [A] is not digit)
You can play with this in Scastie here:
https://scastie.scala-lang.org/7ddVynRITIOqUflQybfXUA

costly computation occuring in both isDefined and Apply of a PartialFunction

It is quite possible that to know whether a function is defined at some point, a significant part of computing its value has to be done. In a PartialFunction, when implementing isDefined and apply, both methods will have to do that. What to do is this common job is costly?
There is the possibility of caching its result, hoping that apply will be called after isDefined. Definitely ugly.
I often wish that PartialFunction[A,B] would be Function[A, Option[B]], which is clearly isomorphic. Or maybe, there could be another method in PartialFunction, say applyOption(a: A): Option[B]. With some mixins, implementors would have a choice of implementing either isDefined and apply or applyOption. Or all of them to be on the safe side, performance wise. Clients which test isDefined just before calling apply would be encouraged to use applyOption instead.
However, this is not so. Some major methods in the library, among them collect in collections require a PartialFunction. Is there a clean (or not so clean) way to avoid paying for computations repeated between isDefined and apply?
Also, is the applyOption(a: A): Option[B] method reasonable? Does it sound feasible to add it in a future version? Would it be worth it?

Why is caching such a problem? In most cases, you have a local computation, so as long as you write a wrapper for the caching, you needn't worry about it. I have the following code in my utility library:
class DroppedFunction[-A,+B](f: A => Option[B]) extends PartialFunction[A,B] {
private[this] var tested = false
private[this] var arg: A = _
private[this] var ans: Option[B] = None
private[this] def cache(a: A) {
if (!tested || a != arg) {
tested = true
arg = a
ans = f(a)
}
}
def isDefinedAt(a: A) = {
cache(a)
ans.isDefined
}
def apply(a: A) = {
cache(a)
ans.get
}
}
class DroppableFunction[A,B](f: A => Option[B]) {
def drop = new DroppedFunction(f)
}
implicit def function_is_droppable[A,B](f: A => Option[B]) = new DroppableFunction(f)
and then if I have an expensive computation, I write a function method A => Option[B] and do something like (f _).drop to use it in collect or whatnot. (If you wanted to do it inline, you could create a method that takes A=>Option[B] and returns a partial function.)
(The opposite transformation--from PartialFunction to A => Option[B]--is called lifting, hence the "drop"; "unlift" is, I think, a more widely used term for the opposite operation.)

Have a look at this thread, Rethinking PartialFunction. You're not the only one wondering about this.

This is an interesting question, and I'll give my 2 cents.
First of resist the urge for premature optimization. Make sure the partial function is the problem. I was amazed at how fast they are on some cases.
Now assuming there is a problem, where would it come from?
Could be a large number of case clauses
Complex pattern matching
Some complex computation on the if causes
One option I'd try to find ways to fail fast. Break the pattern matching into layer, then chain partial functions. This way you can fail the match early. Also extract repeated sub matching. For example:
Lets assume OddEvenList is an extractor that break a list into a odd list and an even list:
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(1::ors, 2::ers) =>
case OddEvenList(3::ors, 4::ors) =>
}
Break to two part, one that matches the split then one that tries to match re result (to avoid repeated computation. However this may require some re-engineering
var pf2: PartialFunction[(List[Int],List[Int],R) = {
case (1 :: ors, 2 :: ers) => R1
case (3 :: ors, 4 :: ors) => R2
}
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(ors, ers) if(pf2.isDefinedAt(ors,ers) => pf2(ors,ers)
}
I have used this when progressively reading XML files that hard a rather inconstant format.
Another option is to compose partial functions using andThen. Although a quick test here seamed to indicate that only the first was is actually tests.

There is absolutely nothing wrong with caching mechanism inside the partial function, if:
the function returns always the same input, when passed the same argument
it has no side effects
it is completely hidden from the rest of the world
Such cached function is not distiguishable from a plain old pure partial function...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why doesn't Scala optimize calls to the same Extractor? - scala

Related

Scala match case with multiple branch with if

Pattern match on value of Either inside a for comprehension?

PartialFunction That Isn't Partial

Ending a for-comprehension loop when a check on one of the items returns false

costly computation occuring in both isDefined and Apply of a PartialFunction

Categories

Resources