Scala doesn't do type inference on return statements? Why not? [duplicate] - scala

Why does Scala fail to infer the return type of the method when there's an explicit return statement used in the method?
For instance, why does the following code compile?
object Main {
def who = 5
def main(args: Array[String]) = println(who)
}
But the following doesn't.
object Main {
def who = return 5
def main(args: Array[String]) = println(who)
}

The return type of a method is either the type of the last statement in the block that defines it, or the type of the expression that defines it, in the absence of a block.
When you use return inside a method, you introduce another statement from which the method may return. That means Scala can't determine the type of that return at the point it is found. Instead, it must proceed until the end of the method, then combine all exit points to infer their types, and then go back to each of these exit points and assign their types.
To do so would increase the complexity of the compiler and slow it down, for the sole gain of not having to specify return type when using return. In the present system, on the other hand, inferring return type comes for free from the limited type inference Scala already uses.
So, in the end, in the balance between compiler complexity and the gains to be had, the latter was deemed to be not worth the former.

It would increase the complexity of the compiler (and language). It's just really funky to be doing type inference on something like that. As with anything type inference related, it all works better when you have a single expression. Scattered return statements effectively create a lot of implicit branching that gets to be very sticky to unify. It's not that it's particularly hard, just sticky. For example:
def foo(xs: List[Int]) = xs map { i => return i; i }
What, I ask you, does the compiler infer here? If the compiler were doing inference with explicit return statements, it would need to be Any. In fact, a lot of methods with explicit return statements would end up returning Any, even if you don't get sneaky with non-local returns. Like I said, sticky.
And on top of that, this isn't a language feature that should be encouraged. Explicit returns do not improve code clarity unless there is just one explicit return and that at the end of the function. The reason is pretty easy to see if you view code paths as a directed graph. As I said earlier, scattered returns produce a lot of implicit branching that produces weird leaves on your graph, as well as a lot of extra paths in the main body. It's just funky. Control flow is much easier to see if your branches are all explicit (pattern matching or if expressions) and your code will be much more functional if you don't rely on side-effecting return statements to produce values.
So, like several other "discouraged" features in Scala (e.g. asInstanceOf rather than as), the designers of the language made a deliberate choice to make things less pleasant. This combined with the complexity that it introduces into type inference and the practical uselessness of the results in all but the most contrived of scenarios. It just doesn't make any sense for scalac to attempt this sort of inference.
Moral of the story: learn not to scatter your returns! That's good advice in any language, not just Scala.

Given this (2.8.Beta1):
object Main {
def who = return 5
def main(args: Array[String]) = println(who)
}
<console>:5: error: method who has return statement; needs result type
def who = return 5
...it seems not inadvertent.

I'm not sure why. Perhaps just to discourage the use of the return statement. :)

Related

When to use implicit parameters

I've been using Scala at work, and I have a question related to implicit parameters.
Often I've seen executionContext defined in method definitions and also in class definitions.
At the same time I've seen classes that accepts case classes that contain configuration data (timeout, adapter, port, etc.) as regular parameters.
My question is why when passing configuration this parameter is not defined as implicit?
Or the other way around what if executionContext would be defined as a regular parameter?
I'm trying to understand when to use implicit parameter and when not to use them.
EDIT: maybe the example of passing a case class is not the best example, it was the first idea that comes to my mind
Conceptually, implicits are something "external" to the application logic, and explicit parameters are ... well ... explicit.
Consider a function def f(x: Double): Double = x*x
It is a pure function that transforms a given real number into another real number. It makes sense for x to be an explicit parameter, as it is an intrinsic part of what this function is.
Now, suppose, you were implementing some sort of approximate algorithm for multiplication, and wanted to control the precision with which you function computes the answer.
You could do def f(x: Double, precision: Int): Double = ???. It would work, but is inconvenient and kinda clumsy:
Function definition no longer expresses the conceptual "nature" of the function being a pure transformation on the set of real numbers
It makes it complicated at the call site, because everyone using your function must now be aware of this additional parameter to pass around (imagine, you are writing a library for non–engineer math majors to use, they understand abstract transformations and complex formulas, but could care less about numeric precision: how often do you think about precision when you need to compute an area of a square?).
It also makes existing code harder to read and modify
So, to make it prettier, you can do def f(x: Double)(implicit precision: Int) = ???. This has an advantage of saying exactly what you want: "I have a transformation double => double, that will use the implied precision when the actual result is computed). Those math majors can now write their abstract formulas the way they are used to: val area = square(x) without polluting their logic with annoying configurations they don't really care about.
When to use this exactly is, certainly, a question of opinion and taste (which is expressly forbidden on SO). Someone can certainly argue about the above example, that precision is actually a part of the transformation definition, because 5.429 and 5.54289 (results of f(2.33)(3) and f(2.33)(4) respectively) are two different numbers.
So, in the end of the day, you just gotta use your judgement and your common sense to make a decision for every case you come across.
When using existing libraries, there is another consideration. Consider:
def foo(f: Future[Int], ec: ExecutionContext) =
f.map { x => x*x }(ec)
.map { _.toString } (ec)
.foreach(println)(ec)
This would look a lot nicer and less messy if you made ec implicit, regardless of where you stand philosophically on whether to consider it a part of your transformation or not:
def foo(f: Future[Int])(implicit ec: ExecutionContext) =
f.map { x => x*x }.map(_.toString).foreach(println)
Implicits can be used when:
you need only one value of some type
it is unambiguous how such value would be defined
this includes both manual definition as well as using metaprogramming to generate the value based on e.g. how its type is defined
Futures and Akka decided that passing some "globals" as implicits is a reasonable use case, so they would pass as implicits:
ExecutionContext
ActorSystem, Materializer
various configs like Timeout
in general things which you don't want to be put into some static field, but which are passed around everywhere.
However, the rest of Scala world would solve this issue by using some abstraction that would pass these things under the hood, some sort of builders, via constructors, abstractions over (dependencies) => result functions, etc.
E.g. cats.effect.IO don't need to pass ExecutionContext around because it passes its scheduler around when you run it. Only when you want to explicitly change the pool things are being run on you have to use some method. In Monix running things also require you to pass Scheduler at the end, when whole computation is composed. So both approached let you give up on passing around all these ExecutionContexts. In case of Future it is necessary because you need to have control over thread pools, but you also evaluate things eagerly, and putting ec (futureA.flatMap(f)(ec)) manually would break for-comprehension.
As a result, outside Akka ecosystem and raw Futures, are more often used to carry around type-classes, as a mean to decouple business logic from particular implementation, allow adding support for new types without modifying code that uses these implementations, and so on. (There are tons of examples of type-classes in Scala so I'll skip it here).
Usually, when I read about people using implicits to pass configs around, it is just a matter of time before it ends up with grief. Akka and EC kind of requires them but you should just pass configs explicitly. You can group them into case classes to pass bunch of them around and it is not that much of an issue. You can also put all things required as implicits explicitly into one place and do:
case class Configs(dbEX: EC, mapEC: EC)
class SomeBehavior(configs: Configs) {
def someAction = {
if (...) {
implicit val ec: EC = configs.dbEC
...
} else {
implicit val ec: EC = configs.mapEC
...
}
}
}
to make them implicit only in the place that needs them. A good role of thumb is: do you care if there is something passed around that you don't see right in the code? Usually, the answer is, yes you do, you would prefer to see it, with only exceptions being cases when it would be somewhat obvious where does the value come from, or if you kinda knew that the value would be ok and you didn't bother where it came from.
There are a multitude of use-cases of implicit in Scala: under the hood, they boil down to leveraging the compiler's implicit resolution mechanism to fill in things that might not have explicitly been mentioned, but the use-cases are divergent enough that in Scala 3, each use-case (of those that survive into Scala 3...) gets encoded with a different keyword.
In the case of the execution context, implicit arguments are being used to mimic dynamic scope in a language which is normally statically scoped. The primary win from doing this is that it allows behavior further down the call stack to be decided-upon much further up the call stack without having to always explicitly pass on the behavior through the intervening layers of the stack (while providing a way for those intervening layers to cleanly force a different behavior).
Historically, a major example of this was for things like numeric precision. Many numeric operations end up being implemented through iterated refinement (e.g. when square-root was implemented in software, it might be implemented using Newton's method), which means there's a trade-off between speed of calculation and precision (suggesting accuracy). With dynamic scoping, there's a neat way to accomplish this: a global variable for the desired level of precision in mathematical results. Your numeric routine checks the value of that variable and governs itself accordingly. The difference from globals in a statically-scoped language is that when A calls B which calls C, if A sets the value of x to 1 and B sets it to 2, x will be 2 when checked in C or B, but once B returns to A, x will once again be 1 (in dynamically scoped languages, you can think of a global variable as really being a name for a stack of values, and the language implementation automatically pops the stack as appropriate).
Dynamic scoping was once fairly popular (especially so in Lisps before the mid/late 1970s); nowadays the only places you really see it are in Bourne shells (including bash), Emacs Lisp; while some languages (Perl and Common Lisp are probably the two main examples) are hybrids: a variable gets declared in a special way to make it dynamically or statically scoped. Static scoping has pretty clearly won: it's easier for the language implementation or the programmer to reason about.
The cost of that ease is that, in our numeric computation example, we end up with something like the following:
def newtonSqrt(x: Double, precision: Int): Double = ???
/** Calculates the length of the hypotenuse of a right triangle with legs of given lengths
*/
def hypotenuse(x: Double, y: Double, precision: Int): Double =
newtonSqrt(x*x + y*y, precision)
Thankfully, Scala supports default arguments, so we avoid having versions that use a default precision, too. Arguably, the precision is exposing an implementation detail (the fact that our calculations aren't necessarily perfectly mathematically accurate): the important thing is that the length of the hypotenuse is the square root of the sum of the squares of the legs.
In Scala, we can make the precision implicit:
// DON'T ACTUALLY PASS AN INT IMPLICITLY!!!!!!
def newtonSqrt(x: Double)(implicit precision: Int): Double = ???
def hypotenuse(x: Double, y: Double)(implicit precision: Int): Double =
newtonSqrt(x*x, y*y)
(It's actually really bad to ever pass a primitive or any type which could plausibly be used for something other than describing the behavior in question through the implicit mechanism: I'm doing it here for didactic clarity).
The compiler will effectively translate newtonSqrt(x*x + y*y) to (something very similar to) newtonSqrt(x*x + y*y, precision). Now callers to hypotenuse can decide to fix precision via an implicit val or to defer the choice to their callers by adding the implicit to their signature.
Dynamic scoping has long been controversial, so it's no surprise that even the constrained dynamic scoping this usage of implicit embeds is controversial. In Scala's case, it doesn't help that in many cases the tooling throws up its hands when it comes to helping you figure out implicits: most of the really furious compiler errors one encounters are related to missing implicits or collisions, and tracing to figure out which values are in the implicit scope at any time is not something the tooling has a history of helping people with. Thus there are many developers who have decided that explicitly threading through configuration is superior to using implicits.
It's largely a matter of taste and the situation whether this sort of behavior description is best passed implicitly or explicitly (and it's worth noting that the type-class pattern, especially without a hard requirement for coherence (that there be one and only one possible way to describe the behavior) as is typical in Scala, is just a special case of this behavior description).
I should also note that it isn't a binary choice between bundling a few settings into a case class vs. passing them implicitly: you can do both:
case class ProcessSettings(sys: ActorSystem, ec: ExecutionContext)
object ProcessSettings {
implicit def implicitly(implicit sys: ActorSystem, ec: ExecutionContext): ProcessSettings =
ProcessSettings(sys, ec)
}
def doStuff(x: SomeInput)(implicit settings: ProcessSettings)

Is returning Either/Option/Try/Or considered a viable / idiomatic approach when function has preconditions for arguments?

First of all, I'm very new to Scala and don't have any experience writing production code with it, so I lack understanding of what is considered a good/best practice among community. I stumbled upon these resources:
https://github.com/alexandru/scala-best-practices
https://nrinaudo.github.io/scala-best-practices/
It is mentioned there that throwing exceptions is not very good practice, which made me think what would be a good way to define preconditions for function then, because
A function that throws is a bit of a lie: its type implies it’s total function when it’s not.
After a bit of research, it seems that using Option/Either/Try/Or(scalactic) is a better approach, since you can use something like T Or IllegalArgumentException as return type to clearly indicate that function is actually partial, using exception as a way to store message that can be wrapped in other exceptions.
However lacking Scala experience I don't quite understand if this is actually viable approach for a real project or using Predef.require is a way to go. I would appreciate if someone explained how things are usually done in Scala community and why.
I've also seen Functional assertion in Scala, but while the idea itself looks interesting, I think PartialFunction is not very suitable for the purpose as it is, because often more than one argument is passed and tuples look like a hack in this case.
Option or Either is definitely the way to go for functional programming.
With Option it is important to document why None might be returned.
With Either, the left side is the unsuccessful value (the "error"), while the right side is the successful value. The left side does not necessarily have to be an Exception (or a subtype of it), it can be a simple error message String (type aliases are your friend here) or a custom data type that is suitable for you application.
As an example, I usually use the following pattern when error handling with Either:
// Somewhere in a package.scala
type Error = String // Or choose something more advanced
type EitherE[T] = Either[Error, T]
// Somewhere in the program
def fooMaybe(...): EitherE[Foo] = ...
Try should only be used for wrapping unsafe (most of the time, plain Java) code, giving you the ability to pattern-match on the result:
Try(fooDangerous()) match {
case Success(value) => ...
case Failure(value) => ...
}
But I would suggest only using Try locally and then go with the above mentioned data types from there.
Some advanced datatypes like cats.effect.IO or monix.reactive.Observable contain error handling natively.
I would also suggest looking into cats.data.EitherT for typeclass-based error handling. Read the documentation, it's definitely worth it.
As a sidenote, for everyone coming from Java, Scala treats all Exceptions as Java treats RuntimeExceptions. That means, even when an unsafe piece of code from one of your dependencies throws a (checked) IOException, Scala will never require you to catch or otherwise handle the exception. So as a rule of thumb, when using Java - dependencies, almost always wrap them in a Try (or an IO if they execute side effects or block the thread).
I think your reasoning is correct. If you have a simple total (opposite of partial) function with arguments that can have invalid types then the most common and simple solution is to return some optional result like Option, etc.
It's usually not advisable to throw exceptions as they break FP laws. You can use any library that can return a more advanced type than Option like Scalaz Validation if you need to compose results in ways that are awkward with Option.
Another two alternatives I could offer is to use:
Type constrained arguments that enforce preconditions. Example: val i: Int Refined Positive = 5 based on https://github.com/fthomas/refined. You can also write your own types which wrap primitive types and assert some properties. The problem here is if you have arguments that have multiple interdependent valid values which are mutually exclusive per argument. For instance x > 1 and y < 1 or x < 1 and y > 1. In such case you can return an optional value instead of using this approach.
Partial functions, which in the essence resemble optional return types: case i: Int if i > 0 => .... Docs: https://www.scala-lang.org/api/2.12.1/scala/PartialFunction.html.
For example:
PF's def lift: (A) ⇒ Option[B] converts PF to your regular function.
Turns this partial function into a plain function returning an Option
result.
Which is similar to returning an option. The problem with partial functions that they are a bit awkward to use and not fully FP friendly.
I think Predef.require belongs to very rare cases where you don't want to allow any invalid data to be constructed and is more of a stop-everything-if-this-happens kind of measure. Example would be that you get arguments you never supposed to get.
You use the return type of the function to indicate the type of the result.
If you want to describe a function that can fail for whatever reason, of the types you mentioned you would probably return Try or Either: I am going to "try" to give your a result, or I am going to return "either" a success or an failure.
Now you can specify a custom exception
case class ConditionException(message: String) extends RuntimeException(message)
that you would return if your condition is not satisfied, e.g
import scala.util._
def myfunction(a: String, minLength: Int): Try[String] = {
if(a.size < minLength) {
Failure(ConditionException(s"string $a is too short")
} else {
Success(a)
}
}
and with Either you would get
import scala.util._
def myfunction(a: String, minLength: Int): Either[ConditionException,String] = {
if(a.size < minLength) {
Left(ConditionException(s"string $a is too short")
} else {
Right(a)
}
}
Not that the Either solution clearly indicates the error your function might return

How unsafe is it to cast an arbitrary function X=>Y to X => Unit in scala?

More explicitly, can this code produce any errors in any scenrios:
def foreach[U](f :Int=>U) = f.asInstanceOf[Int=>Unit](1)
I know it works, and I have a vague idea why: any function, as an instance of a generic type, must define an erased version of apply and jvm performs type check only when the object is actually to be returned to a code where it had a concrete type (often miles away). So, in theory, as long as I never look at the returned value, I should be safe. I don't have an enough low-level understandings of java byte code, let alone scalac, to have any certainty about it.
Why would I want to do it? Look at the following example:
val b = new mutable.Buffer[Int]
val ints = Seq(1, 2, 3, 4)
ints foreach { b += _ }
It's a typical scala construct, as far as imperative style can be typical. foreach in this example takes an Int as an argument, and as scalac knows it to be an Int, it will create a closure with a specialized apply(x :Int). Unfortunately, its return type in this case is a mutable.Buffer[Int], which is an AnyRef. As far as I was able to see, scalac will never invoke a specialized apply providing an AnyVal argument if the result is an AnyRef (and vice versa). This means, that even if the caller applies the function to Int, underneath the function will box the argument and invoke the erased variant. Here of course it doesn't matter as they are boxed within the List anyway, but I'm talking about the principle.
For this reason I prefer to define this type of method as foreach(f :X=>Unit), rather than foreach[O](f: X=>O) as it is in TraversableOnce. If the input sequence in the example had such a signature, everything would compile just as fine, and the compiler would ignore the actual type of the expression and generate a function with Unit return type, which - when applied to an unboxed Int - would invoke directly void apply(Int x), without boxing.
The problem arises with interoperability - sometimes I need to call a method expecting a function with a Unit return type and all I have is a generic function returning Odin knows what. Of course, I could just write f(_) to box it in another function object instead of passing it directly, but it to large extent makes the whole optimisation of small tight loops moot.

Why should one prefer Option for error handling over exceptions in Scala?

So I'm learning functional Scala, and the book says exception breaks referential transparency, and thus Option should be used instead, like so:
def pattern(s: String): Option[Pattern] = {
try {
Some(Pattern.compile(s))
} catch {
case e: PatternSyntaxException => None
}
}
This seems pretty bad; I mean it seems equivalent to:
catch(Exception e){
return null;
}
Save for the fact that we can distinguish "null for error" from "null as genuine value". It seems it should at least return something that contains the error information like:
catch {
case e: Exception => Fail(e)
}
What am I missing?
At this specific section, Option is used mostly as an example because the operation used (calculating the mean) is a partial function, it doesn't produce a value for all possible values (the collection could be empty, thus there's no way to calculate the mean) and Option could be a valid case here. If you can't calculate the mean because the collection is empty just return a None.
But there are many other ways to solve this problem, you could use Either[L,R], with the Left being the error result and a Right as being the good result, you could still throw an exception and wrap it inside a Try object (which seems more common nowadays due to it's use in Promise and Future computations), you could use ScalaZ Validation if the error was actually a validation issue.
The main concept you should take a way from this part is that the error should be part of the return type of the function and not some magic operation (the exception) that can't be reasonably declared by the types.
And as a shameless plug, I did blog about Either and Try here.
It would be easier to answer this question if you weren't asking "why is Option better than exceptions?" and "why is Option better than null?" and "why is Option better than Try?" all at the same time.
The answer to the first of these questions is that using exceptions in situations that aren't truly exceptional muddles the control flow of your program. This is where referential transparency comes in—it's much easier for me (or you) to reason about your code if I can think in terms of values and don't have to keep track of where exceptions are being thrown and caught.
The answer to the second question (why not null?) is something like "Have you ever had to deal with NullPointerException in Java?".
For the third question, in general you're right—it's better to use a type like Either[Throwable, A] or Try[A] to represent computations that can fail, since they allow you to pass along more detailed information about the failure. In some cases, though, when a function can only fail in a single obvious way, it makes sense to use Option. For example, if I'm performing a lookup in a map, I probably don't really need or want something like an Either[NoSuchElementException, A], where the error is so abstract that I'd probably end up wrapping it in something more domain-specific anyway. So get on a map just returns an Option[A].
You should use util.Try:
scala> import java.util.regex.Pattern
import java.util.regex.Pattern
scala> def pattern(s: String): util.Try[Pattern] = util.Try(Pattern.compile(s))
pattern: (s: String)scala.util.Try[java.util.regex.Pattern]
scala> pattern("<?++")
res0: scala.util.Try[java.util.regex.Pattern] =
Failure(java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 3
<?++
^)
scala> pattern("[.*]")
res1: scala.util.Try[java.util.regex.Pattern] = Success([.*])
The naive example
def pattern(s: String): Pattern = {
Pattern.compile(s)
}
has a sideeffect, it can influence the programm that uses it by other means than its result(it can cause a exception). This is discouraged in functional programming, because it increases the code complexity.
The code
def pattern(s: String): Option[Pattern] = {
try {
Some(Pattern.compile(s))
} catch {
case e: PatternSyntaxException => None
}
}
encapsulates the side effect producing part of the programm. The information why the Pattern failed is lost, but sometimes it only matters whether or not it fails. If it matters why the method failed one can use Try(http://www.scala-lang.org/files/archive/nightly/docs/library/index.html#scala.util.Try):
def pattern(s: String): Try[Pattern] = {
Try(Pattern.compile(s))
}
I think the other two answers give you good suggestions about how to proceed. I would still argue that throwing an exception is well represented in Scala's type system, using the bottom type Nothing. So it is well-typed, and I wouldn't exactly called it "magic operation".
However... if your method can quite commonly result in an invalid value, that is if your call side quite reasonably wants to handle such an invalid value straight away, then using Option, Either or Try is a good approach. In a scenario, where your call site doesn't really know what to do with such an invalid value, especially if it is an exceptional condition and not the common case, then you should use exceptions IMO.
The problem of exception is precisely not that they are not well working with functional programming, but that they can be difficult to reason about when you have side effects. Because then your call site must ensure to undo the side effects in the case of an exception. If your call site is purely functional, passing on an exception doesn't do any damage.
If any functions that does anything with integers would declare its return type a Try because of division-by-zero or overflow possibilities, this might totally clutter your code. Another very good reason to use exceptions is invalid argument ranges, or requirements. If you expect an argument to be an integer between 0 and x, you may well throw an IllegalArgumentException if it does not meet that property; conveniently in Scala: require(a >= 0 && a < x).

Should I use Unit or leave out the return type for my scala method?

I am not sure what the difference is between specifying Unit as the return type of my scala method or leaving out the return type altogether. What is the difference?
Can anyone please advise?
Implicit Unit return type:
def f() {println("ABC")}
Explicit Unit return type:
def g(): Unit = {println("ABC")}
Return type inferred from the last method expression, still Unit because this is the type of println, but confusing:
def h() = println("ABC")
All the methods above are equivalent. I would prefer f() because the lack of = operator after method signature alone is enough for me. Use explicit : Unit when you want to extra document the method. The last form is confusing and actually treated as a warning in intellij-idea.
The = operator is crucial. If it is present it means: "please return whatever the last statement returns" in method body. Obviously you cannot use this syntax for abstract methods. If it is not, Unit is assumed.
The special syntax for procedures (methods returning Unit) was a mistake. Don't use it. It is confusing and dangerous for beginners with a Java/C(++) background. And it is a unnecessary special treatment. Always use the equal sign, with and without type inference (the latter should only be used for private members):
def foo(): Unit = someCodeReturningUnit()
private def bar() = someCodeReturningUnit()
The Scala community is divided about it. On the one hand, not using explicit return types means you can easily forget the = sign, which results in errors that are often annoying to track. On the other hand, Unit-returning methods having a different syntax puts them in a separate category, which pleases some people.
Personally, I'd rather this syntax would go away -- the errors resulting from missing = are annoying to track. But it's not even deprecated, so I'd rather take advantage of it than just suffer the problems of its existence.
So, use whatever you wish. There's going to be people criticizing your choice either way, and people praising it either way.