In general I have problems figuring out how to write tailrecursive functions when working 'inside' monads. Here is a quick example:
This is from a small example application that I am writing to better understand FP in Scala. First of all the user is prompted to enter a Team consisting of 7 Players. This function recursively reads the input:
import cats.effect.{ExitCode, IO, IOApp}
import cats.implicits._
case class Player (name: String)
case class Team (players: List[Player])
/**
* Reads a team of 7 players from the command line.
* #return
*/
def readTeam: IO[Team] = {
def go(team: Team): IO[Team] = { // here I'd like to add #tailrec
if(team.players.size >= 7){
IO(println("Enough players!!")) >>= (_ => IO(team))
} else {
for {
player <- readPlayer
team <- go(Team(team.players :+ player))
} yield team
}
}
go(Team(Nil))
}
private def readPlayer: IO[Player] = ???
Now what I'd like to achieve (mainly for educational purposes) is to be able to write a #tailrec notation in front of def go(team: Team). But I don't see a possibility to have the recursive call as my last statement because the last statement as far as I can see always has to 'lift' my Team into the IO monad.
Any hint would be greatly appreciated.
First of all, this isn't necessary, because IO is specifically designed to support stack-safe monadic recursion. From the docs:
IO is trampolined in its flatMap evaluation. This means that you can safely call flatMap in a recursive function of arbitrary depth, without fear of blowing the stack…
So your implementation will work just fine in terms of stack safety, even if instead of seven players you needed 70,000 players (although at that point you might need to worry about the heap).
This doesn't really answer your question, though, and of course even #tailrec is never necessary, since all it does is verify that the compiler is doing what you think it should be doing.
While it's not possible to write this method in such a way that it can be annotated with #tailrec, you can get a similar kind of assurance by using Cats's tailRecM. For example, the following is equivalent to your implementation:
import cats.effect.IO
import cats.syntax.functor._
case class Player (name: String)
case class Team (players: List[Player])
// For the sake of example.
def readPlayer: IO[Player] = IO(Player("foo"))
/**
* Reads a team of 7 players from the command line.
* #return
*/
def readTeam: IO[Team] = cats.Monad[IO].tailRecM(Team(Nil)) {
case team if team.players.size >= 7 =>
IO(println("Enough players!!")).as(Right(team))
case team =>
readPlayer.map(player => Left(Team(team.players :+ player)))
}
This says "start with an empty team and repeatedly add players until we have the necessary number", but without any explicit recursive calls. As long as the monad instance is lawful (according to Cats's definition—there's some question about whether tailRecM even belongs on Monad), you don't have to worry about stack safety.
As a side note, fa.as(b) is equivalent to fa >>= (_ => IO(b)) but more idiomatic.
Also as a side note (but maybe a more interesting one), you can write this method even more concisely (and to my eye more clearly) as follows:
import cats.effect.IO
import cats.syntax.monad._
case class Player (name: String)
case class Team (players: List[Player])
// For the sake of example.
def readPlayer: IO[Player] = IO(Player("foo"))
/**
* Reads a team of 7 players from the command line.
* #return
*/
def readTeam: IO[Team] = Team(Nil).iterateUntilM(team =>
readPlayer.map(player => Team(team.players :+ player))
)(_.players.size >= 7)
Again there are no explicit recursive calls, and it's even more declarative than the tailRecM version—it's just "perform this action iteratively until the given condition holds".
One postscript: you might wonder why you'd ever use tailRecM when IO#flatMap is stack safe, and one reason is that you may someday decide to make your program generic in the effect context (e.g. via the finally tagless pattern). In this case you should not assume that flatMap behaves the way you want it to, since lawfulness for cats.Monad doesn't require flatMap to be stack safe. In that case it would be best to avoid explicit recursive calls through flatMap and choose tailRecM or iterateUntilM, etc. instead, since these are guaranteed to be stack safe for any lawful monadic context.
Related
This question already has answers here:
Scalaz pipe operator connected with a list method
(2 answers)
Closed 4 years ago.
I find myself writing Scala programs more often recently.
I like to program in a style that often uses long method chains, but sometimes the transformation you want to apply is not a method of the object you want to transform. So I find myself defining:
class Better[T] (t: T){
def xform[U](func: T => U) = func(t)
}
implicit def improve[T](t: T) = new Better(t)
This allows my to write the chains I want, such as
val content = s3.getObject(bucket, key)
.getObjectContent
.xform(Source.fromInputStream)
.mkString
.toInt
Is there any similar facility already in the standard library? If so, how should I have discovered it without resorting to StackOverflow?
It's not the standard library, but it might be "standard enough": with Cats, you should be able to write something like
val content =
s3
.getObject(bucket, key)
.getObjectContent
.pure[Id].map(Source.fromInputStream)
.mkString
.toInt
where pure[Id] wraps the input value into the do-nothing Id monad, and then passes it as argument to Source.fromInputStream.
EDIT: This does not seem to work reliably. If the object already has a method map, then this method is called instead of Id.map.
Smaller example (just to demonstrate the necessary imports):
import cats.Id
import cats.syntax.applicative._
import cats.syntax.functor._
object Main {
def square(x: Int) = x * x
def main(args: Array[String]): Unit = {
println(42.pure[Id].map(square))
}
}
However, writing either
val content =
Source
.fromInputStream(
s3
.getObject(bucket, key)
.getObjectContent
)
.mkString
.toInt
or
val content =
Source
.fromInputStream(s3.getObject(bucket, key).getObjectContent)
.mkString
.toInt
does not require any extra dependencies, and frees you both from the burden of defining otherwise useless case classes, and also from the burden of reindenting your code every time you rename either content or s3.
It also shows how the expressions are actually nested, and what depends on what - there is a reason why the vast majority of mainstream programming languages of the past 50 years have a call-stack.
For the sake of simplicity suppose we have a method called listTail which is defined in the following way:
private def listTail(ls: List[Int]): List[Int] = {
ls.tail
}
Also we have a method which handles the exception when the list is empty.
private def handleEmptyList(ls: List[Int]): List[Int] = {
if(ls.isEmpty) List.empty[Int]
}
Now I want to create a safe version of the listTail method, which uses both methods:
import scala.util.{Try, Success, Failure}
def safeListTail(ls: List[Int]): List[Int] = {
val tryTail: Try[List[Int]] = Try(listTail(ls))
tryTail match {
case Success(list) => list
case Failure(_) => handleEmptyList(ls)
}
}
My question is, if the two private methods are already tested, then should I test the safe method as well? And if yes, how?
I was thinking just to check if the pattern matching cases are executed depending on the input. That is, when we hit the Failure case then the handleEmptyList method is executed. But I am now aware of how to check this.
Or do I need to refactor my code, and put everything in a single method? Even though maybe my private methods are much more complex than this in the example.
My test are written using ScalaTest.
Allowing your methods to throw intentionally is a bad idea and definitely isn't in the spirit of FP. It's probably better to capture failure in the type signature of methods which have the ability to fail.
private def listTail(ls: List[Int]): Try[List[Int]] = Try {
ls.tail
}
Now your users know that this will return either an Success or a Failure and there's no magic stack unrolling. This already makes it easier to test that method.
You can also get rid of the pattern matching with a simple def safeTailList(ls: List[Int]) = listTail(l).getOrElse(Nil) with this formulation -- pretty nice!
If you want to test this, you can make it package private and test it accordingly.
The better idea would be to reconsider your algorithm. There's machinery that makes getting the safe tail built-in:
def safeTailList(ls: List[Int]) = ls.drop(1)
It is actually the other way around: normally, you don't want to test private methods, only the public ones, because they are the ones that define your interactions with the outside world, as long as they work as promised, who cares what your private methods do, that's just implementation detail.
So, the bottom line is - just test your safeListTail, and that's it, no need to test the inner implementation separately.
BTW, you don't need the match there: Try(listTail(ls)).getOrElse(handleEmptyList(ls)) is equivalent to what you have there ... which is actually not a very good idea, because it swallows other exceptions, not just the one that is thrown when the list is empty, a better approach would be actually to reinstate match but get rid of Try:
ls match {
case Nil => handleEmptyList(ls)
case _ => listTail(ls)
}
This is a "real life" OO design question. I am working with Scala, and interested in specific Scala solutions, but I'm definitely open to hear generic thoughts.
I am implementing a branch-and-bound combinatorial optimization program. The algorithm itself is pretty easy to implement. For each different problem we just need to implement a class that contains information about what are the allowed neighbor states for the search, how to calculate the cost, and then potentially what is the lower bound, etc...
I also want to be able to experiment with different data structures. For instance, one way to store a logic formula is using a simple list of lists of integers. This represents a set of clauses, each integer a literal. We can have a much better performance though if we do something like a "two-literal watch list", and store some extra information about the formula in general.
That all would mean something like this
object BnBSolver[S<:BnBState]{
def solve(states: Seq[S], best_state:Option[S]): Option[S] = if (states.isEmpty) best_state else
val next_state = states.head
/* compare to best state, etc... */
val new_states = new_branches ++ states.tail
solve(new_states, new_best_state)
}
class BnBState[F<:Formula](clauses:F, assigned_variables) {
def cost: Int
def branches: Seq[BnBState] = {
val ll = clauses.pick_variable
List(
BnBState(clauses.assign(ll), ll :: assigned_variables),
BnBState(clauses.assign(-ll), -ll :: assigned_variables)
)
}
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Int) :F =
Formula(clauses.filterNot(_ contains ll)
.map(_.filterNot(_==-ll))))
}
Hopefully this is not too crazy, wrong or confusing. The whole issue here is that this assign method from a formula would usually take just the current literal that is going to be assigned. In the case of two-literal watch lists, though, you are doing some lazy thing that requires you to know later what literals have been previously assigned.
One way to fix this is you just keep this list of previously assigned literals in the data structure, maybe as a private thing. Make it a self-standing lazy data structure. But this list of the previous assignments is actually something that may be naturally available by whoever is using the Formula class. So it makes sense to allow whoever is using it to just provide the list every time you assign, if necessary.
The problem here is that we cannot now have an abstract Formula class that just declares a assign(ll:Int):Formula. In the normal case this is OK, but if this is a two-literal watch list Formula, it is actually an assign(literal: Int, previous_assignments: Seq[Int]).
From the point of view of the classes using it, it is kind of OK. But then how do we write generic code that can take all these different versions of Formula? Because of the drastic signature change, it cannot simply be an abstract method. We could maybe force the user to always provide the full assigned variables, but then this is a kind of a lie too. What to do?
The idea is the watch list class just becomes a kind of regular assign(Int) class if I write down some kind of adapter method that knows where to take the previous assignments from... I am thinking maybe with implicit we can cook something up.
I'll try to make my answer a bit general, since I'm not convinced I'm completely following what you are trying to do. Anyway...
Generally, the first thought should be to accept a common super-class as a parameter. Obviously that won't work with Int and Seq[Int].
You could just have two methods; have one call the other. For instance just wrap an Int into a Seq[Int] with one element and pass that to the other method.
You can also wrap the parameter in some custom class, e.g.
class Assignment {
...
}
def int2Assignment(n: Int): Assignment = ...
def seq2Assignment(s: Seq[Int]): Assignment = ...
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Assignment) :F = ...
}
And of course you would have the option to make those conversion methods implicit so that callers just have to import them, not call them explicitly.
Lastly, you could do this with a typeclass:
trait Assigner[A] {
...
}
implicit val intAssigner = new Assigner[Int] {
...
}
implicit val seqAssigner = new Assigner[Seq[Int]] {
...
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign[A : Assigner](ll: A) :F = ...
}
You could also make that type parameter at the class level:
case class Formula[A:Assigner,F<:Formula[A,F]](clauses:List[List[Int]]) {
def assign(ll: A) :F = ...
}
Which one of these paths is best is up to preference and how it might fit in with the rest of the code.
While checking Intel's BigDL repo, I stumbled upon this method:
private def recursiveListFiles(f: java.io.File, r: Regex): Array[File] = {
val these = f.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_, r))
}
I noticed that it was not tail recursive and decided to write a tail recursive version:
private def recursiveListFiles(f: File, r: Regex): Array[File] = {
#scala.annotation.tailrec def recursiveListFiles0(f: Array[File], r: Regex, a: Array[File]): Array[File] = {
f match {
case Array() => a
case htail => {
val these = htail.head.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
recursiveListFiles0(these.filter(_.isDirectory)++htail.tail, r, a ++ good)
}
}
}
recursiveListFiles0(Array[File](f), r, Array.empty[File])
}
What made this difficult compared to what I am used to is the concept that a File can be transformed into an Array[File] which adds another level of depth.
What is the theory behind recursion on datatypes that have the following member?
def listTs[T]: T => Traversable[T]
Short answer
If you generalize the idea and think of it as a monad (polymorphic thing working for arbitrary type params) then you won't be able to implement a tail recursive implementation.
Trampolines try to solve this very problem by providing a way to evaluate a recursive computation without overflowing the stack. The general idea is to create a stream of pairs of (result, computation). So at each step you'll have to return the computed result up to that point and a function to create the next result (aka thunk).
From Rich Dougherty’s blog:
A trampoline is a loop that repeatedly runs functions. Each function,
called a thunk, returns the next function for the loop to run. The
trampoline never runs more than one thunk at a time, so if you break
up your program into small enough thunks and bounce each one off the
trampoline, then you can be sure the stack won't grow too big.
More + References
In the categorical sense, the theory behind such data types is closely related to Cofree Monads and fold and unfold functions, and in general to Fixed point types.
See this fantastic talk: Fun and Games with Fix Cofree and Doobie by Rob Norris which discusses a use case very similar to your question.
This article about Free monads and Trampolines is also related to your first question: Stackless Scala With Free Monads.
See also this part of the Matryoshka docs. Matryoshka is a Scala library implementing monads around the concept of FixedPoint types.
I am a new to Scala coming from Java background, currently confused about the best practice considering Option[T].
I feel like using Option.map is just more functional and beautiful, but this is not a good argument to convince other people. Sometimes, isEmpty check feels more straight forward thus more readable. Is there any objective advantages, or is it just personal preference?
Example:
Variation 1:
someOption.map{ value =>
{
//some lines of code
}
} orElse(foo)
Variation 2:
if(someOption.isEmpty){
foo
} else{
val value = someOption.get
//some lines of code
}
I intentionally excluded the options to use fold or pattern matching. I am simply not pleased by the idea of treating Option as a collection right now, and using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO. But no matter why I dislike these options, I want to keep the scope of this question to be the above two variations as named in the title.
Is there any objective advantages, or is it just personal preference?
I think there's a thin line between objective advantages and personal preference. You cannot make one believe there is an absolute truth to either one.
The biggest advantage one gains from using the monadic nature of Scala constructs is composition. The ability to chain operations together without having to "worry" about the internal value is powerful, not only with Option[T], but also working with Future[T], Try[T], Either[A, B] and going back and forth between them (also see Monad Transformers).
Let's try and see how using predefined methods on Option[T] can help with control flow. For example, consider a case where you have an Option[Int] which you want to multiply only if it's greater than a value, otherwise return -1. In the imperative approach, we get:
val option: Option[Int] = generateOptionValue
var res: Int = if (option.isDefined) {
val value = option.get
if (value > 40) value * 2 else -1
} else -1
Using collections style method on Option, an equivalent would look like:
val result: Int = option
.filter(_ > 40)
.map(_ * 2)
.getOrElse(-1)
Let's now consider a case for composition. Let's say we have an operation which might throw an exception. Additionaly, this operation may or may not yield a value. If it returns a value, we want to query a database with that value, otherwise, return an empty string.
A look at the imperative approach with a try-catch block:
var result: String = _
try {
val maybeResult = dangerousMethod()
if (maybeResult.isDefined) {
result = queryDatabase(maybeResult.get)
} else result = ""
}
catch {
case NonFatal(e) => result = ""
}
Now let's consider using scala.util.Try along with an Option[String] and composing both together:
val result: String = Try(dangerousMethod())
.toOption
.flatten
.map(queryDatabase)
.getOrElse("")
I think this eventually boils down to which one can help you create clear control flow of your operations. Getting used to working with Option[T].map rather than Option[T].get will make your code safer.
To wrap up, I don't believe there's a single truth. I do believe that composition can lead to beautiful, readable, side effect deferring safe code and I'm all for it. I think the best way to show other people what you feel is by giving them examples as we just saw, and letting them feel for themselves the power they can leverage with these sets of tools.
using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO
If you do just want an isEmpty check, isEmpty/isDefined is perfectly fine. But in your case you also want to get the value. And using pattern matching for this is not abuse; it's precisely the basic use-case. Using get allows to very easily make errors like forgetting to check isDefined or making the wrong check:
if(someOption.isEmpty){
val value = someOption.get
//some lines of code
} else{
//some other lines
}
Hopefully testing would catch it, but there's no reason to settle for "hopefully".
Combinators (map and friends) are better than get for the same reason pattern matching is: they don't allow you to make this kind of mistake. Choosing between pattern matching and combinators is a different question. Generally combinators are preferred because they are more composable (as Yuval's answer explains). If you want to do something covered by a single combinator, I'd generally choose them; if you need a combination like map ... getOrElse, or a fold with multi-line branches, it depends on the specific case.
It seems similar to you in case of Option but just consider the case of Future. You will not be able to interact with the future's value after going out of Future monad.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Promise
import scala.util.{Success, Try}
// create a promise which we will complete after sometime
val promise = Promise[String]();
// Now lets consider the future contained in this promise
val future = promise.future;
val byGet = if (!future.value.isEmpty) {
val valTry = future.value.get
valTry match {
case Success(v) => v + " :: Added"
case _ => "DEFAULT :: Added"
}
} else "DEFAULT :: Added"
val byMap = future.map(s => s + " :: Added")
// promise was completed now
promise.complete(Try("PROMISE"))
//Now lets print both values
println(byGet)
// DEFAULT :: Added
println(byMap)
// Success(PROMISE :: Added)