Compose stateless futures with side-effecting futures in Scala - scala

While composing futures with a for-yield structure, some with side effects, some without, I introduced a race condition because a future depending on a side effect did not take the result of that side effecting future as an argument.
In short:
future b reads a value that is changed by a side effect from future
a, but future a does not explicitly depend on the result of future b and could therefore happen before b finishes reading.
To solve the problem, my colleague introduced a dummy function taking as an argument the result of b and simply throwing it away. This was done to make the dependency explicit.
The actual code is here:
val futureConnection:Future[(Either[String, (Connection)],Boolean)] =
for {
scannerUser <- scanner.orFail("Scanning user not found")
scannedUser <- futureScannedUser.orFail("Scanned user not found")
existsInAnyDirection <- connections.existsInAnyDirection(scannerUser, scannedUser)
connection <- connections.createConnection(scannerUser, scannedUser, form.magicWord, existsInAnyDirection)
} yield {
(connection, existsInAnyDirection)
}
In this case, future b is
connections.existsInAnyDirection(scannerUser, scannedUser)
and future a with the dummy parameter is
connections.createConnection(scannerUser, scannedUser, form.magicWord, existsInAnyDirection)
Notice that the parameter existsInAnyDirection is never used inside createConnection. This effectively creates the dependency graph that createConnection cannot be initiated before existsInAnyDirection is completed.
Now for the question:
Is there a more sane way to make the dependency explicit?
Bonus Info
My own digging tells me, that the Scala Futures simply don't handle side effects very well. The methods on the Future trait that deal with side effects return Unit, whereas there could very well be results to read from a side effecting operation, i.e. error codes, generated ID's, any other meta info, really.

Future is handling side effect of postponed computation like A => Future[B].
You tried to mix few different side effects but composed only one of them Future[_].
Try to choose second container, this can be Product or State, depends on your side effect and think in way of composing of side-effects (may be you will need modand transformers). And after your code can looks like (simplest cases):
for {
scannerUser <- scanner.orFail("Scanning ...")
(scannedUser, magicWord) <- futureScannedUser.orFail("Scanned ...")
connection <- connections.createConnection(scannerUser, scannedUser, magicWord)
} yield {
(connection, existsInAnyDirection)
}
// OR
for {
(scannerUser, state) <- scanner.orFail("Scanning ...")
(scannedUser, nextState) <- futureScannedUser(state).orFail("Scanned ...")
connection <- connections.createConnection(scannerUser, scannedUser, nextState)
} yield {
(connection, existsInAnyDirection)
}

Related

How does the cats-effect IO monad really work?

I'm new to functional programming and Scala, and I was checking out the Cats Effect framework and trying to understand what the IO monad does. So far what I've understood is that writing code in the IO block is just a description of what needs to be done and nothing happens until you explicitly run using the unsafe methods provided, and also a way to make code that performs side-effects referentially transparent by actually not running it.
I tried executing the snippet below just to try to understand what it means:
object Playground extends App {
var out = 10
var state = "paused"
def changeState(newState: String): IO[Unit] = {
state = newState
IO(println("Updated state."))
}
def x(string: String): IO[Unit] = {
out += 1
IO(println(string))
}
val tuple1 = (x("one"), x("two"))
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
println(out)
println(state)
}
And the output was:
13
paused
I don't understand why the assignment state = newState does not run, but the increment and assign expression out += 1 run. Am I missing something obvious on how this is supposed to work? I could really use some help. I understand that I can get this to run using the unsafe methods.
In your particular example, I think what is going on is that regular imperative Scala coded is unaffected by the IO monad--it runs when it normally would under the rules of Scala.
When you run:
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
this immediately calls x. That has nothing to do with the IO monad; it's just how for comprehensions are defined. The first step is to evaluate the first statement so you can call flatMap on it.
As you observe, you never "run" the monadic result, so the argument to flatMap, the monadic continuation, is never invoked, resulting in no call to changeState. This is specific to the IO monad, as, e.g., the List monad's flatMap would have immediately invoked the function (unless it were an empty list).

Why does Finatra use flatMap and not just map?

This might be a really dumb question but I am trying to understand the logic behind using #flatMap and not just #map in this method definition in Finatra's HttpClient definition:
def executeJson[T: Manifest](request: Request, expectedStatus: Status = Status.Ok): Future[T] = {
execute(request) flatMap { httpResponse =>
if (httpResponse.status != expectedStatus) {
Future.exception(new HttpClientException(httpResponse.status, httpResponse.contentString))
} else {
Future(parseMessageBody[T](httpResponse, mapper.reader[T]))
.transformException { e =>
new HttpClientException(httpResponse.status, s"${e.getClass.getName} - ${e.getMessage}")
}
}
}
}
Why create a new Future when I can just use #map and instead have something like:
execute(request) map { httpResponse =>
if (httpResponse.status != expectedStatus) {
throw new HttpClientException(httpResponse.status, httpResponse.contentString)
} else {
try {
FinatraObjectMapper.parseResponseBody[T](httpResponse, mapper.reader[T])
} catch {
case e => throw new HttpClientException(httpResponse.status, s"${e.getClass.getName} - ${e.getMessage}")
}
}
}
Would this be purely a stylistic difference and using Future.exception is just better style in this case, whereas throwing almost looks like a side-effect (in reality it's not, as it doesn't exit the context of a Future) or is there something more behind it, such as order of execution and such?
Tl;dr:
What's the difference between throwing within a Future vs returning a Future.exception?
From a theoretical point of view, if we take away the exceptions part (they cannot be reasoned about using category theory anyway), then those two operations are completely identical as long as your construct of choice (in your case Twitter Future) forms a valid monad.
I don't want to go into length over these concepts, so I'm just going to present the laws directly (using Scala Future):
import scala.concurrent.ExecutionContext.Implicits.global
// Functor identity law
Future(42).map(x => x) == Future(42)
// Monad left-identity law
val f = (x: Int) => Future(x)
Future(42).flatMap(f) == f(42)
// combining those two, since every Monad is also a Functor, we get:
Future(42).map(x => x) == Future(42).flatMap(x => Future(x))
// and if we now generalise identity into any function:
Future(42).map(x => x + 20) == Future(42).flatMap(x => Future(x + 20))
So yes, as you already hinted, those two approaches are identical.
However, there are three comments that I have on this, given that we are including exceptions into the mix:
Be careful - when it comes to throwing exceptions, Scala Future (probably Twitter too) violates the left-identity law on purpose, in order to trade it off for some extra safety.
Example:
import scala.concurrent.ExecutionContext.Implicits.global
def sneakyFuture = {
throw new Exception("boom!")
Future(42)
}
val f1 = Future(42).flatMap(_ => sneakyFuture)
// Future(Failure(java.lang.Exception: boom!))
val f2 = sneakyFuture
// Exception in thread "main" java.lang.Exception: boom!
As #randbw mentioned, throwing exceptions is not idiomatic to FP and it violates principles such as purity of functions and referential transparency of values.
Scala and Twitter Future make it easy for you to just throw an exception - as long as it happens in a Future context, exception will not bubble up, but instead cause that Future to fail. However, that doesn't mean that literally throwing them around in your code should be permitted, because it ruins the structure of your programs (similarly to how GOTO statements do it, or break statements in loops, etc.).
Preferred practice is to always evaluate every code path into a value instead of throwing bombs around, which is why it's better to flatMap into a (failed) Future than to map into some code that throws a bomb.
Keep in mind referential transparency.
If you use map instead of flatMap and someone takes the code from the map and extracts it out into a function, then you're safer if this function returns a Future, otherwise someone might run it outside of Future context.
Example:
import scala.concurrent.ExecutionContext.Implicits.global
Future(42).map(x => {
// this should be done inside a Future
x + 1
})
This is fine. But after completely valid refactoring (which utilizes the rule of referential transparency), your codfe becomes this:
def f(x: Int) = {
// this should be done inside a Future
x + 1
}
Future(42).map(x => f(x))
And you will run into problems if someone calls f directly. It's much safer to wrap the code into a Future and flatMap on it.
Of course, you could argue that even when using flatMap someone could rip out the f from .flatMap(x => Future(f(x)), but it's not that likely. On the other hand, simply extracting the response processing logic into a separate function fits perfectly with the functional programming's idea of composing small functions into bigger ones, and it's likely to happen.
From my understanding of FP, exceptions are not thrown. This would be, as you said, a side-effect. Exceptions are instead values that are handled at some point in the execution of the program.
Cats (and i'm sure other libraries, too) employs this technique too (https://github.com/typelevel/cats/blob/master/core/src/main/scala/cats/ApplicativeError.scala).
Therefore, the flatMap call allows the exception to be contained within a satisfied Future here and handled at a later point in the program's execution where other exception value handling may also occur.

How to cancel a future action if another future did failed?

I have 2 futures (2 actions on db tables) and I want that before saving modifications to check if both futures have finished successfully.
Right now, I start second future inside the first (as a dependency), but I know is not the best option. I know I can use a for-comprehension to execute both futures in parallel, but even if one fail, the other will be executed (not tested yet)
firstFuture.dropColumn(tableName) match {
case Success(_) => secondFuture.deleteEntity(entity)
case Failure(e) => throw new Exception(e.getMessage)
}
// the first future alters a table, drops a column
// the second future deletes a row from another table
in this case, if first future is executed successfully, the second can fail. I want to revert the update of first future. I heard about SQL transactions, seems to be something like that, but how?
val futuresResult = for {
first <- firstFuture.dropColumn(tableName)
second <- secondFuture.deleteEntity(entity)
} yield (first, second)
A for-comprehension is much better in my case because I don't have dependencies between these two futures and can be executed in parallel, but this not solve my problem, the result can be (success, success) or (failed, success) for example.
Regarding Future running sequentially vs in parallel:
This is a bit tricky because Scala Future is designed to be eager. There are some other constructs across various Scala libraries that handle synchronous and asynchronous effects, such as cats IO, Monix Task, ZIO etc. which are designed in a lazy way, and they don't have this behaviour.
The thing with Future being eager is that it will start the computation as soon as it is can. Here "start" means schedule it on an ExecutionContext that is either selected explicitly or present implicitly. While it's technically possible that the execution is stalled a bit in case the scheduler decides to do so, it will most likely be started almost instantaneously.
So if you have a value of type Future, it's going to start running then and there. If you have a lazy value of type Future, or a function / method that returns a value of type Future, then it's not.
But even if all you have are simple values (no lazy vals or defs), if the Future definition is done inside the for-comprehension, then it means it's part of a monadic flatMap chain (if you don't understand this, ignore it for now) and it will be run in sequence, not in parallel. Why? This is not specific to Futures; every for-comprehension has the semantics of being a sequential chain in which you can pass the result of the previous step to the next step. So it's only logical that you can't run something in step n + 1 if it depends on something from step n.
Here's some code to demonstrate this.
val program = for {
_ <- Future { Thread.sleep(5000); println("f1") }
_ <- Future { Thread.sleep(5000); println("f2") }
} yield ()
Await.result(program, Duration.Inf)
This program will wait five seconds, then print "f1", then wait another five seconds, and then print "f2".
Now let's take a look at this:
val f1 = Future { Thread.sleep(5000); println("f1") }
val f2 = Future { Thread.sleep(5000); println("f2") }
val program = for {
_ <- f1
_ <- f2
} yield ()
Await.result(program, Duration.Inf)
The program, however, will print "f1" and "f2" simultaneously after five seconds.
Note that the sequence semantics are not really violated in the second case. f2 still has the opportunity to use the result of f1. But f2 is not using the result of f1; it's a standalone value that can be computed immediately (defined with a val). So if we change val f2 to a function, e.g. def f2(number: Int), then the execution changes:
val f1 = Future { Thread.sleep(5000); println("f1"); 42 }
def f2(number: Int) = Future { Thread.sleep(5000); println(number) }
val program = for {
number <- f1
_ <- f2(number)
} yield ()
As you would expect, this will print "f1" after five seconds, and only then will the other Future start, so it will print "42" after another five seconds.
Regarding transactions:
As #cbley mentioned in the comment, this sounds like you want database transactions. For example, in SQL databases this has a very specific meaning and it ensures the ACID properties.
If that's what you need, you need to solve it on the database layer. Future is too generic for that; it's just an effect type that models sync and async computations. When you see a Future value, just by looking at the type, you can't tell if it's the result of a database call or, say, some HTTP call.
For example, doobie describes every database query as a ConnectionIO type. You can have multiple queries lined up in a for-comprehension, just how you would have with Future:
val program = for {
a <- database.getA()
_ <- database.write("foo")
b <- database.getB()
} yield {
// use a and b
}
But unlike our earlier examples, here getA() and getB() don't return a value of type Future[A], but ConnectionIO[A]. What's cool about that is that doobie completely takes care of the fact that you probably want these queries to be run in a single transaction, so if getB() fails, "foo" will not be committed to the database.
So what you would do in that case is obtain the full description of your set of queries, wrap it into a single value program of type ConnectionIO, and once you want to actually run the transaction, you would do something like program.transact(myTransactor), where myTransactor is an instance of Transactor, a doobie construct that knows how to connect to your physical database.
And once you transact, your ConnectionIO[A] is turned into a Future[A]. If the transaction failed, you'll have a failed Future, and nothing will be really committed to your database.
If your database operations are independent of each other and can be run in parallel, doobie will also allow you to do that. Committing transactions via doobie, both in sequence and in parallel, is quite nicely explained in the docs.

performance enhancement using flatMap vs for-comprehension

I have an actor in my play app, that every tick (2 sec) sends a message to some method:
onSomething() : Future[Unit] = {
for {
a <- somethingThatReturnsFuture
b <- anotherThingThatReturnsFuture
}
}
This method has two calls that return future so I decided to use for-comprehension, but is it true that for-comprehension is blocking? So akka could not call this method again even with the 16 instances they run until the method complete?
If I would have my method to work with flatMap/map this will allow akka to have better performance? Like this:
onSomething() : Future[Unit] = {
somethingThatReturnsFuture.flatMap(res1 => {
anotherThingThatReturnsFuture.map(res2 => {
//whatever
})
})
}
thanks
As per Luis' comment, for-comprehensions are just syntactic sugar
Scala’s “for comprehensions” are syntactic sugar for composition of
multiple operations with foreach, map, flatMap, filter or withFilter.
Scala actually translates a for-expression into calls to those
methods, so any class providing them, or a subset of them, can be used
with for comprehensions.
which expands into underlying monadic operations, thus there should be no performance hit over using monadic operations directly. If your methods are independent of each other then you might gain same performance by taking advantage of Futures being eager and start them outside the for-comprehension like so
val aF = somethingThatReturnsFuture()
val bF = anotherThingThatReturnsFuture() // I started without waiting on anyone
for {
a <- aF
b <- bF
} yield {
a + b
}
However if calculation of b depends on a then you will not be able to kick them off in parallel
for {
a <- somethingThatReturnsFuture
b <- anotherThingThatReturnsFuture(a)
} yield {
a + b
}
Here anotherThingThatReturnsFuture "blocks" in the sense of having to wait on somethingThatReturnsFuture.
First of all.
Because the methods you are calling return a Future, none of them will block the Thread execution.
But what it is true, is that flatmap will concatenate sequentially the two operations. I mean, it will call the first method, then it returns inmediately, because it is a Future, and then it will call the second.
This will happen in the two options you have posted before (for comprehension and flatmap) because they are basically the same.
If you want to call the two methods at the same time, (in two different threads), so you don't know which of them will start to execute first, you have to use parallel collections.
But in your case, perhaps it is better to not use them because using futures will guarante that the thread will not block

Is it good practice to use a Try for flow control?

There's a function in our codebase with a signature like this:
def hasPermission(...): Try[Unit]
It basically checks if a user has permission to perform a certain action on a certain item. If the user has permission, it returns an empty Success, and if not, it returns a Failure with a specific exception type. This function is often used within comprehensions like this:
for {
_ <- hasPermission(...)
res <- doSomething()
} yield res
This seems like bad practice, but I can't quite articulate why I feel that way. To me, it seems like hasPermission should simply return a Boolean.
Is this an appropriate use of a Try?
edit: I think my question is different than the linked one because it's more specific. That one is asking a general question about returning Try[Unit], which I believe is acceptable in some cases.
If the method says hasPermission, then I'd say it should return a Boolean, or a Try[Boolean]. Try[Unit] is not as obvious as Try[Boolean], and the caller would have to inspect the exception to tell if it didn't have the permission, or whether it failed to retrieve the permission info.
Now that said, generally calling hasPermission and then acting depending on the result can cause race conditions (e.g. if the permission is revoked after hasPermission is called). Therefore it's often preferable to do def doSomething(...): Try[Unit] and then raise e.g. a NoPermissionException.
Generally, the use of exceptions for control flow is an anti-pattern
Try tries (no pun intended) to encapsulate that flow control, but if you don't actually need to use exceptions, there's no reason to. Scala 2.12's Either implementation seems close to what you probably want:
Either is right-biased, which means that Right is assumed to be the default case to operate on. If it is Left, operations like map and flatMap return the Left value unchanged:
Let's assume that you are implementing a web server, and this logic is in control of a particular path:
type Request = ...
type Response = String
type ResponseOr[+T] = Either[Response, T]
def checkPermission: ResponseOr[Unit] =
if(hasPermission) Right(())
else Left("insufficient permissions")
def doSomething(req: Request): ResponseOr[Something] =
if(argumentsAreBad(req)) Left("arguments are bad!")
else Right(new Something)
def makeResponse(result: Something): Response = ???
def handleIt(req: Request): Response = {
val result = for {
_ <- checkPermission
result <- doSomething
} yield makeResponse(result)
result.merge // special method for `Either[T, T]` that gives a `T`
}
You'll see similar behavior to Success vs Failure - think of Left as analagous to a Failure, where if at any point one of the flatMap/map steps returns a Left, that's the end result and the rest are "skipped". No exceptions required.