I'm building a library that, as part of its functionality, makes HTTP requests. To get it to work in the multiple environments it'll be deployed in I'd like it to be able to work with or without Futures.
One option is to have the library parametrise the type of its response so you can create an instance of the library with type Future, or an instance with type Id, depending on whether you are using an asynchronous HTTP implementation. (Id might be an Identity monad - enough to expose a consistent interface to users)
I've started with that approach but it has got complicated. What I'd really like to do instead is use the Future type everywhere, boxing synchronous responses in a Future where necessary. However, I understand that using Futures will always entail some kind of threadpool. This won't fly in e.g. AppEngine (a required environment).
Is there a way to create a Future from a value that will be executed on the current thread and thus not cause problems in environments where it isn't possible to spawn threads?
(p.s. as an additional requirement, I need to be able to cross build the library back to Scala v2.9.1 which might limit the features available in scala.concurrent)
From what I understand you wish to execute something and then wrap the result with Future. In that case, you can always use Promise
val p = Promise[Int]
p success 42
val f = p.future
Hence you now have a future wrapper containing the final value 42
Promise is very well explained here .
Take a look at Scalaz version of Future trait. That's based on top of Trampoline mechanism which will be executing by the current thread unless fork or apply won't be called + that completely removes all ExecutionContext imports =)
Related
I'm having hard time understanding what value effect systems, like ZIO or Cats Effect.
It does not make code readable, e.g.:
val wrappedB = for {
a <- getA() // : ZIO[R, E, A]
b <- getB(a) // : ZIO[R, E, B]
} yield b
is no more readable to me than:
val a = getA() // : A
val b = getB(a) // : B
I could even argue, that the latter is more straight forward, because calling a function executes it, instead of just creating an effect or execution pipeline.
Delayed execution does not sound convincing, because all examples I've encountered so far are just executing the pipeline right away anyways. Being able to execute effects in parallel or multiple time can be achieved in simpler ways IMHO, e.g. C# has Parallel.ForEach
Composability. Functions can be composed without using effects, e.g. by plain composition.
Pure functional methods. In the end the pure instructions will be executed, so it seems like it's just pretending DB access is pure. It does not help to reason, because while construction of the instructions is pure, executing them is not.
I may be missing something or just downplaying the benefits above or maybe benefits are bigger in certain situations (e.g. complex domain).
What are the biggest selling points to use effect systems?
Because it makes it easy to deal with side effects. From your example:
a <- getA() // ZIO[R, E, A] (doesn't have to be ZIO btw)
val a = getA(): A
The first getA accounts in the effect and the possibility of returning an error, a side effect. This would be like getting an A from some db where the said A may not exist or that you lack permission to access it. The second getA would be like a simple def getA = "A".
How do we put these methods together ? What if one throws an error ? Should we still proceed to the next method or just quit it ? What if one blocks your thread ?
Hopefully that addresses your second point about composability. To quickly address the rest:
Delayed execution. There are probably two reasons for this. The first is you actually don't want to accidentally start an execution. Or just because you write it it starts right away. This breaks what the cool guys refer to as referential transparency. The second is concurrent execution requires a thread pool or execution context. Normally we want to have a centralized place where we can fine tune it for the whole app. And when building a library we can't provide it ourselves. It's the users who provide it. In fact we can also defer the effect. All you do is define how the effect should behave and the users can use ZIO, Monix, etc, it's totally up to them.
Purity. Technically speaking wrapping a process in a pure effect doesn't necessarily mean the underlying process actually uses it. Only the implementation knows if it's really used or not. What we can do is lift it to make it compatible with the composition.
what makes programming with ZIO or Cats great is when it comes to concurrent programming. They are also other reasons but this one is IMHO where I got the "Ah Ah! Now I got it".
Try to write a program that monitor the content of several folders and for each files added to the folders parse their content but not more than 4 files at the same time. (Like the example in the video "What Java developpers could learn from ZIO" By Adam Fraser on youtube https://www.youtube.com/watch?v=wxpkMojvz24 .
I mean this in ZIO is really easy to write :)
The all idea behind the fact that you combine data structure (A ZIO is a data structure) in order to make bigger data structure is so easy to understand that I would not want to code without it for complex problems :)
The two examples are not comparable since an error in the first statement will mark as faulty the value equal to the objectified sequence in the first form while it will halt the whole program in the second. The second form shall then be a function definition to properly encapsulate the two statements, followed by an affectation of the result of its call.
But more than that, in order to completely mimic the first form, some additional code has to be written, to catch exceptions and build a true faulty result, while all these things are made for free by ZIO...
I think that the ability to cleanly propagate the error state between successive statements is the real value of the ZIO approach. Any composite ZIO program fragment is then fully composable itself.
That's the main benefit of any workflow based approach, anyway.
It is this modularity which gives to effect handling its real value.
Since an effect is an action which structurally may produce errors, handling effects like this is an excellent way to handle errors in a composable way. In fact, handling effects consists in handling errors !
These two Scalaz types
scalaz.concurrent.Task[+A]
scalaz.effect.IO[A]
seem very conceptually similar. They both:
Represent a potentially side-effecting computation
Produce a success (A) or failure (Exception) result
Have Monad instances
Can be unsafely unwrapped with run or unsafePerformIO
How do they differ? Why do they both exist?
Core difference is that IO simply delays the execution of something but does it within a current thread. Task on the other hand is capable of executing something concurrently (thus the implicit ExecutorService).
Additionally, Task carries the semantics of scalaz's Future (Future that is more compossible than the classic scala version; Future that allows you to have higher control of concurrency by making forking explicitly defined and not execute tasks in parallel as soon as instantiated ). Furthermore, if you read the source for scalaz's Future it will point you to Task as a more robust version that can be used in prod.
Finally, note that attemptRun of the Task returns \/[Throwable, A] while unsafePerformIO of IO just returns A. This speaks to more robust treatment of real life error scenarios.
As far as I know, everywhere you would use IO to compose effects you would use Task in real production codebase.
Here is a good blog post about Task usage: Tim Perrett's Task Post
Disclaimer: I have no scala experience for now, so my question is connected with very basics.
Consider the following example (it may be incomplete):
import akka.actor.{ActorSystem, Props}
import akka.io.IO
import spray.can.Http
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
import akka.actor.Actor
import spray.routing._
import spray.http._
object Boot extends App {
implicit val system = ActorSystem("my-actor-system")
val service = system.actorOf(Props[MyActor], "my")
implicit val timeout = Timeout(5.seconds)
IO(Http) ? Http.Bind(service, interface = "localhost", port = 8080)
}
class MyActor extends Actor with MyService {
def actorRefFactory = context
def receive = runRoute(myRoute)
}
trait MyService extends HttpService {
val myRoute =
path("my") {
post {
complete {
"PONG"
}
}
}
}
My question is: what actually happens when control reaches complete block? The question seems to be too general, so let me split it.
I see creation of a single actor in the example. Does it mean that the application is single-threaded and uses only one cpu core?
What happens if I do blocking call inside complete?
If p. 1 is true and p. 2 will block, how do I dispatch requests to utilize all cpus? I see two ways: actor per request and actor per connection. The second one seems to be reasonable, but I cannot find the way to do it using spray library.
If the previous question is irrelevant, will detach directive do? And what about passing function returning Future to complete directive? What is the difference between detach and passing function returning the Future?
What is the proper way to configure number of working threads and balance requests/connections?
It would be great if you point me explanations in the official documentation. It is very extensive and I believe I am missing something.
Thank you.
It's answered here by Mathias - one of the Spray authors. Copying his reply for the reference:
In the end the only thing that really completes the request is a call
to requestContext.complete. Thereby it doesn't matter which thread
or Actor context this call is made from. All that matters is that it
does happen within the configured "request-timeout" period. You can of
course issue this call yourself in some way or another, but spray
gives you a number of pre-defined constructs that maybe fit your
architecture better than passing the actual RequestContext around.
Mainly these are:
The complete directive, which simply provides some sugar on top of the "raw" ctx => ctx.complete(…) function literal.
The Future Marshaller, which calls ctx.complete from an future.onComplete handler.
The produce directive, which extracts a function T => Unit that can later be used to complete the request with an instance of a custom
type.
Architecturally, in most cases, it's a good idea to not have the API
layer "leak into" the core of your application. I.e. the application
should not know anything about the API layer or HTTP. It should only
deal with objects of its own domain model. Therefore passing the
RequestContext directly to the application core is mostly not the best
solution.
Resorting to the "ask" and relying on the Future Marshaller is an
obvious, well understood and rather easy alternative. It comes with
the (small) drawback that an ask comes with a mandatory timeout check
itself which logically isn't required (since the spray-can layer
already takes care of request timeouts). The timeout on the ask is
required for technical reasons (so the underlying PromiseActorRef can
be cleaned up if the expected reply never comes).
Another alternative to passing the RequestContext around is the
produce directive (e.g. produce(instanceOf[Foo]) { completer =>
…). It extracts a function that you can pass on to the application
core. When your core logic calls complete(foo) the completion logic
is run and the request completed. Thereby the application core remains
decoupled from the API layer and the overhead is minimal. The
drawbacks of this approach are twofold: first the completer function
is not serializable, so you cannot use this approach across JVM
boundaries. And secondly the completion logic is now running directly
in an actor context of the application core, which might change
runtime behavior in unwanted ways if the Marshaller[Foo] has to do
non-trivial tasks.
A third alternative is to spawn a per-request actor in the API layer
and have it handle the response coming back from the application core.
Then you do not have to use an ask. Still, you end up with the same
problem that the PromiseActorRef underlying an ask has: how to clean
up if no response ever comes back from the application core? With a
re-request actor you have full freedom to implement a solution for
this question. However, if you decide to rely on a timeout (e.g. via
context.setReceiveTimeout) the benefits over an "ask" might be
non-existent.
Which of the described solutions best fits you architecture you need
to decide yourself. However, as I hopefully was able to show, you do
have a couple of alternatives to choose from.
To answer some of your specific questions: There is only a single actor/handler that services the route thus if you make it block Spray will block. This means you want to either complete the route immediately or dispatch work using either of the 3 options above.
There are many examples on the web for these 3 options. The easiest is to wrap your code in a Future. Check also "actor per request" option/example. In the end your architecture will define the most appropriate way to go.
Finally, Spray runs on top of Akka, so all Akka configuration still applies. See HOCON reference.conf and application.conf for Actor threading settings.
I need to add a WebSocket-to-TCP proxy to my Play 2.3 application, but while the outgoing TCP connection using Akka I/O supports back-pressure, I don't see anything for the WebSocket. There's clearly no support in the actor-based API, but James Roper says:
Iteratees handle this by design, you can't feed a new element into an
iteratee until last future it returns has been redeemed, because you
don't have a reference to it until then.
However, I don't see what he's referring to. Iteratee.foreach, as used in the examples, seems too simple. The only futures I see in the iteratee API are for completing the result of the computation. Should I be completing a Future[Unit] for each message or what?
Iteratee.foldM lets to pass a state along to each step, much like the regular fold operation, and return a future. If you do not have such a state you can just pass Unit and it will behave as a foreach that will not accept the next step until the future completes.
Here is an example of a utility function that does exactly that:
def foreachM[E](f: E => Future[Unit])(implicit ec: ExecutionContext): Iteratee[E, Unit] =
Iteratee.foldM[E, Unit](Unit)((_, e) => f(e))
Iteratee is not the same as Iterator. An Iteratee does indeed inherently support back-pressure (in fact you'll find yourself with the opposite problem - by default they don't do any buffering (at least within the pipeline - of course async sockets still have receive buffers), so you sometimes have to add an explicit buffering step to an enumerator/iteratee pipeline to get reasonable performance). The examples look simple but that just means the framework is doing what a framework does and making things easy. If you're doing a significant amount of work, or making async calls, in your handlers, then you shouldn't use the simple Iteratee.foreach, but instead use an API that accepts a Future-based handler; if you're blocking within an Iteratee then you block the whole thing, waste your threads, and defeat the point of using them at all.
I am reading this blog post that claims Futures are not "functional" since they are just wrappers of side-effectful computations. For instance, they contain RPC calls, HTTP requests, etc. Is it correct ?
The blog post gives the following example:
def twoUsersFeed(a: UserHandle, b: UserHandle)
(implicit ec: ExecutionContext): Future[Html] =
for {
feedA <- usersFeed(a)
feedB <- usersFeed(b)
} yield feedA ++ feedB
you lose the desired property: consistent results (the referential transparency). Also you lose the property of making as few requests as possible. It is difficult to use multi-valued requests and have composable code.
I am afraid I don't get it. Could you explain how we lose the consistent result in this case ?
The blog post fails to draw a proper distinction between Future itself and the way it's commonly used, IMO. You could write pure-functional code with Future, if you only ever wrote Futures that called pure, total functions; such code would be referentially transparent and "functional" in every remotely reasonable sense of the word.
What is true is that Futures give you limited control of side effects, if you use them with methods that have side effects. If you create a Futurewrapping webClient.get, then creating that Future will send a HTTP call. But that's not a fact about Future, that's a fact about webClient.get!
There is a grain of truth in this blog post. Separating expressing your computation from executing it, completely, via e.g. the Free monad, can result in more efficient and more testable code. E.g. you can create a "query language", where you express an operation like "fetch the profile photos of all the mutual friends of A and B" without actually running it. This makes it easier to test if your logic is correct (because it's very easy to make e.g. a test implementation that can "run" the same queries - or even just inspect the "query object" directly) and, as I think the blog post is trying to suggest, means you could e.g. combine multiple requests to fetch the same profile. (This isn't even purely a functional-programming concern - some OO books have the idea of a "command pattern" - though IME functional programming tools like for/yield syntax make it much easier to work in this way). Whereas if all you have is a fetchProfile method that, when run, immediately fires off a HTTP request, then if your code logic requests the same profile twice, there's no way to avoid fetching the same profile twice.
But that isn't really about Future per se, and IMO this blog post is more confusing than helpful.