Scala Requests module: Is it possible to make concurrent requests? - scala

I am using Scala concurrent module but I am not sure if it works asynchronously or synchronously ? I have a list of urls which are image urls and I want to make concurrent requests and get Array[Bytes]:
object Runner extends App {
def getBytesFromUrl(url:String) = {
requests.get(url).bytes
}
val urls = Seq(
"https://cdn.pixabay.com/photo/2014/02/27/16/10/flowers-276014__340.jpg",
"https://cdn.pixabay.com/photo/2014/02/27/16/10/flowers-276014__340.jpg",
"https://cdn.pixabay.com/photo/2014/02/27/16/10/flowers-276014__340.jpg"
)
// Would this make concurrent or sequential requests ?
val result = urls.map(url => getBytesFromUrl(url))
}
Would the above code make concurrent or sequential requests? And if it makes sequential requests then what's the right way to make concurrent requests?

I am using Scala concurrent module
You are not, at least not in the code shown.
Maybe you meant "request-scala library" in there?
but I am not sure if it works asynchronously or synchronously
For concurrency to exist you need asynchronously.
Now, again, if you were wondering about requests-scala then it is synchronously, that should be clear from the README and from the type signature.
Would the above code make concurrent or sequential requests?
Sequential, as the library explicitly states.
And if it makes sequential requests then what's the right way to make concurrent requests?
For one, you may consider using a different ecosystem since the author has been very clear that he doesn't think you need to be asynchronous at all to be efficient most of the time.
Another thing to consider is if you really need to make three HTTP calls concurrently.
Anyways, you can just use Future.traverse to do what you want.
object Runner {
def getBytesFromUrl(URL: String): Array[Byte] = {
requests.get(url).bytes
}
def getBytesFromUrls(urls: List[String]): Future[List[Array[Byte]]] =
Future.traverse(urls)(url => Future(getBytesFromUrl(url)))
}
You can then either compose that Future or Await it if you want to return to synchronous-land.
Personal disclaimer, if you actually have a lot of URLs and you need to do other asynchronous things with those bytes, like writing them to disk.
Then, I personally would recommend you to look to other ecosystems like typelevel, since those provide tools to write that kind of program in a more principled way.

Related

Are Akka actors overkill for doing data crunching/uploading?

I'm quite new to Scala as well as Akka actors. I'm really only reading about their use and implementation now. My background is largely js and python with a bit of C#.
A new service I have to write is going to receive REST requests, then do the following:
Open a socket connection to a message broker
Query an external REST service once
Make many big, long REST requests to another internal service, do math on the responses, and send the result out. Messages are sent through the socket connection as progress updates.
Scalability is the primary concern here, as we may normally receive ~10 small requests per minute, but at unknown times receive several jaw-droppingly enormous and long running requests at once.
Using Scala Futures, the very basic implementation would be something like this:
val smallResponse = smallHttpRequest(args)
smallResponse.onComplete match {
case Success(result) => {
result.data.grouped(10000).toList.forEach(subList => {
val bigResponse = getBigSlowHttpRequest(subList)
bigResponse.onSuccess {
case crunchableStuff => crunchAndDeliver(crunchableStuff)
}
})
}
case Failure(error) => handleError(error)
}
My understanding is that on a machine with many cores, letting the JVM handle all the threading underneath the above futures would allow for them all to run in parallel.
This could definitely be written using Akka actors, but I don't know what, if any, benefits I would realize in doing so. Would it be overkill to turn the above into an actor based process with a bunch of workers taking chunks of crunching?
For such an operation, I wouldn't go near Akka Actors -- it's way too much for what looks to be a very basic chain of async requests. The Actor system gives you the ability to safely handle and/or accumulate state in an actor, whilst your task can easily be modeled as a type safe stateless flow of data.
So Futures (or preferably one of the many lazy variants such as the Twitter Future, cats.IO, fs2 Task, Monix, etc) would easily handle that.
No IDE to hand, so there's bound to be a huge mistake in here somewhere!
val smallResponse = smallHttpRequest(args)
val result: Future[List[CrunchedData]] = smallResponse.map(result => {
result.data
.grouped(10000)
.toList
// List[X] => List[Future[X]]
.map(subList => getBigSlowHttpRequest(subList))
// List[Future[X]] => Future[List[X]] so flatmap
.flatMap(listOfFutures => Future.sequence(listOfFutures))
})
Afterwards you could pass the future back via the controller if using something like Finch, Http4s, Play, Akka Http, etc. Or manually take a look like in your example code.

Play 2.5.x (Scala) -- How does one put a value obtained via wsClient into a (lazy) val

The use case is actually fairly typical. A lot of web services use authorization tokens that you retrieve at the start of a session and you need to send those back on subsequent requests.
I know I can do it like this:
lazy val myData = {
val request = ws.url("/some/url").withAuth(user, password, WSAuthScheme.BASIC).withHeaders("Accept" -> "application/json")
Await.result(request.get().map{x => x.json }, 120.seconds)
}
That just feels wrong as all the docs say never us Await.
Is there a Future/Promise Scala style way of handling this?
I've found .onComplete which allows me to run code upon the completion of a Promise however without using a (mutable) var I see no way of getting a value in that scope into a lazy val in a different scope. Even with a var there is a possible timing issue -- hence the evils of mutable variables :)
Any other way of doing this?
Unfortunately, there is no way to make this non-blocking - lazy vals are designed to be synchronous and to block any thread accessing them until they are completed with a value (internally a lazy val is represented as a simple synchronized block).
A Future/Promise Scala way would be to use a Future[T] or a Promise[T] instead of a val x: T, but that way implies a great deal of overhead with executionContexts and maps upon each use of the val, and more optimal resource utilization may not be worth the decreased readability in all cases, so it may be OK to leave the Await there if you extensively use the value in many parts of your application.

Playframework non-blocking Action

Came across a problem I did not find an answer yet.
Running on playframework 2 with Scala.
Was required to write an Action method that performs multiple Future calls.
My question:
1) Is the attached code non-blocking and hence looking the way it should be ?
2) Is there a guarantee that both DAO results are caught at any given time ?
def index = Action.async {
val t2:Future[Tuple2[List[PlayerCol],List[CreatureCol]]] = for {
p <- PlayerDAO.findAll()
c <- CreatureDAO.findAlive()
}yield(p,c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
Thanks for your feedback.
Is the attached code non-blocking and hence looking the way it should be ?
That depends on a few things. First, I'm going to assume that PlayerDAO.findAll() and CreatureDAO.findAlive() return Future[List[PlayerCol]] and Future[List[CreatureCol]] respectively. What matters most is what these functions are actually calling themselves. Are they making JDBC calls, or using an asynchronous DB driver?
If the answer is JDBC (or some other synchronous db driver), then you're still blocking, and there's no way to make it fully "non-blocking". Why? Because JDBC calls block their current thread, and wrapping them in a Future won't fix that. In this situation, the most you can do is have them block a different ExecutionContext than the one Play is using to handle requests. This is generally a good idea, because if you have several db requests running concurrently, they can block Play's internal thread pool used for handling HTTP requests, and suddenly your server will have to wait to handle other requests (even if they don't require database calls).
For more on different ExecutionContexts see the thread pools documentation and this answer.
If you're answer is an asynchronous database driver like reactive mongo (there's also scalike-jdbc, and maybe some others), then you're in good shape, and I probably made you read a little more than you had to. In that scenario your index controller function would be fully non-blocking.
Is there a guarantee that both DAO results are caught at any given time ?
I'm not quite sure what you mean by this. In your current code, you're actually making these calls in sequence. CreatureDAO.findAlive() isn't executed until PlayerDAO.findAll() has returned. Since they are not dependent on each other, it seems like this isn't intentional. To make them run in parallel, you should instantiate the Futures before mapping them in a for-comprehension:
def index = Action.async {
val players: Future[List[PlayerCol]] = PlayerDAO.findAll()
val creatures: Future[List[CreatureCol]] = CreatureDAO.findAlive()
val t2: Future[(List[PlayerCol], List[CreatureCol])] = for {
p <- players
c <- creatures
} yield (p, c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
The only thing you can guarantee about having both results being completed is that yield isn't executed until the Futures have completed (or never, if they failed), and likewise the body of t2.map(...) isn't executed until t2 has been completed.
Further reading:
Are there any benefits in using non-async actions in Play Framework 2.2?
Understanding the Difference Between Non-Blocking Web Service Calls vs Non-Blocking JDBC

What effect does using Action.async have, since Play uses Netty which is non-blocking

Since Netty is a non-blocking server, what effect does changing an action to using .async?
def index = Action { ... }
versus
def index = Action.async { ... }
I understand that with .async you will get a Future[SimpleResult]. But since Netty is non-blocking, will Play do something similar under the covers anyway?
What effect will this have on throughput/scalability? Is this a hard question to answer where it depends on other factors?
The reason I am asking is, I have my own custom Action and I wanted to reset the cookie timeout for every page request so I am doing this which is a async call:
object MyAction extends ActionBuilder[abc123] {
def invokeBlock[A](request: Request[A], block: (abc123[A]) => Future[SimpleResult]) = {
...
val result: Future[SimpleResult] = block(new abc123(..., result))
result.map(_.withCookies(...))
}
}
The take away from the above snippet is I am using a Future[SimpleResult], is this similar to calling Action.async but this is inside of my Action itself?
I want to understand what effect this will have on my application design. It seems like just for the ability to set my cookie on a per request basis I have changed from blocking to non-blocking. But I am confused since Netty is non-blocking, maybe I haven't really changed anything in reality as it was already async?
Or have I simply created another async call embedded in another one?
Hoping someone can clarify this with some details and how or what effect this will have in performance/throughput.
def index = Action { ... } is non-blocking you are right.
The purpose of Action.async is simply to make it easier to work with Futures in your actions.
For example:
def index = Action.async {
val allOptionsFuture: Future[List[UserOption]] = optionService.findAll()
allOptionFuture map {
options =>
Ok(views.html.main(options))
}
}
Here my service returns a Future, and to avoid dealing with extracting the result I just map it to a Future[SimpleResult] and Action.async takes care of the rest.
If my service was returning List[UserOption] directly I could just use Action.apply, but under the hood it would still be non-blocking.
If you look at Action source code, you can even see that apply eventually calls async:
https://github.com/playframework/playframework/blob/2.3.x/framework/src/play/src/main/scala/play/api/mvc/Action.scala#L432
I happened to come across this question, I like the answer from #vptheron, and I also want to share something I read from book "Reactive Web Applications", which, I think, is also great.
The Action.async builder expects to be given a function of type Request => Future[Result]. Actions declared in this fashion are not much different from plain Action { request => ... } calls, the only difference is that Play knows that Action.async actions are already asynchronous, so it doesn’t wrap their contents in a future block.
That’s right — Play will by default schedule any Action body to be executed asynchronously against its default web worker pool by wrapping the execution in a future. The only difference between Action and Action.async is that in the second case, we’re taking care of providing an asynchronous computation.
It also presented one sample:
def listFiles = Action { implicit request =>
val files = new java.io.File(".").listFiles
Ok(files.map(_.getName).mkString(", "))
}
which is problematic, given its use of the blocking java.io.File API.
Here the java.io.File API is performing a blocking I/O operation, which means that one of the few threads of Play's web worker pool will be hijacked while the OS figures out the list of files in the execution directory. This is the kind of situation you should avoid at all costs, because it means that the worker pool may run out of threads.
-
The reactive audit tool, available at https://github.com/octo-online/reactive-audit, aims to point out blocking calls in a project.
Hope it helps, too.

Akka actor forward message with continuation

I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.