Running a background task to update a cache - scala

I'm creating a web server using Akka-HTTP. When receiving a request (to a certain route), the server calls a REST API. The call to this API is fairly long. Currently, I'm caching the result so that the following request uses the cache. I want to have a background task that updates periodically the cache (by calling the API). When receiving the request, the server would use the cached result (instead of having to call the API). The cache would only be updated through the background task.
How would I do that? I can use Akka's scheduling module to run the task periodically but I don't know how to update the cache once the task has run.
Currently, I have something like that:
val roster = Util.get_roster()
var pcache = new SRCache(roster)
val route = cache(lfuCache, keyerFunction)(
pathSingleSlash {
get {
complete(
HttpEntity(
ContentTypes.`text/html(UTF-8)`,Views.index(pcache.get).toString))
}
}
)
pcache.get calls the API (which is quite slow) and I want to replace the API call by something that simply returns the content of the cache and add a background task to update the cache.

Assuming you are using the cache from this example: https://doc.akka.io/docs/akka-http/current/common/caching.html.
import akka.http.caching.scaladsl.Cache
import akka.http.caching.scaladsl.CachingSettings
import akka.http.caching.LfuCache
import akka.http.scaladsl.server.RequestContext
import akka.http.scaladsl.server.RouteResult
import akka.http.scaladsl.model.Uri
import akka.http.scaladsl.server.directives.CachingDirectives._
import scala.concurrent.duration._
// Use the request's URI as the cache's key
val keyerFunction: PartialFunction[RequestContext, Uri] = {
case r: RequestContext => r.request.uri
}
val defaultCachingSettings = CachingSettings(system)
val lfuCacheSettings =
defaultCachingSettings.lfuCacheSettings
.withInitialCapacity(25)
.withMaxCapacity(50)
.withTimeToLive(20.seconds)
.withTimeToIdle(10.seconds)
val cachingSettings =
defaultCachingSettings.withLfuCacheSettings(lfuCacheSettings)
val lfuCache: Cache[Uri, RouteResult] = LfuCache(cachingSettings)
// Create the route
val route = cache(lfuCache, keyerFunction)(innerRoute)
Your background task should be scheduled to update lfuCache. Here is the interface of this cache class you can use: https://doc.akka.io/api/akka-http/10.1.10/akka/http/caching/scaladsl/Cache.html.
Methods of interest:
abstract def get(key: K): Option[Future[V]]
// Retrieves the future instance that is currently in the cache for the given key.
abstract def getOrLoad(key: K, loadValue: (K) ⇒ Future[V]): Future[V]
// Returns either the cached Future for the given key,
// or applies the given value loading function on the key, producing a Future[V].
abstract def put(key: K, mayBeValue: Future[V])
(implicit ex: ExecutionContext): Future[V]
// Cache the given future if not cached previously.
This is the Scheduler interface you can use:
https://doc.akka.io/docs/akka/current/scheduler.html
val cancellable =
system.scheduler.schedule(0 milliseconds, 5 seconds, ...)
Your scheduler will call lfuCache.put(...) every n seconds and update cache.
Next, your code can follow one of these patterns:
Use cached route as your are already doing with:
val route = cache(lfuCache, keyerFunction)(....
Or simply call lfuCache.get(key) or lfuCache.getOrLoad(...) without using caching dsl directives (without this: cache(lfuCache,...).
If you are using Cache class directly for putting and retrieving values, then consider using simpler keys instead of URI values.

Related

Is synchronous HTTP request wrapped in a Future considered CPU or IO bound?

Consider the following two snippets where first wraps scalaj-http requests with Future, whilst second uses async-http-client
Sync client wrapped with Future using global EC
object SyncClientWithFuture {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import scalaj.http.Http
val delay = "3000"
val slowApi = s"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = Future(Http(slowApi).asString).flatMap { _ =>
Future.sequence(List(
Future(Http(slowApi).asString),
Future(Http(slowApi).asString),
Future(Http(slowApi).asString)
))
}
time { Await.result(nestedF, Inf) }
}
}
Async client using global EC
object AsyncClient {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import sttp.client._
import sttp.client.asynchttpclient.future.AsyncHttpClientFutureBackend
implicit val sttpBackend = AsyncHttpClientFutureBackend()
val delay = "3000"
val slowApi = uri"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = basicRequest.get(slowApi).send().flatMap { _ =>
Future.sequence(List(
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send()
))
}
time { Await.result(nestedF, Inf) }
}
}
The snippets are using
Slowwly to simulate slow API
scalaj-http
async-http-client sttp backend
time
The former takes 12 seconds whilst the latter takes 6 seconds. It seems the former behaves as if it is CPU bound however I do not see how that is the case since Future#sequence should executes the HTTP requests in parallel? Why does synchronous client wrapped in Future behave differently from proper async client? Is it not the case that async client does the same kind of thing where it wraps calls in Futures under the hood?
Future#sequence should execute the HTTP requests in parallel?
First of all, Future#sequence doesn't execute anything. It just produces a future that completes when all parameters complete.
Evaluation (execution) of constructed futures starts immediately If there is a free thread in the EC. Otherwise, it simply submits it for a sort of queue.
I am sure that in the first case you have single thread execution of futures.
println(scala.concurrent.ExecutionContext.Implicits.global) -> parallelism = 6
Don't know why it is like this, it might that other 5 thread is always busy for some reason. You can experiment with explicitly created new EC with 5-10 threads.
The difference with the Async case that you don't create a future by yourself, it is provided by the library, that internally don't block the thread. It starts the async process, "subscribes" for a result, and returns the future, which completes when the result will come.
Actually, async lib could have another EC internally, but I doubt.
Btw, Futures are not supposed to contain slow/io/blocking evaluations without blocking. Otherwise, you potentially will block the main thread pool (EC) and your app will be completely frozen.

scala ZIO foreachPar

I'm new to parallel programming and ZIO, i'm trying to get data from an API, by parallel requests.
import sttp.client._
import zio.{Task, ZIO}
ZIO.foreach(files) { file =>
getData(file)
Task(file.getName)
}
def getData(file: File) = {
val data: String = readData(file)
val request = basicRequest.body(data).post(uri"$url")
.headers(content -> "text", char -> "utf-8")
.response(asString)
implicit val backend: SttpBackend[Identity, Nothing, NothingT] = HttpURLConnectionBackend()
request.send().body
resquest.Response match {
case Success(value) => {
val src = new PrintWriter(new File(filename))
src.write(value.toString)
src.close()
}
case Failure(exception) => log error
}
when i execute the program sequentially, it work as expected,
if i tried to run parallel, by changing ZIO.foreach to ZIO.foreachPar.
The program is terminating prematurely, i get that, i'm missing something basic here,
any help is appreciated to help me figure out the issue.
Generally speaking I wouldn't recommend mixing synchronous blocking code as you have with asynchronous non-blocking code which is the primary role of ZIO. There are some great talks out there on how to effectively use ZIO with the "world" so to speak.
There are two key points I would make, one ZIO lets you manage resources effectively by attaching allocation and finalization steps and two, "effects" we could say are "things which actually interact with the world" should be wrapped in the tightest scope possible*.
So lets go through this example a bit, first of all, I would not suggest using the default Identity backed backend with ZIO, I would recommend using the AsyncHttpClientZioBackend instead.
import sttp.client._
import zio.{Task, ZIO}
import zio.blocking.effectBlocking
import sttp.client.asynchttpclient.zio.AsyncHttpClientZioBackend
// Extract the common elements of the request
val baseRequest = basicRequest.post(uri"$url")
.headers(content -> "text", char -> "utf-8")
.response(asString)
// Produces a writer which is wrapped in a `Managed` allowing it to be properly
// closed after being used
def managedWriter(filename: String): Managed[IOException, PrintWriter] =
ZManaged.fromAutoCloseable(UIO(new PrintWriter(new File(filename))))
// This returns an effect which produces an `SttpBackend`, thus we flatMap over it
// to extract the backend.
val program = AsyncHttpClientZioBackend().flatMap { implicit backend =>
ZIO.foreachPar(files) { file =>
for {
// Wrap the synchronous reading of data in a `Task`, but which allows runs this effect on a "blocking" threadpool instead of blocking the main one.
data <- effectBlocking(readData(file))
// `send` will return a `Task` because it is using the implicit backend in scope
resp <- baseRequest.body(data).send()
// Build the managed writer, then "use" it to produce an effect, at the end of `use` it will automatically close the writer.
_ <- managedWriter("").use(w => Task(w.write(resp.body.toString)))
} yield ()
}
}
At this point you will just have the program which you will need to run using one of the unsafe methods or if you are using a zio.App through the main method.
* Not always possible or convenient, but it is useful because it prevents resource hogging by yielding tasks back to the runtime for scheduling.
When you use a purely functional IO library like ZIO, you must not call any side-effecting functions (like getData) except when calling factory methods like Task.effect or Task.apply.
ZIO.foreach(files) { file =>
Task {
getData(file)
file.getName
}
}

How to save & return data from within Future callback

I've been facing an issue the past few days regarding saving & handling data from Futures in Scala. I'm new to the language and the concept of both. Lagom's documentation on Cassandra says to implement roughly 9 files of code and I want to ensure my database code works before spreading it out over that much code.
Specifically, I'm currently trying to implement a proof of concept to send data to/from the cassandra database that lagom implements for you. So far I'm able to send and retrieve data to/from the database, but I'm having trouble returning that data since this all runs asynchronously, and also returning that the data returned successfully.
I've been playing around for a while; The retrieval code looks like this:
override def getBucket(logicalBucket: String) = ServiceCall[NotUsed, String] {
request => {
val returnList = ListBuffer[String]()
println("Retrieving Bucket " + logicalBucket)
val readingFromTable = "SELECT * FROM data_access_service_impl.s3buckets;"
//DB query
var rowsFuture: Future[Seq[Row]] = cassandraSession.selectAll(readingFromTable)
println(rowsFuture)
Await.result(rowsFuture, 10 seconds)
rowsFuture onSuccess {
case rows => {
println(rows)
for (row <- rows) println(row.getString("name"))
for (row <- rows) returnList += row.getString("name")
println("ReturnList: " + returnList.mkString)
}
}
rowsFuture onFailure {
case e => println("An error has occured: " + e.getMessage)
Future {"An error has occured: " + e.getMessage}
}
Future.successful("ReturnList: " + returnList.mkString)
}
}
When this runs, I get the expected database values to 'println' in the onSuccess callback. However, that same variable, which I use in the return statement, outside of the callback prints as empty (and returns empty data as well). This also happens in the 'insertion' function I use, where it doesn't always return variables I set within callback functions.
If I try to put the statement within the callback function, I'm given an error of 'returns Unit, expects Future[String]'. So I'm stuck where I can't return from within the callback functions, so I can't guarantee I'm returning data).
The goal for me is to return a string to the API so that it shows a list of all the s3 bucket names within the DB. That would mean iterating through the Future[Seq[Row]] datatype, and saving the data into a concatenated string. If somebody could help with that, they'll solve 2 weeks of problems I've had reading through Lagom, Akka, Datastax, and Cassandra documentation. I'm flabbergasted at this point (information overload) and there's no clearcut guide I've found on this.
For reference, here's the cassandraSession documentation:
LagomTutorial/Documentation Style Information with their only cassandra-query example
CassandraSession.scala code
The key thing to understand about Future, (and Option, and Either, and Try) is that you do not (in general) get values out of them, you bring computations into them. The most common way to do that is with the map and flatMap methods.
In your case, you want to take a Seq[Row] and transform it into a String. However, your Seq[Row] is wrapped in this opaque data structure called Future, so you can't just rows.mkString as you would if you actually had a Seq[Row]. So, instead of getting the value and performing computation on it, bring your computation rows.mkString to the data:
//DB query
val rowsFuture: Future[Seq[Row]] = cassandraSession.selectAll(readingFromTable)
val rowToString = (row: Row) => row.getString("name")
val computation = (rows: Seq[Row]) => rows.map(rowToString).mkString
// Computation to the data, rather than the other way around
val resultFuture = rowsFuture.map(computation)
Now, when rowsFuture is completed, the new future that you created by calling rowsFuture.map will be fulfilled with the result of calling computation on the Seq[Row] that you actually care about.
At that point you can just return resultFuture and everything will work as anticipated, because the code that calls getBucket is expecting a Future and will handle it as is appropriate.
Why is Future opaque?
The simple reason is because it represents a value that may not currently exist. You can only get the value when the value is there, but when you start your call it isn't there. Rather than have you poll some isComplete field yourself, the code lets you register computations (callbacks, like onSuccess and onFailure) or create new derived future values using map and flatMap.
The deeper reason is because Future is a Monad and monads encompass computation, but do not have an operation to extract that computation out of them
Replace the select for your specific select and the field that you want to obtain for your specific field.The example is only for test, is not a architecture propose.
package ldg.com.dbmodule.model
/**
* Created by ldipotet on 05/11/17.
*/
import com.datastax.driver.core.{Cluster, ResultSet, ResultSetFuture}
import scala.util.{Failure, Success, Try}
import java.util.concurrent.TimeUnit
import scala.collection.JavaConversions._
//Use Implicit Global Context is strongly discouraged! we must create
//our OWN execution CONTEXT !
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Future, _}
import scala.concurrent.duration._
object CassandraDataStaxClient {
//We create here a CallBack in Scala with the DataStax api
implicit def resultSetFutureToScala(rf: ResultSetFuture):
Future[ResultSet] = {
val promiseResult = Promise[ResultSet]()
val producer = Future {
getResultSet(rf) match {
//we write a promise with an specific value
case Success(resultset) => promiseResult success resultset
case Failure(e) => promiseResult failure (new
IllegalStateException)
}
}
promiseResult.future
}
def getResultSet: ResultSetFuture => Try[ResultSet] = rsetFuture => {
Try(
// Other choice can be:
// getUninterruptibly(long timeout, TimeUnit unit) throws
TimeoutException
// for an specific time
//can deal an IOException
rsetFuture.getUninterruptibly
)
}
def main(args: Array[String]) {
def defaultFutureUnit() = TimeUnit.SECONDS
val time = 20 seconds
//Better use addContactPoints and adds more tha one contact point
val cluster = Cluster.builder().addContactPoint("127.0.0.1").build()
val session = cluster.connect("myOwnKeySpace")
//session.executeAsync es asyncronous so we'll have here a
//ResultSetFuture
//converted to a resulset due to Implicitconversion
val future: Future[ResultSet] = session.executeAsync("SELECT * FROM
myOwnTable")
//blocking on a future is strongly discouraged!! next is an specific
//case
//to make sure that all of the futures have been completed
val results = Await.result(future,time).all()
results.foreach(row=>println(row.getString("any_String_Field"))
session.close()
cluster.close()
}
}

Nesting Futures in Play Action

Im using Play and have an action in which I want to do two things:-
firstly check my cache for a value
secondly, call a web service with the value
Since WS API returns a Future, I'm using Action.async.
My Redis cache module also returns a Future.
Assume I'm using another ExecutionContext appropriately for the potentially long running tasks.
Q. Can someone confirm if I'm on the right track by doing the following. I know I have not catered for the Exceptional cases in the below - just keeping it simple for brevity.
def token = Action.async { implicit request =>
// 1. Get Future for read on cache
val cacheFuture = scala.concurrent.Future {
cache.get[String](id)
}
// 2. Map inside cache Future to call web service
cacheFuture.map { result =>
WS.url(url).withQueryString("id" -> result).get().map { response =>
// process response
Ok(responseData)
}
}
}
My concern is that this may not be the most efficient way of doing things because I assume different threads may handle the task of completing each of the Futures.
Any recommendations for a better approach are greatly appreciated.
That's not specific to Play. I suggest you have a look at documentations explaining how Futures work.
val x: Future[FutureOp2ResType] = futureOp1(???).flatMap { res1 => futureOp2(res1, ???) }
Or with for-comprehension
val x: Future[TypeOfRes] = for {
res1 <- futureOp1(???)
res2 <- futureOp2(res1, ???)
// ...
} yield res
As for how the Futures are executed (using threads), it depends on which ExecutionContext you use (e.g. the global one, the Play one, ...).
WS.get returning a Future, it should not be called within cacheFuture.map, or it will returns a Future[Future[...]].

Is there any point in blocking for a future?

Suppose I have an application serving many requests. One of the requests takes a while to complete. I have the following code:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.Await
import scala.concurrent.Future
def longReq(data:String):String = {
val respFuture = Future{
// some code that computes resp but takes a long time
// can my application process other requests during this time?
resp = ??? // time-consuming step
}
Await.result(respFuture, 2 minutes)
}
If I don't use futures at all, the application will be blocked until resp is computed and no other requests can be served in parallel during that time. However, if I use futures and then block for resp using Await, will the application be able to serve other requests in parallel while resp is being computed?
In your particular example, assuming that longReq is called serially by a request loop, the answer is No, it cannot process anything else. For that longReq would have to return a future instead:
def longReq(data:String): Future[String] = {
Future {
// some code that computes resp but takes a long time
// can my application process other requests during this time?
resp = ??? // time-consuming step
}
}
Of course that just pushes the reason you likely used Await.result further down the line.
The purpose of using Future is to avoid blocking, but it is a turtles-all-the-way-down buy-in. If you want to use a Future, the final recipient has to be able to deal with getting the result in an asynchronous way, i.e. your request loop must have a way to capture the caller in such a way that when the future is finally completed the caller can be told about the result
Let's assume your request loop receives a request object that a response callback, then you would call longReq like this (assuming the use of longReq that returns a Future):
def asyncCall(request: Request): Unit = {
longReq(request.data).map( result => request.response(result) )
}
The most common scenario where you would use the flow is HTTP or other servers where the synchronous Request => Response cycle has an async equivalent of Request => Future[Response], which pretty much any modern server framework offers (Play, Finatra, Scalatra, etc.)
When to use Await.result
The one scenario, where it might be reasonable to use Await.result is if you have a bunch of Futures and are willing to block while the all complete (assuming the use of longReq that returns a Future):
val futures = allData.map(longReq)) // List[Future[String]]
val combined = Future.sequence(futures) // Future[List[String]]
val responses = Await.result(combined, 10.seconds) // List[String]
Of course, combined being a Future, it would still be better to map over it and handle the result asynchronously