Can Spark ForEachPartitionAsync be async on worker nodes? - scala

I write a custom spark sink. In my addBatch method I use ForEachPartitionAsync which if I'm not wrong only makes the driver work asynchronously, returning a future.
val work: FutureAction[Unit] = rdd.foreachPartitionAsync { rows =>
val sourceInfo: StreamSourceInfo = serializeRowsAsInputStream(schema, rows)
val ackIngestion = Future {
ingestRows(sourceInfo) } andThen {
case Success(ingestion) => ackIngestionDone(partitionId, ingestion)
}
Await.result(ackIngestion, timeOut) // I would like to remove this line..
}
work onSuccess {
case _ => // move data from temporary table, report success of all workers
}
work onFailure{
//delete tmp data
case t => throw t.getCause
}
I can't find a way to run the worker nodes without blocking on the Await call, as if I remove them a success is reported to the work future object although the future didn't really finish.
Is there a way to report to the driver that all the workers finished
their asynchronous jobs?
Note: I looked at the foreachPartitionAsync function and it has only one implementation that expects a function that returns a Unit (i would've expected it to have another one returning a future or maybe a CountDownLatch..)

Related

Scala IO wait during map external call

I will start mentioning I am very new to Scala but I have now to maintain a legacy code where some new feature are being tried to be include.
I have the following code:
Where a list is coming as a parameter where a new output needs to be processed. However it seems like code is not waiting for the response to the external service when processing.
def historyBet(jackpotListUser : List[JackpotBetHistory])(implicit MC: AppMarkerContext) : List[LegacyJackpotHistoryResponse] =
for {
bet <- jackpotListUser
prize = jackpotIntegratorService.findJackpotByJackpotHumanId(bet.jackpotHumanId) match {
case Some(jackpot : JackpotResponse) =>
...
extra code extracting price from jackpot : JackpotResponse
...
extra code generating result with prize
} yield result
How can I do a call to jackpotIntegratorService.findJackpotByJackpotHumanId to execute at that time. instead of returning something that F[Option....?
def findJackpotByJackpotHumanId(
jackpotHumanId: JackpotHumanId
)(implicit MC: AppMarkerContext): F[Option[JackpotResponse]] =
jackpotIntegratorRepo.findJackpotByJackpotHumanId(jackpotHumanId)
where it is finally implemented as:
override def findJackpotByJackpotHumanId(
jackpotHumanId: JackpotHumanId
)(implicit mc: AppMarkerContext): IO[Option[JackpotResponse]] =
... code calling an API which return the IO.
Thanks!
I thought I could do IO.await somewhere... but not sure where or how...
because in the "historyBet" function I got a F[] when it was an IO... so what is the syntax to be able to wait for the response and the continue?
Extra Comment:
The real issue we notice is that the method call is starting (the logs shows part of it) but the caller with in the maps continues too.
prize = jackpotIntegratorService.findJackpotByJackpotHumanId
this part of the code continues even when prize, which we want the final object JackpotResponse, not the IO or F.
So, if your method needs to call an IO then it must return an IO unless you unsafeRunSync them... but, as the name suggest, you should not do that.
So the return type is now: IO[List[LegacyJackpotHistoryResponse]
And can be implemented like this:
def historyBet(jackpotListUser: List[JackpotBetHistory])(implicit MC: AppMarkerContext): IO[List[LegacyJackpotHistoryResponse]] =
jackpotListUser.traverse { bet =>
jackpotIntegratorService.findJackpotByJackpotHumanId(bet.jackpotHumanId).map {
case Some(jackpot) =>
// ...
case None =>
// ...
}
}

Play framework action response delayed when creating multiple futures

In the following action it should return response immediately after hitting URL but instead it waits till all the Future blocks are started and then only sends response. It waits till "Starting for group 10" is logged in console even though "Returning from action" is logged immediately after hitting URL.
def test = Action { implicit request =>
Future.successful(0 to 150).foreach { items =>
items.grouped(15).zipWithIndex.foreach{ itemGroupWithIndex =>
val (itemGroup, index) = itemGroupWithIndex
Future {
logger.info("************** Starting for group " + index)
itemGroup.foreach { item =>
Thread.sleep(1000)
logger.info("Completed for item " + item)
}
}
}
}
logger.info("************** Returning from action **************")
Ok(views.html.test("test page"))
}
I am not able to understand reason behind this delay and how can i make this action send response immediately.
Play framework version 2.5.9
Your Action is not Async. You have a synchronous endpoint which is why you see the Returning from action printed immediately on the console. You should probably use the Action.async as your processing type. Using async Actions will drastically improve the overall performance of your application and is highly recommended when building high throughput and responsive web applications.
Two points in your code needs to change
Asynchronous Action: Because you are using Future, the action should be asynchronous: Action.async{...}.
No Blocking Code: The whole point of using Future and asynchronous programming is not to have a code that "blocks" the execution. So I suggest to remove the Thread.sleep(1000) part of the code.
Note that if you write your code non-blocking way; whenever the action method get the result; it will perform the required action(s), such as logging or providing the view.
This is because there are race conditions in your Futures.
You need to ensure that you are returning a single Future[T] and not a Future[Future[T]] (or any layered variances).
If the Futures are independent of each other, use Future.sequence
example:
def future: Future[String] = Future.successful("hi")
def action = Action.async { _ =>
val futures: Seq[Future[String]] = (1 to 50).map(_ => future()).toSeq
val oneFuture = Future.sequence(futures)
oneFuture
}
This will avoid race conditions
FYI, this has nothing to do with the Play framework. This is concurrent programming in scala.

Returning value from Scala future completion

Coming from a Java background, I have been trying to teach myself Scala for some time now. As part of that, I am doing a small pet project that exposes a HTTP endpoint that saves the registration numberof a vehicle against the owner and returns the status.
To give more context, I am using Slick as FRM which performs DB operations asynchronously and returns a Future.
Based on the output of this Future, I want to set the status variable to return back to the client.
Here, is the code
def addVehicleOwner(vehicle: Vehicle): String = {
var status = ""
val addFuture = db.run((vehicles returning vehicles.map(_.id)) += vehicle)
addFuture onComplete {
case Success(id) => {
BotLogger.info(LOGTAG, s"Vehicle registered at $id ")
status = String.format("Registration number - '%s' mapped to owner '%s' successfully", vehicle.registration,
vehicle.owner)
println(s"status inside success $status") //--------- (1)
}
case Failure(e: SQLException) if e.getMessage.contains("SQLITE_CONSTRAINT") => {
status = updateVehicleOwner(vehicle)
BotLogger.info(LOGTAG, s"Updated owner='${vehicle.owner}' for '${vehicle.registration}'")
}
case Failure(e) => {
BotLogger.error(LOGTAG, e)
status = "Sorry, unable to add now!"
}
}
exec(addFuture)
println(s"Status=$status") //--------- (2)
status
}
// Helper method for running a query in this example file:
def exec[T](sqlFuture: Future[T]):T = Await.result(sqlFuture, 1 seconds)
This was fairly simple in Java. With Scala, I am facing the following problems:
The expected value gets printed at (1), but (2) always prints empty string and same is what method returns. Can someone explain why?
I even tried marking the var status as #volatile var status, it still evaluates to empty string.
I know, that the above is not the functional way of doing things as I am muting state. What is the clean way of writing code for such cases.
Almost all the examples I could find described how to map the result of Success or handle Failure by doing a println. I want to do more than that.
What are some good references of small projects that I can refer to? Specially, that follow TDD.
Instead of relying on status to complete inside the closure, you can recover over the Future[T] which handle the exception if they occur, and always returns the result you want. This is taking advantage of the nature of expressions in Scala:
val addFuture =
db.run((vehicles returning vehicles.map(_.id)) += vehicle)
.recover {
case e: SQLException if e.getMessage.contains("SQLITE_CONSTRAINT") => {
val status = updateVehicleOwner(vehicle)
BotLogger.info(
LOGTAG,
s"Updated owner='${vehicle.owner}' for '${vehicle.registration}'"
)
status
}
case e => {
BotLogger.error(LOGTAG, e)
val status = "Sorry, unable to add now!"
status
}
}
val result: String = exec(addFuture)
println(s"Status = $result")
result
Note that Await.result should not be used in any production environment as it synchronously blocks on the Future, which is exactly the opposite of what you actually want. If you're already using a Future to delegate work, you want it to complete asynchronously. I'm assuming your exec method was simply for testing purposes.

Akka Future - Parallel versus Concurrent?

From the well-written Akka Concurrency:
As I understand, the diagram points out, both numSummer and charConcat will run on the same thread.
Is it possible to run each Future in parallel, i.e. on separate threads?
The picture on the left is them running in parallel.
The point of the illustration is that the Future.apply method is what kicks off the execution, so if it doesn't happen until the first future's result is flatMaped (as in the picture on the right), then you don't get the parallel execution.
(Note that by "kicked off", i mean the relevant ExecutionContext is told about the job. How it parallelizes is a different question and may depend on things like the size of its thread pool.)
Equivalent code for the left:
val numSummer = Future { ... } // execution kicked off
val charConcat = Future { ... } // execution kicked off
numSummer.flatMap { numsum =>
charConcat.map { string =>
(numsum, string)
}
}
and for the right:
Future { ... } // execution kicked off
.flatMap { numsum =>
Future { ... } // execution kicked off (Note that this does not happen until
// the first future's result (`numsum`) is available.)
.map { string =>
(numsum, string)
}
}

Consuming a service using WS in Play

I was hoping someone can briefly go over the various ways of consuming a service (this one just returns a string, normally it would be JSON but I just want to understand the concepts here).
My service:
def ping = Action {
Ok("pong")
}
Now in my Play (2.3.x) application, I want to call my client and display the response.
When working with Futures, I want to display the value.
I am a bit confused, what are all the ways I could call this method i.e. there are some ways I see that use Success/Failure,
val futureResponse: Future[String] = WS.url(url + "/ping").get().map { response =>
response.body
}
var resp = ""
futureResponse.onComplete {
case Success(str) => {
Logger.trace(s"future success $str")
resp = str
}
case Failure(ex) => {
Logger.trace(s"future failed")
resp = ex.toString
}
}
Ok(resp)
I can see the trace in STDOUT for success/failure, but my controller action just returns "" to my browser.
I understand that this is because it returns a FUTURE and my action finishes before the future returns.
How can I force it to wait?
What options do I have with error handling?
If you really want to block until feature is completed look at the Future.ready() and Future.result() methods. But you shouldn't.
The point about Future is that you can tell it how to use the result once it arrived, and then go on, no blocks required.
Future can be the result of an Action, in this case framework takes care of it:
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
}
Look at the documentation, it describes the topic very well.
As for the error-handling, you can use Future.recover() method. You should tell it what to return in case of error and it gives you new Future that you should return from your action.
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
.recover{ case e: Exception => InternalServerError(e.getMessage) }
}
So the basic way you consume service is to get result Future, transform it in the way you want by using monadic methods(the methods that return new transformed Future, like map, recover, etc..) and return it as a result of an Action.
You may want to look at Play 2.2 -Scala - How to chain Futures in Controller Action and Dealing with failed futures questions.