Using scala 2.11.7, rxscala_2.11 0.25.0, rxjava 1.0.16, my oddFutures callbacks don't get called in AsyncDisjointedChunkMultiprocessing.process():
package jj.async
import scala.concurrent.Future
import scala.concurrent.ExecutionContext
import rx.lang.scala.Observable
import jj.async.helpers._
/* Problem: How to multi-process records asynchronously in chunks.
Processing steps:
- fetch finite # of records from a repository (10 at-a-time (<= 10 for last batch) because of downstream limitations)
- process ea. chunk through a filter asynchronously (has 10-record input limit)
- compute the reverse of the filtered result
- enrich (also has 10-record input limit) filtered results asynchronously
- return enriched filtered results once all records are processed
*/
object AsyncDisjointedChunkMultiprocessing {
private implicit val ec = ExecutionContext.global
def process(): List[Enriched] = {
#volatile var oddsBuffer = Set[Int]()
#volatile var enrichedFutures = Observable just Set[Enriched]()
oddFutures.foreach(
odds =>
if (odds.size + oddsBuffer.size >= chunkSize) {
val chunkReach = chunkSize - oddsBuffer.size
val poors = oddsBuffer ++ odds take chunkReach
enrichedFutures = enrichedFutures + poors
oddsBuffer = odds drop chunkReach
} else {
oddsBuffer ++= odds
},
error => throw error,
() => enrichedFutures + oddsBuffer)
enrichedFutures.toBlocking.toList.flatten
}
private def oddFutures: Observable[Set[Int]] =
Repository.query(chunkSize) { chunk =>
evenFuture(chunk) map {
filtered => chunk -- filtered
}
}
private def evenFuture(chunk: Set[Int]): Future[Set[Int]] = {
checkSizeLimit(chunk)
Future { Remote even chunk }
}
}
class Enriched(i: Int)
object Enriched {
def apply(i: Int) = new Enriched(i)
def enrich(poors: Set[Int]): Set[Enriched] = {
checkSizeLimit(poors);
Thread.sleep(1000)
poors map { Enriched(_) }
}
}
object Repository {
def query(fetchSize: Int)(f: Set[Int] => Future[Set[Int]]): Observable[Set[Int]] = {
implicit val ec = ExecutionContext.global
Observable.from {
Thread.sleep(20)
f(Set(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
Thread.sleep(20)
f(Set(11, 12, 13, 14, 15, 16, 17, 18, 19, 20))
Thread.sleep(15)
f(Set(21, 22, 23, 24, 25))
}
}
}
package object helpers {
val chunkSize = 10
implicit class EnrichedObservable(enrichedObs: Observable[Set[Enriched]]) {
def +(poors: Set[Int]): Observable[Set[Enriched]] = {
enrichedObs merge Observable.just {
Enriched.enrich(poors)
}
}
}
def checkSizeLimit(set: Set[_ <: Any]) =
if (set.size > chunkSize) throw new IllegalArgumentException(s"$chunkSize-element limit violated: ${set.size}")
}
// unmodifiable
object Remote {
def even = { xs: Set[Int] =>
Thread.sleep(1500)
xs filter { _ % 2 == 0 }
}
}
Is there something wrong w/ the way I'm creating my Observable.from(Future) in Repository.query()?
The problem is that I am trying to create an observable from multiple futures but Observable.from(Future) only provides for a singular future (the compiler did not complain because I carelessly omitted the separating commas thereby usurping an unsuspecting overload). My sol'n:
object Repository {
def query(f: Set[Int] => Future[Set[Int]])(fetchSize: Int = 10): Observable[Future[Set[Int]]] =
// observable (as opposed to list) because modeling a process
// where the total result size is unknown beforehand.
// Also, not creating or applying because it blocks the futures
(1 to 21 by fetchSize).foldLeft(Observable just Future((Set[Int]()))) { (obs, i) =>
obs + f(DataSource.fetch(i)())
}
}
object DataSource {
def fetch(begin: Int)(fetchSize: Int = 10) = {
val end = begin + fetchSize
Thread.sleep(200)
(for {
i <- begin until end
} yield i).toSet
}
}
where:
implicit class FutureObservable(obs: Observable[Future[Set[Int]]]) {
def +(future: Future[Set[Int]]) =
obs merge (Observable just future)
}
Related
I need functionality like monix.Observable.bufferTimedAndCounted but with custom "weither". I found bufferTimedWithPressure operator which allow use item's weith:
val subj = PublishSubject[String]()
subj
.bufferTimedWithPressure(1.seconds, 5, _ => 3)
.subscribe(s => {
println(s)
Future.successful(Ack.Continue)
})
for (i <- 1 to 60) {
Thread.sleep(100)
subj.onNext(i.toString)
}
But emission happens every specified duration. I need behavior like bufferTimedAndCounted, so emission happens when buffer full. How to achive that?
I copied BufferTimedObservable from monix sources and slightly change it, add weight function (Note - i'm not tested it in all cases):
import java.util.concurrent.TimeUnit
import monix.execution.Ack.{Continue, Stop}
import monix.execution.cancelables.{CompositeCancelable, MultiAssignCancelable}
import monix.execution.{Ack, Cancelable}
import monix.reactive.Observable
import monix.reactive.observers.Subscriber
import scala.collection.mutable.ListBuffer
import scala.concurrent.Future
import scala.concurrent.duration.{Duration, FiniteDuration, MILLISECONDS}
/**
* Copied from monix sources, adopted to size instead count
*
*/
final class BufferTimedWithWeigthObservable[+A](source: Observable[A], timespan: FiniteDuration, maxSize: Int, sizeOf: A => Int)
extends Observable[Seq[A]] {
require(timespan > Duration.Zero, "timespan must be strictly positive")
require(maxSize >= 0, "maxSize must be positive")
def unsafeSubscribeFn(out: Subscriber[Seq[A]]): Cancelable = {
val periodicTask = MultiAssignCancelable()
val connection = source.unsafeSubscribeFn(new Subscriber[A] with Runnable {
self =>
implicit val scheduler = out.scheduler
private[this] val timespanMillis = timespan.toMillis
// MUST BE synchronized by `self`
private[this] var ack: Future[Ack] = Continue
// MUST BE synchronized by `self`
private[this] var buffer = ListBuffer.empty[A]
// MUST BE synchronized by `self`
private[this] var currentSize = 0
private[this] var sizeOfLast = 0
private[this] var expiresAt = scheduler.clockMonotonic(MILLISECONDS) + timespanMillis
locally {
// Scheduling the first tick, in the constructor
periodicTask := out.scheduler.scheduleOnce(timespanMillis, TimeUnit.MILLISECONDS, self)
}
// Runs periodically, every `timespan`
def run(): Unit = self.synchronized {
val now = scheduler.clockMonotonic(MILLISECONDS)
// Do we still have time remaining?
if (now < expiresAt) {
// If we still have time remaining, it's either a scheduler
// problem, or we rushed to signaling the bundle upon reaching
// the maximum size in onNext. So we sleep some more.
val remaining = expiresAt - now
periodicTask := scheduler.scheduleOnce(remaining, TimeUnit.MILLISECONDS, self)
} else if (buffer != null) {
// The timespan has passed since the last signal so we need
// to send the current bundle
sendNextAndReset(now, byPeriod = true).syncOnContinue(
// Schedule the next tick, but only after we are done
// sending the bundle
run())
}
}
// Must be synchronized by `self`
private def sendNextAndReset(now: Long, byPeriod: Boolean = false): Future[Ack] = {
val prepare = if (byPeriod) buffer else buffer.dropRight(1)
// Reset
if (byPeriod) {
buffer = ListBuffer.empty[A]
currentSize = 0
sizeOfLast = 0
} else {
buffer = buffer.takeRight(1)
currentSize = sizeOfLast
}
// Setting the time of the next scheduled tick
expiresAt = now + timespanMillis
ack = ack.syncTryFlatten.syncFlatMap {
case Continue => out.onNext(prepare)
case Stop => Stop
}
ack
}
def onNext(elem: A): Future[Ack] = self.synchronized {
val now = scheduler.clockMonotonic(MILLISECONDS)
buffer.append(elem)
sizeOfLast = sizeOf(elem)
currentSize = currentSize + sizeOfLast
// 9 and 9 true
//10 and 9
if (expiresAt <= now || (maxSize > 0 && maxSize < currentSize)) {
sendNextAndReset(now)
}
else {
Continue
}
}
def onError(ex: Throwable): Unit = self.synchronized {
periodicTask.cancel()
ack = Stop
buffer = null
out.onError(ex)
}
def onComplete(): Unit = self.synchronized {
periodicTask.cancel()
if (buffer.nonEmpty) {
val bundleToSend = buffer.toList
// In case the last onNext isn't finished, then
// we need to apply back-pressure, otherwise this
// onNext will break the contract.
ack.syncOnContinue {
out.onNext(bundleToSend)
out.onComplete()
}
} else {
// We can just stream directly
out.onComplete()
}
// GC relief
buffer = null
// Ensuring that nothing else happens
ack = Stop
}
})
CompositeCancelable(connection, periodicTask)
}
}
How use it:
object MonixImplicits {
implicit class RichObservable[+A](source: Observable[A]) {
def bufferTimedAndSized(timespan: FiniteDuration, maxSize: Int, sizeOf: A => Int): Observable[Seq[A]] = {
new BufferTimedWithWeigthObservable(source, timespan, maxSize, sizeOf)
}
}
}
import MonixImplicits._
someObservable.bufferTimedAndSized(1.seconds, 5, item => item.size)
Trying to execute a function in a given time frame, but if computation fails by TimeOut get a partial result instead of an empty exception.
The attached code solves it.
The timedRun function is from Computation with time limit
Any better approach?.
package ga
object Ga extends App {
//this is the ugly...
var bestResult = "best result";
try {
val result = timedRun(150)(bestEffort())
} catch {
case e: Exception =>
print ("timed at = ")
}
println(bestResult)
//dummy function
def bestEffort(): String = {
var res = 0
for (i <- 0 until 100000) {
res = i
bestResult = s" $res"
}
" " + res
}
//This is the elegant part from stackoverflow gruenewa
#throws(classOf[java.util.concurrent.TimeoutException])
def timedRun[F](timeout: Long)(f: => F): F = {
import java.util.concurrent.{ Callable, FutureTask, TimeUnit }
val task = new FutureTask(new Callable[F]() {
def call() = f
})
new Thread(task).start()
task.get(timeout, TimeUnit.MILLISECONDS)
}
}
I would introduce a small intermediate class for more explicitly communicating the partial results between threads. That way you don't have to modify non-local state in any surprising ways. Then you can also just catch the exception within the timedRun method:
class Result[A](var result: A)
val result = timedRun(150)("best result")(bestEffort)
println(result)
//dummy function
def bestEffort(r: Result[String]): Unit = {
var res = 0
for (i <- 0 until 100000) {
res = i
r.result = s" $res"
}
r.result = " " + res
}
def timedRun[A](timeout: Long)(initial: A)(f: Result[A] => _): A = {
import java.util.concurrent.{ Callable, FutureTask, TimeUnit }
val result = new Result(initial)
val task = new FutureTask(new Callable[A]() {
def call() = { f(result); result.result }
})
new Thread(task).start()
try {
task.get(timeout, TimeUnit.MILLISECONDS)
} catch {
case e: java.util.concurrent.TimeoutException => result.result
}
}
It's admittedly a bit awkward since you don't usually have the "return value" of a function passed in as a parameter. But I think it's the least-radical modification of your code that makes sense. You could also consider modeling your computation as something that returns a Stream or Iterator of partial results, and then essentially do .takeWhile(notTimedOut).last. But how feasible that is really depends on the actual computation.
First, you need to use one of the solution to recover after the future timed out which are unfortunately not built-in in Scala:
See: Scala Futures - built in timeout?
For example:
def withTimeout[T](fut:Future[T])(implicit ec:ExecutionContext, after:Duration) = {
val prom = Promise[T]()
val timeout = TimeoutScheduler.scheduleTimeout(prom, after)
val combinedFut = Future.firstCompletedOf(List(fut, prom.future))
fut onComplete{case result => timeout.cancel()}
combinedFut
}
Then it is easy:
var bestResult = "best result"
val expensiveFunction = Future {
var res = 0
for (i <- 0 until 10000) {
Thread.sleep(10)
res = i
bestResult = s" $res"
}
" " + res
}
val timeoutFuture = withTimeout(expensiveFunction) recover {
case _: TimeoutException => bestResult
}
println(Await.result(timeoutFuture, 1 seconds))
I'm implementing a iterator to a HTTP resource, which I can recover a list of elements paged, I tried to do this with a plain Iterator, but it's a blocking implementation, and since I'm using akka it makes my dispatcher go a little crazy.
My will it's to implement the same iterator using akka-stream. The problem is I need bit different retry strategy.
The service returns a list of elements, identified by a id, and sometimes when I query for the next page, the service returns the same products on the current page.
My current algorithm is this.
var seenIds = Set.empty
var position = 0
def isProblematicPage(elements: Seq[Element]) Boolean = {
val currentIds = elements.map(_.id)
val intersection = seenIds & currentIds
val hasOnlyNewIds = intersection.isEmpty
if (hasOnlyNewIds) {
seenIds = seenIds | currentIds
}
!hasOnlyNewIds
}
def incrementPage(): Unit = {
position += 10
}
def doBackOff(attempt: Int): Unit = {
// Backoff logic
}
#tailrec
def fetchPage(attempt: Int = 0): Iterator[Element] = {
if (attempt > MaxRetries) {
incrementPage()
return Iterator.empty
}
val eventualPage = service.retrievePage(position, position + 10)
val page = Await.result(eventualPage, 5 minutes)
if (isProblematicPage(page)) {
doBackOff(attempt)
fetchPage(attempt + 1)
} else {
incrementPage()
page.iterator
}
}
I'm doing the implementation using akka-streams but I can't figure out how to accumulate the pages and test for repetition using the streams structure.
Any suggestions?
The Flow.scan method is useful in such situations.
I would start your stream with a source of positions:
type Position = Int
//0,10,20,...
def positionIterator() : Iterator[Position] = Iterator from (0,10)
val positionSource : Source[Position,_] = Source fromIterator positionIterator
This position source can then be directed to a Flow.scan which utilizes a function similar to your fetchPage (side note: you should avoid awaits as much as possible, there is a way to not have awaits in your code but that is outside the scope of your original question). The new function needs to take in the "state" of already seen Elements:
def fetchPageWithState(service : Service)
(seenEls : Set[Element], position : Position) : Set[Elements] = {
val maxRetries = 10
val seenIds = seenEls map (_.id)
#tailrec
def readPosition(attempt : Int) : Seq[Elements] = {
if(attempt > maxRetries)
Iterator.empty
else {
val eventualPage : Seq[Element] =
Await.result(service.retrievePage(position, position + 10), 5 minutes)
if(eventualPage.map(_.id).exists(seenIds.contains)) {
doBackOff(attempt)
readPosition(attempt + 1)
}
else
eventualPage
}
}//end def readPosition
seenEls ++ readPosition(0).toSet
}//end def fetchPageWithState
This can now be used within a Flow:
def fetchFlow(service : Service) : Flow[Position, Set[Element],_] =
Flow[Position].scan(Set.empty[Element])(fetchPageWithState(service))
The new Flow can be easily connected to your Position Source to create a Source of Set[Element]:
def elementsSource(service : Service) : Source[Set[Element], _] =
positionSource via fetchFlow(service)
Each new value from elementsSource will be an ever growing Set of unique Elements from fetched pages.
The Flow.scan stage was a good advice, but it lacked the feature to deal with futures, so I implemented it asynchronous version Flow.scanAsync it's now available on akka 2.4.12.
The current implementation is:
val service: WebService
val maxTries: Int
val backOff: FiniteDuration
def retry[T](zero: T, attempt: Int = 0)(f: => Future[T]): Future[T] = {
f.recoverWith {
case ex if attempt >= maxAttempts =>
Future(zero)
case ex =>
akka.pattern.after(backOff, system.scheduler)(retry(zero, attempt + 1)(f))
}
}
def isProblematicPage(lastPage: Seq[Element], currPage: Seq[Element]): Boolean = {
val lastPageIds = lastPage.map(_.id).toSet
val currPageIds = currPage.map(_.id).toSet
val intersection = lastPageIds & currPageIds
intersection.nonEmpty
}
def retrievePage(lastPage: Seq[Element], startIndex: Int): Future[Seq[Element]] = {
retry(Seq.empty) {
service.fetchPage(startIndex).map { currPage: Seq[Element] =>
if (isProblematicPage(lastPage, currPage)) throw new ProblematicPageException(startIndex)
else currPage
}
}
}
val pagesRange: Range = Range(0, maxItems, pageSize)
val scanAsyncFlow = Flow[Int].via(ScanAsync(Seq.empty)(retrievePage))
Source(pagesRange)
.via(scanAsyncFlow)
.mapConcat(identity)
.runWith(Sink.seq)
Thanks Ramon for the advice :)
Basically I'm running two futures queries on cassandra, then I need to do some computation and return the value(an average of values).
Here is my code:
object TestWrapFuture {
def main(args: Array[String]) {
val category = 5392
ExtensiveComputation.average(category).onComplete {
case Success(s) => println(s)
case Failure(f) => throw new Exception(f)
}
}
}
class ExtensiveComputation {
val volume = new ListBuffer[Int]()
def average(categoryId: Int): Future[Double] = {
val productsByCategory = Product.findProductsByCategory(categoryId)
productsByCategory.map { prods =>
for (prod <- prods if prod._2) {
Sku.findSkusByProductId(prod._1).map { skus =>
skus.foreach(sku => volume += (sku.height.get * sku.width.get * sku.length.get))
}
}
val average = volume.sum / volume.length
average
}
}
}
object ExtensiveComputation extends ExtensiveComputation
So what is the problem?
The skus.foreach are appending the result value in a ListBuffer. Since everything is async, when I try to obtain the result in my main, I got an error saying I can't divide by zero.
Indeed, since my Sku.findSkusByProduct returns a Future, when I try to compute the average, the volume is empty.
Should I block anything prior this computation, or should I do anything else?
EDIT
Well, I tried to block like this:
val volume = new ListBuffer[Int]()
def average(categoryId: Int): Future[Double] = {
val productsByCategory = Product.findProductsByCategory(categoryId)
val blocked = productsByCategory.map { prods =>
for (prod <- prods if prod._2) {
Sku.findSkusByProductId(prod._1).map { skus =>
skus.foreach(sku => volume += (sku.height.get * sku.width.get * sku.length.get))
}
}
}
Await.result(blocked, Duration.Inf)
val average = volume.sum / volume.length
Future.successful(average)
}
Then I got two different results from this piece of code:
Sku.findSkusByProductId(prod._1).map { skus =>
skus.foreach(sku => volume += (sku.height.get * sku.width.get * sku.length.get))
}
1 - When there are just a few like 50 to be looked up on cassandra, it just runs and gives me the result
2 - When there are many like 1000, it gives me
java.lang.ArithmeticException: / by zero
EDIT 2
I tried this code as #Olivier Michallat proposed
def average(categoryId: Int): Future[Double] = {
val productsByCategory = Product.findProductsByCategory(categoryId)
productsByCategory.map { prods =>
for (prod <- prods if prod._2) findBlocking(prod._1)
volume.sum / volume.length
}
}
def findBlocking(productId: Long) = {
val future = Sku.findSkusByProductId(productId).map { skus =>
skus.foreach(sku => volume += (sku.height.get * sku.width.get * sku.length.get))
}
Await.result(future, Duration.Inf)
}
And the following as #kolmar proposed:
def average(categoryId: Int): Future[Int] = {
for {
prods <- Product.findProductsByCategory(categoryId)
filtered = prods.filter(_._2)
skus <- Future.traverse(filtered)(p => Sku.findSkusByProductId(p._1))
} yield {
val volumes = skus.flatten.map(sku => sku.height.get * sku.width.get * sku.length.get)
volumes.sum / volumes.size
}
}
Both works with a few skus to find like 50, but both fails with many skus to find like 1000 throwing ArithmeticException: / by zero
It seems that it could not compute everything before returning the future...
You need to wait until all the futures generated by findSkusByProductId have completed before you compute the average. So accumulate all these futures in a Seq, call Future.sequence on it to get a Future[Seq], then map that future to a function that computes the average. Then replace productsByCategory.map with a flatMap.
Since you have to call a function that returns a Future on a sequence of arguments, it's better to use Future.traverse for that.
For example:
object ExtensiveComputation {
def average(categoryId: Int): Future[Double] = {
for {
products <- Product.findProductsByCategory(categoryId)
filtered = products.filter(_._2)
skus <- Future.traverse(filtered)(p => Sku.findSkusByProductId(p._1))
} yield {
val volumes = skus.map { sku =>
sku.height.get * sku.width.get * sku.length.get }
volumes.sum / volumes.size
}
}
}
I am doing Exercises from Learning Concurrent Programming in Scala.
For an exercise question in code comment.
Program prints proper output of HTML contents for proper URL and timeout sufficiently enough.
Program prints "Error occured" for proper URL and low timeout.
However for invalid URL "Error occured" is not printed. What is the problem with the code below?
/*
* Implement a command-line program that asks the user to input a URL of some website,
* and displays the HTML of that website. Between the time that the user hits ENTER and
* the time that the HTML is retrieved, the program should repetitively print a . to the
* standard output every 50 milliseconds, with a two seconds timeout. Use only futures
* and promises, and avoid the synchronization primitives from the previous chapters.
* You may reuse the timeout method defined in this chapter.
*/
object Excersices extends App {
val timer = new Timer()
def timeout(t: Long = 1000): Future[Unit] = {
val p = Promise[Unit]
val timer = new Timer(true)
timer.schedule(new TimerTask() {
override def run() = {
p success ()
timer cancel()
}
}, t)
p future
}
def printDot = println(".")
val taskOfPrintingDot = new TimerTask {
override def run() = printDot
}
println("Enter a URL")
val lines = io.Source.stdin.getLines()
val url = if (lines hasNext) Some(lines next) else None
timer.schedule(taskOfPrintingDot, 0L, 50.millisecond.toMillis)
val timeOut2Sec = timeout(2.second.toMillis)
val htmlContents = Future {
url map { x =>
blocking {
Source fromURL (x) mkString
}
}
}
Future.firstCompletedOf(Seq(timeOut2Sec, htmlContents)) map { x =>
timer cancel ()
x match {
case Some(x) =>
println(x)
case _ =>
println("Error occured")
}
}
Thread sleep 5000
}
As #Gábor Bakos said exception produces Failure which doesn't handled by map:
val fut = Future { Some(Source fromURL ("hhhttp://google.com")) }
scala> fut map { x => println(x) } //nothing printed
res12: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise#5e025724
To process failure - use recover method :
scala> fut recover { case failure => None } map { x => println(x) }
None
res13: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise#578afc83
In your context it's something like:
Future.firstCompletedOf(Seq(timeOut2Sec, htmlContents)) recover {case x => println("Error:" + x); None} map { x => ...}
The Complete Code after using recover as advised by #dk14:
object Exercises extends App {
val timer = new Timer()
def timeout(t: Long = 1000): Future[Unit] = {
val p = Promise[Unit]
val timer = new Timer(true)
timer.schedule(new TimerTask() {
override def run() = {
p success ()
timer cancel ()
}
}, t)
p future
}
def printDot = println(".")
val taskOfPrintingDot = new TimerTask {
override def run() = {
printDot
}
}
println("Enter a URL")
val lines = io.Source.stdin.getLines()
val url = if (lines hasNext) Some(lines next) else None
timer.schedule(taskOfPrintingDot, 0L, 50.millisecond.toMillis)
val timeOut2Sec = timeout(2.second.toMillis)
val htmlContents = Future {
url map { x =>
blocking {
Source fromURL (x) mkString
}
}
}
Future.firstCompletedOf(Seq(timeOut2Sec, htmlContents))
.recover { case x => println("Error:" + x); None }
.map { x =>
timer cancel ()
x match {
case Some(x) =>
println(x)
case _ =>
println("Timeout occurred")
}
}
Thread sleep 5000
}