I am having an issue using RxJava backpressure. Basically, I have one producer that produces more items than the consumer can handle and want to have some buffer queue to handle only items that I can deal with, and request when I complete some of them, like in this example:
object Tester extends App {
Observable[Int] { subscriber =>
(1 to 100).foreach { e =>
subscriber.onNext(e)
Thread.sleep(100)
println("produced " + e + "(" + Thread.currentThread().getName + Thread.currentThread().getId + ")")
}
}
.subscribeOn(NewThreadScheduler())
.observeOn(ComputationScheduler())
.subscribe(
new Subscriber[Int]() {
override def onStart(): Unit = {
request(2)
}
override def onNext(value: Int): Unit = {
Thread.sleep(1000)
println("consumed " + value + "(" + Thread.currentThread().getName + Thread.currentThread().getId + ")")
request(1)
}
override def onCompleted(): Unit = {
println("finished ")
}
})
Thread.sleep(100000)
I expect to get output like
produced 1(RxNewThreadScheduler-113)
consumed 1(RxComputationThreadPool-312)
produced 2(RxNewThreadScheduler-113)
consumed 2(RxComputationThreadPool-312)
produced 3(RxNewThreadScheduler-113)
consumed 3(RxComputationThreadPool-312)
......
but instead, I get
produced 1(RxNewThreadScheduler-113)
produced 2(RxNewThreadScheduler-113)
produced 3(RxNewThreadScheduler-113)
produced 4(RxNewThreadScheduler-113)
produced 5(RxNewThreadScheduler-113)
produced 6(RxNewThreadScheduler-113)
produced 7(RxNewThreadScheduler-113)
produced 8(RxNewThreadScheduler-113)
produced 9(RxNewThreadScheduler-113)
consumed 1(RxComputationThreadPool-312)
produced 10(RxNewThreadScheduler-113)
produced 11(RxNewThreadScheduler-113)
produced 12(RxNewThreadScheduler-113)
produced 13(RxNewThreadScheduler-113)
.....
When you implement your Observable using Observable.create it is up to you to manage backpressure (which is not a simple task). Here your observable simply ignores reactive pull requests (you just iterate, not waiting for a request to call the iterator's next() method).
If possible, try to use Observable factory methods like range, etc... and composing using map/flatMap to obtain the desired source Observable, as those will respect backpressure.
Otherwise, have a look at the experimental utility classes introduced recently for correctly managing backpressure in a OnSubscribe implementation: AsyncOnSubscribe and SyncOnSubscribe.
Here is a quite naïve example:
Observable<Integer> backpressuredObservable =
Observable.create(SyncOnSubscribe.createStateful(
() -> 0, //starts the state at 0
(state, obs) -> {
int i = state++; //first i is 1 as desired
obs.next(i);
if (i == 100) { //maximum is 100, stop there
obs.onCompleted();
}
return i; //update the state
}));
Related
I'm new to scala and extremely new to scalaz. Through a different stackoverflow answer and some handholding, I was able to use scalaz.stream to implement a Process that would continuously fetch twitter API results. Now i'd like to do the same thing for the Cassandra DB where the twitter handles are stored.
The code for fetching the twitter results is here:
def urls: Seq[(Handle,URL)] = {
Await.result(
getAll(connection).map { List =>
List.map(twitterToGet =>
(twitterToGet.handle, urlBoilerPlate + twitterToGet.handle + parameters + twitterToGet.sinceID)
)
},
5 seconds)
}
val fetchUrl = channel.lift[Task, (Handle, URL), Fetched] {
url => Task.delay {
val finalResult = callTwitter(url)
if (finalResult.tweets.nonEmpty) {
connection.updateTwitter(finalResult)
} else {
println("\n" + finalResult.handle + " does not have new tweets")
}
s"\ntwitter Fetch & database update completed"
}
}
val P = Process
val process =
(time.awakeEvery(3.second) zipWith P.emitAll(urls))((b, url) => url).
through(fetchUrl)
val fetched = process.runLog.run
fetched.foreach(println)
What I'm planning to do is use
def urls: Seq[(Handle,URL)] = {
to continuously fetch Cassandra results (with an awakeEvery) and send them off to an actor to run the above twitter fetching code.
My question is, what is the best way to implement this with scalaz.stream? Note that i'd like it to get ALL the database results, then have a delay before getting ALL the database results again. Should i use the same architecture as the twitter fetching code above? If so, how would I create a channel.lift that doesn't require input? Is there a better way in scalaz.stream?
Thanks in advance
Got this working today. The cleanest way to do it would be to emit the database results as a stream and attach a sink to the end of the stream to do the twitter processing. What I actually have is a bit more complex as it retrieves the database results continuously and sends them off to an actor for the twitter processing. The style of retrieving the results follows my original code from my question:
val connection = new simpleClient(conf.getString("cassandra.node"))
implicit val threadPool = new ScheduledThreadPoolExecutor(4)
val system = ActorSystem("mySystem")
val twitterFetch = system.actorOf(Props[TwitterFetch], "twitterFetch")
def myEffect = channel.lift[Task, simpleClient, String]{
connection: simpleClient => Task.delay{
val results = Await.result(
getAll(connection).map { List =>
List.map(twitterToGet =>
(twitterToGet.handle, urlBoilerPlate + twitterToGet.handle + parameters + twitterToGet.sinceID)
)
},
5 seconds)
println("Query Successful, results= " +results +" at " + format.print(System.currentTimeMillis()))
twitterFetch ! fetched(connection, results)
s"database fetch completed"
}
}
val P = Process
val process =
(time.awakeEvery(3.second).flatMap(_ => P.emit(connection).
through(myEffect)))
val fetching = process.runLog.run
fetching.foreach(println)
Some notes:
I had asked about using channel.lift without input, but it became clear that the input should be the cassandra connection.
The line
val process =
(time.awakeEvery(3.second).flatMap(_ => P.emit(connection).
through(myEffect)))
Changed from zipWith to flatMap because I wanted to retrieve the results continuously instead of once.
I was hoping someone can briefly go over the various ways of consuming a service (this one just returns a string, normally it would be JSON but I just want to understand the concepts here).
My service:
def ping = Action {
Ok("pong")
}
Now in my Play (2.3.x) application, I want to call my client and display the response.
When working with Futures, I want to display the value.
I am a bit confused, what are all the ways I could call this method i.e. there are some ways I see that use Success/Failure,
val futureResponse: Future[String] = WS.url(url + "/ping").get().map { response =>
response.body
}
var resp = ""
futureResponse.onComplete {
case Success(str) => {
Logger.trace(s"future success $str")
resp = str
}
case Failure(ex) => {
Logger.trace(s"future failed")
resp = ex.toString
}
}
Ok(resp)
I can see the trace in STDOUT for success/failure, but my controller action just returns "" to my browser.
I understand that this is because it returns a FUTURE and my action finishes before the future returns.
How can I force it to wait?
What options do I have with error handling?
If you really want to block until feature is completed look at the Future.ready() and Future.result() methods. But you shouldn't.
The point about Future is that you can tell it how to use the result once it arrived, and then go on, no blocks required.
Future can be the result of an Action, in this case framework takes care of it:
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
}
Look at the documentation, it describes the topic very well.
As for the error-handling, you can use Future.recover() method. You should tell it what to return in case of error and it gives you new Future that you should return from your action.
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
.recover{ case e: Exception => InternalServerError(e.getMessage) }
}
So the basic way you consume service is to get result Future, transform it in the way you want by using monadic methods(the methods that return new transformed Future, like map, recover, etc..) and return it as a result of an Action.
You may want to look at Play 2.2 -Scala - How to chain Futures in Controller Action and Dealing with failed futures questions.
I want omit all the same type of messages except the last one:
def receive = {
case Message(type:MessageType, data:Int) =>
// remove previous and get only last message of passed MessageType
}
for example when I send:
actor ! Message(MessageType.RUN, 1)
actor ! Message(MessageType.RUN, 2)
actor ! Message(MessageType.FLY, 1)
then I want to recevie only:
Message(MessageType.RUN, 2)
Message(MessageType.FLY, 1)
Of course if they will be send very fast, or on high CPU load
You could wait a very short amount of time, storing the most recent messages that arrive, and then process only those most recent ones. This can be accomplished by sending messages to yourself, and scheduleOnce. See the second example under the Akka HowTo: Common Patterns, Scheduling Periodic Messages. Instead of scheduling ticks whenever the last tick ends, you can wait until new messages arrive. Here's an example of something like that:
case class ProcessThis(msg: Message)
case object ProcessNow
var onHold = Map.empty[MessageType, Message]
var timer: Option[Cancellable] = None
def receive = {
case msg # Message(t, _) =>
onHold += t -> msg
if (timer.isEmpty) {
import context.dispatcher
timer = Some(context.system.scheduler.scheduleOnce(1 millis, self, ProcessNow))
}
case ProcessNow =>
timer foreach { _.cancel() }
timer = None
for (m <- onHold.values) self ! ProcessThis(m)
onHold = Map.empty
case ProcessThis(Message(t, data)) =>
// really process the message
}
Incoming Messages are not actually processed right away, but are stored in a Map that keeps only the last of each MessageType. On the ProcessNow tick message, they are really processed.
You can change the length of time you wait (in my example set to 1 millisecond) to strike a balance between responsivity (length of time from a message arriving to response) and efficiency (CPU or other resources used or held up).
type is not a good name for a field, so let's use messageType instead. This code should do what you want:
var lastMessage: Option[Message] = None
def receive = {
case m => {
if (lastMessage.fold(false)(_.messageType != m.messageType)) {
// do something with lastMessage.get
}
lastMessage = Some(m)
}
}
I have a moderate number of long-running Actors and I wish to write a synchronous function that returns the first one of these that completes. I can do it with a spin-wait on futures (e.g.,:
while (! fs.exists(f => f.isSet) ) {
Thread.sleep(100)
}
val completeds = fs.filter(f => f.isSet)
completeds.head()
), but that seems very "un-Actor-y"
The scala.actors.Futures class has two methods awaitAll() and awaitEither() that seem awfully close; if there were an awaitAny() I'd jump on it. Am I missing a simple way to do this or is there a common pattern that is applicable?
A more "actorish" way of waiting for completion is creating an actor in charge of handling completed result (lets call it ResultHandler)
Instead of replying, workers send their answer to ResultHandler in fire-and-forget manner. The latter will continue processing the result while other workers complete their job.
The key for me was the discovery that every (?) Scala object is, implicitly, an Actor, so you can use Actor.react{ } to block. Here is my source code:
import scala.actors._
import scala.actors.Actor._
//Top-level class that wants to return the first-completed result from some long-running actors
class ConcurrentQuerier() {
//Synchronous function; perhaps fulfilling some legacy interface
def synchronousQuery : String = {
//Instantiate and start the monitoring Actor
val progressReporter = new ProgressReporter(self) //All (?) objects are Actors
progressReporter.start()
//Instantiate the long-running Actors, giving each a handle to the monitor
val lrfs = List (
new LongRunningFunction(0, 2000, progressReporter), new LongRunningFunction(1, 2500, progressReporter), new LongRunningFunction(3, 1500, progressReporter),
new LongRunningFunction(4, 1495, progressReporter), new LongRunningFunction(5, 1500, progressReporter), new LongRunningFunction(6, 5000, progressReporter) )
//Start 'em
lrfs.map{ lrf =>
lrf.start()
}
println("All actors started...")
val start = System.currentTimeMillis()
/*
This blocks until it receives a String in the Inbox.
Who sends the string? A: the progressReporter, which is monitoring the LongRunningFunctions
*/
val s = receive {
case s:String => s
}
println("Received " + s + " after " + (System.currentTimeMillis() - start) + " ms")
s
}
}
/*
An Actor that reacts to a message that is a tuple ("COMPLETED", someResult) and sends the
result to this Actor's owner. Not strictly necessary (the LongRunningFunctions could post
directly to the owner's mailbox), but I like the idea that monitoring is important enough
to deserve its own object
*/
class ProgressReporter(val owner : Actor) extends Actor {
def act() = {
println("progressReporter awaiting news...")
react {
case ("COMPLETED", s) =>
println("progressReporter received a completed signal " + s);
owner ! s
case s =>
println("Unexpected message: " + s ); act()
}
}
}
/*
Some long running function
*/
class LongRunningFunction(val id : Int, val timeout : Int, val supervisor : Actor) extends Actor {
def act() = {
//Do the long-running query
val s = longRunningQuery()
println(id.toString + " finished, sending results")
//Send the results back to the monitoring Actor (the progressReporter)
supervisor ! ("COMPLETED", s)
}
def longRunningQuery() : String = {
println("Starting Agent " + id + " with timeout " + timeout)
Thread.sleep(timeout)
"Query result from agent " + id
}
}
val cq = new ConcurrentQuerier()
//I don't think the Actor semantics guarantee that the result is absolutely, positively the first to have posted the "COMPLETED" message
println("Among the first to finish was : " + cq.synchronousQuery)
Typical results look like:
scala ActorsNoSpin.scala
progressReporter awaiting news...
All actors started...
Starting Agent 1 with timeout 2500
Starting Agent 5 with timeout 1500
Starting Agent 3 with timeout 1500
Starting Agent 4 with timeout 1495
Starting Agent 6 with timeout 5000
Starting Agent 0 with timeout 2000
4 finished, sending results
progressReporter received a completed signal Query result from agent 4
Received Query result from agent 4 after 1499 ms
Among the first to finish was : Query result from agent 4
5 finished, sending results
3 finished, sending results
0 finished, sending results
1 finished, sending results
6 finished, sending results
Let's say you have a gui component and 10 threads all tell it to repaint at sufficiently the same time as they all arrive before a single paint operation takes place. Instead of naively wasting resources repainting 10 times, just merge/ignore all but the last one and repaint once (or more likely, twice--once for the first, and once for the last). My understanding is that the Swing repaint manager does this.
Is there a way to accomplish this same type of behavior in a Scala Actor? Is there a way to look at the queue and merge messages, or ignore all but the last of a certain type or something?
Something like this?:
act =
loop {
react {
case Repaint(a, b) => if (lastRepaint + minInterval < System.currentTimeMillis) {
lastRepaint = System.currentTimeMillis
repaint(a, b)
}
}
If you want to repaint whenever the actor's thread gets a chance, but no more, then:
(UPDATE: repainting using the last message arguments)
act =
loop {
react {
case r#Repaint(_, _) =>
var lastMsg = r
def findLast: Unit = {
reactWithin(0) {
case r#Repaint(_, _) =>
lastMsg = r
case TIMEOUT => repaint(lastMsg.a, lastMsg.b)
}
}
findLast
}
}