I want to write a program that will consume the infinite stream from the web. The stream will come in as JSON through web sockets. I'm looking for the data structured that I should use.
Requirements are:
The stream will be infinite. I don't ever want to stop listening for the new data.
The time between new events in the stream is unknown. They can come rapidly one after another, but also it's possible to have big pauses of several hours. I want to consume when there is something, and wait in the meantime.
I want to consume events sequentally, one after another.
My component should only transform the consumed events, and forward them further. I tried with something like that:
fun consume(stream: Stream<WebEvent>): Sequence<TransformedEvent> {
return try {
stream.asSequence().let { seq ->
var currentEvent = generator.firstEvent(seq.first())
seq.map {
currentEvent = generator.nextEvent(currentEvent, it)
return#map currentEvent
}
}
} catch (e: NoSuchElementException) {
throw EmptyStreamException(e)
}
}
The generator in this example "needs" the "previous" event to generate the new one, but that's part of the transformation logic. I'm interested in consuming the stream.
This worked. But I'm wondering is there a better way to do it in Kotlin. Maybe with a blocking queue, or something like that.
Related
I'm currently struggling to get a desired behaviour when using Combine. I've previously used RX framework and believe (from what I remember) that the described scenario is possible by specifying backpressure strategies for buffering.
So the issue I have is that I have a publisher that publishes values very rapidly, I have two subscribers to it, one which can react just as fast as the values are published (cool beans), but then a second subscriber that runs some CPU expensive processing.
I know in order to support the second slower subscriber that I need to afford buffering of values, but don't seem to be be able to make this happen, here is what I have so far:
let subject = PassthroughSubject<Int, Never>()
// publish some values
Task {
for i in 0... {
subject.send(i)
}
}
subject
.print("fast")
.sink { _ in }
subject
.map { n -> Int in
sleep(1) // CPU intensive work here
return n
}
.print("slow")
.sink { _ in }
Originally I thought I could use .buffer(..) on the slow subscriber but this doesn't appear to be the use case, what seems to happen is that the subject dispatches to each subscriber and only after the subscriber finishes, does it then demand more from the publisher, and in this case that seems to block the .send(..) call of the publishing loop.
Any advice would be greatly appreciated 👍
Is there a way in Scala to execute something in a loop without blocking the entire flow?
I have the following code to transmit something in Actor model
All actors send something to other actors:
def some_method
loop {
// Transmit something
Thread.sleep(100)
}
I also have some code to receive what other actors send. But the flow is not coming out of the loop. It sleeps and continues without coming out of the loop. Thus, all actors keep sending but nobody receives. How can I fix this?
If I understand you correctly, you want the transmission to occur every 100ms, but you don't want to create another Thread for that (and a Thread.sleep inside an actor may indeed block the flow).
You can use reactWithin:
import java.util.Date
import math.max
def some_method = {
var last_transmission_time = 0
loop {
val current_time = (new Date).getTime
reactWithin(max(0, last_transmission_time + 100 - current_time)) {
// actor reaction cases
case TIMEOUT => {
// Transmit something
last_transmission_time = (new Date).getTime
}
}
}
}
The last_transmission_time saves the last time a transmission was done.
The reaction timeout is calculated so that a TIMEOUT will occur when the current time is the last-transmission-time + 100ms.
If a timeout occured it means over 100ms passed since the last transmission, so another transmission should be called.
If the reaction cases themselves may take a lot of time, then I don't see any simple solution but creating another thread.
I didn't try the code because I'm not sure that I fully understand your problem.
If you want to execute long running computations concurrently (on a single machine), Akka actors can help.
One approach is to spawn a new actor for each piece of work. Something like
while(true) {
val actor = system.actorOf(Props[ProcessingActor])
(actor ? msg).map {
...
system.stop(actor)
}
}
A second idea is to configure a set number of actors behind a router. And then send all messages to the router.
val router = system.actorOf(Props[ProcessingActor].withRouter(RoundRobinRouter(nrOfInstances = 5)))
while(true) {
(router ? msg).map { ... }
}
I wonder, which is better if the system is overloaded (rate of incoming messages is higher than processing rate)?
Which will last longer? And will both eventually blow up the system with an OOMError?
Before you create a new Actor for each task you could also just use a Future. It really depends on what you want to achieve. To get as much work done with the least memory usage, you should use the actor/router approach. Futures are more expensive, because for each task would create a new instance of Future and Promise. But it really depends on your use case, which approach is the better. I just wouldn't create a lot of actors, when there really is no need for them. Especially as system.actorOf always creates a new error kernel.
I'm implementing long polling in Play 2.0 in potentially a distributed environment. The way I understand it is that when Play gets a request, it should suspend pending notification of an update then go to the db to fetch new data and repeat. I started looking at the chat example that Play 2.0 offers but it's in websocket. Furthermore it doesn't look like it's capable of being distributed. So I thought I will use Akka's event bus. I took the eventstream implementation and replicated my own with LookupClassification. However I'm stumped as to how I'm gonna get a message back (or for that matter, what should be the subscriber instead of ActorRef)?
EventStream implementation:
https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/event/EventStream.scala
I am not sure that is what you are looking for, but there is quite a simple solution in the comet-clock sample, that you can adapt to use AKKA actors. It uses an infinite iframe instead of long polling. I have used an adapted version for a more complex application doing multiple DB calls and long computation in AKKA actors and it works fine.
def enum = Action {
//get your actor
val myActorRef = Akka.system.actorOf(Props[TestActor])
//do some query to your DB here. Promise.timeout is to simulate a blocking call
def getDatabaseItem(id: Int): Promise[String] = { Promise.timeout("test", 10 milliseconds) }
//test iterator, you will want something smarter here
val items1 = 1 to 10 toIterator
// this is a very simple enumerator that takes ints from an existing iterator (for an http request parameters for instance) and do some computations
def myEnum(it: Iterator[Int]): Enumerator[String] = Enumerator.fromCallback[String] { () =>
if (!items1.hasNext)
Promise.pure[Option[String]](None) //we are done with our computations
else {
// get the next int, query the database and compose the promise with a further query to the AKKA actor
getDatabaseItem(items1.next).flatMap { dbValue =>
implicit val timeout = new Timeout(10 milliseconds)
val future = (myActorRef ? dbValue) mapTo manifest[String]
// here we convert the AKKA actor to the right Promise[Option] output
future.map(v => Some(v)).asPromise
}
}
}
// finally we stream the result to the infinite iframe.
// console.log is the javascript callback, you will want something more interesting.
Ok.stream(myEnum(items1) &> Comet(callback = "console.log"))
}
Note that this fromCallback doesn't allow you to combine enumerators with "andThen", there is in the trunk version of play2 a generateM method that might be more appropriate if you want to use combinations.
It's not long polling, but it works fine.
I stumbled on your question while looking for the same thing.
I found the streaming solution unsatisfying as they caused "spinner of death" in webkit browser (i.e. shows it is loading all the time)
Anyhow, didn't have any luck finding good examples but I managed to create my own proof-of-concept using promises:
https://github.com/kallebertell/longpoll
I have read that, when using react, all actors can execute in a single thread. I often process a collection in parallel and need to output the result. I do not believe System.out.println is threadsafe so I need some protection. One way (a traditional way) I could do this:
val lock = new Object
def printer(msg: Any) {
lock.synchronized {
println(msg)
}
}
(1 until 1000).par.foreach { i =>
printer(i)
}
println("done.")
How does this first solution compare to using actors in terms of efficiency? Is it true that I'm not creating a new thread?
val printer = actor {
loop {
react {
case msg => println(msg)
}
}
}
(1 until 10000).par.foreach { i =>
printer ! i
}
println("done.")
It doesn't seem to be a good alternative however, because the actor code never completes. If I put a println at the bottom it is never hit, even though it looks like it goes through every iteration for i. What am I doing wrong?
As you have it now with your Actor code, you only have one actor doing all the printing. As you can see from running the code, the values are all printed out sequentially by the Actor whereas in the parallel collection code, they're out of order. I'm not too familiar with parallel collections, so I don't know the performance gains between the two.
However, if your code is doing a lot of work in parallel, you probably would want to go with multiple actors. You could do something like this:
def printer = actor {
loop {
react {
case msg => println(msg)
}
}
}
val num_workers = 10
val worker_bees = Vector.fill(num_workers)(printer)
(1 until 1000).foreach { i =>
worker_bees(i % num_workers) ! i
}
The def is important. This way you're actually creating multiple actors and not just flooding one.
One actor instance will never process more than one message at the time. Whatever thread pool is allocated for the actors, each actor instance will only occupy one thread at the time, so you are guaranteed that all the printing will be processed serially.
As for not finishing, the execution of an actor never returns from a react or a loop, so:
val printer = actor {
loop {
react {
case msg => println(msg)
}
// This line is never reached because of react
}
// This line is never reached because of loop
}
If you replace loop and react with a while loop and receive, you'll see that everything inside the while loop executes as expected.
To fix your actor implementation you need to tell the actor to exit before the program will exit as well.
val printer = actor {
loop {
react {
case "stop" => exit()
case msg => println(msg)
}
}
}
(1 until 1000).par.foreach { printer ! _ }
printer ! "stop"
In both your examples there are thread pools involved backing both the parallels library and the actor library but they are created as needed.
However, println is thread safe as it does indeed have a lock in it's internals.
(1 until 1000).par.foreach { println(_) } // is threadsafe
As for performance, there are many factors. The first is that moving from a lock that multiple threads are contending for to a lock being used by only one thread ( one actor ) will increase performance. Second, if you are going to use actors and want performance, use
Akka. Akka actors are blazingly fast when compared to scala actors. Also, I hope that the stdout that println is writing to is going to a file and not the screen since involving the display drivers is going to kill your performance.
Using the parallels library is wonderful for performance since so you can take advantage of multiple cores for your computation. If each computation is very small then try the actor route for centralized reporting. However if each computation is significant and takes a decent amount of cpu time then stick just using println by itself. You really are not in a contended lock situation.
I'm not sure I can understand your problem correctly. For me your actor code works fine and terminates.
Nevertheless, you can savely use println for parallel collections, so all you really need is something like this:
(1 until 1000).par.foreach { println(_) }
Works like a charm here. I assume you already know that the output order will vary, but I just want to stress it again, because the question comes up ever so often. So don't expect the numbers to scroll down your screen in a successive fashion.