Colossus Background Task - scala

I am building an application using Tumblr's new Colossus framework (http://tumblr.github.io/colossus/). There is still limited documentation on it (and the fact that I'm still very new to Akka doesn't help), so I was wondering if someone could chime in on whether my approach is correct.
The application is simple and consists of two key components:
A thin web service layer that will queue tasks into Redis
A background worker which will poll the same Redis instance for available tasks and process them as they become available
I made a simple example to demonstrate that my concurrency model will work (and it does), which I posted below. However, I would like to make sure that there is not a more idiomatic way to do this.
import colossus.IOSystem
import colossus.protocols.http.Http
import colossus.protocols.http.HttpMethod.Get
import colossus.protocols.http.UrlParsing._
import colossus.service.{Callback, Service}
import colossus.task.Task
object QueueProcessor {
implicit val io = IOSystem() // Create separate IOSystem for worker
Task { ctx =>
while(true) {
// Below code is for testing purposes only. This is where the Redis loop will live, and will use a blocking call to get the next available task
Thread.sleep(5000)
println("task iteration")
}
}
def ping = println("starting") // Method to launch this processor
}
object Main extends App {
implicit val io = IOSystem() // Primary IOSystem for the web service
QueueProcessor.ping // Launch worker
Service.serve[Http]("app", 8080) { ctx =>
ctx.handle { conn =>
conn.become {
case req#Get on Root => Callback.successful(req.ok("Here"))
// The methods to add tasks to the queue will live here
}
}
}
}
I tested the above model and it works. The background loop continues running while the service happily accepts requests. But, I think that there might be a better way to do this with workers (nothing found in documentation), or perhaps Akka Streams?

I got it working with something that seems semi-idiomatic to me. However, new answers & feedback are still welcomed!
class Processor extends Actor {
import scala.concurrent.ExecutionContext.Implicits.global
override def receive = {
case "start" => self ! "next"
case "next" => {
Future {
blocking {
// Blocking call here to wait on Redis (BRPOP/BLPOP)
self ! "next"
}
}
}
}
}
object Main extends App {
implicit val io = IOSystem()
val processor = io.actorSystem.actorOf(Props[Processor])
processor ! "start"
Service.serve[Http]("app", 8080) { ctx =>
ctx.handle { conn =>
conn.become {
// Queue here
case req#Get on Root => Callback.successful(req.ok("Here\n"))
}
}
}
}

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

Execute some logic asynchronously in spray routing

Here is my simple routing application:
object Main extends App with SimpleRoutingApp {
implicit val system = ActorSystem("my-system")
startServer(interface = "0.0.0.0", port = System.getenv("PORT").toInt) {
import format.UsageJsonFormat._
import spray.httpx.SprayJsonSupport._
path("") {
get {
complete("OK")
}
} ~
path("meter" / JavaUUID) {
meterUUID => pathEnd {
post {
entity(as[Usage]) {
usage =>
// execute some logic asynchronously
// do not wait for the result
complete("OK")
}
}
}
}
}
}
What I want to achieve is to execute some logic asynchronously in my path directive, do not wait for the result and return immediately HTTP 200 OK.
I am quite new to Scala and spray and wondering if there is any spray way to solve this specific problem. Otherwise I would go into direction of creating Actor for every request and letting it to do the job. Please advice.
There's no special way of handling this in spray: simply fire your async action (a method returning a Future, a message sent to an actor, whatever) and call complete right after.
def doStuffAsync = Future {
// literally anything
}
path("meter" / JavaUUID) { meterUUID =>
pathEnd {
post {
entity(as[Usage]) { usage =>
doStuffAsync()
complete("OK")
}
}
}
}
Conversely, if you need to wait for an async action to complete before sending the response, you can use spray-specific directives for working with Futures or Actors.

spray and actor non deterministic tests

Helo,
at the beginning i wold like to apologize for my english :)
akka=2.3.6
spray=1.3.2
scalatest=2.2.1
I encountered strange behavior of teting routes, which asks actors in handleWith directive,
I've route with handleWith directive
pathPrefix("firstPath") {
pathEnd {
get(complete("Hello from this api")) ~
post(handleWith { (data: Data) =>{ println("receiving data")
(dataCalculator ? data).collect {
case Success(_) =>
Right(Created -> "")
case throwable: MyInternalValidatationException =>
Left(BadRequest -> s"""{"${throwable.subject}" : "${throwable.cause}"}""")
}
}})
}
}
and simple actor wchich always responds when receive object Data and has own receive block wrapped in LoggingReceive, so I should see logs when message is receiving by actor
and i test it using (I think simple code)
class SampleStarngeTest extends WordSpec with ThisAppTestBase with OneInstancePerTest
with routeTestingSugar {
val url = "/firstPath/"
implicit val routeTestTimeout = RouteTestTimeout(5 seconds)
def postTest(data: String) = Post(url).withJson(data) ~> routes
"posting" should {
"pass" when {
"data is valid and comes from the identified user" in {
postTest(correctData.copy(createdAt = System.currentTimeMillis()).asJson) ~> check {
print(entity)
status shouldBe Created
}
}
"report is valid and comes from the anonymous" in {
postTest(correctData.copy(createdAt = System.currentTimeMillis(), adid = "anonymous").asJson) ~> check {
status shouldBe Created
}
}
}
}
}
and behavior:
When I run either all tests in package (using Intellij Idea 14 Ultimate) or sbt test I encounter the same results
one execution -> all tests pass
and next one -> not all pass, this which not pass I can see:
1. fail becouse Request was neither completed nor rejected within X seconds ( X up tp 60)
2. system console output from route from line post(handleWith { (data: Data) =>{ println("receiving data"), so code in handleWith was executed
3. ask timeout exception from route code, but not always (among failed tests)
4. no logs from actor LoggingReceive, so actor hasn't chance to respond
5. when I rerun teststhe results are even different from the previous
Is there problem with threading? or test modules, thread blocking inside libraries? or sth else? I've no idea why it isn't work :(

how to cancel ConsoleReader.readLine()

first of all, i'm learning scala and new to the java world.
I want to create a console and run this console as a service that you could start and stop.
I was able to run a ConsoleReader into an Actor but i don't know how to stop properly the ConsoleReader.
Here is the code :
import eu.badmood.util.trace
import scala.actors.Actor._
import tools.jline.console.ConsoleReader
object Main {
def main(args:Array[String]){
//start the console
Console.start(message => {
//handle console inputs
message match {
case "exit" => Console.stop()
case _ => trace(message)
}
})
//try to stop the console after a time delay
Thread.sleep(2000)
Console.stop()
}
}
object Console {
private val consoleReader = new ConsoleReader()
private var running = false
def start(handler:(String)=>Unit){
running = true
actor{
while (running){
handler(consoleReader.readLine("\33[32m> \33[0m"))
}
}
}
def stop(){
//how to cancel an active call to ConsoleReader.readLine ?
running = false
}
}
I'm also looking for any advice concerning this code !
The underlying call to read a characters from the input is blocking. On non-Windows platform, it will use System.in.read() and on Windows it will use org.fusesource.jansi.internal.WindowsSupport.readByte.
So your challenge is to cause that blocking call to return when you want to stop your console service. See http://www.javaspecialists.eu/archive/Issue153.html and Is it possible to read from a InputStream with a timeout? for some ideas... Once you figure that out, have read return -1 when your console service stops, so that ConsoleReader thinks it's done. You'll need ConsoleReader to use your version of that call:
If you are on Windows, you'll probably need to override tools.jline.AnsiWindowsTerminal and use the ConsoleReader constructor that takes a Terminal (otherwise AnsiWindowsTerminal will just use WindowsSupport.readByte` directly)
On unix, there is one ConsoleReader constructor that takes an InputStream, you could provide your own wrapper around System.in
A few more thoughts:
There is a scala.Console object already, so for less confusion name yours differently.
System.in is a unique resource, so you probably need to ensure that only one caller uses Console.readLine at a time. Right now start will directly call readLine and multiple callers can call start. Probably the console service can readLine and maintain a list of handlers.
Assuming that ConsoleReader.readLine responds to thread interruption, you could rewrite Console to use a Thread which you could then interrupt to stop it.
object Console {
private val consoleReader = new ConsoleReader()
private var thread : Thread = _
def start(handler:(String)=>Unit) : Thread = {
thread = new Thread(new Runnable {
override def run() {
try {
while (true) {
handler(consoleReader.readLine("\33[32m> \33[0m"))
}
} catch {
case ie: InterruptedException =>
}
}
})
thread.start()
thread
}
def stop() {
thread.interrupt()
}
}
You may overwrite your ConsoleReader InputStream. IMHO this is reasonable well because of STDIN is a "slow" stream. Please improve example for your needs. This is only sketch, but it works:
def createReader() =
terminal.synchronized {
val reader = new ConsoleReader
terminal.enableEcho()
reader.setBellEnabled(false)
reader.setInput(new InputStreamWrapper(reader.getInput())) // turn on InterruptedException for InputStream.read
reader
}
with InputStream wrapper:
class InputStreamWrapper(is: InputStream, val timeout: Long = 50) extends FilterInputStream(is) {
#tailrec
final override def read(): Int = {
if (is.available() != 0)
is.read()
else {
Thread.sleep(timeout)
read()
}
}
}
P.S. I tried to use NIO - a lot of troubles with System.in (especially crossplatform). I returned to this variant. CPU load is near 0%. This is suitable for such interactive application.

Possible to make a producer/consumer obj in Scala using actors 'threadless' (no receive...)?

So I want to write some network code that appears to be blocking, without actually blocking a thread. I'm going to send some data out on the wire, and have a 'queue' of responses that will come back over the network. I wrote up a very simple proof of concept, inspired by the producer/consumer example on the actor tutorial found here: http://www.scala-lang.org/node/242
The thing is, using receive appears to take up a thread, and so I'm wondering if theres anyway to not take up a thread and still get the 'blocking feel'. Heres my code sample:
import scala.actors.Actor._;
import scala.actors.Actor;
case class Request(val s:String);
case class Message(val s:String);
class Connection {
private val act:Actor = actor {
loop {
react {
case m:Message => receive { case r:Request => reply { m } }
}
}
}
def getNextResponse(): Message = {
return (act !? new Request("get")).asInstanceOf[Message];
}
//this would call the network layer and send something over the wire
def doSomething() {
generateResponse();
}
//this is simulating the network layer getting some data back
//and sending it to the appropriate Connection object
private def generateResponse() {
act ! new Message("someData");
act ! new Message("moreData");
act ! new Message("even more data");
}
}
object runner extends Application {
val conn = new Connection();
conn.doSomething();
println( conn.getNextResponse());
println(conn.getNextResponse());
println(conn.getNextResponse());
}
Is there a way to do this without using the receive, and thereby making it threadless?
You could directly rely on react which should block and release the thread:
class Connection {
private val act:Actor = actor {
loop {
react {
case r:Request => reply { r }
}
}
}
[...]
I expect that you can use react rather than receive and not have actors take up threads like receive does. There is thread on this issue at receive vs react.