Consumer Poll Rate with Akka, SQS, and Camel - scala

A project I'm working on requires the reading of messages from SQS, and I decided to use Akka to distribute the processing of these messages.
Since SQS is support by Camel, and there is functionality built in for use in Akka in the Consumer class, I imagined it would be best to implement the endpoint and read messages this way, though I had not seen many examples of people doing so.
My problem is that I cannot poll my queue quickly enough to keep my queue empty, or near empty. What I originally thought was that I could get a Consumer to receive messages over Camel from SQS at a rate of X/s. From there, I could simply create more Consumers to get up to the rate at which I needed messages processed.
My Consumer:
import akka.camel.{CamelMessage, Consumer}
import akka.actor.{ActorRef, ActorPath}
class MyConsumer() extends Consumer {
def endpointUri = "aws-sqs://my_queue?delay=1&maxMessagesPerPoll=10&accessKey=myKey&secretKey=RAW(mySecret)"
var count = 0
def receive = {
case msg: CamelMessage => {
count += 1
}
case _ => {
println("Got something else")
}
}
override def postStop(){
println("Count for actor: " + count)
}
}
As shown, I've set delay=1 as well as &maxMessagesPerPoll=10 to improve the rate of messages, but I'm unable to spawn multiple consumers with the same endpoint.
I read in the docs that By default endpoints are assumed not to support multiple consumers. and I believe this holds true for SQS endpoints as well, as spawning multiple consumers will give me only one consumer where after running the system for a minute, the output message is Count for actor: x instead of the others which output Count for actor: 0.
If this is at all useful; I'm able to read approximately 33 messages/second with this current implementation on the single consumer.
Is this the proper way to be reading messages from an SQS queue in Akka? If so, is there way I can get this to scale outward so that I can increase my rate of message consumption closer to that of 900 messages/second?

Sadly Camel does not currently support parallel consumption of messages on SQS.
http://camel.465427.n5.nabble.com/Amazon-SQS-listener-as-multi-threaded-td5741541.html
To address this I've written my own Actor to poll batch messages SQS using the aws-java-sdk.
def receive = {
case BeginPolling => {
// re-queue sending asynchronously
self ! BeginPolling
// traverse the response
val deleteMessageList = new ArrayList[DeleteMessageBatchRequestEntry]
val messages = sqs.receiveMessage(receiveMessageRequest).getMessages
messages.toList.foreach {
node => {
deleteMessageList.add(new DeleteMessageBatchRequestEntry(node.getMessageId, node.getReceiptHandle))
//log.info("Node body: {}", node.getBody)
filterSupervisor ! node.getBody
}
}
if(deleteEntryList.size() > 0){
val deleteMessageBatchRequest = new DeleteMessageBatchRequest(queueName, deleteMessageList)
sqs.deleteMessageBatch(deleteMessageBatchRequest)
}
}
case _ => {
log.warning("Unknown message")
}
}
Though I'm not certain if this is the best implementation, and it could of course be improved upon so that requests are not constantly hitting an empty queue, it does suit my current needs of being able to poll messages from the same queue.
Getting about 133 (messages/second)/actor from SQS with this.

Camel 2.15 supports concurrentConsumers, though not sure how useful this is in that I don't know if akka camel support 2.15 and I do not know if having one Consumer actor makes a difference even if there are multiple consumers.

Related

Akka HTTP / Error Response entity was not subscribed after 1 second

I searched the other StackOverflow question/answers towards this error, but couldn't find a hint for solving this problem.
The Akka HTTP application runs for like 5 hours under high workload without problems, and than I start to get multiple:
Response entity was not subscribed after 1 second. Make sure to read the response `entity` body or call `entity.discardBytes()` on it -- in case you deal with `HttpResponse`, use the shortcut `response.discardEntityBytes()`. GET /api/name123 Empty -> 200 OK Default(142 bytes)
and later
The connection actor has terminated. Stopping now.
The actor is only sending out API requests and afterwards forwards those responses to another actor if successfully, in case of failure, that request is added back to the todo stack and retried later. This is the main code:
private def makeApiRequest(id: String): Unit = {
val url = UrlBuilder(id)
val request = HttpRequest(method = HttpMethods.GET, uri = url)
val f: Future[(StatusCode, String)] = Http(context.system)
.singleRequest(request)
.flatMap(_.toStrict(2.seconds))
.flatMap { resp =>
Unmarshal(resp.entity).to[String].map((resp.status, _))
}
context.pipeToSelf(f) {
case Success(response) =>
API_HandleResponseSuccess(id, response._1, response._2)
case Failure(e) =>
API_HandleResponseFailure(id, e.getMessage)
}
}
I don't really understand why I get the "Response entity was not subscribed..." error, as I do Unmarshal(resp.entity).to[String] and thereby would think, that no .DiscardEntityBytes() is needed, or does it needs to be still included somehow?
Side information: Also confusing to me, why the CPU performance doesn't stay constant.
Within the actor do I track the response times of each request and calculate the amount of max. parallel requests possible to handle with the given hardware conditions (restricted to a max max of 120 though) on a regular basis to account for API response time fluctuations, so there should be always enough room to make the requests without starving for that actor. In addition would that be the respective application.conf:
dispatcher-worker-io {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 120
keep-alive-time = 60s
allow-core-timeout = off
}
shutdown-timeout = 60s
throughput = 1
}
...
akka.http.client.host-connection-pool.max-connections = 180
akka.http.client.host-connection-pool.max-open-requests = 256
akka.http.client.host-connection-pool.max-retries = 0
Any ideas on why I after 5 hours without problems start to get those exceptions mentioned above?
or
Has an idea of which part of above shared code might leads to this non-linear CPU performance?
I also made multiple of those long lasting hour runs, and it always ends out like this, somehow it's starving after 5 to 6 hours.
val AkkaVersion = "2.6.15"
val AkkaHttpVersion = "10.2.6"
Directly from the docs (https://doc.akka.io/docs/akka-http/current/client-side/request-level.html):
Always make sure you consume the response entity streams (of type
Source[ByteString,Unit]). Connect the response entity Source to a
Sink, or call response.discardEntityBytes() if you don’t care about
the response entity.
Read the Implications of the streaming nature of Request/Response
Entities section for more details.
If the application doesn’t subscribe to the response entity within
akka.http.host-connection-pool.response-entity-subscription-timeout,
the stream will fail with a TimeoutException: Response entity was not
subscribed after ....
You need to .discardEntityBytes() in case of failure. Right now you only consume it on success.
Perhaps high CPU load is caused by all these unfreed resources on the JVM + retries of all the failures.

Gatling: Producer and consumer users

I have a load test where three sets of users create something and a different set of users perform some actions on them.
What is the recommended way to co-ordinate this behaviour in Gatling?
I'm currently using an object which contains a LinkedBlockingQueue which the "producers" put the ID and consumers take, see below.
However, it causes the test to hang after ~20s (targeting 1tps).
I've also tried using poll with a timeout, but instead of hanging the poll almost always fails (after 30s) or causes a hang if the timeout is larger (1m+).
This seems to be because all the threads are blocked waiting for something from the queue so isn't compatible with the way Gatling tests run (i.e. not 1 thread per user). Is there a non-blocking way to wait in the Gatling DSL?
Producer.scala
// ...
scenario("Produce stuff")
.exec(/* HTTP call which extracts an ID*/)
.exec(session => Queue.ids.put(session("my-id").as[String])
// ...
Consumer.scala
// ...
scenario("Consume stuff")
.exec(session => session.set("my-id", Queue.ids.take()))
.exec(/* HTTP call which users ID*/)
// ...
Queue.scala
object Queue {
val ids = new LinkedBlockingQueue[String]()
}
As an alternative I've tried to use the application functionality but it seems a harder problem to ensure that each user picks a unique item from the app.
Acknowledging this is all a hack, my current solution in Consumer.scala is:
doIf(_ => Queue.ids.size() < MIN_COUNT)(
pause(30) // wait for 30s if queue is initially too small
)
.doWhile(_ => Queue.ids.size() >= MIN_COUNT)(
exec(session => session.set("my-id", Queue.ids.take()))
.exec(...)
.pause(30)
)

Akka: send error from routee back to caller

In my project, I created UserRepositoryActor which create their own router with 10 UserRepositoryWorkerActor instances as routee, see hierarchy below:
As you see, if any error occur while fetching data from database, it will occur at worker.
Once I want to fetch user from database, I send message to UserRepositoryActor with this command:
val resultFuture = userRepository ? FindUserById(1)
and I set 10 seconds for timeout.
In case of network connection has problem, UserRepositoryWorkerActor immediately get ConnectionException from underlying database driver and then (what I think) router will restart current worker and send FindUserById(1) command to other worker that available and resultFuture will get AskTimeoutException after 10 seconds passed. Then some time later, once connection back to normal, UserRepositoryWorkerActor successfully fetch data from database and then try to send result back to the caller and found that resultFuture was timed out.
I want to propagate error from UserRepositoryWorkerActor up to the caller immediately after exception occur, so that will prevent resultFuture to wait for 10 seconds and stop UserRepositoryWorkerActor to try to fetch data again and again.
How can I do that?
By the way, if you have any suggestions to my current design, please suggest me. I'm very new to Akka.
Your assumption about Router resending the message is wrong. Router has already passed the message to routee and it doesnt have it any more.
As far as ConnectionException is concerned, you could wrap in a scala.util.Try and send response to sender(). Something like,
Try(SomeDAO.getSomeObjectById(id)) match {
case Success(s) => sender() ! s
case Failure(e) => sender() ! e
}
You design looks correct. Having a router allows you to distribute work and also to limit number of concurrent workers accessing the database.
Option 1
You can make your router watch its children and act accordingly when they are terminated. For example (taken from here):
import akka.routing.{ ActorRefRoutee, RoundRobinRoutingLogic, Router }
class Master extends Actor {
var router = {
val routees = Vector.fill(5) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router(RoundRobinRoutingLogic(), routees)
}
def receive = {
case w: Work =>
router.route(w, sender())
case Terminated(a) =>
router = router.removeRoutee(a)
val r = context.actorOf(Props[Worker])
context watch r
router = router.addRoutee(r)
}
}
In your case you can send some sort of a failed message from the repository actor to the client. Repository actor can maintain a map of worker ref to request id to know which request failed when worker terminates. It can also record the time between the start of the request and actor termination to decide whether it's worth retrying it with another worker.
Option 2
Simply catch all non-fatal exceptions in your worker actor and reply with appropriate success/failed messages. This is much simpler but you might still want to restart the worker to make sure it's in a good state.
p.s. Router will not restart failed workers, neither it will try to resend messages to them by default. You can take a look at supervisor strategy and Option 1 above on how to achieve that.

How to limit the number of actors of a particular type?

I've created an actor to send messages to a chat server. However, the chat server only permits 5 connections per user. If I hammer my scala server I get error messages because my chat clients get disconnected.
So how can I configure akka so that my XmppSenderActors only use a maximum of 5 threads? I don't want to restrict the rest of the actor system, only this object (at the path /XmppSenderActor/).
I'm trying this config since I think it's the dispatcher I need to configure, but I'm not sure:
akka.actor.deployment {
/XmppSenderActor {
dispatcher = xmpp-dispatcher
}
xmpp-dispatcher {
fork-join-executor.parallelism-min = 2
fork-join-executor.parallelism-max = 3
}
}
This gives me an error though: akka.ConfigurationException: Dispatcher [xmpp-dispatcher] not configured for path akka://sangria-server/user/XmppSenderActor
I would probably try to configure a Router instead.
http://doc.akka.io/docs/akka/2.0/scala/routing.html
A dispatcher seems to deal with sending messages to the inbox rather than the actual number or Actor targets.
That configuration in particular could work for you:
akka.actor.deployment {
/router {
router = round-robin
nr-of-instances = 5
}
}
The nr-of-instances will create 5 childrens from the get going and therefore fill your needs.
You might need to find the right Router implementation though.

Implement timeout in actors

I am new to scala and actors. I need to implement such hypothetical situation:
Server wait for messages, if it does not get any in say 10s period of time, it sends message to the Client. Otherwise it receives messages incoming. If it is inside processing some message and another message comes, it needs to be queued (I suppose that is done automatically by scala actors).
The second problem I am encountering is Sleeping. I need the actor to sleep for some constant period of time when it receives the message. But on the other hand I can't block, as I want incoming messages to be queued for further processing.
How about this?
loop {
reactWithin(10000) {
case TIMEOUT => // send message to client
case work => // do work
}
}
Daniel has provided a better answer to the no-input condition part of the question. So I've edited out my inferior solution.
As to the delayed response part of the question, the message queue doesn't block while an actor sleeps. It can just sleep and messages will still accumulate.
However, if you want a fixed delay from when you receive a message to when you process it, you can, for example, create an actor that works immediately but wraps the message in a request for a delay:
case class Delay(when: Long, what: Any) { }
// Inside class DelayingActor(workingActor: Actor)
case msg => workingActor ! Delay(delayValue + System.currentTimeMillis , msg)
Then, the working actor would
case Delay(t,msg) =>
val t0 = System.currentTimeMillis
if (t>t0) Thread.sleep( t - t0 )
msg match {
// Handle message
}