I want to implement a high-throughput server that accepts multiple clients. Every request should query a database, so I need some kind of async behavior.
I followed the ROUTER-to-REQ pattern from documentation + Futures, so I ended with this "architecture":
trait ZmqProtocol extends Protocol {
private val pool = Executors.newCachedThreadPool()
private implicit val ec: ExecutionContextExecutor = ExecutionContext.fromExecutor(pool)
val context: ZMQ.Context = ZMQ.context(1)
val socket: ZMQ.Socket = context.socket(ZMQ.ROUTER)
socket.bind("tcp://*:5555")
override def receiveMessages(): String = {
while (true) {
val address = socket.recv(0)
val empty = socket.recv(0)
val request = socket.recv(0)
Future {
val message = new String(request)
getResponseFromDb(message)
} onComplete {
case Success(response) =>
// Send reply back to client
socket.send(address, ZMQ.SNDMORE)
socket.send("".getBytes, ZMQ.SNDMORE)
socket.send(response.getBytes(), 0)
case Failure(ex) => println(ex)
}
}
"DONE"
}
}
I understand this won't work because I'm sharing socket in Future so I need a better model. I know the ZeroMQ sockets are fast and creating several worker threads would be enough on input side, but if the bottleneck is on the database side and if I need to do some other work while waiting for DB, I presume all my threads would soon be exhausted.
Would it be too much of an overhead if I create new socket and bind on ROUTER in every Future or is there some better solution?
Also, for Scala developers: is there a way to force onComplete being executed on main thread (I suppose it would solve the issue)? Thanks!
Related
I have a backend app writed in Scala Play. Because I have a realtime implementation using Akka Actors with data stored in a Redis server, I want as my each backend instance (deployed on centos servers) to be a Publisher and in same time a Subscriber to Redis service. Why this? Because a 3rd party app will send requests to my backend to update the data from Redis, and I want that all actors from all instances to push data to clients (frontend) indifferent on which backend instance is redirected this request (a load balancer is used there).
So, when instance1 will publish on Redis, I want that all subscribers(instance2, instance3, even instance1 because I said each instance must be pub/sub) to push data to clients.
I created an object with a Publisher and a Subscriber client and I was expecting that these will have a singleton behavior. But, for an unknown reason, over the night I see that my instances are unsubscribed from the Redis server without a message. I think this, because in the next day, my Redis service have 0 subscribers. I don't know if I have a bad implementation there or just Redis kill the connections after some time.
RedisPubSubServer.scala (In facts, here are just 2 Akka Actors which take RedisClient as params)
class Subscriber(client: RedisClient) extends Actor {
var callback: PubSubMessage => Any = { m => }
implicit val timeout = Timeout(2 seconds)
override def receive: Receive = {
case Subscribe(channel) => client.subscribe(channel)(callback)
case Register(cb) => callback = cb; self ? true
case Unsubscribe(channel) => client.unsubscribe(channel); self ? true
}
}
class Publisher(client: RedisClient) extends Actor {
implicit val timeout = Timeout(2 seconds)
override def receive: Receive = {
case Publish(channel, msg) => client.publish(channel, msg); self ? true
}
}
RedisPubSubClient.scala (here I create the Publisher and Subscriber as singleton)
object Pub {
println("starting publishing service...")
val config = ConfigFactory.load.getObject("redis").toConfig
val client = new RedisClient(config.getString("master"), config.getInt("port"))
val system = ActorSystem("RedisPublisher")
val publisher = system.actorOf(Props(new Publisher(client)))
def publish(channel: String, message: String) =
publisher ! Publish(channel, message)
}
object Sub {
val client = new RedisClient(config.getString("master"), config.getInt("port"))
val system = ActorSystem("RedisSubscriber")
val subscriber = system.actorOf(Props(new Subscriber(client)))
println("SUB Registering...")
subscriber ! Register(callback)
def sub(channel: String) = subscriber ! Subscribe(channel)
def unsub(channel: String) = subscriber ! Unsubscribe(channel)
def callback(msg: PubSubMessage) = {
msg match {
case S(channel, no) => println(s"subscribed to $channel and count $no")
case U(channel, no) => println(s"unsubscribed from $channel and count $no")
case M(channel, msg) => msg match {
case "exit" => client.unsubscribe()
case jsonString => // do the job
}
case E(e) => println(s"ERR = ${e.getMessage}")
}
}
}
and the RedisService
object RedisService {
val system = ActorSystem("RedisServiceSubscriber")
val subscriber = system.actorOf(Props(new Subscriber(client)))
subscriber ! Register(callback)
subscriber ! Subscribe("channelName")
// So, here I'm expecting that subscriber to have a life-cycle as the backend instance
}
from an api endpoint, I push data calling Pub publish method as:
def reloadData(request: AnyType) {
Pub.publish("channelName", requestAsString)
}
Can be possible as Publisher/Subscriber Actors to be killed after a while and due of that to throw in some errors for redis clients Pub/Sub?
For Publisher, I must say that I'm thinking to create the client each time when the api call is made, but for the Subscriber, I can not use another way that a singleton object which will listen the Redis entire life of the backend.
thanks
edit: used library:
"net.debasishg" %% "redisclient" % "3.41"
After some researches, I found another scala redis lib which seems to do exactly what I need in an easier maner
"com.github.etaty" %% "rediscala" % "1.9.0"
I am a bit lost using the akka-http libraries to create a server. The communication I need to establish is as following:
There is one server and n clients (n < 5)
Sometimes the clients send a command to the server, the server evaluates/delegates the command and answers the client
There are constant broadcast messages from the server to all clients
Given that:
my server needs to manage multiple 'sessions' that are connected via a websocket
Here is my websocket endpoint:
path("socket") {
handleWebSocketMessages(listen())
}
And here it the listen() method:
// stores offers to broadcast to all clients
private var offers: List[TextMessage => Unit] = List()
def listen(): Flow[Message, Message, NotUsed] = {
val inbound: Sink[Message, Any] = Sink.foreach(m => /* handle the message */) // (*)
val outbound: Source[Message, SourceQueueWithComplete[Message]] =
Source.queue[Message](16, OverflowStrategy.fail)
Flow.fromSinkAndSourceMat(inbound, outbound)((_, outboundMat) => {
offers ::= outboundMat.offer
NotUsed
})
}
def sendText(text: String): Unit = {
for (connection <- offers) connection(TextMessage.Strict(text))
}
With this approach I can register multiple clients and answer them using the sendText(text: String) method. But, there is one big problem: How do I answer only a specific client after I evaluated it's command. (see (*))
[Another thing that's bugging me is that offers is a var, which seems wrong when programming in a purely FP way, but I can accept that if the rest is working]
Edit:
To elaborate I basically need to be able to implement a method looking like this:
def onMessageReceived(m: Message, answer: TextMessage => Unit): Unit = {
val response: TextMessage = handleMessage(m)
answer(response)
}
But I cannot figure out on where to call this method in my websocket Flow.
I am not really sure if that is the way to go, but this seems to be working:
var actors: List[ActorRef] = Nil
private def wsFlow(implicit materializer: ActorMaterializer): Flow[ws.Message, ws.Message, NotUsed] = {
val (actor, source) = Source.actorRef[String](10, akka.stream.OverflowStrategy.dropTail)
.toMat(BroadcastHub.sink[String])(Keep.both)
.run()
actors = actor :: actors
val wsHandler: Flow[ws.Message, ws.Message, NotUsed] =
Flow[ws.Message]
.merge(source)
.map {
case TextMessage.Strict(tm) => handleMessage(actor, tm)
case _ => TextMessage.Strict("Ignored message!")
}
wsHandler
}
def broadcast(msg: String): Unit = {
actors.foreach(_ ! TextMessage.Strict(msg))
}
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
I have an HTTP Connection Pool that hangs after a couple of hours of running:
private def createHttpPool(host: String): SourceQueue[(HttpRequest, Promise[HttpResponse])] = {
val pool = Http().cachedHostConnectionPoolHttps[Promise[HttpResponse]](host)
Source.queue[(HttpRequest, Promise[HttpResponse])](config.poolBuffer, OverflowStrategy.dropNew)
.via(pool).toMat(Sink.foreach {
case ((Success(res), p)) => p.success(res)
case ((Failure(e), p)) => p.failure(e)
})(Keep.left).run
}
I enqueue items with:
private def enqueue(uri: Uri): Future[HttpResponse] = {
val promise = Promise[HttpResponse]
val request = HttpRequest(uri = uri) -> promise
queue.offer(request).flatMap {
case Enqueued => promise.future
case _ => Future.failed(ConnectionPoolDroppedRequest)
}
}
And resolve the response like this:
private def request(uri: Uri): Future[HttpResponse] = {
def retry = {
Thread.sleep(config.dispatcherRetryInterval)
logger.info(s"retrying")
request(uri)
}
logger.info("req-start")
for {
response <- enqueue(uri)
_ = logger.info("req-end")
finalResponse <- response.status match {
case TooManyRequests => retry
case OK => Future.successful(response)
case _ => response.entity.toStrict(10.seconds).map(s => throw Error(s.toString, uri.toString))
}
} yield finalResponse
}
The result of this function is then always transformed if the Future is successful:
def get(uri: Uri): Future[Try[JValue]] = {
for {
response <- request(uri)
json <- Unmarshal(response.entity).to[Try[JValue]]
} yield json
}
Everything works fine for a while and then all I see in the logs are req-start and no req-end.
My akka configuration is like this:
akka {
actor.deployment.default {
dispatcher = "my-dispatcher"
}
}
my-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 256
parallelism-factor = 128.0
parallelism-max = 1024
}
}
akka.http {
host-connection-pool {
max-connections = 512
max-retries = 5
max-open-requests = 16384
pipelining-limit = 1
}
}
I'm not sure if this is a configuration problem or a code problem. I have my parallelism and connection numbers so high because without it I get very poor req/s rate (I want to request as fast possible - I have other rate limiting code to protect the server).
You are not consuming the entity of the responses you get back from the server. Citing the docs below:
Consuming (or discarding) the Entity of a request is mandatory! If
accidentally left neither consumed or discarded Akka HTTP will assume
the incoming data should remain back-pressured, and will stall the
incoming data via TCP back-pressure mechanisms. A client should
consume the Entity regardless of the status of the HttpResponse.
The entity comes in the form of a Source[ByteString, _] which needs to be run to avoid resource starvation.
If you don't need to read the entity, the simplest way to consume the entity bytes is to discard them, by using
res.discardEntityBytes()
(you can attach a callback by adding - e.g. - .future().map(...)).
This page in the docs describes all the alternatives to this, including how to read the bytes if needed.
--- EDIT
After more code/info was provided, it is clear that the resource consumption is not the problem. There is another big red flag in this implementation, namely the Thread.sleep in the retry method.
This is a blocking call that is very likely to starve the threading infrastructure of your underlying actor system.
A full blown explanation of why this is dangerous was provided in the docs.
Try changing that and using akka.pattern.after (docs). Example below:
def retry = akka.pattern.after(200 millis, using = system.scheduler)(request(uri))
I want to make a request to a server asynchronously involving actors. Say I have 2 actors:
class SessionRetriever extends Actor {
import SessionRetriever._
def receiver = {
Get =>
val s = getSessionIdFromServer() // 1
sender ! Result(s) // 2
}
def getSessionIdFromServer(): String = { ... } // 3
}
object SessionRetriever {
object Get
object Result(s: String)
}
And
class RequestSender extends Actor {
val sActor = context actorOf Props[SessionRetriever]
def receiver = {
// get session id
val sesId = sActor ! SessionRetriever.Get
val res = sendRequestToServer(sesId)
logToFile(res)
context shutdown sActor
}
def sendRequestToServer(sessionId: String): String = { .... }
}
My questions:
val s = getSessionIdFromServer() // 1
sender ! Result(s) // 2
1) getSessionIdFromServer() does a synchronous request to the server. I think it would be much better make a request asynchronous, correct? So it will return Future[String] instead of a plain String.
2) How do I make asynchronous: by using an AsyncHttpClient (if I recall correctly its name) or by wrapping its synchronous body into Future { } ?
3) Should I use blocking { } block ? If yes, then where exactly: inside its body or here val s = blocking { getSessionIdFromServer() } ?
P.S. I'd like not to use async { } and await { } at this point because they are quite high level functions and after all they are build on top on Futures.
you might try this non-blocking way
def receive = {
Get =>
//assume getSessionIdFromServer() run aysnchronize
val f: Future[String] = getSessionIdFromServer()
val client = sender //keep it local to use when future back
f onComplete {
case Success(rep) => client ! Result(rep)
case Failure(ex) => client ! Failed(ex)
}
}
1) If getSessionIdFromServer() is blocking then you should execute it asynchronously from your receive function, otherwise your actor will block each time it receives a new request and will always wait until it receives a new session before processing the next request.
2) Using a Future will "move" the blocking operation to a different thread. So, your actor will not be blocked and will be able to keep processing incoming requests - that's good -, however you are still blocking a thread - not so great. Using the AsyncHttpClient is a good idea. You can explore other non-blocking httpClient, like PlayWebService.
3) I am not quite familiar with blocking so not sure I should advise anything here. From what I understand, it will tell the thread pool that the operation is blocking and that it should spawn a temporary new thread to handle it - this avoids having all your workers being blocked. Again, if you do that your actor will not blocked, but you are still blocking a thread while getting the session from the server.
To summarize: just use an async http client in getSessionIdFromServer if it is possible. Otherwise, use either Future{} or blocking.
To do an asynchronous call with AsyncHttpClient you could deal with the java Future via a scala Promise.
import scala.concurrent.Future
import com.ning.http.client.AsyncHttpClient
import scala.concurrent.Promise
import java.util.concurrent.Executor
object WebClient {
private val client = new AsyncHttpClient
case class BadStatus(status: Int) extends RuntimeException
def get(url: String)(implicit exec: Executor): Future[String] = {
val f = client.prepareGet(url).execute();
val p = Promise[String]()
f.addListener(new Runnable {
def run = {
val response = f.get
if (response.getStatusCode / 100 < 4)
p.success(response.getResponseBodyExcerpt(131072))
else p.failure(BadStatus(response.getStatusCode))
}
}, exec)
p.future
}
def shutdown(): Unit = client.close()
}
object WebClientTest extends App {
import scala.concurrent.ExecutionContext.Implicits.global
WebClient get "http://www.google.com/" map println foreach (_ => WebClient.shutdown())
}
And then deal with the future completion via a callback.
Credit for the code to the awesome reactive programming course at Coursera.