I'm a newbie to akka streams. I'm using kafka as a source(using ReactiveKafka library) and do some processing of data through the flow and using a subscriber(EsHandler) as sink.
Now I need to handle errors and push it to different kafka queue through an error handler. I'm trying to use EsHandler both as publisher and subscriber. I'm not sure how to include EsHandler as an middle man instead of sink.
This is my code:
val publisher = Kafka.kafka.consume(topic, "es", new StringDecoder())
val flow = Flow[String].map { elem => JsonConverter.convert(elem.toString()) }
val sink = Sink.actorSubscriber[GenModel](Props(classOf[EsHandler]))
Source(publisher).via(flow).to(sink).run()
class EsHandler extends ActorSubscriber with ActorPublisher[Model] {
val requestStrategy = WatermarkRequestStrategy(100)
def receive = {
case OnNext(msg: Model) =>
context.actorOf(Props(classOf[EsStorage], self)) ! msg
case OnError(err: Exception) =>
context.stop(self)
case OnComplete =>
context.stop(self)
case Response(msg) =>
if (msg.isError()) onNext(msg.getContent())
}
}
class ErrorHandler extends ActorSubscriber {
val requestStrategy = WatermarkRequestStrategy(100)
def receive = {
case OnNext(msg: Model) =>
println(msg)
}
}
We highly recommend against implementing your own Processor (which is the name the reactive streams spec gives to "Subscriber && Publisher". It is pretty hard to get it right, which is why there's not Publisher exposed directly as helper trait.
Instead, most of the time you'll want to use Sources/Sinks (or Publishers/Subscribers) provided to you and run your operations between those, as map/filter etc. steps.
In fact, there is an existing implementation for Kafka Sources and Sinks you can use, it's called reactive-kafka and is verified by the Reactive Streams TCK, so you can trust it to be valid implementations.
Related
I have a backend app writed in Scala Play. Because I have a realtime implementation using Akka Actors with data stored in a Redis server, I want as my each backend instance (deployed on centos servers) to be a Publisher and in same time a Subscriber to Redis service. Why this? Because a 3rd party app will send requests to my backend to update the data from Redis, and I want that all actors from all instances to push data to clients (frontend) indifferent on which backend instance is redirected this request (a load balancer is used there).
So, when instance1 will publish on Redis, I want that all subscribers(instance2, instance3, even instance1 because I said each instance must be pub/sub) to push data to clients.
I created an object with a Publisher and a Subscriber client and I was expecting that these will have a singleton behavior. But, for an unknown reason, over the night I see that my instances are unsubscribed from the Redis server without a message. I think this, because in the next day, my Redis service have 0 subscribers. I don't know if I have a bad implementation there or just Redis kill the connections after some time.
RedisPubSubServer.scala (In facts, here are just 2 Akka Actors which take RedisClient as params)
class Subscriber(client: RedisClient) extends Actor {
var callback: PubSubMessage => Any = { m => }
implicit val timeout = Timeout(2 seconds)
override def receive: Receive = {
case Subscribe(channel) => client.subscribe(channel)(callback)
case Register(cb) => callback = cb; self ? true
case Unsubscribe(channel) => client.unsubscribe(channel); self ? true
}
}
class Publisher(client: RedisClient) extends Actor {
implicit val timeout = Timeout(2 seconds)
override def receive: Receive = {
case Publish(channel, msg) => client.publish(channel, msg); self ? true
}
}
RedisPubSubClient.scala (here I create the Publisher and Subscriber as singleton)
object Pub {
println("starting publishing service...")
val config = ConfigFactory.load.getObject("redis").toConfig
val client = new RedisClient(config.getString("master"), config.getInt("port"))
val system = ActorSystem("RedisPublisher")
val publisher = system.actorOf(Props(new Publisher(client)))
def publish(channel: String, message: String) =
publisher ! Publish(channel, message)
}
object Sub {
val client = new RedisClient(config.getString("master"), config.getInt("port"))
val system = ActorSystem("RedisSubscriber")
val subscriber = system.actorOf(Props(new Subscriber(client)))
println("SUB Registering...")
subscriber ! Register(callback)
def sub(channel: String) = subscriber ! Subscribe(channel)
def unsub(channel: String) = subscriber ! Unsubscribe(channel)
def callback(msg: PubSubMessage) = {
msg match {
case S(channel, no) => println(s"subscribed to $channel and count $no")
case U(channel, no) => println(s"unsubscribed from $channel and count $no")
case M(channel, msg) => msg match {
case "exit" => client.unsubscribe()
case jsonString => // do the job
}
case E(e) => println(s"ERR = ${e.getMessage}")
}
}
}
and the RedisService
object RedisService {
val system = ActorSystem("RedisServiceSubscriber")
val subscriber = system.actorOf(Props(new Subscriber(client)))
subscriber ! Register(callback)
subscriber ! Subscribe("channelName")
// So, here I'm expecting that subscriber to have a life-cycle as the backend instance
}
from an api endpoint, I push data calling Pub publish method as:
def reloadData(request: AnyType) {
Pub.publish("channelName", requestAsString)
}
Can be possible as Publisher/Subscriber Actors to be killed after a while and due of that to throw in some errors for redis clients Pub/Sub?
For Publisher, I must say that I'm thinking to create the client each time when the api call is made, but for the Subscriber, I can not use another way that a singleton object which will listen the Redis entire life of the backend.
thanks
edit: used library:
"net.debasishg" %% "redisclient" % "3.41"
After some researches, I found another scala redis lib which seems to do exactly what I need in an easier maner
"com.github.etaty" %% "rediscala" % "1.9.0"
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
Let's suppose I have a master with a n slaves, where the slaves are located on other computers.
Master actor looks like
class MasterActor extends Actor {
val router: ActorRef = // ... initialized router to contact the slaves
override def receive: Receive = {
case work: DoWork => router ! work
}
}
While my slaves look like
class SlaveActor extends Actor {
override def receive: Receive = {
case work: DoWork => // Some logic that takes a couple of seconds
}
}
In the case where a machine where some slaves are hosted crashes, or if the app has been stopped while processing some work, I expect a fallback mechanism that enable the system (I suppose the master) to be aware of that failure and then redistribute the lost work to the other slaves.
I have understood the principle of supervision in Akka, that enables master to get notified when a child actor is unreachable, but how can I get back the specific instance of work that the actor had to redistribute it ?
As I started using Akka very recently, my approach is perhaps not adapted to the best practices and should I solve this situation in another way ?
Thanks !
Now this approach won't work in every scenario but how about storing the child actor work to a file system. You can create provide an id to each child and save the work in a file based on key -> value pair, where child actor id is the key and value is the work in progress.
Now, what data has to be saved, that you have to decide but this is one of the simplest approach that you can give a try.
Hope this helps!
It sounds like what you want is Akka Persistence. The main model for this is event sourcing
A rough outline would be to persist one event when receiving a DoWork and persist a second event when finishing the work. On recovery, the state is then a list of work not finished (which will, if you get the protocol right be at most one piece of work), along these lines:
case class DidWork(dw: DoWork)
class SlaveActor extends PersistentActor {
val persistenceId: String = ???
var workToDo: List[DoWork] = Nil
val receiveRecover: Receive = {
case dw: DoWork => workToDo = dw :: workToDo
case DidWork(dw: DoWork) => workToDo = workToDo.filter(_ != dw)
case SnapshotOffer(_, snapshot: List[DoWork]) => workToDo = snapshot
}
val snapshotInterval = ???
val receiveCommand: Receive = {
case dw: DoWork =>
persist(dw) { event =>
workToDo = dw :: workToDo
context.system.eventStream.publish(event)
if (lastSequenceNr % snapshotInterval == 0 && lastSequenceNr != 0)
saveSnapshot(workToDo)
doWork
}
}
private[this] def doWork: Unit = {
workToDo.reverse.foreach { dw: DoWork =>
// do the work
persist(DidWork(dw)) { event =>
workToDo = workToDo.tail
context.system.eventStream.publish(event)
if (lastSequenceNr % snapshotInterval == 0 && lastSequenceNr != 0)
saveSnapshot(workToDo)
}
}
}
}
If you want to make sure that the master knows about workers going away you can watch them (see docs https://doc.akka.io/docs/akka/current/actors.html#lifecycle-monitoring-aka-deathwatch ). If you keep track of what workload you have assigned to what worker, that is yet not completed you can implement resending. However you must likely consider the case where the master or it's actor system goes away as well, and how to recover if that happens.
The Akka distributed workers sample covers exactly this so it could be a good source of inspiration: https://developer.lightbend.com/guides/akka-distributed-workers-scala/
I'm trying to write an Actor which connects to an Amazon Kinesis stream and then relays any messages received via Comet to a Web UI. I'm using Source.actorPublisher for this and using the json method with Comet in Play described here. I got the events working just fine using Source.tick(), but when I tried using an ActorPublisher, the Actor never seems to be sent any Request messages as expected. How are requests for data usually sent down an Akka flow? I'm using v2.5 of the Play Framework.
My controller code:
def subDeviceSeen(id: Id): Action[AnyContent] = Action {
val deviceSeenSource: Source[DeviceSeenMessage, ActorRef] = Source.actorPublisher(DeviceSeenEventActor.props)
Ok.chunked(deviceSeenSource
.filter(m => m.id == id)
.map(Json.toJson(_))
via Comet.json("parent.deviceSeen")).as(ContentTypes.JSON)
}
Am I doing anything obviously wrong in the above? Here is my Actor code:
object DeviceSeenEventActor {
def props: Props = Props[DeviceSeenEventActor]
}
class DeviceSeenEventActor extends ActorPublisher[DeviceSeenMessage] {
implicit val mat = ActorMaterializer()(context)
val log = Logging(context.system, this)
def receive: Receive = {
case Request => log.debug("Received request message")
initKinesis()
context.become(run)
case Cancel => context.stop(self)
}
def run: Receive = {
case vsm:DeviceSeenMessage => onNext(vsm)
log.debug("Received request message")
onCompleteThenStop() //we are currently only interested in one message
case _:Any => log.warning("Unknown message received by event Actor")
}
private def initKinesis() = {
//init kinesis, a worker is created and given a reference to this Actor.
//The reference is used to send messages to the actor.
}
}
The 'Received request message' log line is never displayed. Am I missing some implicit? There are no warnings or anything else obvious displayed in the play console.
The issue was that I was pattern matching on case Request => ... instead of case Request() => .... Since I didn't have a default case in my receive() method, the message was simply dropped by the Actor.
I am trying to continuously read the wikipedia IRC channel using this lib: https://github.com/implydata/wikiticker
I created a custom Akka Publisher, which will be used in my system as a Source.
Here are some of my classes:
class IrcPublisher() extends ActorPublisher[String] {
import scala.collection._
var queue: mutable.Queue[String] = mutable.Queue()
override def receive: Actor.Receive = {
case Publish(s) =>
println(s"->MSG, isActive = $isActive, totalDemand = $totalDemand")
queue.enqueue(s)
publishIfNeeded()
case Request(cnt) =>
println("Request: " + cnt)
publishIfNeeded()
case Cancel =>
println("Cancel")
context.stop(self)
case _ =>
println("Hm...")
}
def publishIfNeeded(): Unit = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
println("onNext")
onNext(queue.dequeue())
}
}
}
object IrcPublisher {
case class Publish(data: String)
}
I am creating all this objects like so:
def createSource(wikipedias: Seq[String]) {
val dataPublisherRef = system.actorOf(Props[IrcPublisher])
val dataPublisher = ActorPublisher[String](dataPublisherRef)
val listener = new MessageListener {
override def process(message: Message) = {
dataPublisherRef ! Publish(Jackson.generate(message.toMap))
}
}
val ticker = new IrcTicker(
"irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(listener)
)
ticker.start() // if I comment this...
Thread.currentThread().join() //... and this I get Request(...)
Source.fromPublisher(dataPublisher)
}
So the problem I am facing is this Source object. Although this implementation works well with other sources (for example from local file), the ActorPublisher don't receive Request() messages.
If I comment the two marked lines I can see, that my actor has received the Request(count) message from my flow. Otherwise all messages will be pushed into the queue, but not in my flow (so I can see the MSG messages printed).
I think it's something with multithreading/synchronization here.
I am not familiar enough with wikiticker to solve your problem as given. One question I would have is: why is it necessary to join to the current thread?
However, I think you have overcomplicated the usage of Source. It would be easier for you to work with the stream as a whole rather than create a custom ActorPublisher.
You can use Source.actorRef to materialize a stream into an ActorRef and work with that ActorRef. This allows you to utilize akka code to do the enqueing/dequeing onto the buffer while you can focus on the "business logic".
Say, for example, your entire stream is only to filter lines above a certain length and print them to the console. This could be accomplished with:
def dispatchIRCMessages(actorRef : ActorRef) = {
val ticker =
new IrcTicker("irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(new MessageListener {
override def process(message: Message) =
actorRef ! Publish(Jackson.generate(message.toMap))
}))
ticker.start()
Thread.currentThread().join()
}
//these variables control the buffer behavior
val bufferSize = 1024
val overFlowStrategy = akka.stream.OverflowStrategy.dropHead
val minMessageSize = 32
//no need for a custom Publisher/Queue
val streamRef =
Source.actorRef[String](bufferSize, overFlowStrategy)
.via(Flow[String].filter(_.size > minMessageSize))
.to(Sink.foreach[String](println))
.run()
dispatchIRCMessages(streamRef)
The dispatchIRCMessages has the added benefit that it will work with any ActorRef so you aren't required to only work with streams/publishers.
Hopefully this solves your underlying problem...
I think the main problem is Thread.currentThread().join(). This line will 'hang' current thread because this thread is waiting for himself to die. Please read https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join-long- .