I'm using pub-sub pattern in fs2. I dynamically create topics and subscribers while processing a stream of messages. For some reason, my subscribers receive only initial message, but further published messages never get to subscribers
def startPublisher2[In](inputStream: Stream[F, Event]): Stream[F, Unit] = {
inputStream.through(processingPipe)
}
val processingPipe: Pipe[F, Event, Unit] = { inputStream =>
inputStream.flatMap {
case message: Message[_] => initSubscriber(message)
.flatMap { topic => Stream.eval(topic.publish1(message)) }
}
}
def initSubscriber[In](message: Message[In]): Stream[F,Topic[F, Event]] = {
Option(sessions.get(message.sessionId)) match {
case None =>
println(s"=== Create new topic for sessionId=${message.sessionId}")
val topic = Topic[F, Event](message)
sessions.put(message.sessionId, topic)
Stream.eval(topic) flatMap {t =>
//TODO: Is there a better solution?
Stream.empty.interruptWhen(interrupter) concurrently startSubscribers2(t)
}
case Some(topic) =>
println(s"=== Existing topic for sessionId=${message.sessionId}")
Stream.eval(topic)
}
}
Subscriber code is simple:
def startSubscribers2(topic: Topic[F, Event]): Stream[F, Unit] = {
def processEvent(): Pipe[F, Event, Unit] =
_.flatMap {
case e#Text(_) =>
Stream.eval(F.delay(println(s"Subscriber processing event: $e")))
case Message(content, sessionId) =>
//Thread.sleep(2000)
Stream.eval(F.delay(println(s"Subscriber #$sessionId got message: ${content}")))
case Quit =>
println("Quit")
Stream.eval(interrupter.set(true))
}
topic.subscribe(10).through(processEvent())
}
The output is the following:
=== Create new topic for sessionId=11111111-1111-1111-1111-111111111111
Subscriber #11111111-1111-1111-1111-111111111111 got message: 1
=== Existing topic for sessionId=11111111-1111-1111-1111-111111111111
=== Create new topic for sessionId=22222222-2222-2222-2222-222222222222
Subscriber #22222222-2222-2222-2222-222222222222 got message: 1
=== Create new topic for sessionId=33333333-3333-3333-3333-333333333333
Subscriber #33333333-3333-3333-3333-333333333333 got message: 1
=== Existing topic for sessionId=22222222-2222-2222-2222-222222222222
=== Existing topic for sessionId=22222222-2222-2222-2222-222222222222
I don't see messages published to existing topic.
Also, I'm wondering if there is a better way to start an async stream of subscribers, instead of Stream.empty.interruptWhen(interrupter) concurrently startSubscribers2(t)
Related
I have a ListenerActor which is listening to messages from backend and pushing the messages through channel as SSE Events.
I want to keep my actor alive so that i can stream continuously. How do i add keepAlive to my actor.
P.S: I am not using Akka stream or Akka http.
def filter(inboxId:String): Enumeratee[SSEPublisher.ListenerEnvelope, SSEPublisher.ListenerEnvelope] = Enumeratee.filter[SSEPublisher.ListenerEnvelope] { envelope: SSEPublisher.ListenerEnvelope => envelope.inboxId == inboxId }
def convert: Enumeratee[SSEPublisher.ListenerEnvelope, String] = Enumeratee.map[SSEPublisher.ListenerEnvelope] {
envelope =>
Json.toJson(envelope).toString()
}
def connDeathWatch(addr: String): Enumeratee[SSEPublisher.ListenerEnvelope, SSEPublisher.ListenerEnvelope] =
Enumeratee.onIterateeDone { () => println(addr + " - SSE disconnected")
}
implicit def pair[E]: EventNameExtractor[E] = EventNameExtractor[E] { p =>
val parsedJson = scala.util.parsing.json.JSON.parseFull(s"$p").get
val topic = parsedJson.asInstanceOf[Map[String, String]].apply("topic")
Some(topic)
}
implicit def id[E]: EventIdExtractor[E] = EventIdExtractor[E](p => Some(UUID.randomUUID().toString))
def events(inboxId: String) = InboxResource(inboxId)(AuthScope.Basic)(authUser => Action { implicit request =>
Ok.feed(content = ncf.sseEnumerator
&> filter(inboxId)
&> convert
&> EventSource()
).as("text/event-stream")
})
override def receive: Receive = {
case Tick =>
log.info(s"sending re-register tick to event-publisher")
Topics.all.foreach { a: Topic =>
log.info(s"$a")
clusterClient ! ClusterClient.SendToAll(publisherPath, SSEPublisher.AddListener(a, self))
}
case ListenerEnvelope(topic, inboxId, itemId, sourceId, message) =>
log.info(s"Received message from event publisher for topic $topic, for inbox $inboxId, msg : $message")
channel.push(SSEPublisher.ListenerEnvelope(topic, inboxId, itemId, sourceId, message))
}
You can create a keepAlive protocol at the actor level and use the scheduler to send the keepAlive message to the actor.
def convert(t: SomeType): Enumeratee[SSEPublisher.ListenerEnvelope, String] =
// pattern match on type t
}
I am currently playing around with akka streams and tried the following example.
Get the first element from kafka when requesting a certain HTTP endpoint.
This is the code I wrote and its working.
get {
path("ticket" / IntNumber) { ticketNr =>
val future = Consumer.plainSource(consumerSettings, Subscriptions.topics("tickets"))
.take(1)
.completionTimeout(5 seconds)
.runWith(Sink.head)
onComplete(future) {
case Success(record) => complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, record.value()))
case _ => complete(HttpResponse(StatusCodes.NotFound))
}
}
}
I am just wondering if this is the ideomatic way of working with (akka) streams.
So is there a more "direct" way of connecting the kafka stream to the HTTP response stream?
For example, when POSTing I do this:
val kafkaTicketsSink = Flow[String]
.map(new ProducerRecord[Array[Byte], String]("tickets", _))
.toMat(Producer.plainSink(producerSettings))(Keep.right)
post {
path("ticket") {
(entity(as[Ticket]) & extractMaterializer) { (ticket, mat) => {
val f = Source.single(ticket).map(t => t.description).runWith(kafkaTicketsSink)(mat)
onComplete(f) { _ =>
val locationHeader = headers.Location(s"/ticket/${ticket.id}")
complete(HttpResponse(StatusCodes.Created, headers = List(locationHeader)))
}
}
}
}
}
Maybe this can also be improved??
You could keep a single, backpressured stream alive using Sink.queue. You can pull an element from the materialized queue every time a request is received. This should give you back one element if available, and backpressure otherwise.
Something along the lines of:
val queue = Consumer.plainSource(consumerSettings, Subscriptions.topics("tickets"))
.runWith(Sink.queue())
get {
path("ticket" / IntNumber) { ticketNr =>
val future: Future[Option[ConsumerRecord[String, String]]] = queue.pull()
onComplete(future) {
case Success(Some(record)) => complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, record.value()))
case _ => complete(HttpResponse(StatusCodes.NotFound))
}
}
}
More info on Sink.queue can be found in the docs.
I started playing around scala and came to this particular boilerplate of web socket chatroom in scala.
They use MessageHub.source() and BroadcastHub.sink() as their Source and Sink for sending the messages to all connected clients.
The example is working fine for exchanging messages as it is.
private val (chatSink, chatSource) = {
// Don't log MergeHub$ProducerFailed as error if the client disconnects.
// recoverWithRetries -1 is essentially "recoverWith"
val source = MergeHub.source[WSMessage]
.log("source")
.recoverWithRetries(-1, { case _: Exception ⇒ Source.empty })
val sink = BroadcastHub.sink[WSMessage]
source.toMat(sink)(Keep.both).run()
}
private val userFlow: Flow[WSMessage, WSMessage, _] = {
Flow.fromSinkAndSource(chatSink, chatSource)
}
def chat(): WebSocket = {
WebSocket.acceptOrResult[WSMessage, WSMessage] {
case rh if sameOriginCheck(rh) =>
Future.successful(userFlow).map { flow =>
Right(flow)
}.recover {
case e: Exception =>
val msg = "Cannot create websocket"
logger.error(msg, e)
val result = InternalServerError(msg)
Left(result)
}
case rejected =>
logger.error(s"Request ${rejected} failed same origin check")
Future.successful {
Left(Forbidden("forbidden"))
}
}
}
I want to store the messages that are exchanged in the chatroom in a DB.
I tried adding map and fold functions to source and sink to get hold of the messages that are sent but I wasn't able to.
I tried adding a Flow stage between MergeHub and BroadcastHub like below
val flow = Flow[WSMessage].map(element => println(s"Message: $element"))
source.via(flow).toMat(sink)(Keep.both).run()
But it throws a compilation error that cannot reference toMat with such signature.
Can someone help or point me how can I get hold of messages that are sent and store them in DB.
Link for full template:
https://github.com/playframework/play-scala-chatroom-example
Let's look at your flow:
val flow = Flow[WSMessage].map(element => println(s"Message: $element"))
It takes elements of type WSMessage, and returns nothing (Unit). Here it is again with the correct type:
val flow: Flow[Unit] = Flow[WSMessage].map(element => println(s"Message: $element"))
This will clearly not work as the sink expects WSMessage and not Unit.
Here's how you can fix the above problem:
val flow = Flow[WSMessage].map { element =>
println(s"Message: $element")
element
}
Not that for persisting messages in the database, you will most likely want to use an async stage, roughly:
val flow = Flow[WSMessage].mapAsync(parallelism) { element =>
println(s"Message: $element")
// assuming DB.write() returns a Future[Unit]
DB.write(element).map(_ => element)
}
I am playing a bit with the NATS streaming and I have a problem with the subscriber rate limiting. When I set the max in flight to 1 and the timeout to 1 second and I have a consumer which is basically a Thread.sleep(1000) then I get multiple times the same event. I thought by limiting the in flight and using a manual ack this should not happen. How can I get exatly once delivery on very slow consumers?
case class EventBus[I, O](inputTopic: String, outputTopic: String, connection: Connection, eventProcessor: StatefulEventProcessor[I, O]) {
// the event bus could be some abstract class while the `Connection` coulbd be injected using DI
val substritionOptions: SubscriptionOptions = new SubscriptionOptions.Builder()
.setManualAcks(true)
.setDurableName("foo")
.setMaxInFlight(1)
.setAckWait(1, TimeUnit.SECONDS)
.build()
if (!inputTopic.isEmpty) {
connection.subscribe(inputTopic, new MessageHandler() {
override def onMessage(m: Message) {
m.ack()
try {
val event = eventProcessor.deserialize(m.getData)
eventProcessor.onEvent(event)
} catch {
case any =>
try {
val command = new String(m.getData)
eventProcessor.onCommand(command)
} catch {
case any => println(s"de-serialization error: $any")
}
} finally {
println("got event")
}
}
}, substritionOptions)
}
if (!outputTopic.isEmpty) {
eventProcessor.setBus(e => {
try {
connection.publish(outputTopic, eventProcessor.serialize(e))
} catch {
case ex => println(s"serialization error $ex")
}
})
}
}
abstract class StatefulEventProcessor[I, O] {
private var bus: Option[O => Unit] = None
def onEvent(event: I): Unit
def onCommand(command: String): Unit
def serialize(o: O): Array[Byte] =
SerializationUtils.serialize(o.asInstanceOf[java.io.Serializable])
def deserialize(in: Array[Byte]): I =
SerializationUtils.deserialize[I](in)
def setBus(push: O => Unit): Unit = {
if (bus.isDefined) {
throw new IllegalStateException("bus already set")
} else {
bus = Some(push)
}
}
def push(event: O) =
bus.get.apply(event)
}
EventBus("out-1", "out-2", sc, new StatefulEventProcessor[String, String] {
override def onEvent(event: String): Unit = {
Thread.sleep(1000)
push("!!!" + event)
}
override def onCommand(command: String): Unit = {}
})
(0 until 100).foreach(i => sc.publish("out-1", SerializationUtils.serialize(s"test-$i")))
First, there is no exactly once (re)delivery guarantee with NATS Streaming. What MaxInflight gives you, is the assurance that the server will not send new messages to the subscriber until the number of unacknowledged messages is below that number. So in case of MaxInflight(1), you are asking the server to send the next new message only after receiving the ack from the previously delivered message. However, this does not block redelivery of unacknowledged messages.
The server has no guarantee or no knowledge that a message is actually received by a subscriber. This is what the ACK is for, to let the server know that the message was properly processed by the subscriber. If the server would not honor redelivery (even when MaxInflight is reached), then a "lost" message would stall your subscription for ever. Keep in mind that NATS Streaming server and clients are not directly connected to each other with a TCP connection (they are both connected to a NATS server, aka gnatsd).
I can't find lifecycle description for High level consumer. I'm on 0.8.2.2 and I can't use "modern" consumer from kafka-clients. Here is my code:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
val consumerProperties = new Properties()
consumerProperties.put("zookeeper.connect", kafkaConfig.zooKeeperConnectString)
consumerProperties.put("group.id", consumerGroup)
consumerProperties.put("auto.offset.reset", "smallest")
val consumer = Consumer.create(new ConsumerConfig(consumerProperties))
try {
val messageStreams = consumer.createMessageStreams(
Predef.Map(kafkaConfig.topic -> 1),
new DefaultDecoder,
new MessageEnvelopeDecoder)
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Await.result(receiveMessageFuture, await)
} finally {
consumer.shutdown()
}
It's not clear to me. Should I shutdown consumer after each message retrieval or I can keep instance and reuse it for message fetching? I suppose reusing instance is the right way, but can't find some articles / best practices.
I'm trying to reuse consumer and / or messageStreams. It doesn't work well for me and I can't find the reason for it.
If I try to reuse messageStreams, I get exception:
2017-04-17_19:57:57.088 ERROR MessageEnvelopeConsumer - Error while awaiting for messages java.lang.IllegalStateException: Iterator is in failed state
java.lang.IllegalStateException: Iterator is in failed state
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
at scala.collection.IterableLike$class.take(IterableLike.scala:134)
at kafka.consumer.KafkaStream.take(KafkaStream.scala:25)
Happens here:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
try {
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Try(Await.result(receiveMessageFuture, await)) match {
case Success(result) => result
case Failure(_: TimeoutException) => List.empty
case Failure(e) =>
// ===> never got any message from topic
logger.error(s"Error while awaiting for messages ${e.getClass.getName}: ${e.getMessage}", e)
List.empty
}
} catch {
case e: Exception =>
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
I tried to create messageStreams each time:
no luck...
2017-04-17_20:02:44.236 WARN MessageEnvelopeConsumer - Error while consuming messages
kafka.common.MessageStreamsExistException: ZookeeperConsumerConnector can create message streams at most once
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:151)
at MessageEnvelopeConsumer.consume(MessageEnvelopeConsumer.scala:47)
Happens here:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
try {
val messageStreams = consumer.createMessageStreams(
Predef.Map(kafkaConfig.topic -> 1),
new DefaultDecoder,
new MessageEnvelopeDecoder)
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Try(Await.result(receiveMessageFuture, await)) match {
case Success(result) => result
case Failure(_: TimeoutException) => List.empty
case Failure(e) =>
logger.error(s"Error while awaiting for messages ${e.getClass.getName}: ${e.getMessage}", e)
List.empty
}
} catch {
case e: Exception =>
// ===> now exception raised here
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
UPD
I used iterator based approach. It looks this way:
// consumerProperties.put("consumer.timeout.ms", "100")
private lazy val consumer: ConsumerConnector = Consumer.create(new ConsumerConfig(consumerProperties))
private lazy val messageStreams: Seq[KafkaStream[Array[Byte], MessageEnvelope]] =
consumer.createMessageStreamsByFilter(Whitelist(kafkaConfig.topic), 1, new DefaultDecoder, new MessageEnvelopeDecoder)
private lazy val iterator: ConsumerIterator[Array[Byte], MessageEnvelope] = {
val stream = messageStreams.head
stream.iterator()
}
def consume(): List[MessageEnvelope] = {
try {
if (iterator.hasNext) {
val fromKafka: MessageAndMetadata[Array[Byte], MessageEnvelope] = iterator.next
List(fromKafka.message())
} else {
List.empty
}
} catch {
case _: ConsumerTimeoutException =>
List.empty
case e: Exception =>
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
Now I'm trying to figure out if it automatically commits offsets to ZK...
Constant shutdown causes unnecessary consumer group rebalances which affects the performance a lot. See this article for best practices: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
My answer is the latest question update. Iterator approach works for me as expected.