Akka streams Source.actorRef vs Source.queue vs buffer, which one to use? - scala

I am using akka-streams-kafka to created a stream consumer from a kafka topic.
Using broadcast to serve events from kafka topic to web socket clients.
I have found following three approaches to create a stream Source.
Question:
My goal is to serve hundreds/thousands of websocket clients (some of which might be slow consumers). Which approach scales better?
Appreciate any thoughts?
Broadcast lowers the rate down to slowest consumer.
BUFFER_SIZE = 100000
Source.ActorRef (source actor does not support backpressure option)
val kafkaSourceActorWithBroadcast = {
val (sourceActorRef, kafkaSource) = Source.actorRef[String](BUFFER_SIZE, OverflowStrategy.fail)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run
Consumer.plainSource(consumerSettings,
Subscriptions.topics(KAFKA_TOPIC))
.runForeach(record => sourceActorRef ! Util.toJson(record.value()))
kafkaSource
}
Source.queue
val kafkaSourceQueueWithBroadcast = {
val (futureQueue, kafkaQueueSource) = Source.queue[String](BUFFER_SIZE, OverflowStrategy.backpressure)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run
Consumer.plainSource(consumerSettings, Subscriptions.topics(KAFKA_TOPIC))
.runForeach(record => futureQueue.offer(Util.toJson(record.value())))
kafkaQueueSource
}
buffer
val kafkaSourceWithBuffer = Consumer.plainSource(consumerSettings, Subscriptions.topics(KAFKA_TOPIC))
.map(record => Util.toJson(record.value()))
.buffer(BUFFER_SIZE, OverflowStrategy.backpressure)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.right).run
Websocket route code for completeness:
val streamRoute =
path("stream") {
handleWebSocketMessages(websocketFlow)
}
def websocketFlow(where: String): Flow[Message, Message, NotUsed] = {
Flow[Message]
.collect {
case TextMessage.Strict(msg) => Future.successful(msg)
case TextMessage.Streamed(stream) =>
stream.runFold("")(_ + _).flatMap(msg => Future.successful(msg))
}
.mapAsync(parallelism = PARALLELISM)(identity)
.via(logicStreamFlow)
.map { msg: String => TextMessage.Strict(msg) }
}
private def logicStreamFlow: Flow[String, String, NotUsed] =
Flow.fromSinkAndSource(Sink.ignore, kafkaSourceActorWithBroadcast)

Related

Akka committableOffset store in DB

I am beginner in akka and I have an problem statement to work with. I have an akka flow that's reading Kafka Events from some topic and doing some transformation before creating Commitable offset of the message.
I am not sure the best way to add a akka sink on top of this code to store the transformed events in some DB
def eventTransform : Flow[KafkaMessage,CommittableRecord[Either[Throwable,SomeEvent]],NotUsed]
def processEvents
: Flow[KafkaMessage, ConsumerMessage.CommittableOffset, NotUsed] =
Flow[KafkaMessage]
.via(eventTransform)
.filter({ x =>
x.value match {
case Right(event: SomeEvent) =>
event.status != "running"
case Left(_) => false
}
})
.map(_.message.committableOffset)
This is my akka source calling the akka flow
private val consumerSettings: ConsumerSettings[String, String] = ConsumerSettings(
system,
new StringDeserializer,
new StringDeserializer,
)
.withGroupId(groupId)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
private val committerSettings: CommitterSettings = CommitterSettings(system)
private val control = new AtomicReference[Consumer.Control](Consumer.NoopControl)
private val restartableSource = RestartSource
.withBackoff(restartSettings) { () =>
Consumer
.committableSource(consumerSettings, Subscriptions.topics(topicIn))
.mapMaterializedValue(control.set)
.via(processEvents) // calling the flow here
}
restartableSource
.toMat(Committer.sink(committerSettings))(Keep.both)
.run()
def api(): Behavior[Message] =
Behaviors.receive[Message] { (_, message) =>
message match {
case Stop =>
context.pipeToSelf(control.get().shutdown())(_ => Stopped)
Behaviors.same
case Stopped =>
Behaviors.stopped
}
}
.receiveSignal {
case (_, _ #(PostStop | PreRestart)) =>
control.get().shutdown()
Behaviors.same
}
}

Akka-http first websocket client only receives the data once from a kafka topic

I am using akka-http websocket to push messages from a kafka topic to websocket clients.
For this purpose, i created a plain kafka consumer (using akka-streams-kafka connector) with offset set to "earliest" so that every new websocket client connecting gets all the data from the beginning.
The problem is that the first connected websocket client gets all the data and other ws clients (connecting after the first client has got all the data) do not get any. The kafka topic has 1million records.
I am using the BroadcastHub from Akka-streams.
Appreciate any suggestions.
lazy private val kafkaPlainSource: Source[String, NotUsed] = {
val consumerSettings = ConsumerSettings(system, new StringDeserializer, new StringDeserializer)
.withBootstrapServers(KAFKA_BROKERS)
.withGroupId(UUID.randomUUID().toString)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val kafkaSource = Consumer.plainSource(consumerSettings, Subscriptions.topics(KAFKA_TOPIC))
.mapAsync(PARALLELISM) { cr =>
Future {
cr.value
}
}
kafkaSource.toMat(BroadcastHub.sink)(Keep.right).run
}
def logicFlow: Flow[String, String, NotUsed] =
Flow.fromSinkAndSourceCoupled(Sink.ignore, kafkaSource)
val websocketFlow: Flow[Message, Message, Any] = {
Flow[Message]
.map {
case TextMessage.Strict(msg) => msg
case _ => println("ignore streamed message")
}
.via(logicFlow)
.map { msg: String => TextMessage.Strict(msg) }
}
lazy private val streamRoute =
path("stream") {
handleWebSocketMessages {
websocketFlow
.watchTermination() { (_, done) =>
done.onComplete {
case Success(_) =>
log.info("Stream route completed successfully")
case Failure(ex) =>
log.error(s"Stream route completed with failure : $ex")
}
}
}
}
def startServer(): Unit = {
bindingFuture = Http().bindAndHandle(wsRoutes, HOST, PORT)
log.info(s"Server online at http://localhost:9000/")
}
def stopServer(): Unit = {
bindingFuture
.flatMap(_.unbind())
.onComplete{
_ => system.terminate()
log.info("terminated")
}
}

Kafka producer hangs on send

The logic is that a streaming job, getting data from a custom source has to write both to Kafka as well as HDFS.
I wrote a (very) basic Kafka producer to do this, however the whole streaming job hangs on the send method.
class KafkaProducer(val kafkaBootstrapServers: String, val kafkaTopic: String, val sslCertificatePath: String, val sslCertificatePassword: String) {
val kafkaProps: Properties = new Properties()
kafkaProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers)
kafkaProps.put("acks", "1")
kafkaProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
kafkaProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
kafkaProps.put("ssl.truststore.location", sslCertificatePath)
kafkaProps.put("ssl.truststore.password", sslCertificatePassword)
val kafkaProducer: KafkaProducer[Long, Array[String]] = new KafkaProducer(kafkaProps)
def sendKafkaMessage(message: Message): Unit = {
message.data.foreach(list => {
val producerRecord: ProducerRecord[Long, Array[String]] = new ProducerRecord[Long, Array[String]](kafkaTopic, message.timeStamp.getTime, list.toArray)
kafkaProducer.send(producerRecord)
})
}
}
And the code calling the producer:
receiverStream.foreachRDD(rdd => {
val messageRowRDD: RDD[Row] = rdd.mapPartitions(partition => {
val parser: Parser = new Parser
val kafkaProducer: KafkaProducer = new KafkaProducer(kafkaBootstrapServers, kafkaTopic, kafkaSslCertificatePath, kafkaSslCertificatePass)
val newPartition = partition.map(message => {
Logger.getLogger("importer").error("Writing Message to Kafka...")
kafkaProducer.sendKafkaMessage(message)
Logger.getLogger("importer").error("Finished writing Message to Kafka")
Message.data.map(singleMessage => parser.parseMessage(Message.timeStamp.getTime, singleMessage))
})
newPartition.flatten
})
val df = sqlContext.createDataFrame(messageRowRDD, Schema.messageSchema)
Logger.getLogger("importer").info("Entries-count: " + df.count())
val row = Try(df.first)
row match {
case Success(s) => Persister.writeDataframeToDisk(df, outputFolder)
case Failure(e) => Logger.getLogger("importer").warn("Resulting DataFrame is empty. Nothing can be written")
}
})
From the logs I can tell that each executor is reaching the "sending to kafka" point, however not any further. All executors hang on that and no exception is thrown.
The Message class is a very simple case class with 2 fields, a timestamp and an array of strings.
This was due to the acks field in Kafka.
Acks was set to 1 and sends went ahead a lot faster.

Akka - Measure time of consumer

I'm developing a system that pulls messages from a JMS(the consumer) and push it to a Kafka Topic(the producer).
Since my consumer stays alive waiting for new messages arriving in the JMS queue and push it to Kafka, how can I effectively measure how many messages I can pull by second?
Here is my code:
My Consumer:
class ActiveMqConsumerActor extends Consumer {
var startTime: Long = _
val log = Logging(context.system, this)
val producerActor = context.actorOf(Props[KafkaProducerActor])
override def autoAck = false
override def endpointUri: String = "activemq:KafkaTest"
override def receive: Receive = LoggingReceive {
case msg: CamelMessage =>
val camelMsg = msg.bodyAs[String]
producerActor ! Message(camelMsg.getBytes)
sender() ! Ack
case ex: Exception => sender() ! Failure(ex)
case _ =>
log.error("Got a message that I don't understand")
sender() ! Failure(new Exception("Got a message that I don't understand"))
}
}
The main:
object ActiveMqConsumerTest extends App {
val system = ActorSystem("KafkaSystem")
val camel = CamelExtension(system)
val camelContext = camel.context
camelContext.addComponent("activemq", ActiveMQComponent.activeMQComponent("tcp://0.0.0.0:61616"))
val consumer = system.actorOf(Props[ActiveMqConsumerActor].withRouter(FromConfig), "consumer")
val producer = system.actorOf(Props[KafkaProducerActor].withRouter(FromConfig), "producer")
}
Thanks
You can try using something like "Metrics". https://dropwizard.github.io/metrics/3.1.0/manual/ You can define precise metrics, including time and use that inside of your actor.

request-reply with akka-camel and ActiveMQ

Update: It would seem that an even simpler test case is not working: just trying to send a message from an ActiveMQ producer to an ActiveMQ consumer via the in-process broker. Here is the code:
val brokerURL = "vm://localhost?broker.persistent=false"
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("foo.bar")
val producer = session.createProducer(queue)
val consumer = session.createConsumer(queue)
val message = session.createTextMessage("marco")
producer.send(message)
val resp = consumer.receive(2000)
assert(resp != null)
I'm trying to implement a very simple request-reply pattern using akka-camel. Here's my (testbench) code which is trying to use activeMQ directly to send a message and expect a response:
val brokerURL = "vm://localhost?broker.persistent=false"
// create in-process broker, session, queue, etc...
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("myapp.somequeue")
val producer = session.createProducer(queue)
val tempDest = session.createTemporaryQueue()
val respConsumer = session.createConsumer(tempDest)
val message = session.createTextMessage("marco")
message.setJMSReplyTo(tempDest)
message.setJMSCorrelationID("myCorrelationID")
// create actor system with CamelExtension
val camel = CamelExtension(system)
val camelContext = camel.context
camelContext.addComponent("activemq", ActiveMQComponent.activeMQComponent(brokerURL))
val listener = system.actorOf(Props[Frontend])
// send a message, expect a response
producer.send(message)
val resp: TextMessage = respConsumer.receive(5000).asInstanceOf[TextMessage]
assert(resp.getText() == "polo")
I've tried two different approaches for the Consumer actor. The first is simpler, which attempts to respond using sender !:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
sender ! "polo"
}
}
}
The second attempts to reply using the CamelTemplate:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
val replyTo = msg.getHeaderAs("JMSReplyTo", classOf[ActiveMQTempQueue], camelContext)
val correlationId = msg.getHeaderAs("JMSCorrelationID", classOf[String], camelContext)
camel.template.sendBodyAndHeader("activemq:"+replyTo.getQueueName(), "polo", "JMSCorrelationID", correlationId)
}
}
}
I do see the println() output from my actor's receive method, so the ActiveMQ message is getting into the actor, but I get a timeout on the respConsumer.receive() call in the testbench. I've tried lots of combinations of specifying and not specifying headers in the reply. I've also tried enabling and disabling autoAck.
Thanks in advance.
Turns out I needed to call connection.start() in the JMS code.