How to get a Scala Kafka Consumer in a Play application to continously listen to the broker through the life of the application? - scala

I am trying to create a Play-Scala application that uses a Scala Kafka Consumer to listen to a Kafka broker. I am using the Cake Solutions Scala Kafka Client library, and following their example here.
I have created a containing class to act as a Kafka consumer provider, and I have bound this as an eager singleton so that it is created when the application starts up.
The problem is that the consumer will listen to the broker when the application starts up, but not after that.
Here is my code for the ConsumerProvider:
trait KafkaConsumerProvider {
def consumer: ActorRef
}
#Singleton
class KafkaConsumerProviderImpl #Inject() (actorSystem: ActorSystem, configuration: Configuration)
extends KafkaConsumerProvider {
private val consumerConf: KafkaConsumer.Conf[String, String] = KafkaConsumer.Conf(
keyDeserializer = new StringDeserializer,
valueDeserializer = new StringDeserializer,
bootstrapServers = configuration.get[String]("messageBroker.bootstrapServers"),
groupId = configuration.get[String]("messageBroker.consumer.groupId"),
enableAutoCommit = false,
autoCommitInterval= 1000,
sessionTimeoutMs = 10000,
maxPartitionFetchBytes = ConsumerConfig.DEFAULT_MAX_PARTITION_FETCH_BYTES,
maxPollRecords = 500,
maxPollInterval = 300000,
maxMetaDataAge = 300000,
autoOffsetReset = OffsetResetStrategy.LATEST,
isolationLevel = IsolationLevel.READ_UNCOMMITTED,
)
private val actorConf: KafkaConsumerActor.Conf = KafkaConsumerActor.Conf(
scheduleInterval = 1.seconds, // scheduling interval for Kafka polling when consumer is inactive
unconfirmedTimeout = 3.seconds, // duration for how long to wait for a confirmation before redelivery
maxRedeliveries = 3 // maximum number of times a unconfirmed message will be redelivered
)
override val consumer: ActorRef = {
val receiverActor = actorSystem.actorOf(ReceiverActor.props)
val topics = configuration.get[String]("messageBroker.consumer.topics").split(",").toSeq
val _consumer = actorSystem.actorOf(KafkaConsumerActor.props(consumerConf, actorConf, receiverActor))
_consumer ! Subscribe.AutoPartition(topics)
_consumer
}
}
and here is how I am binding the dependency as an eager singleton in Module.scala:
class Module extends AbstractModule with ScalaModule {
override def configure(): Unit = {
bind[KafkaMessageBrokerWriter].to[KafkaMessageBrokerWriterImpl].asEagerSingleton()
bind[KafkaConsumerProvider].to[KafkaConsumerProviderImpl].asEagerSingleton()
}
}
How do I get the consumer to keep listening?

The problem was that, in the ReceiverActor, I forgot to confirm the offsets:
sender() ! Confirm(records.offsets)

Related

Akka Streams Reactive Kafka - OutOfMemoryError under high load

I am running an Akka Streams Reactive Kafka application which should be functional under heavy load. After running the application for around 10 minutes, the application goes down with an OutOfMemoryError. I tried to debug the heap dump and found that akka.dispatch.Dispatcher is taking ~5GB of memory. Below are my config files.
Akka version: 2.4.18
Reactive Kafka version: 2.4.18
1.application.conf:
consumer {
num-consumers = "2"
c1 {
bootstrap-servers = "localhost:9092"
bootstrap-servers=${?KAFKA_CONSUMER_ENDPOINT1}
groupId = "testakkagroup1"
subscription-topic = "test"
subscription-topic=${?SUBSCRIPTION_TOPIC1}
message-type = "UserEventMessage"
poll-interval = 100ms
poll-timeout = 50ms
stop-timeout = 30s
close-timeout = 20s
commit-timeout = 15s
wakeup-timeout = 10s
use-dispatcher = "akka.kafka.default-dispatcher"
kafka-clients {
enable.auto.commit = true
}
}
2.build.sbt:
java -Xmx6g \
-Dcom.sun.management.jmxremote.port=27019 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.rmi.server.hostname=localhost \
-Dzookeeper.host=$ZK_HOST \
-Dzookeeper.port=$ZK_PORT \
-jar ./target/scala-2.11/test-assembly-1.0.jar
3.Source and Sink actors:
class EventStream extends Actor with ActorLogging {
implicit val actorSystem = context.system
implicit val timeout: Timeout = Timeout(10 seconds)
implicit val materializer = ActorMaterializer()
val settings = Settings(actorSystem).KafkaConsumers
override def receive: Receive = {
case StartUserEvent(id) =>
startStreamConsumer(consumerConfig("EventMessage"+".c"+id))
}
def startStreamConsumer(config: Map[String, String]) = {
val consumerSource = createConsumerSource(config)
val consumerSink = createConsumerSink()
val messageProcessor = startMessageProcessor(actorA, actorB, actorC)
log.info("Starting The UserEventStream processing")
val future = consumerSource.map { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}.runWith(consumerSink)
future.onComplete {
case _ => actorSystem.stop(messageProcessor)
}
}
def startMessageProcessor(actorA: ActorRef, actorB: ActorRef, actorC: ActorRef) = {
actorSystem.actorOf(Props(classOf[MessageProcessor], actorA, actorB, actorC))
}
def createConsumerSource(config: Map[String, String]) = {
val kafkaMBAddress = config("bootstrap-servers")
val groupID = config("groupId")
val topicSubscription = config("subscription-topic").split(',').toList
println(s"Subscriptiontopics $topicSubscription")
val consumerSettings = ConsumerSettings(actorSystem, new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers(kafkaMBAddress)
.withGroupId(groupID)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
.withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true")
Consumer.committableSource(consumerSettings, Subscriptions.topics(topicSubscription:_*))
}
def createConsumerSink() = {
Sink.foreach(println)
}
}
In this case actorA, actorB, and actorC are doing some business logic processing and database interaction. Is there anything I am missing in handling the Akka Reactive Kafka consumers such as commit, error, or throttling configuration? Because looking into the heap dump, I could guess that the messages are piling up.
One thing I would change is the following:
val future = consumerSource.map { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}.runWith(consumerSink)
In the above code, you're using ask to send messages to the messageProcessor actor and expect replies, but in order for ask to function as a backpressure mechanism, you need to use it with mapAsync (more information is in the documentation). Something like the following:
val future =
consumerSource
.mapAsync(parallelism = 5) { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}
.runWith(consumerSink)
Adjust the level of parallelism as needed.

How to Test Kafka Consumer

I have a Kafka Consumer (built in Scala) which extracts latest records from Kafka. The consumer looks like this:
val consumerProperties = new Properties()
consumerProperties.put("bootstrap.servers", "localhost:9092")
consumerProperties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("group.id", "something")
consumerProperties.put("auto.offset.reset", "latest")
val consumer = new KafkaConsumer[String, String](consumerProperties)
consumer.subscribe(java.util.Collections.singletonList("topic"))
Now, I want to write an integration test for it. Is there any way or any best practice for Testing Kafka Consumers?
You need to start zookeeper and kafka programmatically for integration tests.
1.1 start zookeeper (ZooKeeperServer)
def startZooKeeper(zooKeeperPort: Int, zkLogsDir: Directory): ServerCnxnFactory = {
val tickTime = 2000
val zkServer = new ZooKeeperServer(zkLogsDir.toFile.jfile, zkLogsDir.toFile.jfile, tickTime)
val factory = ServerCnxnFactory.createFactory
factory.configure(new InetSocketAddress("0.0.0.0", zooKeeperPort), 1024)
factory.startup(zkServer)
factory
}
1.2 start kafka (KafkaServer)
case class StreamConfig(streamTcpPort: Int = 9092,
streamStateTcpPort :Int = 2181,
stream: String,
numOfPartition: Int = 1,
nodes: Map[String, String] = Map.empty)
def startKafkaBroker(config: StreamConfig,
kafkaLogDir: Directory): KafkaServer = {
val syncServiceAddress = s"localhost:${config.streamStateTcpPort}"
val properties: Properties = new Properties
properties.setProperty("zookeeper.connect", syncServiceAddress)
properties.setProperty("broker.id", "0")
properties.setProperty("host.name", "localhost")
properties.setProperty("advertised.host.name", "localhost")
properties.setProperty("port", config.streamTcpPort.toString)
properties.setProperty("auto.create.topics.enable", "true")
properties.setProperty("log.dir", kafkaLogDir.toAbsolute.path)
properties.setProperty("log.flush.interval.messages", 1.toString)
properties.setProperty("log.cleaner.dedupe.buffer.size", "1048577")
config.nodes.foreach {
case (key, value) => properties.setProperty(key, value)
}
val broker = new KafkaServer(new KafkaConfig(properties))
broker.startup()
println(s"KafkaStream Broker started at ${properties.get("host.name")}:${properties.get("port")} at ${kafkaLogDir.toFile}")
broker
}
emit some events to stream using KafkaProducer
Then consume with your consumer to test and verify its working
You can use scalatest-eventstream that has startBroker method which will start Zookeeper and Kafka for you.
Also has destroyBroker which will cleanup your kafka after tests.
eg.
class MyStreamConsumerSpecs extends FunSpec with BeforeAndAfterAll with Matchers {
implicit val config =
StreamConfig(streamTcpPort = 9092, streamStateTcpPort = 2181, stream = "test-topic", numOfPartition = 1)
val kafkaStream = new KafkaEmbeddedStream
override protected def beforeAll(): Unit = {
kafkaStream.startBroker
}
override protected def afterAll(): Unit = {
kafkaStream.destroyBroker
}
describe("Kafka Embedded stream") {
it("does consume some events") {
//uses application.properties
//emitter.broker.endpoint=localhost:9092
//emitter.event.key.serializer=org.apache.kafka.common.serialization.StringSerializer
//emitter.event.value.serializer=org.apache.kafka.common.serialization.StringSerializer
kafkaStream.appendEvent("test-topic", """{"MyEvent" : { "myKey" : "myValue"}}""")
val consumerProperties = new Properties()
consumerProperties.put("bootstrap.servers", "localhost:9092")
consumerProperties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("group.id", "something")
consumerProperties.put("auto.offset.reset", "earliest")
val myConsumer = new KafkaConsumer[String, String](consumerProperties)
myConsumer.subscribe(java.util.Collections.singletonList("test-topic"))
val events = myConsumer.poll(2000)
events.count() shouldBe 1
events.iterator().next().value() shouldBe """{"MyEvent" : { "myKey" : "myValue"}}"""
println("=================" + events.count())
}
}
}

Akka - Measure time of consumer

I'm developing a system that pulls messages from a JMS(the consumer) and push it to a Kafka Topic(the producer).
Since my consumer stays alive waiting for new messages arriving in the JMS queue and push it to Kafka, how can I effectively measure how many messages I can pull by second?
Here is my code:
My Consumer:
class ActiveMqConsumerActor extends Consumer {
var startTime: Long = _
val log = Logging(context.system, this)
val producerActor = context.actorOf(Props[KafkaProducerActor])
override def autoAck = false
override def endpointUri: String = "activemq:KafkaTest"
override def receive: Receive = LoggingReceive {
case msg: CamelMessage =>
val camelMsg = msg.bodyAs[String]
producerActor ! Message(camelMsg.getBytes)
sender() ! Ack
case ex: Exception => sender() ! Failure(ex)
case _ =>
log.error("Got a message that I don't understand")
sender() ! Failure(new Exception("Got a message that I don't understand"))
}
}
The main:
object ActiveMqConsumerTest extends App {
val system = ActorSystem("KafkaSystem")
val camel = CamelExtension(system)
val camelContext = camel.context
camelContext.addComponent("activemq", ActiveMQComponent.activeMQComponent("tcp://0.0.0.0:61616"))
val consumer = system.actorOf(Props[ActiveMqConsumerActor].withRouter(FromConfig), "consumer")
val producer = system.actorOf(Props[KafkaProducerActor].withRouter(FromConfig), "producer")
}
Thanks
You can try using something like "Metrics". https://dropwizard.github.io/metrics/3.1.0/manual/ You can define precise metrics, including time and use that inside of your actor.

stopping spark streaming after reading first batch of data

I am using spark streaming to consume kafka messages. I want to get some messages as sample from kafka instead of reading all messages. So I want to read a batch of messages, return them to caller and stopping spark streaming. Currently I am passing batchInterval time in awaitTermination method of spark streaming context method. I don't now how to return processed data to caller from spark streaming. Here is my code that I am using currently
def getsample(params: scala.collection.immutable.Map[String, String]): Unit = {
if (params.contains("zookeeperQourum"))
zkQuorum = params.get("zookeeperQourum").get
if (params.contains("userGroup"))
group = params.get("userGroup").get
if (params.contains("topics"))
topics = params.get("topics").get
if (params.contains("numberOfThreads"))
numThreads = params.get("numberOfThreads").get
if (params.contains("sink"))
sink = params.get("sink").get
if (params.contains("batchInterval"))
interval = params.get("batchInterval").get.toInt
val sparkConf = new SparkConf().setAppName("KafkaConsumer").setMaster("spark://cloud2-server:7077")
val ssc = new StreamingContext(sparkConf, Seconds(interval))
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
var consumerConfig = scala.collection.immutable.Map.empty[String, String]
consumerConfig += ("auto.offset.reset" -> "smallest")
consumerConfig += ("zookeeper.connect" -> zkQuorum)
consumerConfig += ("group.id" -> group)
var data = KafkaUtils.createStream[Array[Byte], Array[Byte], DefaultDecoder, DefaultDecoder](ssc, consumerConfig, topicMap, StorageLevel.MEMORY_ONLY).map(_._2)
val streams = data.window(Seconds(interval), Seconds(interval)).map(x => new String(x))
streams.foreach(rdd => rdd.foreachPartition(itr => {
while (itr.hasNext && size >= 0) {
var msg=itr.next
println(msg)
sample.append(msg)
sample.append("\n")
size -= 1
}
}))
ssc.start()
ssc.awaitTermination(5000)
ssc.stop(true)
}
So instead of saving messages in a String builder called "sample" I want to return to caller.
You can implement a StreamingListener and then inside it, onBatchCompleted you can call ssc.stop()
private class MyJobListener(ssc: StreamingContext) extends StreamingListener {
override def onBatchCompleted(batchCompleted: StreamingListenerBatchCompleted) = synchronized {
ssc.stop(true)
}
}
This is how you attach your SparkStreaming to the JobListener:
val listen = new MyJobListener(ssc)
ssc.addStreamingListener(listen)
ssc.start()
ssc.awaitTermination()
We can get sample messages using following piece of code
var sampleMessages=streams.repartition(1).mapPartitions(x=>x.take(10))
and if we want to stop after first batch then we should implement our own StreamingListener interface and should stop streaming in onBatchCompleted method.

request-reply with akka-camel and ActiveMQ

Update: It would seem that an even simpler test case is not working: just trying to send a message from an ActiveMQ producer to an ActiveMQ consumer via the in-process broker. Here is the code:
val brokerURL = "vm://localhost?broker.persistent=false"
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("foo.bar")
val producer = session.createProducer(queue)
val consumer = session.createConsumer(queue)
val message = session.createTextMessage("marco")
producer.send(message)
val resp = consumer.receive(2000)
assert(resp != null)
I'm trying to implement a very simple request-reply pattern using akka-camel. Here's my (testbench) code which is trying to use activeMQ directly to send a message and expect a response:
val brokerURL = "vm://localhost?broker.persistent=false"
// create in-process broker, session, queue, etc...
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("myapp.somequeue")
val producer = session.createProducer(queue)
val tempDest = session.createTemporaryQueue()
val respConsumer = session.createConsumer(tempDest)
val message = session.createTextMessage("marco")
message.setJMSReplyTo(tempDest)
message.setJMSCorrelationID("myCorrelationID")
// create actor system with CamelExtension
val camel = CamelExtension(system)
val camelContext = camel.context
camelContext.addComponent("activemq", ActiveMQComponent.activeMQComponent(brokerURL))
val listener = system.actorOf(Props[Frontend])
// send a message, expect a response
producer.send(message)
val resp: TextMessage = respConsumer.receive(5000).asInstanceOf[TextMessage]
assert(resp.getText() == "polo")
I've tried two different approaches for the Consumer actor. The first is simpler, which attempts to respond using sender !:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
sender ! "polo"
}
}
}
The second attempts to reply using the CamelTemplate:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
val replyTo = msg.getHeaderAs("JMSReplyTo", classOf[ActiveMQTempQueue], camelContext)
val correlationId = msg.getHeaderAs("JMSCorrelationID", classOf[String], camelContext)
camel.template.sendBodyAndHeader("activemq:"+replyTo.getQueueName(), "polo", "JMSCorrelationID", correlationId)
}
}
}
I do see the println() output from my actor's receive method, so the ActiveMQ message is getting into the actor, but I get a timeout on the respConsumer.receive() call in the testbench. I've tried lots of combinations of specifying and not specifying headers in the reply. I've also tried enabling and disabling autoAck.
Thanks in advance.
Turns out I needed to call connection.start() in the JMS code.