Kafka: one producer for two topics vs. two producers - scala

There are two kafka-topics:
logs (in text format), for this case I use standard StringSerializer for Kafka
events (in JSON format), for this case I use custom JSON Serializer for Kafka
There is some REST web-application (based on Servlet).
Which approach is best for this application?
Approach 1: Create single producer for both topics:
val producer = new KafkaProducer[String, AnyRef](...props...)
// send logs
producer.send(new ProducerRecord[String, AnyRef](
topic = "logs", "some log key", "some log str"))
// send events
producer.send(new ProducerRecord[String, AnyRef](
topic = "events", "some evt key", Event("some"))
Approach 2: Create two producers with strong types of values.
val logsProducer = new KafkaProducer[String, String](...props...)
val eventsProducer = new KafkaProducer[String, Event](...props...)
// send logs
logsProducer.send(new ProducerRecord[String, String](
topic = "logs", "some log key", "some log str"))
// send events
eventsProducer.send(new ProducerRecord[String, Event](
topic = "events", "some evt key", Event("some event"))
Update 1: For the Approach-1 I use own serializer based on Json4s:
class KafkaJson4sSerializer[T <: AnyRef] extends Serializer[T] {
import org.json4s._
import org.json4s.native.Serialization
import org.json4s.native.Serialization.write
implicit val formats = Serialization.formats(NoTypeHints)
override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {}
override def serialize(topic: String, data: T): Array[Byte] = {
write(data).getBytes
}
override def close(): Unit = {}
}
val p = new Properties()
p.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
p.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"KafkaJson4sSerializer") // use the above own serializer
val producer = new KafkaProducer[String, AnyRef](p)
// send string type to topic 'logs'
producer.send(new ProducerRecord[String, AnyRef]("logs", "k1", "string value"))
// send Event type to another topic 'events'
producer.send(new ProducerRecord[String, AnyRef]("events", "k2", Event("some evt")))

Related

Kafka Producer/Consumer crushing every second API call

Everytime I make the second API call, I get an error in Postman saying "There was an internal server error."
I don't understand if the problem is related to my kafka producer or consumer, they both worked just fine yesterday. The messages don't arrive anymore to the consumer and I can't make a second API call as the code crushes every second time (without giving any logs in Scala)
This is my producer code:
class Producer(topic: String, brokers: String) {
val producer = new KafkaProducer[String, String](configuration)
private def configuration: Properties = {
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ProducerConfig.ACKS_CONFIG, "all")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props
}
def sendMessages(message: String): Unit = {
val record = new ProducerRecord[String, String](topic, "1", message)
producer.send(record)
producer.close()
}
}
This is where I'm using it:
object Message extends DefaultJsonProtocol with SprayJsonSupport {
val newConversation = new Producer(brokers = KAFKA_BROKER, topic = "topic_2")
def sendMessage(sender_id: String, receiver_id: String, content: String): String = {
val JsonMessage = Map("sender_id" -> sender_id, "receiver_id" -> receiver_id, "content" -> content)
val i = JsonMessage.toJson.prettyPrint
newConversation.sendMessages(i)
"Message Sent"
}
}
And this is the API:
f
inal case class Message(sender_id: String, receiver_id: String, content: String)
object producerRoute extends DefaultJsonProtocol with SprayJsonSupport {
implicit val MessageFormat = jsonFormat3(Message)
val sendMessageRoute:Route = (post & path("send")){
entity(as[Message]){
msg => {
complete(sendMessage(msg.sender_id,msg.receiver_id,msg.content))
}
}
}
}
On the other hand, this is my Consumer code:
class Consumer(brokers: String, topic: String, groupId: String) {
val consumer = new KafkaConsumer[String, String](configuration)
consumer.subscribe(util.Arrays.asList(topic))
private def configuration: Properties = {
val props = new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId)
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true)
props
}
def receiveMessages():Array[String] = {
val a:ArrayBuffer[String] = new ArrayBuffer[String]
while (true) {
val records = consumer.poll(Duration.ofSeconds(0))
records.forEach(record => a.addOne(record.value()))
}
println(a.toArray)
a.toArray
}
}
object Consumer extends App {
val consumer = new Consumer(brokers = KAFKA_BROKER, topic = "topic_2", groupId = "test")
consumer.receiveMessages()
}
I don't even get the result from the print in the consumer anymore. I don't understand what's the problem as it worked just fine before and I didn't change anything since the last time it worked.

How can i reduce lag in kafka consumer/producer

I am looking for improvement in scala kafka code. For reduce lag, what should i do in consumer & producer.
This is the code I got from someone.
I know this code is not a difficult code. But I have never seen scala code before, and I am just beginning to learn about kafka. So I have a hard time finding the problem.
import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import scala.util.Try
class KafkaMessenger(val servers: String, val sender: String) {
val props = new Properties()
props.put("bootstrap.servers", servers)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("producer.type", "async")
val producer = new KafkaProducer[String, String](props)
def send(topic: String, message: Any): Try[Unit] = Try {
producer.send(new ProducerRecord(topic, message.toString))
}
def close(): Unit = producer.close()
}
object KafkaMessenger {
def apply(host: String, topic: String, sender: String, message: String): Unit = {
val messenger = new KafkaMessenger(host, sender)
messenger.send(topic, message)
messenger.close()
}
}
and this is consumer code.
import java.util.Properties
import java.util.concurrent.Executors
import com.satreci.g2gs.common.impl.utils.KafkaMessageTypes._
import kafka.admin.AdminUtils
import kafka.consumer._
import kafka.utils.ZkUtils
import org.I0Itec.zkclient.{ZkClient, ZkConnection}
import org.slf4j.LoggerFactory
import scala.language.postfixOps
class KafkaListener(val zookeeper: String,
val groupId: String,
val topic: String,
val handleMessage: ByteArrayMessage => Unit,
val workJson: String = ""
) extends AutoCloseable {
private lazy val logger = LoggerFactory.getLogger(this.getClass)
val config: ConsumerConfig = createConsumerConfig(zookeeper, groupId)
val consumer: ConsumerConnector = Consumer.create(config)
val sessionTimeoutMs: Int = 10 * 1000
val connectionTimeoutMs: Int = 8 * 1000
val zkClient: ZkClient = ZkUtils.createZkClient(zookeeper, sessionTimeoutMs, connectionTimeoutMs)
val zkUtils = new ZkUtils(zkClient, new ZkConnection(zookeeper), false)
def createConsumerConfig(zookeeper: String, groupId: String): ConsumerConfig = {
val props = new Properties()
props.put("zookeeper.connect", zookeeper)
props.put("group.id", groupId)
props.put("auto.offset.reset", "smallest")
props.put("zookeeper.session.timeout.ms", "5000")
props.put("zookeeper.sync.time.ms", "200")
props.put("auto.commit.interval.ms", "1000")
props.put("partition.assignment.strategy", "roundrobin")
new ConsumerConfig(props)
}
def run(threadCount: Int = 1): Unit = {
val streams = consumer.createMessageStreamsByFilter(Whitelist(topic), threadCount)
if (!AdminUtils.topicExists(zkUtils, topic)) {
AdminUtils.createTopic(zkUtils, topic, 1, 1)
}
val executor = Executors.newFixedThreadPool(threadCount)
for (stream <- streams) {
executor.submit(new MessageConsumer(stream))
}
logger.debug(s"KafkaListener start with ${threadCount}thread (topic=$topic)")
}
override def close(): Unit = {
consumer.shutdown()
logger.debug(s"$topic Listener close")
}
class MessageConsumer(val stream: MessageStream) extends Runnable {
override def run(): Unit = {
val it = stream.iterator()
while (it.hasNext()) {
val message = it.next().message()
if (workJson == "") {
handleMessage(message)
}
else {
val strMessage = new String(message)
val newMessage = s"$strMessage/#/$workJson"
val outMessage = newMessage.toCharArray.map(c => c.toByte)
handleMessage(outMessage)
}
}
}
}
}
Specifically, I want to modify the structure that creates KafkaProduce objects whenever I send a message. There seems to be many other improvements to reduce lag.
Increase the number of consumer(KafkaListener) instances with same group id.
It will increase the consumption rate. Eventually your lag between producer write & consumer will get minimized.

Kafka and akka (scala): How to create Source[CommittableMessage[Array[Byte], String], Consumer.Control]?

I would like for unit test to create a source with committable message and with Consumer control.
Or to transform a source created like this :
val message: Source[Array[Byte], NotUsed] = Source.single("one message".getBytes)
to something like this
Source[CommittableMessage[Array[Byte], String], Consumer.Control]
Goal is to unit test actor behavior on message without having to install kafka on the build machine
You can use this helper to create a CommittableMessage:
package akka.kafka.internal
import akka.Done
import akka.kafka.ConsumerMessage.{CommittableMessage, CommittableOffsetBatch, GroupTopicPartition, PartitionOffset}
import akka.kafka.internal.ConsumerStage.Committer
import org.apache.kafka.clients.consumer.ConsumerRecord
import scala.collection.immutable
import scala.concurrent.Future
object AkkaKafkaHelper {
private val committer = new Committer {
def commit(offsets: immutable.Seq[PartitionOffset]): Future[Done] = Future.successful(Done)
def commit(batch: CommittableOffsetBatch): Future[Done] = Future.successful(Done)
}
def commitableMessage[K, V](key: K, value: V, topic: String = "topic", partition: Int = 0, offset: Int = 0, groupId: String = "group"): CommittableMessage[K, V] = {
val partitionOffset = PartitionOffset(GroupTopicPartition(groupId, topic, partition), offset)
val record = new ConsumerRecord(topic, partition, offset, key, value)
CommittableMessage(record, ConsumerStage.CommittableOffsetImpl(partitionOffset)(committer))
}
}
Use Consumer.committableSource to create a Source[CommittableMessage[K, V], Control]. The idea is that in your test you would produce one or more messages onto some topic, then use committableSource to consume from that same topic.
The following is an example that illustrates this approach: it's a slightly adjusted excerpt from the IntegrationSpec in the Akka Streams Kafka project. IntegrationSpec uses scalatest-embedded-kafka, which provides an in-memory Kafka instance for ScalaTest specs.
Source(1 to 100)
.map(n => new ProducerRecord(topic1, partition0, null: Array[Byte], n.toString))
.runWith(Producer.plainSink(producerSettings))
val consumerSettings = createConsumerSettings(group1)
val (control, probe1) = Consumer.committableSource(consumerSettings, TopicSubscription(Set(topic1)))
.filterNot(_.record.value == InitialMsg)
.mapAsync(10) { elem =>
elem.committableOffset.commitScaladsl().map { _ => Done }
}
.toMat(TestSink.probe)(Keep.both)
.run()
probe1
.request(25)
.expectNextN(25).toSet should be(Set(Done))
probe1.cancel()
Await.result(control.isShutdown, remainingOrDefault)

Kafka producer hangs on send

The logic is that a streaming job, getting data from a custom source has to write both to Kafka as well as HDFS.
I wrote a (very) basic Kafka producer to do this, however the whole streaming job hangs on the send method.
class KafkaProducer(val kafkaBootstrapServers: String, val kafkaTopic: String, val sslCertificatePath: String, val sslCertificatePassword: String) {
val kafkaProps: Properties = new Properties()
kafkaProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers)
kafkaProps.put("acks", "1")
kafkaProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
kafkaProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
kafkaProps.put("ssl.truststore.location", sslCertificatePath)
kafkaProps.put("ssl.truststore.password", sslCertificatePassword)
val kafkaProducer: KafkaProducer[Long, Array[String]] = new KafkaProducer(kafkaProps)
def sendKafkaMessage(message: Message): Unit = {
message.data.foreach(list => {
val producerRecord: ProducerRecord[Long, Array[String]] = new ProducerRecord[Long, Array[String]](kafkaTopic, message.timeStamp.getTime, list.toArray)
kafkaProducer.send(producerRecord)
})
}
}
And the code calling the producer:
receiverStream.foreachRDD(rdd => {
val messageRowRDD: RDD[Row] = rdd.mapPartitions(partition => {
val parser: Parser = new Parser
val kafkaProducer: KafkaProducer = new KafkaProducer(kafkaBootstrapServers, kafkaTopic, kafkaSslCertificatePath, kafkaSslCertificatePass)
val newPartition = partition.map(message => {
Logger.getLogger("importer").error("Writing Message to Kafka...")
kafkaProducer.sendKafkaMessage(message)
Logger.getLogger("importer").error("Finished writing Message to Kafka")
Message.data.map(singleMessage => parser.parseMessage(Message.timeStamp.getTime, singleMessage))
})
newPartition.flatten
})
val df = sqlContext.createDataFrame(messageRowRDD, Schema.messageSchema)
Logger.getLogger("importer").info("Entries-count: " + df.count())
val row = Try(df.first)
row match {
case Success(s) => Persister.writeDataframeToDisk(df, outputFolder)
case Failure(e) => Logger.getLogger("importer").warn("Resulting DataFrame is empty. Nothing can be written")
}
})
From the logs I can tell that each executor is reaching the "sending to kafka" point, however not any further. All executors hang on that and no exception is thrown.
The Message class is a very simple case class with 2 fields, a timestamp and an array of strings.
This was due to the acks field in Kafka.
Acks was set to 1 and sends went ahead a lot faster.

How to Test Kafka Consumer

I have a Kafka Consumer (built in Scala) which extracts latest records from Kafka. The consumer looks like this:
val consumerProperties = new Properties()
consumerProperties.put("bootstrap.servers", "localhost:9092")
consumerProperties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("group.id", "something")
consumerProperties.put("auto.offset.reset", "latest")
val consumer = new KafkaConsumer[String, String](consumerProperties)
consumer.subscribe(java.util.Collections.singletonList("topic"))
Now, I want to write an integration test for it. Is there any way or any best practice for Testing Kafka Consumers?
You need to start zookeeper and kafka programmatically for integration tests.
1.1 start zookeeper (ZooKeeperServer)
def startZooKeeper(zooKeeperPort: Int, zkLogsDir: Directory): ServerCnxnFactory = {
val tickTime = 2000
val zkServer = new ZooKeeperServer(zkLogsDir.toFile.jfile, zkLogsDir.toFile.jfile, tickTime)
val factory = ServerCnxnFactory.createFactory
factory.configure(new InetSocketAddress("0.0.0.0", zooKeeperPort), 1024)
factory.startup(zkServer)
factory
}
1.2 start kafka (KafkaServer)
case class StreamConfig(streamTcpPort: Int = 9092,
streamStateTcpPort :Int = 2181,
stream: String,
numOfPartition: Int = 1,
nodes: Map[String, String] = Map.empty)
def startKafkaBroker(config: StreamConfig,
kafkaLogDir: Directory): KafkaServer = {
val syncServiceAddress = s"localhost:${config.streamStateTcpPort}"
val properties: Properties = new Properties
properties.setProperty("zookeeper.connect", syncServiceAddress)
properties.setProperty("broker.id", "0")
properties.setProperty("host.name", "localhost")
properties.setProperty("advertised.host.name", "localhost")
properties.setProperty("port", config.streamTcpPort.toString)
properties.setProperty("auto.create.topics.enable", "true")
properties.setProperty("log.dir", kafkaLogDir.toAbsolute.path)
properties.setProperty("log.flush.interval.messages", 1.toString)
properties.setProperty("log.cleaner.dedupe.buffer.size", "1048577")
config.nodes.foreach {
case (key, value) => properties.setProperty(key, value)
}
val broker = new KafkaServer(new KafkaConfig(properties))
broker.startup()
println(s"KafkaStream Broker started at ${properties.get("host.name")}:${properties.get("port")} at ${kafkaLogDir.toFile}")
broker
}
emit some events to stream using KafkaProducer
Then consume with your consumer to test and verify its working
You can use scalatest-eventstream that has startBroker method which will start Zookeeper and Kafka for you.
Also has destroyBroker which will cleanup your kafka after tests.
eg.
class MyStreamConsumerSpecs extends FunSpec with BeforeAndAfterAll with Matchers {
implicit val config =
StreamConfig(streamTcpPort = 9092, streamStateTcpPort = 2181, stream = "test-topic", numOfPartition = 1)
val kafkaStream = new KafkaEmbeddedStream
override protected def beforeAll(): Unit = {
kafkaStream.startBroker
}
override protected def afterAll(): Unit = {
kafkaStream.destroyBroker
}
describe("Kafka Embedded stream") {
it("does consume some events") {
//uses application.properties
//emitter.broker.endpoint=localhost:9092
//emitter.event.key.serializer=org.apache.kafka.common.serialization.StringSerializer
//emitter.event.value.serializer=org.apache.kafka.common.serialization.StringSerializer
kafkaStream.appendEvent("test-topic", """{"MyEvent" : { "myKey" : "myValue"}}""")
val consumerProperties = new Properties()
consumerProperties.put("bootstrap.servers", "localhost:9092")
consumerProperties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
consumerProperties.put("group.id", "something")
consumerProperties.put("auto.offset.reset", "earliest")
val myConsumer = new KafkaConsumer[String, String](consumerProperties)
myConsumer.subscribe(java.util.Collections.singletonList("test-topic"))
val events = myConsumer.poll(2000)
events.count() shouldBe 1
events.iterator().next().value() shouldBe """{"MyEvent" : { "myKey" : "myValue"}}"""
println("=================" + events.count())
}
}
}