I'm using Scala and Kafka to create topic based pub-sub architecture.
My question is how can I handle One-to-One Messaging part of my application using Kafka topics?
This is my producer class:
class Producer(topic: String, key: String, brokers: String, message: String) {
val producer = new KafkaProducer[String, String](configuration)
private def configuration: Properties = {
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ProducerConfig.ACKS_CONFIG, "all")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props
}
def sendMessages(): Unit = {
val record = new ProducerRecord[String, String](topic, key, message)
producer.send(record)
producer.close()
}
}
And this is my consumer class:
class Consumer(brokers: String, topic: String, groupId: String) {
val consumer = new KafkaConsumer[String, String](configuration)
consumer.subscribe(util.Arrays.asList(topic))
private def configuration: Properties = {
val props = new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId)
//props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true)
props
}
def receiveMessages(): Unit = {
while (true) {
consumer.poll(Duration.ofSeconds(0)).forEach(record => println(s"Received message: $record"))
}
}
}
I also have an auth service that takes cares care of everything related to authenticating via JWT tokens.
I am confused on how to create messages to specific users, I thought about creating a "Messages" class but I got lost when it comes to how to send these "specific" users messages and how to partition these messages on kafka for later usage:
class Message {
def sendMessage(sender_id: String, receiver_id: String, content: String): Unit ={
val newMessage = new Producer(brokers = KAFKA_BROKER,key =sender_id + " to " + receiver_id, topic = "topic_1", message = content)
newMessage.sendMessages()
}
def loadMessage(): Unit ={
//
}
}
My thought was to specify a custom key for all messages belonging to the same conversation but I couldn't find the right way to retrieve these messages later on as my consumer returns everything contained in that topic no matter what the key is. Meaning, all the users will eventually get all the messages. I know my modeling seems messy but I couldn't find the right way to do it, I'm also kinda confused when it comes to the usage of the group_id in the consumer.
Could someone make me what's the right way to achieve what I'm trying to do here please ?
couldn't find the right way to retrieve these messages later on ... consumer returns everything contained in that topic no matter what the key is
You would need to .assign the Consumer instance to a specific partition, not use .subscribe, which reads all partitions. Or you'd use specific topics for each conversation.
But then you need unique partitions/topics for every conversation that will exist. In a regular chat application where users create/remove rooms randomly, that will not scale for Kafka.
Ultimately, I'd suggest writing your data to somewhere else than Kafka that you can actually query and index on a "convertsationId" and/or user ids rather than try to forward those events directly from Kafka into your "chat" application.
Related
I have written a spark kafka producer, which pulls messages from hive and pushes into kafka, most of the records(messages) are getting duplicated when we ingest into kafka, though i do not have any duplicates before pushing into kafka. I have added configurations related to exactly-once semantics making kafka producer idempotent
below is the code-snippet i am using for kafka producer
import java.util.{Properties, UUID}
import org.apache.kafka.clients.CommonClientConfigs
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord}
import org.apache.kafka.common.KafkaException
import org.apache.kafka.common.errors.{AuthorizationException, OutOfOrderSequenceException, ProducerFencedException}
import org.apache.log4j.LogManager
import org.apache.spark.sql.DataFrame
object KafkaWriter {
lazy val log = LogManager.getLogger("KafkaWriter")
private def writeKafkaWithoutRepartition(df: DataFrame, topic: String, noOfPartitions: Int,
kafkaBootstrapServer: String): Unit = {
log.info("Inside Method KafkaWriter::writeKafkaWithoutRepartition no of partitions =" + noOfPartitions)
df.foreachPartition(
iter => {
val properties = getKafkaProducerPropertiesForBulkLoad(kafkaBootstrapServer)
properties.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString + "-" + System.currentTimeMillis().toString)
log.info("Inside Method writeKafkaWithoutRepartition:: inside writekafka :: kafka properties ::" + properties)
val kafkaProducer = new KafkaProducer[String, String](properties)
try {
log.info("kafka producer property enable.idempotence ::" + properties.getProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG))
kafkaProducer.initTransactions
kafkaProducer.beginTransaction
log.info("Inside Method writeKafkaWithoutRepartition:: inside each partition kafka transactions started")
iter.foreach(row => {
log.info("Inside Method writeKafkaWithoutRepartition:: inside each iterator record kafka transactions started")
kafkaProducer.send(new ProducerRecord(topic, row.getAs[String]("value")))
})
kafkaProducer.commitTransaction
log.info("Inside Method writeKafkaWithoutRepartition:: kafka transactions completed")
} catch {
case e#(_: ProducerFencedException ) =>
// We can't recover from these exceptions, so our only option is to close the producer and exit.
log.error("Exception occured while sending records to kafka ::" + e.getMessage)
kafkaProducer.close
case e: KafkaException =>
// For all other exceptions, just abort the transaction and try again.
log.error("Exception occured while sending records to kafka ::" + e.getMessage)
kafkaProducer.abortTransaction
case ex: Exception =>
// For all other exceptions, just abort the transaction and try again.
log.error("Exception occured while sending records to kafka ::" + ex.getMessage)
kafkaProducer.abortTransaction
} finally {
kafkaProducer.close
}
})
}
def writeWithoutRepartition(df: DataFrame, topic: String, noOfPartitions: Int, kafkaBootstrapServer: String): Unit = {
var repartitionedDF = df.selectExpr("to_json(struct(*)) AS value")
log.info("Inside KafkaWriter::writeWithoutRepartition ")
writeKafkaWithoutRepartition(repartitionedDF, topic, noOfPartitions, kafkaBootstrapServer)
}
def getKafkaProducerPropertiesForBulkLoad(kafkaBootstrapServer: String): Properties = {
val properties = new Properties
properties.setProperty(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServer)
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true")
// properties.setProperty(CommonClientConfigs.REQUEST_TIMEOUT_MS_CONFIG, "400000")
// properties.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "300000")
properties.put(ProducerConfig.RETRIES_CONFIG, "1000")
properties.put(ProducerConfig.ACKS_CONFIG, "all")
properties.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1")
properties
}
}
set the isolation.level-->Committed on the kafka consumer end.
tried setting min.insync.replicas-->2(In my opinion this property might not play an important role , still tried)
Spark version : 2.3.1
kafka client version : 2.2.1
And I am also using transactions while producing messages into kafka, init begin and commit transactions for each message.I am ingesting around 100 Million records one time, I did split the data into smaller chunks, say 100 Million divided into 1 Million at once before ingesting into kafka
Tried using structured streaming, still no luck
df.selectExpr(s""" '${key}' as key """, "to_json(struct(*)) AS value").write.format("kafka").options(getKafkaProducerProperties(topic)).save
I am not sure if i am missing any configurations on the kafka producer , broker or at the consumer end.
Not sure if i can include any
Thanks in Advance
I am trying to create a Play-Scala application that uses a Scala Kafka Consumer to listen to a Kafka broker. I am using the Cake Solutions Scala Kafka Client library, and following their example here.
I have created a containing class to act as a Kafka consumer provider, and I have bound this as an eager singleton so that it is created when the application starts up.
The problem is that the consumer will listen to the broker when the application starts up, but not after that.
Here is my code for the ConsumerProvider:
trait KafkaConsumerProvider {
def consumer: ActorRef
}
#Singleton
class KafkaConsumerProviderImpl #Inject() (actorSystem: ActorSystem, configuration: Configuration)
extends KafkaConsumerProvider {
private val consumerConf: KafkaConsumer.Conf[String, String] = KafkaConsumer.Conf(
keyDeserializer = new StringDeserializer,
valueDeserializer = new StringDeserializer,
bootstrapServers = configuration.get[String]("messageBroker.bootstrapServers"),
groupId = configuration.get[String]("messageBroker.consumer.groupId"),
enableAutoCommit = false,
autoCommitInterval= 1000,
sessionTimeoutMs = 10000,
maxPartitionFetchBytes = ConsumerConfig.DEFAULT_MAX_PARTITION_FETCH_BYTES,
maxPollRecords = 500,
maxPollInterval = 300000,
maxMetaDataAge = 300000,
autoOffsetReset = OffsetResetStrategy.LATEST,
isolationLevel = IsolationLevel.READ_UNCOMMITTED,
)
private val actorConf: KafkaConsumerActor.Conf = KafkaConsumerActor.Conf(
scheduleInterval = 1.seconds, // scheduling interval for Kafka polling when consumer is inactive
unconfirmedTimeout = 3.seconds, // duration for how long to wait for a confirmation before redelivery
maxRedeliveries = 3 // maximum number of times a unconfirmed message will be redelivered
)
override val consumer: ActorRef = {
val receiverActor = actorSystem.actorOf(ReceiverActor.props)
val topics = configuration.get[String]("messageBroker.consumer.topics").split(",").toSeq
val _consumer = actorSystem.actorOf(KafkaConsumerActor.props(consumerConf, actorConf, receiverActor))
_consumer ! Subscribe.AutoPartition(topics)
_consumer
}
}
and here is how I am binding the dependency as an eager singleton in Module.scala:
class Module extends AbstractModule with ScalaModule {
override def configure(): Unit = {
bind[KafkaMessageBrokerWriter].to[KafkaMessageBrokerWriterImpl].asEagerSingleton()
bind[KafkaConsumerProvider].to[KafkaConsumerProviderImpl].asEagerSingleton()
}
}
How do I get the consumer to keep listening?
The problem was that, in the ReceiverActor, I forgot to confirm the offsets:
sender() ! Confirm(records.offsets)
My goal is to use kafka to read in a string in json format, do a filter to the string and then sink the message out (still in json string format).
For testing purpose, my input string message looks like:
{"a":1,"b":2}
And my code of implementation is:
def main(args: Array[String]): Unit = {
// parse input arguments
val params = ParameterTool.fromArgs(args)
if (params.getNumberOfParameters < 4) {
println("Missing parameters!\n"
+ "Usage: Kafka --input-topic <topic> --output-topic <topic> "
+ "--bootstrap.servers <kafka brokers> "
+ "--zookeeper.connect <zk quorum> --group.id <some id> [--prefix <prefix>]")
return
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableSysoutLogging
env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000))
// create a checkpoint every 5 seconds
env.enableCheckpointing(5000)
// make parameters available in the web interface
env.getConfig.setGlobalJobParameters(params)
// create a Kafka streaming source consumer for Kafka 0.10.x
val kafkaConsumer = new FlinkKafkaConsumer010(
params.getRequired("input-topic"),
new JSONKeyValueDeserializationSchema(false),
params.getProperties)
val messageStream = env.addSource(kafkaConsumer)
val filteredStream: DataStream[ObjectNode] = messageStream.filter(node => node.get("a").asText.equals("1")
&& node.get("b").asText.equals("2"))
messageStream.print()
// Refer to: https://stackoverflow.com/documentation/apache-flink/9004/how-to-define-a-custom-deserialization-schema#t=201708080802319255857
filteredStream.addSink(new FlinkKafkaProducer010[ObjectNode](
params.getRequired("output-topic"),
new SerializationSchema[ObjectNode] {
override def serialize(element: ObjectNode): Array[Byte] = element.toString.getBytes()
}, params.getProperties
))
env.execute("Kafka 0.10 Example")
}
As can be seen, I want to print message stream to the console and sink the filtered message to kafka. However, I can see neither of them.
The interesting thing is, if I modify the schema of KafkaConsumer from JSONKeyValueDeserializationSchema to SimpleStringSchema, I can see messageStream print to the console. Code as shown below:
val kafkaConsumer = new FlinkKafkaConsumer010(
params.getRequired("input-topic"),
new SimpleStringSchema,
params.getProperties)
val messageStream = env.addSource(kafkaConsumer)
messageStream.print()
This makes me think if I use JSONKeyValueDeserializationSchema, my input message is actually not accepted by Kafka. But this seems so weird and quite different from the online document(https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html)
Hope someone can help me out!
The JSONKeyValueDeserializationSchema() expects message key with each kafka msg and I am assuming that no key is supplied when the JSON messages are produced and sent over the kafka topic.
Thus to solve the issue, try using JSONDeserializationSchema() which expects only the message and creates an object node based on the message received.
I am trying to implement a setup where I have multiple web browsers open a websocket connection to my akka-http server in order to read all messages posted to a kafka topic.
so the stream of messages should go this way
kafka topic -> akka-http -> websocket connection 1
-> websocket connection 2
-> websocket connection 3
For now I have created a path for the websocket:
val route: Route =
path("ws") {
handleWebSocketMessages(notificationWs)
}
Then I have created a consumer for my kafka topic:
val consumerSettings = ConsumerSettings(system,
new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val source = Consumer
.plainSource(consumerSettings, Subscriptions.topics("topic1"))
And then finally I want to connect this source to the websocket in handleWebSocketMessages
def handleWebSocketMessages: Flow[Message, Message, Any] =
Flow[Message].mapConcat {
case tm: TextMessage =>
TextMessage(source)::Nil
case bm: BinaryMessage =>
// ignore binary messages but drain content to avoid the stream being clogged
bm.dataStream.runWith(Sink.ignore)
Nil
}
Here is the error I get when I try to use source in the TextMessage:
Error:(77, 9) overloaded method value apply with alternatives:
(textStream: akka.stream.scaladsl.Source[String,Any])akka.http.scaladsl.model.ws.TextMessage
(text: String)akka.http.scaladsl.model.ws.TextMessage.Strict
cannot be applied to (akka.stream.scaladsl.Source[org.apache.kafka.clients.consumer.ConsumerRecord[Array[Byte],String],akka.kafka.scaladsl.Consumer.Control])
TextMessage(source)::Nil
I think I'm making numerous mistakes along the way but I would say that the most blocking part is the handleWebSocketMessages.
The first thing, is to understand that source is of type : Source[ConsumerRecord[K, V], Control].
So, it's not something that you could pass as an argument of a TextMessage.
Now, let's take the websocket's point of view:
An outgoing message is built for each message in the Kafka source. The message will be a TextMessage from a String transformation of the Kafka message.
For each incoming message, just println() it
So, the Flow can be seen as two components: the Source & the Sink.
val incomingMessages: Sink[Message, NotUsed] =
Sink.foreach(println(_))
val outgoingMessages: Source[Message, NotUsed] =
source
.map { consumerRecord => TextMessage(consumerRecord.record.value) }
val handleWebSocketMessages: Flow[Message, Message, Any]
= Flow.fromSinkAndSource(incomingMessages, outgoingMessages)
Hope it helps.
I have an application that should send a finite number of messages to Kafka and then quit. For some reason, the Kafka connection stays up even if I close the producer. My implementation (in Scala) is more or less
object Kafka {
private val props = new Properties()
props.put("compression.codec", DefaultCompressionCodec.codec.toString)
props.put("producer.type", "sync")
props.put("metadata.broker.list", "localhost:9092")
props.put("batch.num.messages", "200")
props.put("message.send.max.retries", "3")
props.put("request.required.acks", "-1")
props.put("client.id", "myclient")
private val producer = new Producer[Array[Byte], Array[Byte]](new ProducerConfig(props))
private def encode(msg: Message) = new KeyedMessage("topic", msg.id.getBytes, write(msg).getBytes)
def send(msg: Message) = Try(producer.send(encode(msg)))
def close() = producer.close()
}
Here Message is a simple case class, and how I convert it to byte array is not really relevant.
The messages do arrive, but when I eventually call Kafka.close(), the application does not exit, and the connection does not seem to be released.
Is there a way to explicitly ask Kafka to terminate the connection?
def close() = producer.close()
This creates a function called "close" that calls producer.close()
I see no evidence that your code actually closes the producer.
You need to just call:
producer.close