suppose my producer is writing the message to Topic A...once the message is in Topic A, i want to copy the same message to Topic B. Is this possible in kafka?
If I understand correctly, you just want stream.to("topic-b"), although, that seems strange without doing something to the data.
Note:
The specified topic should be manually created before it is used
I am not clear about what use case you are exactly trying to achieve by simply copying data from one topic to another topic. If both the topics are in the same Kafka cluster then it is never a good idea to have two topics with the same message/content.
I believe the gap here is that probably you are not clear about the concept of the Consumer group in Kafka. Probably you have two action items to do by consuming the message from the Kafka topic. And you are believing that if the first application consumes the message from the Kafka topic, will it be available for the second application to consume the same message or not. Kafka allows you to solve this kind of common use case with the help of the consumer group.
Let's try to differentiate between other message queue and Kafka and you will understand that you do not need to copy the same data/message between two topics.
In other message queues, like SQS(Simple Queue Service) where if the message is consumed by a consumer, the same message is not available to get consumed by other consumers. It is the responsibility of the consumer to delete the message safely once it has processed the message. By doing this we guarantee that the same message should not get processed by two consumers leading to inconsistency.
But, In Kafka, it is totally fine to have multiple sets of consumers consuming from the same topic. The set of consumers form a group commonly termed as the consumer group. Here one of the consumers from the consumer group can process the message based on the partition of the Kafka topic the message is getting consumed from.
Now the catch here is that we can have multiple consumer groups consuming from the same Kafka topic. Each consumer group will process the message in the way they want to do. There is no interference between consumers of two different consumer groups.
To fulfill your use case I believe you might need two consumer groups that can simply process the message in the way they want. You do not essentially have to copy the data between two topics.
Hope this helps.
There are two immediate options to forward the contents of one topic to another:
by using the stream feature of Kafka to create a forwarding link
between the two topics.
by creating a consumer / producer pair
and using those to receive and then forward on messages
I have a short piece of code that shows both (in Scala):
def topologyPlan(): StreamsBuilder = {
val builder = new StreamsBuilder
val inputTopic: KStream[String, String] = builder.stream[String, String]("topic2")
inputTopic.to("topic3")
builder
}
def run() = {
val kafkaStreams = createStreams(topologyPlan())
kafkaStreams.start()
val kafkaConsumer = createConsumer()
val kafkaProducer = createProducer()
kafkaConsumer.subscribe(List("topic1").asJava)
while (true) {
val record = kafkaConsumer.poll(Duration.ofSeconds(5)).asScala
for (data <- record.iterator) {
kafkaProducer.send(new ProducerRecord[String, String]("topic2", data.value()))
}
}
}
Looking at the run method, the first two lines set up a streams object to that uses the topologyPlan() to listen for messages in 'topic2' and forward then to 'topic3'.
The remaining lines show how a consumer can listen to a 'topic1' and use a producer to send them onward to 'topic2'.
The final point of the example here is Kafka is flexible enough to let you mix options depending on what you need, so the code above will take messages in 'topic1', and send them to 'topic3' via 'topic2'.
If you want to see the code that sets up consumer, producer and streams, see the full class here.
Related
How does the pubsub work in Kafka?
I was reading about Kafka Topic-Partition theory, and it mentioned that In one consumer group, each partition will be processed by one consumer only. Now there are 2 cases:-
If the producer didn't mention the partition key or message key, the message will be evenly distributed across the partitions of a specific topic. ---- If this is the case, and there can be only one consumer(or subscriber in case of PubSub) per partition, how does all the subscribers receive the similar message?
If I producer produced to a specific partition, then how does the other consumers (or subscribers) receive the message?
How does the PubSub works in each of the above cases? if only a single consumer can get attached to a specific partition, how do other consumers receive the same msg?
Kafka prevents more than one consumer in a group from reading a single partition. If you have a use-case where multiple consumers in a consumer group need to process a particular event, then Kafka is probably the wrong tool. Otherwise, you need to write code external to Kafka API to transmit one consumer's events to other services via other protocols. Kafka Streams Interactive Query feature (with an RPC layer) is one example of this.
Or you would need lots of unique consumers groups to read the same event.
Answer doesn't change when producers send data to a specific partitions since "evenly distributed" partitions are still pre-computed, as far as the consumer is concerned. The consumer API is assigned to specific partitions, and does not coordinate the assignment with any producer.
I have a use case where I want to consume from a kafka topic and depending on some logic if I am not able to process the message right now, I want to enqueue the message back to the same topic from where it had been read
Something like this
Topic1 ---> Consumer ---> Can't process now
^
|Re-enqueues________|
Is it possible ?
Yes, this is possible.
However, be aware that depending on your retention settings the re-ingested message might exist in the topic multiple times. Also, the consumer will consume all messages as long as it is running which could lead to the case that it has consumed all valid messages but keeps on re-ingesting the other messages over and over again.
The typical pattern to deal with messages that should be re-ingested into your pipeline is to send them to a dedicated Kafka topic. Once your consumer is fixed to be able to process those messages you can then have your consumer read that dedicated topic just once.
As Kafka has a topic based pub-sub architecture how can I handle One-to-One and Group Messaging part of web application using Kafka?
I am using SpringBoot+Angular stack and Docker Kafka server.
I'll write another answer here.
Based on my experience with the chatting service. You only need one topic for all the messages. Using a well designed Message body.
public class Message {
private String from; // user id
private String to; // user id or group id
}
Then you can create like 100 partitions for this topic and create two consumers to consume them (50 partitions for one consumer in the beginning).
Then if your system reaches the bottleneck, you can easier scale X more consumers to handle the load.
How to do distribute the messages in the consumer. I used to send the messages to the Mobile app, so all the app has a long-existing connection to the server, and the server sends the messages to the app by that channel. For group chat, I create a Redis cache to store all the active users in the group, so I can easier get the users who belong to this group, send them the messages.
And another thing, Kafka stateless, means Kafka doesn't de-coupled from the business logic, only acts as a message system, transfers the messages. If you connect your business logic to Kafka, like create a topic "One-to-One" and delete some after they finished, Kafka will be very messy.
By One-to-One, I suppose you mean one producer and one consumer i.e. using at as a queue.
This is certainly possible with Kafka. You can have one consumer subscribe to a topic and and restrict others by not giving them authorization . See Authorization in Kafka
Note that once a message is consumed, it is not deleted, rather it is committed so that the same consumer will not consume it again.
By Group Messaging, I suppose you mean one producer > multiple consumers or
multiple-producer > multiple-consumers
This is also possible, a producer can produce messages to a topic and multiple consumers can consume them.
If all the consumers have the same group id, then each consumer in the group gets only a subset of messages.
If they have different group ids then each consumer will get all messages.
Multiple producers also can produce to the same topic.
A consumer can also subscribe to multiple topics.
Ok, It's a very complicated question, I try to type some simple basic information.
Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
So if you are using partitions, means you have multiple consumers to consume some in parallel.
consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
Basically, you can have only one group, then the message will not be processed twice in the same consumer group, and this is how Kafka delivers exactly once.
If you need two consumer groups, you need to think about why you need two? Are the consumers in two groups handling the different logic?
There is more, please check the official document, or you can answer a smaller question.
We are using Kafka to store messages that are produced by a node in our cluster and to be distributed to all nodes in the cluster and I have it mostly working with akka-streams but there is a couple of questions I have to tie this up. There are some constraints to this.
First of all the message has to be consumed by every node in the cluster but produced by only one node. I understand I can assign each node a group id that is probably its node ID which means each node will get the message. That sorted. But here are the questions.
The data is extremely transient and fairly large (just under a meg) and cannot be compressed further or broken up. If there is a new message on the topic the old one is pretty much trash. How can I limit the topic to basically just one message currently maximum?
Given that the data is necessary for the node to start, I need to consume the latest message on the topic no matter whether I have consumed it before and, hopefully without creating a unique group id every time I start the server. Is this possible and if so, how can it be done.
Finally, the data is usually on the topic but on occasion it is not there and I, ideally, need to be able to check if there is a message there and if not ask the producer to create the message. Is this possible?
This is the code I am currently using to start the consumer:
private Control startMatrixConsumer() {
final ConsumerSettings<Long, byte[]> matrixConsumerSettings = ConsumerSettings
.create(services.actorSystem(), new LongDeserializer(), new ByteArrayDeserializer())
.withBootstrapServers(services.config().getString("kafka.bootstrapServers"))
.withGroupId("group1") // todo put in the conf ??
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
final String topicName = Matrix.class.getSimpleName() + '-' + eventId;
final AutoSubscription subscription = Subscriptions.topics(topicName);
return Consumer.plainSource(MatrixConsumerSettings, subscription)
.named(Matrix.class.getSimpleName() + "-Kafka-Consumer-" + eventId)
.map(data -> {
final Matrix matrix = services.kryoDeserialize(data.value(), Matrix.class);
log.debug(format("Received %s for event %d from Kafka", Matrix.class.getSimpleName(), matrix.getEventId()));
return matrix;
})
.filter(Objects::nonNull)
.to(Sink.actorRef(getSelf(), NotUsed.getInstance()))
.run(ActorMaterializer.create(getContext()));
}
Thanks a bunch.
All the message has to be consumed by every node in the cluster but
produced by only one.
You are correct, you can achieve this by having an unique group id per node.
How can I limit the topic to basically just one message currently
maximum?
Kafka provides compacted topics.
Compacted topic maintains only the most recent message of a given key. For instance, Kafka consumers store their offsets in compacted topic.
In your case, produce every message with the same key, and Kafka Log Cleaner will delete old messages. Please be aware that compaction is performed periodically, so you can end up with two (or more) messages with the same key for a short period of time (depends on your Log Cleaner configuration.
I need to consume the latest message on the topic no matter whether I
have consumed it before.
You can achieve this by not committing the consumer offset (enable.auto.commit set to false) and setting auto.offset.reset to earliest. By having one message in your compacted topic and consumer that starts from the beginning of the topic, that message is always consumed after node starts.
I need to be able to check if there is a message there and if not ask
the producer to create the message.
Unfortunately, I am not aware of any Kafka functionality that could help you with that. Most of the time Kafka is used to decouple producers and consumers.
If one kafka consumer application is reading message from kafka, another was not able to read and vice-versa.
We are running two independent applications, one will process message and another will read and put into a database.
Message which is been processing in first application is not available in the second application
Without seeing the code I can only guess ... :-)
I think that you have a topic with only one partition and both consumer applications are in the same consumer group. In this case only one consumer gets messages from the only one partition in the topic.
If you want both applications receiving messages from same topic, you need to put them into different consumer groups (group.id parameter for the consumer properties).