During creating kafka producer, we can assign a client id. What is it used for? Can I get the producer client id in a consumer? For example, to see which producer produced the message?
No, a consumer cannot get the producer's client-id.
From the Kaka documentation, client-ids are:
An id string to pass to the server when making requests. The purpose
of this is to be able to track the source of requests beyond just
ip/port by allowing a logical application name to be included in
server-side request logging.
They are only used for identifying clients in the broker logs.
No, you'd have to pass it on as part of the key or value if you need it at the consumer side.
Kafka's philosophy is to decouple producers and consumers. A topic can be read by 0-n consumers and be written to by 0-n producers. Kafka is usually used for communication between (micro)service boundaries where services don't care about who produced a message, just about its contents.
Related
How does the pubsub work in Kafka?
I was reading about Kafka Topic-Partition theory, and it mentioned that In one consumer group, each partition will be processed by one consumer only. Now there are 2 cases:-
If the producer didn't mention the partition key or message key, the message will be evenly distributed across the partitions of a specific topic. ---- If this is the case, and there can be only one consumer(or subscriber in case of PubSub) per partition, how does all the subscribers receive the similar message?
If I producer produced to a specific partition, then how does the other consumers (or subscribers) receive the message?
How does the PubSub works in each of the above cases? if only a single consumer can get attached to a specific partition, how do other consumers receive the same msg?
Kafka prevents more than one consumer in a group from reading a single partition. If you have a use-case where multiple consumers in a consumer group need to process a particular event, then Kafka is probably the wrong tool. Otherwise, you need to write code external to Kafka API to transmit one consumer's events to other services via other protocols. Kafka Streams Interactive Query feature (with an RPC layer) is one example of this.
Or you would need lots of unique consumers groups to read the same event.
Answer doesn't change when producers send data to a specific partitions since "evenly distributed" partitions are still pre-computed, as far as the consumer is concerned. The consumer API is assigned to specific partitions, and does not coordinate the assignment with any producer.
A kafka message has:
key, value, compression type, headers(key-value pairs,optional),
partition+offset, timestamp
Key is hashed to partition to find which partition producer would write to.
Then why do we need partition as part of message.
Also, how does producer know the offset as offset seems more like a property of kafka server? And doesn't it cause coupling between server and producer?
And how would it work if multiple producers are writing to a topic, as offset send by them may clash?
why do we need partition as part of message.
It's optional for the client to set the record partition. The partition is still needed in the protocol because the key is not hashed server-side, then rerouted.
how does producer know the offset as offset seems more like a property of kafka server?
The producer would need a callback to get the OffsetMetadata, but it's not known when a batch is sent
And doesn't it cause coupling between server and producer?
Yes? It's the Kafka protocol. The consumer is also "coupled" with the server because it must understand how to communicate with it.
multiple producers are writing to a topic, as offset send by them may clash?
If max.inflight.connections is more than 1 and retires are enabled, then yes, batches may get rearranged, but send requests are initially ordered, and clients do not set the record offset, the broker does.
As part of design decision at my client site, the components(microservice) involved in http request-response flow are allowed to produce messsages on a kafka topic, but not allowed to consume messages from kafka topic.
Such components(microservice) can read & write database, can talk to other components, can produce messages on a topic, but cannot consume messages from a kafka topic.
Instead ,the design suggest to write separate utilities that consume messages from kafka topics and store in database. Components(microservice) involved in request-response flow, will read that information from database.
What are the design flaws, if such components(microservice) consume kafka topics? Why the design is suggesting to write separate utilities to consume kafka topic and store in database, so that components can read those information from database.
Kafka Topics are divided into partitions, and for each consumer group, the partitions are distributed among the various consumers in that group. Each consumer is responsible for consuming the messages in the partitions is gets assigned.
Presumably, your request handling machines are clustered and load balanced. There are two ways you might have those machines subscribe to Kafka topics, and both of those ways are broken:
You could put your request handling machines in different consumer groups. In this case, each one will have to consume all of the messages. That is probably not what you want, which is typically to have each consumer pull from the queue and have each message processed only once. Also, the consumers will be out of sync and will process the messages ad different rates.
You could put your request handling machines in the same consumer groups. In this case, each one will only have access to the partitions that it is assigned. Every machine will see different message streams. This, of course, is also not what you want. Clients would get different results depending on which machine the load balancer directed them to.
If you want all of your request handling machines to pull from the same queue of messages across the whole topic, then they need to communicate with a single consumer that is assigned all the partitions.
Does an idempotent producer have to be transactional in order to ensure idempotency when publishing to a multi-partitioned topic? After reading Kafka documentation I am still unsure if it does or not.
My environment is Kafka 1.0 cluster and Kafka 1.1 client.
Idempotent producer create an id which is send with the messages. With this id, the lead broker is able to say 'Oh, I already treated this message'.
Idempotent producer and transactional messaging are two different approaches of making a exactly-once messaging semantics.
So, no !
I have one kafka producer and consumer.The kafka producer is publishing to one topic and the data is taken and some processing is done. The kafka consumer is reading from another topic about whether the processing of data from topic 1 was successful or not ie topic 2 has success or failure messages.Now Iam starting my consumer and then publishing the data to topic 1 .I want to make the producer and consumer synchronous ie once the producer publishes the data the consumer should read the success or failure message for that data and then the producer should proceed with the next set of data .
Apache Kafka and Publish/Subscribe messaging in general seeks to de-couple producers and consumers through the use of streaming async events. What you are describing is more like a batch job or a synchronous Remote Procedure Call (RPC) where the Producer and Consumer are explicitly coupled together. The standard Apache Kafka Producers/Consumer APIs do not support this Message Exchange Pattern but you can always write your own simple wrapper on top of the Kafka API's that uses Correlation IDs, Consumption ACKs, and Request/Response messages to make your own interface that behaves as you wish.
Short Answer : You can't do that, Kafka doesn't provide that support.
Long Answer: As Hans explained, Publish/Subscribe messaging model keeps Publish and subscribe completely unaware of each other and I believe that is where the power of this model lies. Producer can produce without worrying about if there is any consumer and consumer can consume without worrying about how many producers are there.
The closest you can do is, you can make your producer synchronous. Which means you can wait till your message is received and acknowledged by broker.
if you want to do that, flush after every send.