kafka consumer Connection check - apache-kafka

I want to check the connection of the Kafka consumer at a remote location.
It is possible to determine whether the consumer is allocated to the partition.
At a remote location, i can get detailed information about the topic from the Kafka broker.
But can the consumer guarantee that the consumer is able to receive the message that the consumer is matched to the topic's partition?

You can use Kafka Rest Proxy to determine whether a partition has been assigned to the consumer or not.
I think, Kafka Rest Proxy have all functionality which you are looking for.

Related

How client will automatically detect a new leader when the primary one goes down in Kafka?

Consider the below scenario:
I have a Kakfa broker cluster(localhost:9002,localhost:9003,localhost:9004,localhost:9005).
Let's say localhost:9002 is my primary(leader) for the cluster.
Now my producer is producing data and sending it to the broker(localhost:9002).
If my primary broker(localhost:9002) goes down, with the help of Zookeeper or some other consensus algorithm new leader will be elected(consider localhost:9003 is now the new leader).
So, in the above scenario can someone please explain to me how the Kafka client(producer) will get notified about the new broker configuration(localhost:9003) and how it will connect to the new leaders and start producing data again.
Kafka clients are receiving the necessary meta information from the cluster automatically on each request when reading from or writing to a topic in case of a leadership change.
In general, the client sends a (read/write) request to one of the bootstrap server, listed in the configuration bootstrap.servers. This initial request (hence called bootstrap) returns the details on which broker the topic partition leader is located so that the client can communicate directly with that broker. Each individual broker contains all meta information for the entire cluster, meaning also having the knowledge on the partition leader of other brokers.
Now, if one of your broker goes down and the leadership of a topic partition switches, your producer will get notified about it through that mechanism.
There is a KafkaProducer configuration called metadata.max.age.ms which you can modify to update metadata on your producer even if there is no leadership change happening:
"Controls how long the producer will cache metadata for a topic that's idle. If the elapsed time since a topic was last produced to exceeds the metadata idle duration, then the topic's metadata is forgotten and the next access to it will force a metadata fetch request."
Just a few notes on your question:
The term "Kafka broker cluster" does not really exists. You have a Kafka cluster containing one or multiple Kafka brokers.
You do not have a broker as a "primary(leader) for the cluster" but you have for each TopicPartition a leader. Maybe you mean the Controller which is located on one of the brokers within your cluster.

Apache Kafka: how to configure message buffering properly

I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.

Does kafka broker always check if its the leader while responding to read/write request

I am seeing org.apache.kafka.common.errors.NotLeaderForPartitionException on my producer which I understand happens when producer tries to produce messages to a broker which is not a leader for the partition.
Does that mean each time a leader fulfills a write request it first checks if its the leader or not?
If yes does that translates to a zookeeper request for every write request to know if the node is the leader?
How Producer Get MetaData About Brokers
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.
When Producer Will Refresh MetaData
I think this depends what kafka client you used.There are some small differents between ruby, java or other kafka client.for example, in java:
producer will fetch metadata when client initialize,then period update it depends on expiration time.
producer also will force update metadata when request error occured,such as InvalidMetadataException.
But in ruby-kafka client, it usually refresh metadata when error occured or initialize.

Is it possible to kill a consumer from the Kafka server?

When I check for consumer lag, it shows that a particular consumer-id is running from a particular host consuming from a topic.
But when I go to that host there is no such consumer running.
How do I kill this consumer-id, so that I can reset consumer offset for the group that its part of.
Kafka server version: 0.11.0.1
Kafka client version(stuck): 0.10.0.2
This consumer-id got stuck in the first place as it was not able to consume messages because of some messages having headers in Kafka.
I've already tried the following:
Consuming from a different host and different Kafka version, it consumes messages but the consumer-id, host does not change.
Restarting kafka broker which is the leader for that topic.
Changing security groups to prevent the host from connecting to my broker.
Perhaps what you see is not a consumer id, but a consumer group, see Kafka docs, consumer config to learn about the difference.
Kafka uses consumer groups to keep track of the last consumed message (consumer offset), so when talking about the consumer lag this is probably the explanation.
This means there is no consumer running and you only need to get rid of the consumer offset for this group. See e.g. How do I delete a Kafka Consumer Group to reset offsets?

how producers find kafka reader

The producers send messages by setting up a list of Kafka Broker as follows.
props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9092,127.0.0.1:9092");
I wonder "producers" how to know that which of the three brokers knew which one had a partition leader.
For a typical distributed server, either you have a load bearing server or have a virtual IP, but for Kafka, how is it loaded?
Does the producers program try to connect to one broker at random and look for a broker with a partition leader?
A Kafka cluster contains multiple broker instances. At any given time, exactly one broker is the leader while the remaining are the in-sync-replicas (ISR) which contain the replicated data. When the leader broker is taken down unexpectedly, one of the ISR becomes the leader.
Kafka chooses one broker’s partition’s replicas as leader using ZooKeeper. When a producer publishes a message to a partition in a topic, it is forwarded to its leader.
According to Kafka documentation:
The partitions of the log are distributed over the servers in the
Kafka cluster with each server handling data and requests for a share
of the partitions. Each partition is replicated across a configurable
number of servers for fault tolerance.
Each partition has one server which acts as the "leader" and zero or
more servers which act as "followers". The leader handles all read and
write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for
some of its partitions and a follower for others so load is well
balanced within the cluster.
You can find topic and partition leader using this piece of code.
EDIT:
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.
It's quite an old question but I have the same question and after researched, I want to share the answer cuz I hope it can help others.
To determine leader of a partition, producer uses a request type called a metadata request, which includes a list of topics the producer is interested in.
The broker will response specifies which partitions exist in the topics, the replicas for each partition, and which replica is the leader.
Metadata requests can be sent to any broker because all brokers have a metadata cache that contains this information.