I enabled JMX on the kafka brokers on port 8081. When I view the MBean properties in jConsole, I only see the following for kafka.consumer-
kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchRequestRateAndTimeMs,clientId=ReplicaFetcherThread-2-413
kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=ReplicaFetcherThread-0-413
But none of the other ones that are identified in here under Kafka Consumer Metrics are emitted by JMX.
Kafka Version # 0.8.2.1
I am specifically interested in -
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)
Any thoughts?
The JMX PORT you are listening is the broker port. But the Mbean of kafka.consumer: is the consumer jvm metrics. So if you have another JVM that is consume a topic, you can see kafka.consumer Mbeans.
ConsumerLag is an overloaded term in Kafka, it can stand for:
Consumer's metric: Calculated difference between a consumer's current log offset and a producer’s current log offset. You can find it under JMX bean, if you're using Java/Scala based consumer (e.g. pykafka consumer doesn't export metrics):
kafka v0.8.2.x:
kafka.consumer:type= ConsumerFetcherManager, name=MaxLag, clientId=([-.\w]+)
kafka v0.9+:
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)
Consumer lag used to be stored in ZooKeeper (Kafka <= v0.8), newer versions of Kafka have special topic __consumer_offsets that stores each consumer's lag. There are tools (e.g. kafka-manager) that can compute lag by consuming messages from this topic and calculating lag. In kafka-manager you have to enable this feature for each cluster:
[ ] Poll consumer information (Not recommended for large # of consumers)
Broker's metric: Represent the offset differences between partition leaders and their followers. You can find this metric under JMX bean:
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)
This may help to find it for 0.8, but I am currently running a Kafka 0.10 broker and consumer. When using a console consumer, I pointed jconsole to this consumer and found on the MBeans TAB: kafka.consumer-> consumer-fetcher-manager-metric -> consumer-1 -> Attributes -> records-max-lag.
Related
I am using Debezium which makes of Kafka Connect.
Kafka Connect exposes a couple of topics that need to be created:
OFFSET_STORAGE_TOPIC
This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector offsets. The topic should have many partitions, be highly replicated (e.g., 3x or more) and should be configured for compaction.
STATUS_STORAGE_TOPIC
This environment variable should be provided when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector status. The topic can have multiple partitions, should be highly replicated (e.g., 3x or more) and should be configured for compaction.
Does anyone have any specific recommended compaction configs for these topics?
e.g.
is it enough to set just:
cleanup.policy: compact
unclean.leader.election.enable: true
or also:
min.compaction.lag.ms: 60000
segment.ms: 1800000
min.cleanable.dirty.ratio: 0.01
delete.retention.ms: 100
The defaults should be fine, and Connect will create/configure those topics on its own unless you preconfigure those topics with those settings.
These are the only cases when I can think of when to adjust the compaction settings
a connect-group lingering on the topic longer than you want it to be. For example, a source connector doesn't start immediately after a long downtime because it's processing the offsets topic
your Connect cluster doesn't accurately report its state, or the tasks do not rebalance appropriately (because the status topic is in a bad state)
The __consumer_offsets (compacted) topic is what is used for Sink connectors, and would be configured separately for all consumers, not only Connect
I am using Kafka client library comes with Kafka 0.11.0.1. I noticed that using kafkaconsumer does not need to configure zookeeper anymore. Does that mean zookeep server will automatically be located by the kafka bootstrap server?
Since Kafka 0.9 the KafkaConsumer implementation stores offsets commit and consumer group information in Kafka brokers themselves. This eliminates the zookeeper dependency and increases the scalability of the consumers.
I have setup Apache Kafka and confirmed that producers and consumers work on the localhost.
How to set up Kafka so that:
multiple producers feed messages into a broker on a network computer
many consumers on the network can consume the messages from the broker
I noticed the following line: zookeeper.connect=localhost:2181 in server.properties which is used to start the kafka server. If this is the setting, is the setting for what addresses it listens to or is it specifying the server's address/port is on the network?
The zookeeper is used internally by Kafka to coordinate the cluster (leader election). In versions of Kafka before 0.8, ZK was the exclusive store for consumer offsets (what is consumed so far), but from 0.8.1, I think, you can choose whether to store offsets in ZK or in a special Kafka topic called __consumer_offsets.
What you're interested in is advertised.host.name and advertised.port settings that Kafka exposes to clients (or "what addresses it listens to", as you put it).
It is the name of the zookeeper server that Kafka connects to. The documentation for the Broker configuration can be found here http://kafka.apache.org/documentation.html#brokerconfigs
I've been following the high level consumer example - but it seems these for are consuming from kafka. I want to connect to zookeeper (zookeperhost:2181) and get a list of kafka brokers associated. Is there a way to do this with HLC?
Also, is there a way to use SimpleConsumer to find a list of kafka brokers, given zk?
As you can see in the link you gave, you don't pass a broker list to the HLC, but
props.put("zookeeper.connect", a_zookeeper);
So it's already linked to zookeeper, and from there it will discover kafka brokers.
For you second question, you have the option of using ZkClient to get /brokers data in ZooKeeper, but I wouldn't do it since it depends on Kafka implementation details, which may or may not be stable.
If I have multiple brokers, which broker should my producer use? Do I need to manually switch the broker to balance the load? Also why does the consumer only need a zookeeper endpoint instead of a broker endpoint?
quick example from tutorial:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
which broker should my producer use? Do I need to manually switch the broker to balance the load?
Kafka runs on cluster, meaning set of nodes, so while producing anything you need to tell him the LIST of brokers that you've configured for your application, below is a small note taken from their documentation.
“metadata.broker.list” defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker.
Hope this clear some of your confusion
Also why does the consumer only need a zookeeper endpoint instead of a
broker endpoint
This is not technically correct, as there are two types of APIs available, High level and Low level consumer.
The high level consumer basically takes care of most of the thing like leader detection, threading issue, etc. but does not provide much control over messages which exactly the purpose of using the other alternatives Simple or Low level consumer, in which you will see that you need to provide the brokers, partition related details.
So Consumer need zookeeper end point only when you are going with the high level API, in case of using Simple you do need to provide other information
Kafka sets a single broker as the leader for each partition of each topic. The leader is responsible for handling both reads and writes to that partition. You cannot decide to read or write from a non-Leader broker.
So, what does it mean to provide a broker or list of brokers to the kafka-console-producer ? Well, the broker or brokers you provide on the command-line are just the first contact point for your producer. If the broker you list is not the leader for the topic/partition you need, your producer will get the current leader info (called "topic metadata" in kafka-speak) and reconnect to other brokers as necessary before sending writes. In fact, if your topic has multiple partitions it may even connect to several brokers in parallel (if the partition leaders are different brokers).
Second q: why does the consumer require a zookeeper list for connections instead of a broker list? The answer to that is that kafka consumers can operate in "groups" and zookeeper is used to coordinate those groups (how groups work is a larger issue, beyond the scope of this Q). Zookeeper also stores broker lists for topics, so the consumer can pull broker lists directly from zookeeper, making an additional --broker-list a bit redundant.
Kafka Producer API does not interact directly with Zookeeper. However, the High Level Consumer API connects to Zookeeper to fetch/update the partition offset information for each consumer. So, the consumer API would fail if it cannot connect to Zookeeper.
All above answers are correct in older versions of Kafka, but things have changed with arrival of Kafka 0.9.
Now there is no longer any direct interaction with zookeeper from either the producer or consumer. Another interesting things is with 0.9, Kafka has removed the dissimilarity between High-level and Low-level APIs, since both follows a uniformed consumer API.