What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.
Related
I'm trying to run some tests to understand MM2 behavior. As part of that I had the following questions:
How to correctly pass a custom consumer group for MM2 in mm2.properties?
Based on this question, tried passing <alias>.group.id=temp_cons_group in mm2.properties and on restarting the MM2 instance could see the consumer group mentioned in the MM2 logs.
However, when I try listing consumer groups registered in the source broker, the group doesn't show up?
How to test if the property <alias>.consumer.auto.offset.reset works?
Here, I want to consume the same messages again so in reference to the question, tried setting <source_alias>.consumer.auto.offset.reset to earliest and restarted MM2.
I was able to see the property set correctly in MM2 logs but did not get the messages from the beginning in the target cluster topic.
How do I start a MM2 instance to start consuming messages from a specific offset for a topic present in the source cluster?
MirrorMaker does not use a consumer group to run and instead uses the assign() API, so it's expected that you don't see a group.
It's hard to "test". One way to verify this configuration was picked up is to check it's present in the logs when MirrorMaker starts its consumers.
This is currently not trivial to do. There's a KIP in progress to improve the process but at the moment it requires manually updating the internal offset topic from your Connect instance. At a very high level, here's the process:
First, ensure MirrorMaker is not running. Then you need to find the offset records for MirrorMaker in the offsets topic using a command like:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic <CONNECT_OFFSET_TOPIC \
--from-beginning \
--property print.key=true | grep <SOURCE_CONNECTOR_NAME>
You will see records with offsets for each partition MirrorMaker handles. To update the offsets, you need to produce new records to this topic with the offsets you want. For each partition, ensure your record has the same key as the existing message so it replaces the existing stored offsets.
I write Kafka consumed messages on a file (backup.log). To do this, I created a service on my CentOS which run kafka-console-consumer.sh --bootstrap-server kafka1:9092 --topic test --consumer.config consumer.properties >> /backup/backup.log
I have two Linux host.
Each Linux host one Kafka consumer and each consumer belong to a consumer group. So I have two consumer group.
This way, If a consumer is down, the other consumer has all the messages as he belongs to another consumer group.
Currently, I have only one topic with four partitions.
Question: I have to consume multiples topics. I read that it's recommended to add consumers (parallelize) instead of using only one consumer for too many topics/partitions.
-> What is the best way to achieve this? Duplicate the service which runs kafka-console-consumer.sh? Or another way?
What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.
I have several consumers that connect to Kafka Cluster that I do not have control over. At the same time, I would like to have visibility into how those consumers are configured.
Is there an API to list all consumers (if there is one for publishers, it is an added benefit) and then read all their configs?
I am talking about these consumer settings:
https://docs.confluent.io/current/installation/configuration/consumer-configs.html#cp-config-consumer
This is not possible as most of those settings are configured at the consumer only and are not pushed to the brokers or any topic.
It's possible however to get a high-level description for a given consumer group:
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group consumer-group
I have 3 spark-streaming consumers with same group id subscribed to a Kafka topic which has only one partition. So that one partition is assigned to one consumer and other consumers are idle. We need to kill the idle consumers. We are using Spark with YARN.
How to identify the consumer which is idle
Is there any commands from kafka to check which consumer is idle or is there any other we can achieve the same?
kafka-consumer-groups --group name will show all clients within a group and those assigned a partition
Although, it doesn't hurt to have backup processes incase one of the consumer threads fails
Try this -
$ kafka-consumer-groups.sh --describe --group group_name --members