Delete unused kafka consumer group - apache-kafka

I'm using Apache Kafka 0.10 with a compacted topic as a distributed cache synch mechanism. When the application starts up it generates an instance specific consumer group id. As instances are added and removed for horizontal scalability, obviously we get a large number of group ids that should never be used again.
I'm sure that this is the perfect use case for KStreams and KTables, but I am trying to do this myself for intellectual reasons as well as that the KStreams and KTables are defined as alpha quality in 0.10.
Is there a Kafka API call that I can use that could delete an existing consumer group, knowing that it should never be used again?
Since Zookeeper is not maintaining consumer offsets in version 0.10, Is there a way delete the consumer group using Kafka?

It's possible with CLI in Kafka
./bin/kafka-consumer-groups \
--bootstrap-server <bootstrap_server(s)> \
--topic <topic_name> \
--delete \
--group <consumer_group_name>
kafka-consumer-groups should be available in Kafka installation home.

Since Kafka 0.9, an internal topic is used to store committed offsets. You can configure how long those offsets should be kept via offsets.retention.minutes. (See also offsets.retention.check.interval.ms).

Related

MM2.0 consumer group behavior

I'm trying to run some tests to understand MM2 behavior. As part of that I had the following questions:
How to correctly pass a custom consumer group for MM2 in mm2.properties?
Based on this question, tried passing <alias>.group.id=temp_cons_group in mm2.properties and on restarting the MM2 instance could see the consumer group mentioned in the MM2 logs.
However, when I try listing consumer groups registered in the source broker, the group doesn't show up?
How to test if the property <alias>.consumer.auto.offset.reset works?
Here, I want to consume the same messages again so in reference to the question, tried setting <source_alias>.consumer.auto.offset.reset to earliest and restarted MM2.
I was able to see the property set correctly in MM2 logs but did not get the messages from the beginning in the target cluster topic.
How do I start a MM2 instance to start consuming messages from a specific offset for a topic present in the source cluster?
MirrorMaker does not use a consumer group to run and instead uses the assign() API, so it's expected that you don't see a group.
It's hard to "test". One way to verify this configuration was picked up is to check it's present in the logs when MirrorMaker starts its consumers.
This is currently not trivial to do. There's a KIP in progress to improve the process but at the moment it requires manually updating the internal offset topic from your Connect instance. At a very high level, here's the process:
First, ensure MirrorMaker is not running. Then you need to find the offset records for MirrorMaker in the offsets topic using a command like:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic <CONNECT_OFFSET_TOPIC \
--from-beginning \
--property print.key=true | grep <SOURCE_CONNECTOR_NAME>
You will see records with offsets for each partition MirrorMaker handles. To update the offsets, you need to produce new records to this topic with the offsets you want. For each partition, ensure your record has the same key as the existing message so it replaces the existing stored offsets.

Kafka consumer group description does not include all topics [duplicate]

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

Where are consumer groups list stored in recent Kafka version?

Apparently, in earlier kafka version, the list of consumers for a certain consumer group was stored in zookeeper.
Where is this information stored for latest kafka release?
Since Kafka 0.10, the list of Consumer Groups are stored in the __consumer_offsets topic.
That topic contains both the committed offsets and the groups metadata (group.id, members, generation, leader, ...). Groups are stored using GroupMetadataMessage messages (Offsets use OffsetsMessage).
You can dump the groups metadata using the GroupMetadataMessageFormatter. For example:
./bin/kafka-console-consumer.sh \
--formatter "kafka.coordinator.group.GroupMetadataManager\$GroupMetadataMessageFormatter" \
--bootstrap-server localhost:9092 \
--topic __consumer_offsets
Note that this is a compacted topic, so to get the list of groups, you would need to "materialize" all entries. This is what brokers do when you use the listConsumerGroups() API
Consumer groups are still available in Zookeeper. Their offsets are stored in Kafka since 0.9 "high level" consumer

Kafka Connect offset.storage.topic not receiving messages (i.e. how to access Kafka Connect offset metadata?)

I am working on setting up a Kafka Connect Distributed Mode application which will be a Kafka to S3 pipeline. I am using Kafka 0.10.1.0-1 and Kafka Connect 3.1.1-1. So far things are going smoothly but one aspect that is important to the larger system I am working with requires knowing offset information of the Kafka -> FileSystem pipeline. According to the documentation, the offset.storage.topic configuration will be the location the distributed mode application uses for storing offset information. This makes sense given how Kafka stores consumer offsets in the 'new' Kafka. However, after doing some testing with the FileStreamSinkConnector, nothing is being written to my offset.storage.topic which is the default value: connect-offsets.
To be specific, I am using a Python Kafka producer to push data to a topic and using Kafka Connect with the FileStreamSinkConnect to output the data from the topic to a file. This works and behaves as I expect the connector to behave. Additionally, when I stop the connector and start the connector, the application remembers the state in the topic and there is no data duplication. However, when I go to the offset.storage.topic to see what offset metadata is stored, there is nothing in the topic.
This is the command that I use:
kafka-console-consumer --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic connect-offsets --from-beginning
I receive this message after letting this command run for a minute or so:
Processed a total of 0 messages
So to summarize, I have 2 questions:
Why is offset metadata not being written to the topic that should be storing this even though my distributed application is keeping state correctly?
How do I access offset metadata information for a Kafka Connect distributed mode application? This is 100% necessary for my team's Lambda Architecture implementation of our system.
Thanks for the help.
Liju is correct, connect-offsets is used to track offsets for source connectors (which have a producer but not a consumer). Sink connector have a consumer and track offsets the usual way - __consumer_offsets topic
The best way to look at last committed offsets is with the consumer group tool:
bin/kafka-consumer-groups.sh --group connect-elastic-login-connector --bootstrap-server localhost:9092 --describe
The group name is always "connect-" and the connector name (in my case, elastic-login-connector). This will show the latest offset committed by the group, which basically acknowledges that all messages up to this offset were written to Elastic.
The offsets might be committing to the kafka default offset commit topic i.e. _consumer_offsets
The new S3 Connector released by Confluent might be of interested to you.
From what you describe, maybe it can significantly simplify your goal of exporting records from Kafka to your S3 buckets.

removing a kafka consumer group in zookeeper

I'm using kafka_2.9.2-0.8.1.1 with zookeeper 3.4.6.
Is there a utility that can automatically remove a consumer group from zookeeper? Or can I just remove everything under /consumers/[group_id] in zookeeper? If the latter, is there anything else I'm missing & can this be done with a live system?
Update:
As of kafka version 2.3.0, there is a new utility:
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --delete --group my-group
Related doc: http://kafka.apache.org/documentation/#basic_ops_consumer_lag
See below for more discussion
As of v0.9.0, Kafka ships with a suite of tools in the /bin one of which is the kafka-consumer-groups.sh tool. This will delete a consumer group. ./kafka-consumer-groups.sh --zookeeper <zookeeper_url> --delete --group <group-name>
For new consumers (which use a kafka topic to manage offsets instead of zookeeper) you cannot delete the group information using kafka's built in tools.
Here is an example of trying to delete the group information for a new style consumer using the kafka-consumer-groups.sh script:
bin/kafka-consumer-groups.sh --bootstrap-server "kafka:9092" --delete --group "indexer" --topic "cleaned-logs"
Option '[delete]' is only valid with '[zookeeper]'. Note that there's no need to delete group metadata for the new consumer as the group is deleted when the last committed offset for that group expires.
Here's the important part of that response:
Note that there's no need to delete group metadata for the new consumer as the group is deleted when the last committed offset for that group expires.
This is kind of annoying from a monitoring perspective (esp. when tracking offsets via something like burrow) because it means that if you change consumer group names in your code you'll keep seeing that old groups are behind on their offsets until those offsets expire.
Hypothetically you could write a tombstone to that topic manually (which is what happens during offset expiration) but I haven't found any tools that make this easy.
you can delete group from kafka by CLI
kafka-consumer-groups --bootstrap-server localhost:9092 --delete --group group_name
Currently, as I know, the only way to remove a Kafka consumer group is manually deleting Zookeeper path /consumers/[group_id].
If you just want to delete a consumer group, there is nothing to worry about manually deleting the Zookeeper path, but if you do it for rewinding offsets, the below will be helpful.
First of all, you should stop all the consumers belongs to the consumer group before removing the Zookeeper path. If you don't, those consumers will not consume newly produced messages and will soon close connections to the Zookeeper cluster.
When you restart the consumers, if you want the consumers to start off from the beginning, give auto.offset.reset property to smallest (or earliest in new Kafka releases). The default value of the property is largest (or latest in new Kafka releases) which makes your restarting consumers read after the largest offset which in turn consuming only newly produced messages. For more information about the property, refer to Consumer Config in the Kafka documentation.
FYI, there is a question How can I rewind the offset in the consumer? in Kafka FAQ, but it gave me not much help.