When Schema registry initializes it creates two internal consumer groups
kafkastore.group.id
public static final String KAFKASTORE_GROUP_ID_CONFIG = "kafkastore.group.id";
schema-registry.group.id
public static final String SCHEMAREGISTRY_GROUP_ID_CONFIG = "schema.registry.group.id";
But these consumer groups are not being listed through this command.
kafka-consumer-groups.sh --group "schema-registry" --describe --state --verbose
The Schema Registry doesn't use a consumer group since it doesn't need to store any offsets.
It always seeks to the beginning of the schemas topic, and reads to the end to populate its internal cache, and never commits.
schema.registry.group.id is used for clustering between highly available instances, rather than the consumer API
Related
I'm trying to run some tests to understand MM2 behavior. As part of that I had the following questions:
How to correctly pass a custom consumer group for MM2 in mm2.properties?
Based on this question, tried passing <alias>.group.id=temp_cons_group in mm2.properties and on restarting the MM2 instance could see the consumer group mentioned in the MM2 logs.
However, when I try listing consumer groups registered in the source broker, the group doesn't show up?
How to test if the property <alias>.consumer.auto.offset.reset works?
Here, I want to consume the same messages again so in reference to the question, tried setting <source_alias>.consumer.auto.offset.reset to earliest and restarted MM2.
I was able to see the property set correctly in MM2 logs but did not get the messages from the beginning in the target cluster topic.
How do I start a MM2 instance to start consuming messages from a specific offset for a topic present in the source cluster?
MirrorMaker does not use a consumer group to run and instead uses the assign() API, so it's expected that you don't see a group.
It's hard to "test". One way to verify this configuration was picked up is to check it's present in the logs when MirrorMaker starts its consumers.
This is currently not trivial to do. There's a KIP in progress to improve the process but at the moment it requires manually updating the internal offset topic from your Connect instance. At a very high level, here's the process:
First, ensure MirrorMaker is not running. Then you need to find the offset records for MirrorMaker in the offsets topic using a command like:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic <CONNECT_OFFSET_TOPIC \
--from-beginning \
--property print.key=true | grep <SOURCE_CONNECTOR_NAME>
You will see records with offsets for each partition MirrorMaker handles. To update the offsets, you need to produce new records to this topic with the offsets you want. For each partition, ensure your record has the same key as the existing message so it replaces the existing stored offsets.
I have an application that uses Apache Kafka and creates a new consumer group on every startup. It takes fixed string and adds generated uuid to generate group id. (ex. my_consumer_group_123324234234, my_consumer_group_123324234235 ...). When I shut down the app old consumer groups stays unused until offsets.retention.minutes after kafka doesn't remove them.
I wonder if it is possible to remove unused consumer groups (filtered with name like 'my_consumer_group_*') by script
Yes, it should be possible using the kafka-consumer-groups.sh script included with Kafka.
You could create a script that periodically lists the existing consumer groups
kafka-consumer-groups.sh --bootstrap-server <kafka-servers-addrs> --list
Then describes each one of them
kafka-consumer-groups.sh --bootstrap-server <kafka-servers-addrs> --describe --group <consumer-group>
One option to detect if they are unused is to parse the output to see if it returns:
Consumer group '<consumer-group>' has no active members.
Note that relying on the message could be a bit brittle, since the message could change across Kafka versions, so I'd look for some other more robust approach (e.g. status code that the script returns (if any), initialize your own consumer...)
And then deletes the ones that are unused:
kafka-consumer-groups.sh --bootstrap-server <kafka-server-addrs> --delete --group <consumer-group1> --group <consumer-group2>
What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.
I am working with Kafka 2.11 and fairly new to it. I am trying to understand kafka consumer groups, I have 3 spark applications consuming from same topic and each of them are receiving all the messages from that topic. As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them.
I need to reset kafka offset for one of the application using below command.As I don't know the consumer group name of my application I'm kind of stuck here. Do I need to explicitly assign group id in application and then use it in the command below?
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --to-datetime 2017-11-1907:52:43:00:000 --group <group_name> --topic <topic_name> --execute
If this is true, how can I get consumer group id of each application? I can't
Consumer group.id is mandatory. If you do not set consumer group.id, you will get exception. So obviously you're setting it somewhere in your code or the framework or library you're using is setting it internally. You should always set group.id by yourself.
You can get the consumer group ids by using the following command:
bin/kafka-consumer-groups.sh --list --bootstrap-server <kafka-broker-ip>:9092
If you go to Spark code you can find KafkaSourceProvider class, that is responsible for Kafka source reader, you can see that random group.id is generated:
private[kafka010] class KafkaSourceProvider extends DataSourceRegister
override def createSource(
sqlContext: SQLContext,
metadataPath: String,
schema: Option[StructType],
providerName: String,
parameters: Map[String, String]): Source = {
validateStreamOptions(parameters)
// Each running query should use its own group id. Otherwise, the query may be only assigned
// partial data since Kafka will assign partitions to multiple consumers having the same group
// id. Hence, we should generate a unique id for each query.
val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
...
}
You can search group.id with spark-kafka-source prefix, but you can't find group.id for particular group.
To find all consumer group ids you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --list
To check consumer groups offsets you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --group=GROUP_ID --describe
As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them
The Kafka brokers don't assign consumer group names to consumers connected to them.
When a consumer connects, subscribing to a topic, it "joins" a group.
If you are using Spark application without specifying any consumer group, it means that in some way the library/framework you are using for connecting to Kafka from a Spark application is assigning consumer group names itself.
I'm working with Kafka 0.9.1 new consumer API. The consumer is manually assigned to a partition. For this consumer I would like to see its progress (meaning the lag). Since I added the group id consumer-tutorial as property, I assumed that I can use the command
bin/kafka-consumer-groups.sh --new-consumer --describe --group consumer-tutorial --bootstrap-server localhost:9092
(as explained here http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client)
Unfortunately, my consumer group details is not shown using the above command. Therefore I cannot monitor the progress of my consumer (it's lag). How can I monitor the lag in the above described scenario (manually assigned partition)?
The code is:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "consumer-tutorial");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
String topic = "my-topic";
TopicPartition topicPartition = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(topicPartition));
consumer.seekToBeginning(topicPartition);
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records)
System.out.println(record.offset() + ": " + record.value());
consumer.commitSynch();
}
} finally {
consumer.close();
}
Just in case you don't want to write code to get this info or run command-like tools/shell scripts ad-hoc, there are N tools that will capture Kafka metrics, including Consumer Lag. Off the top of my head: Burrow and SPM for Kafka do a good job. Here is a bit of background about Kafka offsets, consumer lag, and a few metrics derived from what Kafka exposes via JMX. HTH.
If you interested in JMX exposure of consumer group lag, here is the agent I wrote:
https://github.com/peterkovgan/kafka9.offsets
You can run this agent on some Kafka node and expose offset lag statistics to external readers.
There are examples how you use this agent with Telegraf
(https://influxdata.com/time-series-platform/telegraf/).
At the end (combining e.g. telegraf,influxdb and grafana) you can see nice graphs of offset lags for several consumer groups.
In the kafka-consumer-groups.sh command, your group name is incorrect --group consumer-tutorial not consumer-tutorial-group
The problem with your code is directly related to the manual assignment of consumers to topic-partitions.
You specify a consumer group in the group.id property, however, the group ID is only used when you subscribe to a topic (or a set of topics) via the KafkaConsumer.subscribe() API. In your example, you are using the .assign() method, which manually attaches the client to the specified topic-partition pairs, without utilising the underlying consumer group primitives. It is for this reason you are unable to see the consumer lag. Tools such as Burrow will not work in this case, because they will query the offsets of the consumer group, which is not there.
There are two options available to you:
Use the consumer group feature properly, using the subscribe() API. This is the dominant use case for Kafka. However, the seekToBeginning() will also not work in this case, as the offsets will be entirely managed by the consumer group.
Drop the consumer group altogether and manage both partition assignments and offsets manually. This gives you the maximum possible flexibility but is a lot of work, and you might find yourself reinventing the wheel. Most people will not go down this path, unless the consumer group feature of Kafka does not suit your needs.
The choice will depend squarely on your use case. For conventional stream processing, #1 is the idiomatic approach. This is what Kafka was designed for. #2 implies that you know what you are doing and transfers all of the group management responsibility onto your application.
Note: Kafka does not have a "partial" mode where you do some of group management and Kafka does the rest. It's either all-in or none at all.
You can use simple and powerful tool for lag monitoring called
prometheus-kafka-consumer-group-exporter
refer below url:
https://github.com/braedon/prometheus-kafka-consumer-group-exporter
After installation run below command to export Consumer matrix on your required port Prometheus Kafka Consumer Group Exporter
/usr/bin/python3 /usr/local/bin/prometheus-kafka-consumer-group-exporter -p PORT -b KAFKA_CLUSTER_IP_PORT
After running above command verify data on http url YOUR-SERVER-IP:PORT like 127.0.0.1:9208
Now you can use any JMX scraper for dashboard and alert system. I am using prometheus & grafana
This can be run on any shared server like [kafka broker, zookeeper server, prometheus server or any] because it has very low overhead on system resources.