default consumer group id in kafka - apache-kafka

I am working with Kafka 2.11 and fairly new to it. I am trying to understand kafka consumer groups, I have 3 spark applications consuming from same topic and each of them are receiving all the messages from that topic. As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them.
I need to reset kafka offset for one of the application using below command.As I don't know the consumer group name of my application I'm kind of stuck here. Do I need to explicitly assign group id in application and then use it in the command below?
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --to-datetime 2017-11-1907:52:43:00:000 --group <group_name> --topic <topic_name> --execute
If this is true, how can I get consumer group id of each application? I can't

Consumer group.id is mandatory. If you do not set consumer group.id, you will get exception. So obviously you're setting it somewhere in your code or the framework or library you're using is setting it internally. You should always set group.id by yourself.
You can get the consumer group ids by using the following command:
bin/kafka-consumer-groups.sh --list --bootstrap-server <kafka-broker-ip>:9092

If you go to Spark code you can find KafkaSourceProvider class, that is responsible for Kafka source reader, you can see that random group.id is generated:
private[kafka010] class KafkaSourceProvider extends DataSourceRegister
override def createSource(
sqlContext: SQLContext,
metadataPath: String,
schema: Option[StructType],
providerName: String,
parameters: Map[String, String]): Source = {
validateStreamOptions(parameters)
// Each running query should use its own group id. Otherwise, the query may be only assigned
// partial data since Kafka will assign partitions to multiple consumers having the same group
// id. Hence, we should generate a unique id for each query.
val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
...
}
You can search group.id with spark-kafka-source prefix, but you can't find group.id for particular group.
To find all consumer group ids you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --list
To check consumer groups offsets you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --group=GROUP_ID --describe

As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them
The Kafka brokers don't assign consumer group names to consumers connected to them.
When a consumer connects, subscribing to a topic, it "joins" a group.
If you are using Spark application without specifying any consumer group, it means that in some way the library/framework you are using for connecting to Kafka from a Spark application is assigning consumer group names itself.

Related

Consumer do not join a consumer group

I am using kafka-python and I want to consume messages from a topic. For monitoring reasons, I want to create a consumer and assign it to a consumer'group.
I am using the following functions:
server = KafkaConsumer(application.name, bootstrap_servers = str(ip_address)+':'+str(ip_port) , client_id =str(application.name)+'_dispatcher', group_id='xxxxxx')
server.subscribe(topics=[application.name])
However, when monitoring the consumer groups using:
bash kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groups
I still see that the consumer is not added to the consumer groups and the consumer group is not rebalancing. Could someone explain to me what is the problem with Kafka?

Kafka consumer group description does not include all topics [duplicate]

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

How to create a new consumer group in kafka

I am running kafka locally following instructions on quick start guide here,
and then I defined my consumer group configuration in config/consumer.properties so that my consumer can pick messages from the defined group.id
Running the following command,
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
results in,
test-consumer-group <-- group.id defined in conf/consumer.properties
console-consumer-67807 <-- when connecting to kafka via kafka-console-consumer.sh
I am able to connect to kafka via a python based consumer that is configured to use the provide group.id i.e test-consumer-group
First of all, I am not able to understand how/when kafka creates consumer groups. It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh.
How can I explicitly create my own consumer group, lets say my-created-consumer-group ?
You do not explicitly create consumer groups but rather build consumers which always belong to a consumer group. No matter which technology (Spark, Spring, Flink, ...) you are using, each Kafka Consumer will have a Consumer Group. The consumer group is configurable for each individual consumer.
It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh
If you do not tell your console consumer to actually make use of that file it will not be taken into consideration.
There are the following alternatives to provide the name of a consumer group:
Console Consumer with property file (--consumer.config)
This is how the file config/consumer.properties should look like
# consumer group id
group.id=my-created-consumer-group
And this is how you would then ensure that the console-consumer takes this group.id into consideration:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --consumer.config /path/to/config/consumer.properties
Console consumer with --group
For console consumers the consumer group gets created automatically with prefix "console-consumer" and suffix something like a PID, unless you provide your own consumer group by adding --group:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --group my-created-consumer-group
Standard code-based consumer API
When using the standard JAVA/Scala/... Consumer API you could provide the Consumer Group through the properties:
Properties settings = new Properties();
settings.put(ConsumerConfig.GROUP_ID_CONFIG, "basic-consumer");
// set more properties
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(settings)) {
consumer.subscribe(Arrays.asList("test-topic")

Unable to describe Kafka Streams Consumer Group

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

How to seperate a consumer from consumer group without losing offset?

I have a logstash kafka consumer group subscribing nearly 20topics which isn't performing well for a particular high priority kafka topic to process, so I've decided to remove one topic out of the consumer group and launch a separate consumer group for high priority topics, but unfortunately with that i'm losing the offset that was there in old consumer group.
Is there anyway I can start the new logstash consumer group with the initial offset from last consumer group?
Thanks
You can use kafka scripts to set offset for new group.
Sample scenario:
Stop application.
Check current offset for group. You can use following command. The output will contains information about current offset, log end offset, lag, etc for each topic.
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupId --describe
Set offset for new group id, that will be used by new application (suppose the offset for topic you would like to switch is 10001)
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group newGroupId --to-offset 10001 --topic topicName --reset-offsets --execute
Remove topicName from topics list for old application.
Set up new logstash configuration with newGroupId group id.
Start old and new logstash applications.