How to create a new consumer group in kafka - apache-kafka

I am running kafka locally following instructions on quick start guide here,
and then I defined my consumer group configuration in config/consumer.properties so that my consumer can pick messages from the defined group.id
Running the following command,
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
results in,
test-consumer-group <-- group.id defined in conf/consumer.properties
console-consumer-67807 <-- when connecting to kafka via kafka-console-consumer.sh
I am able to connect to kafka via a python based consumer that is configured to use the provide group.id i.e test-consumer-group
First of all, I am not able to understand how/when kafka creates consumer groups. It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh.
How can I explicitly create my own consumer group, lets say my-created-consumer-group ?

You do not explicitly create consumer groups but rather build consumers which always belong to a consumer group. No matter which technology (Spark, Spring, Flink, ...) you are using, each Kafka Consumer will have a Consumer Group. The consumer group is configurable for each individual consumer.
It seems it loads the conf/consumer.properties at some point of time and additionally it implicitly creates consumer-group (in my case console-consumer-67807) when connecting via kafka-console-consumer.sh
If you do not tell your console consumer to actually make use of that file it will not be taken into consideration.
There are the following alternatives to provide the name of a consumer group:
Console Consumer with property file (--consumer.config)
This is how the file config/consumer.properties should look like
# consumer group id
group.id=my-created-consumer-group
And this is how you would then ensure that the console-consumer takes this group.id into consideration:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --consumer.config /path/to/config/consumer.properties
Console consumer with --group
For console consumers the consumer group gets created automatically with prefix "console-consumer" and suffix something like a PID, unless you provide your own consumer group by adding --group:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning --group my-created-consumer-group
Standard code-based consumer API
When using the standard JAVA/Scala/... Consumer API you could provide the Consumer Group through the properties:
Properties settings = new Properties();
settings.put(ConsumerConfig.GROUP_ID_CONFIG, "basic-consumer");
// set more properties
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(settings)) {
consumer.subscribe(Arrays.asList("test-topic")

Related

find no consumer in Kafka consumer group but consume is normal

I use kafka which version is kafka V0.11.
I have a consumer group which group.id = test, but when I use command like kafka-consumer-groups --bootstrap-server localhost:9092 --group test --describe, I find no consumer under this group.
however, service consume is normal.
Who could know why? thx.

How to retrieve Kafka Consumer Configs

I have several consumers that connect to Kafka Cluster that I do not have control over. At the same time, I would like to have visibility into how those consumers are configured.
Is there an API to list all consumers (if there is one for publishers, it is an added benefit) and then read all their configs?
I am talking about these consumer settings:
https://docs.confluent.io/current/installation/configuration/consumer-configs.html#cp-config-consumer
This is not possible as most of those settings are configured at the consumer only and are not pushed to the brokers or any topic.
It's possible however to get a high-level description for a given consumer group:
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group consumer-group

default consumer group id in kafka

I am working with Kafka 2.11 and fairly new to it. I am trying to understand kafka consumer groups, I have 3 spark applications consuming from same topic and each of them are receiving all the messages from that topic. As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them.
I need to reset kafka offset for one of the application using below command.As I don't know the consumer group name of my application I'm kind of stuck here. Do I need to explicitly assign group id in application and then use it in the command below?
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --to-datetime 2017-11-1907:52:43:00:000 --group <group_name> --topic <topic_name> --execute
If this is true, how can I get consumer group id of each application? I can't
Consumer group.id is mandatory. If you do not set consumer group.id, you will get exception. So obviously you're setting it somewhere in your code or the framework or library you're using is setting it internally. You should always set group.id by yourself.
You can get the consumer group ids by using the following command:
bin/kafka-consumer-groups.sh --list --bootstrap-server <kafka-broker-ip>:9092
If you go to Spark code you can find KafkaSourceProvider class, that is responsible for Kafka source reader, you can see that random group.id is generated:
private[kafka010] class KafkaSourceProvider extends DataSourceRegister
override def createSource(
sqlContext: SQLContext,
metadataPath: String,
schema: Option[StructType],
providerName: String,
parameters: Map[String, String]): Source = {
validateStreamOptions(parameters)
// Each running query should use its own group id. Otherwise, the query may be only assigned
// partial data since Kafka will assign partitions to multiple consumers having the same group
// id. Hence, we should generate a unique id for each query.
val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
...
}
You can search group.id with spark-kafka-source prefix, but you can't find group.id for particular group.
To find all consumer group ids you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --list
To check consumer groups offsets you can use following command:
./kafka-consumer-groups.sh --bootstrap-server KAFKKA_ADDRESS --group=GROUP_ID --describe
As i have not mentioned any consumer group id in applications I'm assuming that Kafka is assigning some distinct consumer group id to each of them
The Kafka brokers don't assign consumer group names to consumers connected to them.
When a consumer connects, subscribing to a topic, it "joins" a group.
If you are using Spark application without specifying any consumer group, it means that in some way the library/framework you are using for connecting to Kafka from a Spark application is assigning consumer group names itself.

How to run two console consumers in the same consumer group?

When I run two instances of Kafka-console-consumers with the exact same properties (using the default one config/consumer.properties), I get same messages on both the instances.
./bin/kafka-console-consumer.sh --bootstrap-server :9092 --topic test1
If both the instances have the same consumer group id, shouldn't Kafka send a given message to only one of the consumers? How to run them as one consumer group?
From kafka docs i found this
The default for console consumer's enable.auto.commit property when no group.id is provided is now set to false. This is to avoid polluting the consumer coordinator cache as the auto-generated group is not likely to be used by other consumers.
But here is the trick, use this command to list all consumer groups across all topics, as you said i have opened four console consumers and i want to check list of consumer groups consuming from that topic
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
Every console consumer start with different group id, this is the reason always consuming from beginning addition of this property (--from-beginning)
ups.sh --bootstrap-server localhost:9092 --list
Note: This will not show information about old Zookeeper-based consumers.
console-consumer-66835
console-consumer-38647
console-consumer-18983
console-consumer-18365
console-consumer-96734
Okay easiest way to set group.id for console consumer
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer-property group.id=test1
Read up Managing Consumer Groups.
The trick is to use --consumer.config config/consumer.properties or --consumer-property group.id=test1 that would specify the group.id explicitly.
./bin/kafka-console-consumer.sh \
--bootstrap-server localhost:9092 \
--topic test1 \
--consumer.config config/consumer.properties

how to get the all messages in a topic from kafka server

I would like to get all the messages from beginning in a topic from server.
Ex:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopic --from-beginning
When using the above console command, I would like to able to get all messages in a topic from the beginning but I couldn't consume all the messages in a topic from beginning using java code.
You can get all messages using the following command:
cd Users/kv/kafka/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic topicName --from-beginning --max-messages 100
The easiest way would be to start a consumer and drain all the messages. Now I don't know how many partitions you have in your topic and whether you already have a an existing consumer group or not, but you have a few options:
Have a look at this API: https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
1) If you already have a consumer in the same consumer group, and still want to start consuming from the beginning, you should use the seek option listed in the API doc and set the offset to 0 for each consumer in the group. This would start consuming from the beginning.
2) Otherwise, you can start a few consumers in a new consumer group & you would not have to worry about seek.
PS: Please remember to provide more details about your setup in the future if you have more questions on Kafka. A lot of things depend on how you have configured your infrastructure & how you would prefer it to be and would thus vary from case to case.
TopicPartition topicPartition = new TopicPartition(topic, 0);
List<TopicPartition> partitions = Arrays.asList(topicPartition);
consumer.assign(partitions);
consumer.seekToBeginning(partitions);
Just change the consumer group
ConsumerConfig.GROUP_ID_CONFIG - to new group id
and set
AUTO_OFFSET_RESET_CONFIG - earliest
sample code-
props.put(ConsumerConfig.GROUP_ID_CONFIG, "newID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");