two kafka consumer in same group and one partition - apache-kafka

I have kafka topic and one partiton .When i create consumer missed to mention the group .id . there are 2 different consumer for the same partition .
I can receive messages and my flink source extends FlinkKafkaConsumer011.
Ideally one consumer should receive message since its a same group but in my case both the consumers are receiving messages not sure why ???.
To test have restarted one of my job and newly started consumer is not picking where it has left because other consumer is commiting offset it seems.
Flink 1.6.0
Kafka 0.11.2

Related

How kafka commit work in a consumer group?

I'm very much new in Apache Kafka. So, I'm facing some issues.
If one consumer Auto-Commit is enabled and that machine is consumed a data then other new consumers on that group is not receiving those message.
I have 3 consumer, 1 producer. Here, every single consumer is running on different machine. I'm consuming messages from a single machine with auto-commit enabled for testing purpose. And after that when I'm running another two consumer machine then they are not receiving those messages.
As other two consumers did not received those messages so I want to receive those message for this two consumer.
So, How Kafka commit works for a consumer group?

Set offset for an unconnected kafka topic in a consumer group without stopping consumers

We have a single consumer group set up for the continuous migration multiple topics. There is a service running, which connects a new consumer instance to each topic. We have the ability to disable the consumers for single topics while keeping the others running.
Sometimes we have to reset the offset for a specific topic, so that the migration starts over. Is there a way to override the necessity to disable all of the consumers to do so? As the offset is being kept per topic basis in a consumer group, I don't see why we have to disable the connections to all of the other topics. If this is not possible, what is the benefit of reading multiple topics with a single consumer group?
Example:
service A -> consumer with group "migration" -> consumes topic A
service A -> consumer with group "migration" -> consumer for topic B is stopped
service A -> consumer with group "migration" -> consumes topic C
set offset for group "migration" for topic B
Error: Assignments can only be reset if the group 'migration' is inactive, but the current state is Stable.
Each member of a consumer-group is assigned to topics-partitions, so until the group is UP (continuous poll/commit for assigned partition) you couldn't reset-offsets.
But
There is a solution allow you to do it without service-interruption (stopping consumers)
First : you should switch your partition assignor to "CooperativeStickyAssignor"
partition.assignment.strategy:
[RangeAssignor,CooperativeStickyAssignor]
Restart all of application instances one by one
Remove The Range assignor
partition.assignment.strategy:
[CooperativeStickyAssignor]
Restart all of application instances one by one
After this change, you could now:
take-off the topic (where you'd like to setup a reset-offset) from the consumer topic's list.
Build and redeploy your app
=> The cooperative assignor will revoke/remove the partitions of this topic from the assignation, without stopping the consume from other topics
Reset your offsets with :
kafka-consumer-group --bootstrap-server xxxx --group GROUP_ID_NAME --topic TOPIC_NAME --reset-offset --to-xxx ... --execute
And then reput your topic on your consumer config then build and redeploy it.
Pay Attention : this feature only available from kafka-client version 2.4
More details here : https://www.confluent.io/fr-fr/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/

Flink kafka - Flink job not sending messages to different partitions

I have the below configuration:
One kafka topic with 2 partitions
One zookeeper instance
One kafka instance
Two consumers with same group id
Flink job snippet:
speStream.addSink(new FlinkKafkaProducer011(kafkaTopicName,new
SimpleStringSchema(), props));
Scenario 1:
I have written a flink job (Producer) on eclipse which is reading a file from a folder and putting the msgs on kafka topic.
So when i run this code using eclipse, it works fine.
For example : If I place a file with 100 records, flink sends few msgs to partition 1 & few msgs to partition 2 and hence both the consumers gets few msgs.
Scenario 2:
When i create the jar of the above code and run it on flink server, flink sends all the msgs to a single partition and hence only one consumer get all the msgs.
I want the scenario 1 using the jar created in scenario 2.
For Flink-Kafka Producers, add "null" as the last parameter.
speStream.addSink(new FlinkKafkaProducer011(
kafkaTopicName,
new SimpleStringSchema(),
props,
(FlinkKafkaPartitioner) null)
);
The short explanation for this is that this turns off Flink from using the default partitioner FlinkFixedPartitioner. This being turned off as the default will allow Kafka to distribute the data amongst its partitions as it sees fit. If this is NOT turned off, then each parallelism/task slot used for the sink that utilizes the FlinkKafkaProducer will only write to one partition per parallelism/task slot.
If you do not provide a FlinkKafkaPartitioner or do not explicitly say to use Kafka's one a FlinkFixedPartitioner will be used, meaning that all events from one task will end up in the same partition.
To use Kafka's partitioner use this ctor:
speStream.addSink(new FlinkKafkaProducer011(kafkaTopicName,new SimpleStringSchema(), props), Optional.empty());
The difference between running from IDE and eclipse are probably because of different setup for parallelism or partitioning within Flink.

Kafka consumer Behaviour

I was writing Kafka consumer and I have a query related to consumer processes.
I have a consumer with groupId="testGroupId" and using the same groupId I consume from multiple topics say, "topic1" and "topic2".
Also, assume "topic1" is already created on broker whereas "topic2" is not yet created.
Now If I start the consumer I see consumer threads for "topic1" (which is already created) in zookeeper nodes, but I do not see any consumer thread(s) for "topic2".
My question is, will the consumer thread(s) for "topic2" will be created only after we create the topic on broker?
I assume you use Kafka ConsumerConnector method like createMessageStreamsByFilter. Consumer will subscribe to kafka topic events and in case of new topics it will subscribe to that topic automatically.

Messages sent to all consumers with the same consumer group name

There is following consumer code:
from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
kafka = KafkaClient("localhost", 9092)
consumer = SimpleConsumer(kafka, "my-group", "my-topic")
consumer.seek(0, 2)
for message in consumer:
print message
kafka.close()
Then I produce message with script:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
The thing is that when I start consumers as two different processes then I receive new messages in each process. However I want it to be sent to only one consumer, not broadcasted.
In documentation of Kafka (https://kafka.apache.org/documentation.html) there is written:
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
I see that group for these consumers is the same - my-group.
How to make it so that new message is read by exactly one consumer instead of broadcasting it?
the consumer-group API was not officially supported untilĀ kafka v. 0.8.1 (released Mar 12, 2014). For server versions prior, consumer groups do not work correctly. And as of this post the kafka-python library does not currently attempt to send group offset data:
https://github.com/mumrah/kafka-python/blob/c9d9d0aad2447bb8bad0e62c97365e5101001e4b/kafka/consumer.py#L108-L115
Its hard to tell from the example above what your Zookeeper configuration is or if there's one at all. You'll need a Zookeeper cluster for the consumer group information to be persisted WRT what consumer within each group has consumed to a given offset.
A solid example is here:
Official Kafka documentation - Consumer Group Example
This should not happen - make sure that both of the consumers are being registered under the same consumer group in the zookeeper znodes. Each message to a topic should be consumed by a consumer group exactly once, so one consumer out of everyone in the group should receive the message, not what you are experiencing. What version of Kafka are you using?