Using Kafka & Zookeeper, Different group id does not retrieve Kafka msgs from the beginning - apache-zookeeper

We have kafka v2.10 and zookeeper v3.4 set up and working. We have written high level consumers consuming log msgs from Kafka. Consumer A starts up consuming msgs for topic T and group id G1 (following the high level consumer example provided on Apache Kafka documentation).
Then when consumer B starts up with the same topic T but group id G2, it connects to kafka/zookeeper, but consumes log msgs starting with the offset after the last one used by Consumer A.
My understanding is that it should be given log msgs starting with the lowest offset available in Kafka for that topic. Any idea why it's not doing that?
We are not replicating kafka or zookeeper yet. OUr set up at this point is simple and straight forward and we are trying to get them to work with basic functionality.
Any help is appreciated.
Also, do you know where we can locate the new directories that supposedly zookeeper is creating everytime a consumer with a new group id establishes connection with the zookeeper (for tracking offset for that group id)?

Can you try adding this to the configuration while creating the Consumer group
props.put("auto.offset.reset", "smallest");

Related

Kafka consumer group's offset stuck for one topic

I have an application that uses fs2-kafka for reading business events from a kafka cluster. In that application, I have multiple fs2-kafka consumers, each subscribed to a different topic. But one of the consumers seems to be stuck, as it does not consume any events.
Checking the consumer group's offsets yielded the following results:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
consumer topic 0 - 5 - consumer-consumer-1-99c1c19a-faaf-40e6-a3dc-75b7d04e96f9 /10.0.3.2 consumer-consumer-1
(edited this slightly cause privacy)
I have also managed to get the CURRENT-OFFSET to be 1 (though, it seems like no actual consuming happened because none of my logs were triggered), but regardless - the group does not seem to want to move its offset.
The topic has just one partition and there's only one consumer/consumer group reading from that topic. There is no reason I can see for kafka to hold consumers from consuming. If it matters - that topic, as well as any other topic in this cluster, is created automatically, using kafka's "AUTO_CREATE_TOPICS". (this is a dev environment, it's simply more convinient than creating topics by hand)
The strangest thing is this - the same code, working on a different topic, works. Also, as it is always the case with these things, on my laptop the issue does not reproduce. There's barely any differences between my local kafka and the kafka in our dev cluster.
Originally, I had just one consumer group for the entire application. I have now tried multiple consumer groups per consumer and even sharing a consumer for reading from multiple topics. The only topic that's stuck is this one, every other topic works.
I have also tried:
Restarting kafka and the app, updating kafka to a newer version
Manually resetting the consumer group's offsets
Deleting the topic
Apart from deleting all the data of kafka, I believe I have tried everything on my and kafka's side.

Kafka consumer: how to check if all the messages in the topic partition are completely consumed?

Is there any API or attributes that can be used or compared to determine if all messages in one topic partition are consumed? We are working on a test that will use another consumer in the same consumer group to check if the topic partition still has any message or not. One of our app services is also using Kafka to process internal events. So is there any way to sync the progress of message consumption?
Yes you can use the admin API.
From the admin API you can get the topic offsets for each partition, and a given consumer group offsets. If all messages read, the subtraction of the later from the first will evaluate to 0 for all partitions.

Should Kafka consumers be started before producers?

When I have a kafka console producer message produce some messages and then start a consumer, I am not getting the messages.
However i am receiving message produced by the producer after a consumer has been started.Should Kafka consumers be started before producers?
--from- beginning seems to give all messages including ones that are consumed.
Please help me with this on both console level and java client example for starting producer first and consuming by starting a consumer.
Kafka stores messages for a configurable amount of time. Default is a week. Consumers do not need to be "available" to receive messages, but they do need to know where they should start reading from
The console consumer has the default option of looking at the latest offset for all partitions. So if you're not actively producing data you see nothing as a consumer. You can specify a group flag for the console consumer or a Java client, and that's what tracks what offsets are read within the Kafka protocol and where a read request will resume from if you stopped that consumer in a group
Otherwise, I think you can only give an offset along with a single partition to consume from

Kafka consumer group.id multiple messages

I have developed a Kafka consumer and there will be multiple instances of this consumer running in production. I know how we can use group.id as to not duplicate the processing of data. Is there a way to have all the consumers receive the message but send one consumer a leader bit?
Is there a way to have a group.id per topic or even per key in a topic?
Looks like this has nothing to with Kafka. You already know that by providing a unique group.id for each consumer, all consumer instances will get all messages from the topic. Now as far as the push to DB is concerned - you can factor out that logic and try using a distributed lock so that the push to DB part of your application can only be executed by one of the consumers. Is this a Java based setup ?

How to find out the latest offset of a Kafka topic to know when my reader is up-to-date with topic?

I have a server that needs to keep an in-memory cache of all users. So assuming that a list won't be big - couple hundred thousands items, I'd like to use a Kafka topic with keyed messages where key is a userId to keep the current state of that list and the admin application will send new user object to that topic when something changed. So when the server starts it simply needs to read everything from that topic from the beginning and populate it's cache.
The population phase takes about 20-30 seconds depending on a connection to Kafka so the server needs not become online until it reads everything from the topic to have an up-to-date cache (all the messages in the topic at the moment of start is considered up-to-date). But I don't see how to determine if I read everything from Kafka stream to notify other services that cache is populated and the server can start server requests. I've read about high watermark but don't see it exposed in Java consumer API.
So how to find out the latest offset of a Kafka topic to know when my reader is up-to-date?
Assuming you are using High level consumer.
High watermark is not available in High level consumer.
**As you mentioned: all the messages in the topic at the moment of start is considered up-to-date**
when your application starts, you can do the following using SimpleConsumer Api :-
Find the number of partitions in topic by issuing a TopicMetadataRequest to any broker in the kafka cluster.
Create partition to latestOffset map, where key is partition and value is latestOffset available in that partition.
Map<Integer,Integer> offsetMap = new HashMap<>()
For each partition p in Topic:
A. Find the leader of partition p
B. Send an OffsetRequest to the leader
C. Get the latestOffset from the OffsetResponse
D. Add an entry to offsetMap where key is partition p and offset is
latestOffset.
Start reading messages from kafka using High level consumer:
A. For each message you get from KafkaStream:
AA. Get the partition && offset of the message
BB. if( offsetMap.get(partition)<=offset) stop Reading from this steam
Hope this helps.