Springboot kafka consumer not receiving messages after another has already received the message - apache-kafka

I am new to KAFKA and would like help
I have 2 applications (springboot), they are identical/copies only with different ports.
http://localhost:8080/
http://localhost:8081/
They are both consumers
the two listen to the topic XXX
I have another APP that plays the role of producer.
whenever I send something to topic XXX.
only one of them consumes the message and the other does not.
I have tested both individually and they listen normally if they are listening alone, but if they are listening together only one of them will listen.
I'm using
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-binder-kafka</artifactId>
<version>3.0.0.RELEASE</version>
</dependency>
spring.cloud.stream.kafka.binder.autoCreateTopics=true
spring.cloud.stream.kafka.binder.headers=type
spring.cloud.stream.kafka.default.consumer.ackEachRecord=true
spring.cloud.stream.kafka.default.consumer.enableDlq=true
spring.cloud.stream.kafka.default.consumer.standardHeaders=both
spring.cloud.stream.kafka.default.consumer.dlqName=api***.d**.api***
my listener
#StreamListener(target=SomeString.TOPIC, condition = "headers['type']=='***' or headers['type']=='***'")
public void handle(GenericMessage<String> message) throws BusinessException {
***
}

The Apache Kafka Binder implementation maps each destination to an
Apache Kafka topic. The consumer group maps directly to the same
Apache Kafka concept. Partitioning also maps directly to Apache Kafka
partitions as well.
check kafka consumer group behavior -> https://kafka.apache.org/documentation/#consumerconfigs_group.id
if kafka consumers have same groupId just one of listen message, you should give them different groupId
app in 8080 -> spring.cloud.stream.bindings.<channelName>.group=consumer-group-1
app in 8081 -> spring.cloud.stream.bindings.<channelName>.group=consumer-group-2

Related

How does a consumer know it is no longer listed in the Kafka cluster?

We have this issue that when Kafka brokers must be taken offline, no consumer service has any idea about that and keeps running.
We tried listing consumers in the new Kafka instance, and saw no existing consumer listed there. All consumers listed are those newly created.
We had to manually terminate all existing consumer services which is not convenient every time we hit this issue.
Question - How does a consumer know it is no longer listed in the Kafka cluster so it should terminate itself?
P.S. We use Spring Kafka.
1 -- To Check Clusters & Replica status ?
Check Kafka cluster all broker status
$ zookeeper-shell.sh localhost:9001 ls /brokers/ids
Check Kafka cluster Specific broker status
$ zookeeper-shell.sh localhost:9001 get /brokers/ids/<id>
specific to replica_unavailability check
$ kafka-check --cluster-type=sample_type replica_unavailability
For first broker check
$ kafka-check --cluster-type=sample_type --broker-id 3 replica_unavailability --first-broker-only
Any partitions replicas not available
$ kafka-check --cluster-type=sample_type replica_unavailability
Checking offline partitions
$ kafka-check --cluster-type=sample_type offline
2 -- Code sample to send/auto-shutdown
2 custom options to do handle the shutdown using a kill-message,
do it gracefully by sending a kill-message before taking down
brokers or topics.
Option 1: Consider an in-band message/signal - i.e. send a “kill” message pertaining to topics/brokers consumer is listening to as it follows the offset order on the topic-partition
Option 2: make the consumer listen to 2 topics for e.g. “topic” and “topic_kill”
The difference between the 2 options above, is that the first version is comes in the the order it was sent, consider that there maybe blocking messages maybe waiting, depending on your implementation, to be consumed before that “kill message”.
While, the second version allows kill-signal to arrive independently without being blocked out of band, this is a nicer & reusable architectural pattern, with a clear separation between data topic and signaling.
Code Sample a) producer sending the kill-message & b) consumer to recieve and handle the shutdown
// Producer -- modify and adapt as needed
import json
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_serializer=lambda m: m.encode('utf8'),
value_serializer=lambda m: json.dumps(m).encode('utf8'))
def send_kill(topic: str, partitions: [int]):
for p in partitions:
producer.send(topic, key='kill', partition=p)
producer.flush()
// Consumer to accept a kill-message -- please modify and adapt as needed
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
consumer = KafkaConsumer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_deserializer=lambda m: m.decode('utf8'),
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
consumer.subscribe(['topic'])
for msg in consumer:
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
if msg.key == "kill":
consumer.commit(offsets=offsets)
consumer.unsuscribe()
exit(0)
# do your work...
consumer.commit(offsets=offsets)

How to configure multiple kafka consumer in application.yml file

Actually i have a springboot based micro-service , and i have used kafka to produce/consume data from different system.
Now my question is i have two different topics and based on topics i have two different consumer classes to consume data,
how to define multiple consumer properties in application.yml file ?
I configured for one consumer in application.yml like below :-
spring:
kafka:
consumer:
bootstrapservers: http://199.968.98.101:9092
group-id: groupid-QA-02
auto-offset-reset: latest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
I am using #KafkaListener in my consumer classes
example of consumer method which i used in code
#KafkaListener(topics = "${app.topic.b2b_tf_ta_req}", groupId = "${app.topic.groupoId}")
public void consume(String message) throws Exception {
}
As far as I know bootstrap-servers accept comma separated list of servers
i.e. if you set it to server1:9092,server2:9092 kafka should connect to all of them

Kafka - how to use #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'?

I'm trying to implement Kafka consumer with topic names as a pattern. E.g. #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'. Now when I send message to topic 'sss.test' or any other topic name like 'sss.xyz', 'sss.pqr', it's throwing error as below:
WARN o.apache.kafka.clients.NetworkClient - Error while fetching metadata with correlation id 12 : {sss.xyz-topic=LEADER_NOT_AVAILABLE}
I tried to enable listeners & advertised.listeners in the server.properties file but when I re-start Kafka it consumes messages from all old topics which were tried. The moment I use new topic name, it throws above error.
Kafka doesn't support pattern matching? Or there's some configuration which I'm missing? Please suggest.

How to recover lost message in kafka consumer

I'm writing an application in Apache camel. I am consume messages from some Kafka topic via camel Kafka component and dumps into database for recovery in case of any crash/restart happens. Below is the camel URI
kafka:?autoCommitEnable=false&groupId=r&keySerializerClass=org.apache.kafka.common.serialization.StringSerializer&serializerClass=org.apache.kafka.common.serialization.StringSerializer&topic=
My use case is - I have consumed some message(s) from Kafka but could not dumped the same into the database for recovery and crash happens.Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Thanks
Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Actually kafka store the consumer offset for you, If you do commit offset in your application. So when you restart the application, It will consume message from the last offset stored in kafka.
You could set the AutocommitEnable=true OR
I also found this https://github.com/apache/camel/blob/camel-2.18.2/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaConsumer.java.
There are some piece code :
if (endpoint.getConfiguration().isAutoCommitEnable() != null
&& !endpoint.getConfiguration().isAutoCommitEnable()) {
long partitionLastoffset = partitionRecords.get(partitionRecords.size() - 1).offset();
consumer.commitSync(Collections.singletonMap(
partition, new OffsetAndMetadata(partitionLastoffset + 1)));
}
The Camel will take care of this even you do not set the AutocommitEnable.

flink kafka consumer groupId not working

I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.