Spring Integration - Kafka Outbound Adapter Acknowledge Issue - apache-kafka

Before i post my question, i would like to thank Gary and Artem for helping me in resolving my issues and bcoz of that i am able to successfuly post messages from JMS to Kafka with transaction in place.
Now, i am facing another issue and testing what will happen when my Kafka is down.
When kafka is down for first few retries kafka outbound adapter throws exception and messages are returned back to JMS and retried again and again.
However, after couple of retries , even when kafka is down, messages are dequeued from JMS and i get the following exception:
2017-07-10 23:27:51.117 ERROR 16116 --- [enerContainer-1] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='Test JPMC' to topic test:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
My integration xml is :
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jms="http://www.springframework.org/schema/integration/jms"
xmlns:integration="http://www.springframework.org/schema/integration"
xmlns:int-kafka="http://www.springframework.org/schema/integration/kafka"
xmlns:task="http://www.springframework.org/schema/task"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration/jms
http://www.springframework.org/schema/integration/jms/spring-integration-jms.xsd
http://www.springframework.org/schema/integration/kafka
http://www.springframework.org/schema/integration/kafka/spring-integration-kafka.xsd">
<jms:message-driven-channel-adapter
id="helloJMSAdapater" container="requestListenerContainer"
channel="helloChannel" extract-payload="true" error-channel="errorChannel"/>
<integration:recipient-list-router
input-channel="errorChannel">
<integration:recipient channel="errorOutputChannel" />
<integration:recipient channel="rethrowChannel" />
</integration:recipient-list-router>
<jms:outbound-channel-adapter id="errorQueueChannelAdapter"
channel="errorOutputChannel" destination="errorQueue" connection-factory="jmsConnectionfactory"
delivery-persistent="true" explicit-qos-enabled="true" />
<int-kafka:outbound-channel-adapter
id="kafkaOutboundChannelAdapter" kafka-template="kafkaTemplate"
auto-startup="true" sync="true" channel="inputToKafka" topic="test">
</int-kafka:outbound-channel-adapter>
</beans>
I dont want to acknowledge the JMS messages unless they are successfully posted into kafka.
Is it because of some default parameters that kafka is setting?
My kafka Config is below:
#Configuration
#Component
public class KafkaConfig {
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");//this.brokerAddress);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
// set more properties
return new DefaultKafkaProducerFactory<>(props);
}
}

That isn't Kafka problem. If you say your message is "dequeued from JMS", be sure that the redelivery policy on the queue is configured to infinite.
For example ActiveMQ story is here: http://activemq.apache.org/redelivery-policy.html
maximumRedeliveries 6 Sets the maximum number of times a message will be redelivered before it is considered a poisoned pill and returned to the broker so it can go to a Dead Letter Queue.
Set to -1 for unlimited redeliveries.

Related

Spring Integration Kafka - Exception Handling by using SeekToCurrentErrorHandler

Problem Statement :
Need to handle exceptions occur while consuming messages in kafka
Commit failed offset
Seek to the next unprocessed offset, so that next polling start from this offset.
Seems all these are handled as part of SeekToCurrentErrorHandler.java in Spring-Kafka.
How to leverage this functionality in Spring-Integration-Kafka ?
Please help with this.
Versions used :
Spring-Integration-Kafka - 3.3.1
Spring for apache kafka - 2.5.x
#Bean(name ="kafkaConsumerFactory")
public ConsumerFactory consumerFactory0(
HashMap<String, String> properties = new HashMap<>();
properties.put("bootstrap.servers", "kafkaServerl");
properties.put("key.deserializer", StringDeserializer.class);
properties.put("value.deserializer", StringDeserializer.class);
properties.put("auto.offset.reset", "earliest");
} return new DefaultKafkaConsumerFactoryo(properties); I
#Bean("customKafkalistenerContainer")
public ConcurrentMessagelistenerContainerCtring, AddAccountReqRes> customKafkaListenerContainer() (
ContainerProperties containerProps = new ContainerProperties("Topici");
containerProps.setGroupld("Groupldl");
return (ConcurrentMessagelistenerContainerCtring, CustomReqRes>) new ConcurrentMessageListenerContainer<>(
} kafkaConsumerFactory, containerProps);
IntegrationFlows.from(Kafka.messageDrivenChannelAdapter(customKafkalistenerContainer, KafkaMessageDrivenChannelAdapter.ListenerMode.record)
.errorChannel(errorChannel()))
.handle(transformationProcessor, "process")
.channel("someChannel")
.get();
spring-integration-kafka uses spring-kafka underneath, so you just need to configure the adapter's container with the error handler.
spring-integration-kafka was moved to spring-integration starting with 5.4 (it was an extension previously). So, the current versions of both jars is 5.4.2.

Spring Kafka consumer not able to consume records

We are using Spring Kafka to consume records in batches. We are sometimes facing an issue where the application starts and it doesn't consume any records even though there are enough unread messages. Instead we continuously see info logs saying.
[INFO]-[FetchSessionHandler:handleError:440] - [Consumer clientId=consumer-2, groupId=groupId] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1027: org.apache.kafka.common.errors.DisconnectException.
People are facing this issue and everyone says to ignore it, since it is just a info log. Even, we see after sometime the application starts picking up the records without doing anything. But, it is very unpredictable on how long it might take to start consuming records :(
We didn't see this error when we were using Spring cloud stream. Not sure if we have missed any configuration in spring-kafka.
Anyone faced this issue in past, please let us know if we are missing something. We have huge load in our topics and if there is a lot of lag, could this happen?
We are using Spring Kafka of 2.2.2.RELEASE
Spring boot 2.1.2.RELEASE
Kafka 0.10.0.1 (We understand it's very old, because of unavoidable reasons we are having to use this :()
Here is our code:
application.yml
li.topics: CUSTOM.TOPIC.JSON
spring:
application:
name: DataPublisher
kafka:
listener:
type: batch
ack-mode: manual_immediate
consumer:
enable-auto-commit: false
max-poll-records: 500
fetch-min-size: 1
fetch-max-wait: 1000
group-id: group-dev-02
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer:CustomResourceDeserialiser
auto-offset-reset: earliest
Consumer:
public class CustomKafkaBatchConsumer {
#KafkaListener(topics = "#{'${li.topics}'.split(',')}", id = "${spring.kafka.consumer.group-id}")
public void receiveData(#Payload List<CustomResource> customResources,
Acknowledgment acknowledgment,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
}
}
Deserialiser:
public class CustomResourceDeserialiser implements Deserializer<CustomResource> {
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
}
#Override
public CustomResource deserialize(String topic, byte[] data) {
if (data != null) {
try {
ObjectMapper objectMapper = ObjectMapperFactory.getInstance();
return objectMapper.readValue(data, CustomResource.class);
} catch (IOException e) {
log.error("Failed to deserialise with {}",e.getMessage());
}
}
return null;
}
#Override
public void close() {
}
}
This could be because of this Kafka-8052 - Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request issue. This is fixed in Kafka 2.3.0
Unfortunately, as of Aug 21, 2019 Spring cloud streams haven't upgraded it's dependencies yet with 2.3.0 release of kafka-clients yet.
You can try adding these as explicit dependencies in your gradle
compile ('org.apache.kafka:kafka-streams:2.3.0')
compile ('org.apache.kafka:kafka-clients:2.3.0')
compile ('org.apache.kafka:connect-json:2.3.0')
compile ('org.apache.kafka:connect-api:2.3.0')
Update
This could also be caused by kafka Broker - client incompatibility. If your cluster is behind the client version you might see all kinds of odd problems such as this. Example, let's say, your kafka broker is on 1.x.x and your kafka-consumer is on 2.x.x, this could happen
I have faced the same problem before, solution was either the decrease current partition count or increase the number of consumers. In my case, we have ~100M data on 60 partition and I came across the same error when single pod is running. I scaled 30 pods (30 consumers) and the problem was solved.

Kafka - org.apache.kafka.common.errors.NetworkException

I have a kafka client code which connects to Kafka( Server 0.10.1 and client is 0.10.2) brokers. There are 2 topics with 2 different consumer group in the code and also there is a producer. Getting the NetworkException from the producer code once in a while( once in 2 days, once in 5 days, ...). We see consumer group (Re)joining info in the logs for both the consumer group followed by the NetworkException from the producer future.get() call. Not sure why are we getting this error.
Code :-
final Future<RecordMetadata> futureResponse =
producer.send(new ProducerRecord<>("ping_topic", "ping"));
futureResponse.get();
Exception :-
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
Kafka API definition for NetworkException,
"A misc. network-related IOException occurred when making a request.
This could be because the client's metadata is out of date and it is
making a request to a node that is now dead."
Thanks
I had the same error while testing the Kafka Consumer. I was using a sender template for it.
In the consumer configuration I set additionally the following properties:
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15000);
After sending the Message I added a thread sleep:
ListenableFuture<SendResult<String, String>> future =
senderTemplate.send(MyConsumer.TOPIC_NAME, jsonPayload);
Thread.Sleep(10000).
It was necessary to make the test work, but maybe not suitable for your case.

Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time

Kafka Version : 0.10.2.1,
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
This exception is occuring because you are queueing records at a much faster rate than they can be sent.
When you call the send method, the ProducerRecord will be stored in an internal buffer for sending to the broker. The method returns immediately once the ProducerRecord has been buffered, regardless of whether it has been sent.
Records are grouped into batches for sending to the broker, to reduce the transport overheard per message and increase throughput.
Once a record is added into a batch, there is a time limit for sending that batch to ensure that it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms, which defaults to 30 seconds. See related answer
If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.
Producer configs block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms have been removed. They were initially deprecated in Kafka 0.9.0.0.
Therefore give a try for increasing request.timeout.ms
Still, if you have any problem related to throughput, you can also refer following blog
This issue originates when wither brokers/topics/partitions are not able to contact with producer or producer times out before the queue.
I found that even for a live brokers you can encounter this issue. In my case, the topic partitions leaders were pointing to inactive broker ids. To fix this issue, you have to migrate those leaders to active brokers.
Use topic-reassignment tool for impacted topics.
Topic Migration: https://kafka.apache.org/21/documentation.html#basic_ops_automigrate
I had same message and I fixed it cleaning the kafka data from zookeeper. After that it's working.
i had faced same issue in aks cluster, just restarting of kafka and zookeeper servers resolved the issue.
FOR KAFKA DOCKER CASE
For a lot of time find out what happened, including changes server.properties , producer.properties and my code (Eclipse). That does not work for me (I send message from my laptop to Kafka Docker on a Linux server)
I cleaned Kafka and Zookeeper and reinstall them by docker-compose.yml(I'm newbie). Please look at my docker-compose.yml file and follow how I changes these IP to my Linux server's IP
bitnami/kafka
bitnami/kafka
to...
bitnami-changed
while 10.5.1.30 is my Linux server's IP address
wurstmeister kafka
wurstmeister
after that, I ran my code and here's result:
result
full code:
import java.util.Properties;
import java.util.concurrent.Future;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
public class SimpleProducer {
public static void main(String[] args) throws Exception {
try {
String topicName = "demo";
Properties props = new Properties();
props.put("bootstrap.servers", "10.5.1.30:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
Future<RecordMetadata> f = producer.send(new ProducerRecord<String, String>(topicName, "Eclipse3"));
System.out.println("Message sent successfully, total of message is: " + f.get().toString());
producer.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println("Successful");
}
}
Hope that helps. Peace !!!
Say a topic has 100 partitions (0-99). Kafka lets you produce records to a topic by specifying a particular partition. Faced the issue where I'm trying to produce to partition > 99, because brokers reject these records.
We tried everything, but no luck.
Decreased producer batch size and increased request.timeout.ms.
Restarted target kafka cluster, still no luck.
Checked replication on target kafka cluster, that as well was working fine.
Added retries, retries.backout.ms in prodcuer properties.
Added linger.time as well in kafka prodcuer properties.
Finally our case there was issue with kafka cluster itself, from 2 servers we were unable to fetch metadata in between.
When we changed target kafka cluster to our dev box, it worked fine.

Kafka Producer: Got error produce response with correlation NETWORK_EXCEPTION

We are running kafka in distributed mode across 2 servers.
I'm sending messages to Kafka through Java sdk to a Queue which has Replication factor 2 and 1 partition.
We are running in async mode.
I don't find anything abnormal in Kafka logs.
Can anyone help in finding out what could be cause?
Properties props = new Properties();
props.put("bootstrap.servers", serverAdress);
props.put("acks", "all");
props.put("retries", "1");
props.put("linger.ms",0);
props.put("buffer.memory",10240000);
props.put("max.request.size", 1024000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, Object> producer = new org.apache.kafka.clients.producer.KafkaProducer<>(props);
Exception trace:
-2017-08-15T02:36:29,148 [kafka-producer-network-thread | producer-1] WARN producer.internals.Sender - Got error produce response with
correlation id 353736 on topic-partition BPA_BinLogQ-0, retrying (0
attempts left). Error: NETWORK_EXCEPTION
You are getting a NETWORK_EXCEPTION so this should tell you that something is wrong with the network connection to the Kafka Broker you were producing toward. Either the broker shutdown or the TCP connection was shutdown for some reason.
A quick code dive shows the most probable cause: lost connection to the upstream broker, what causes the delivery method to fail internally inside a sender (link) - you might want to start logging trace in Sender to confirm that:
if (response.wasDisconnected()) {
log.trace("Cancelled request with header {} due to node {} being disconnected",
requestHeader, response.destination());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION, String.format("Disconnected from node %s", response.destination())),
correlationId, now);
}
Now with the batch completed in a non-success fashion, it gets retried, but from the logs you have attached it looks like, you ran out of retries (0 attempts left), so it propagates to your level (link):
if (canRetry(batch, response, now)) {
log.warn(
"Got error produce response with correlation id {} on topic-partition {}, retrying ({} attempts left). Error: {}",
....
reenqueueBatch(batch, now);
}
So the ideas are:
investigate your network connectivity - unfortunately this might mean tracing at least on client-side (esp. NetworkClient that does all the upstream broker management) to see if there's any connection loss;
increase producer's retries value (though newer versions of Kafka set it to MAX_INT or so).