Spring Kafka consumer not able to consume records - apache-kafka

We are using Spring Kafka to consume records in batches. We are sometimes facing an issue where the application starts and it doesn't consume any records even though there are enough unread messages. Instead we continuously see info logs saying.
[INFO]-[FetchSessionHandler:handleError:440] - [Consumer clientId=consumer-2, groupId=groupId] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1027: org.apache.kafka.common.errors.DisconnectException.
People are facing this issue and everyone says to ignore it, since it is just a info log. Even, we see after sometime the application starts picking up the records without doing anything. But, it is very unpredictable on how long it might take to start consuming records :(
We didn't see this error when we were using Spring cloud stream. Not sure if we have missed any configuration in spring-kafka.
Anyone faced this issue in past, please let us know if we are missing something. We have huge load in our topics and if there is a lot of lag, could this happen?
We are using Spring Kafka of 2.2.2.RELEASE
Spring boot 2.1.2.RELEASE
Kafka 0.10.0.1 (We understand it's very old, because of unavoidable reasons we are having to use this :()
Here is our code:
application.yml
li.topics: CUSTOM.TOPIC.JSON
spring:
application:
name: DataPublisher
kafka:
listener:
type: batch
ack-mode: manual_immediate
consumer:
enable-auto-commit: false
max-poll-records: 500
fetch-min-size: 1
fetch-max-wait: 1000
group-id: group-dev-02
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer:CustomResourceDeserialiser
auto-offset-reset: earliest
Consumer:
public class CustomKafkaBatchConsumer {
#KafkaListener(topics = "#{'${li.topics}'.split(',')}", id = "${spring.kafka.consumer.group-id}")
public void receiveData(#Payload List<CustomResource> customResources,
Acknowledgment acknowledgment,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
}
}
Deserialiser:
public class CustomResourceDeserialiser implements Deserializer<CustomResource> {
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
}
#Override
public CustomResource deserialize(String topic, byte[] data) {
if (data != null) {
try {
ObjectMapper objectMapper = ObjectMapperFactory.getInstance();
return objectMapper.readValue(data, CustomResource.class);
} catch (IOException e) {
log.error("Failed to deserialise with {}",e.getMessage());
}
}
return null;
}
#Override
public void close() {
}
}

This could be because of this Kafka-8052 - Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request issue. This is fixed in Kafka 2.3.0
Unfortunately, as of Aug 21, 2019 Spring cloud streams haven't upgraded it's dependencies yet with 2.3.0 release of kafka-clients yet.
You can try adding these as explicit dependencies in your gradle
compile ('org.apache.kafka:kafka-streams:2.3.0')
compile ('org.apache.kafka:kafka-clients:2.3.0')
compile ('org.apache.kafka:connect-json:2.3.0')
compile ('org.apache.kafka:connect-api:2.3.0')
Update
This could also be caused by kafka Broker - client incompatibility. If your cluster is behind the client version you might see all kinds of odd problems such as this. Example, let's say, your kafka broker is on 1.x.x and your kafka-consumer is on 2.x.x, this could happen

I have faced the same problem before, solution was either the decrease current partition count or increase the number of consumers. In my case, we have ~100M data on 60 partition and I came across the same error when single pod is running. I scaled 30 pods (30 consumers) and the problem was solved.

Related

Deserilaisation exception Spring Kafka | Corrupt message gets logged everytime server is restarted with ErrorDeserialiser

I have followed the below documentation https://www.confluent.io/blog/spring-kafka-can-your-kafka-consumers-handle-a-poison-pill/ to handle deserialization exceptions.
It works fine, the message gets logged and move forward, but everytime I restart the server the bad messages are logged again.
Is there a way I can skip/acknowledge the bad message once it is logged so that it doesnt get picked up again on restart the server
Consumer YAML
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
# Configures the Spring Kafka ErrorHandlingDeserializer that delegates to the 'real' deserializers
key-deserializer: org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
value-deserializer: org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
properties:
# Delegate deserializers
spring.deserializer.key.delegate.class: org.apache.kafka.common.serialization.StringDeserializer
spring.deserializer.value.delegate.class: org.sprngframework.kafka.support.serializer.JsonDeserializer
Also, NOTE:=
From the above, in logs I see it gives a warn that spring.deserializer.value.delegate.class is given -> but not a known config.
Have also confgured this
#Bean
#Configuration
#EnableKafka
public class KafkaConfiguration {
/**
* Boot will autowire this into the container factory.
*/
#Bean
public LoggingErrorHandler errorHandler() {
return new LoggingErrorHandler();
}
}
Could someone please advise on the same

Kafka streams fail on decoding timestamp metadata inside StreamTask

We got strange errors on Kafka Streams during starting app
java.lang.IllegalArgumentException: Illegal base64 character 7b
at java.base/java.util.Base64$Decoder.decode0(Base64.java:743)
at java.base/java.util.Base64$Decoder.decode(Base64.java:535)
at java.base/java.util.Base64$Decoder.decode(Base64.java:558)
at org.apache.kafka.streams.processor.internals.StreamTask.decodeTimestamp(StreamTask.java:985)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:303)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:265)
at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:71)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:385)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
and, as a result, error about failed stream: ERROR KafkaStreams - stream-client [xxx] All stream threads have died. The instance will be in error state and should be closed.
According to code inside org.apache.kafka.streams.processor.internals.StreamTask, failure happened due to error in decoding timestamp metadata (StreamTask.decodeTimestamp()). It happened on prod, and can't reproduce on stage.
What could be the root cause of such errors?
Extra info: our app uses Kafka-Streams and consumes messages from several kafka brokers using the same application.id and state.dir (actually we switch from one broker to another, but during some period we connected to both brokers, so we have two kafka streams, one per each broker). As I understand, consumer group lives on broker side (so shouldn't be a problem), but state dir is on client side. Maybe some race condition occurred due to using the same state.dir for two kafka streams? could it be the root cause?
We use kafka-streams v.2.4.0, kafka-clients v.2.4.0, Kafka Broker v.1.1.1, with the following configs:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.timestamp.extractor: org.apache.kafka.streams.processor.WallclockTimestampExtractor
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
commit.interval.ms: 5000
num.stream.threads: 1
auto.offset.reset: latest
Finally, we figured out what is the root cause of corrupted metadata by some consumer groups.
It was one of our internal monitoring tool (written with pykafka) that corrupted metadata by temporarily inactive consumer groups.
Metadata were unencrupted and contained invalid data like the following: {"consumer_id": "", "hostname": "monitoring-xxx"}.
In order to understand what exactly we have in consumer metadata, we could use the following code:
Map<String, Object> config = Map.of( "group.id", "...", "bootstrap.servers", "...");
String topicName = "...";
Consumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<byte[], byte[]>(config, new ByteArrayDeserializer(), new ByteArrayDeserializer());
Set<TopicPartition> topicPartitions = kafkaConsumer.partitionsFor(topicName).stream()
.map(partitionInfo -> new TopicPartition(topicName, partitionInfo.partition()))
.collect(Collectors.toSet());
kafkaConsumer.committed(topicPartitions).forEach((key, value) ->
System.out.println("Partition: " + key + " metadata: " + (value != null ? value.metadata() : null)));
Several options to fix already corrupted metadata:
change consumer group to a new one. caution that you might lose or duplicate messages depending on the latest or earliest offset reset policy. so for some cases, this option might be not acceptable
overwrite metadata manually (timestamp is encoded according to logic inside StreamTask.decodeTimestamp()):
Map<TopicPartition, OffsetAndMetadata> updatedTopicPartitionToOffsetMetadataMap = kafkaConsumer.committed(topicPartitions).entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, (entry) -> new OffsetAndMetadata((entry.getValue()).offset(), "AQAAAXGhcf01")));
kafkaConsumer.commitSync(updatedTopicPartitionToOffsetMetadataMap);
or specify metadata as Af////////// that means NO_TIMESTAMP in Kafka Streams.

Spring Kafka test with embedded Kafka failed on deleting logs

I am using spring kafka with the embedded kafka for JUnit test, it gives an error for every test on windows:
Error deleting C:\Users:LXX691\AppData\Local\Temp\kafka-1103610162480947200/.lock: The process cannot access the file because it is being used by another process.
I just did the basic configuration like below
#SpringBootTest(webEnvironment = RANDOM_PORT)
#RunWith(SpringRunner.class)
public class KafkaTest {
#Autowired
EmbeddedKafkaBroker broker;
#Before
void setUp() throws Exception() {
// setup producer and consumers
}
#Test
void test() {
producer.send(new ProducerRecord<>("topic", "content"));
}
}
Any suggestion to resolve or any workaround is appreciated.
This is a known issue in Apache Kafka: https://issues.apache.org/jira/browse/KAFKA-8145.
Unfortunately there is nothing in Spring Kafka we can do on the matter.
See more info here: Kafka: unable to start Kafka - process can not access file 00000000000000000000.timeindex and here https://github.com/spring-projects/spring-kafka/issues/194

Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time

Kafka Version : 0.10.2.1,
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
This exception is occuring because you are queueing records at a much faster rate than they can be sent.
When you call the send method, the ProducerRecord will be stored in an internal buffer for sending to the broker. The method returns immediately once the ProducerRecord has been buffered, regardless of whether it has been sent.
Records are grouped into batches for sending to the broker, to reduce the transport overheard per message and increase throughput.
Once a record is added into a batch, there is a time limit for sending that batch to ensure that it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms, which defaults to 30 seconds. See related answer
If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.
Producer configs block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms have been removed. They were initially deprecated in Kafka 0.9.0.0.
Therefore give a try for increasing request.timeout.ms
Still, if you have any problem related to throughput, you can also refer following blog
This issue originates when wither brokers/topics/partitions are not able to contact with producer or producer times out before the queue.
I found that even for a live brokers you can encounter this issue. In my case, the topic partitions leaders were pointing to inactive broker ids. To fix this issue, you have to migrate those leaders to active brokers.
Use topic-reassignment tool for impacted topics.
Topic Migration: https://kafka.apache.org/21/documentation.html#basic_ops_automigrate
I had same message and I fixed it cleaning the kafka data from zookeeper. After that it's working.
i had faced same issue in aks cluster, just restarting of kafka and zookeeper servers resolved the issue.
FOR KAFKA DOCKER CASE
For a lot of time find out what happened, including changes server.properties , producer.properties and my code (Eclipse). That does not work for me (I send message from my laptop to Kafka Docker on a Linux server)
I cleaned Kafka and Zookeeper and reinstall them by docker-compose.yml(I'm newbie). Please look at my docker-compose.yml file and follow how I changes these IP to my Linux server's IP
bitnami/kafka
bitnami/kafka
to...
bitnami-changed
while 10.5.1.30 is my Linux server's IP address
wurstmeister kafka
wurstmeister
after that, I ran my code and here's result:
result
full code:
import java.util.Properties;
import java.util.concurrent.Future;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
public class SimpleProducer {
public static void main(String[] args) throws Exception {
try {
String topicName = "demo";
Properties props = new Properties();
props.put("bootstrap.servers", "10.5.1.30:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
Future<RecordMetadata> f = producer.send(new ProducerRecord<String, String>(topicName, "Eclipse3"));
System.out.println("Message sent successfully, total of message is: " + f.get().toString());
producer.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println("Successful");
}
}
Hope that helps. Peace !!!
Say a topic has 100 partitions (0-99). Kafka lets you produce records to a topic by specifying a particular partition. Faced the issue where I'm trying to produce to partition > 99, because brokers reject these records.
We tried everything, but no luck.
Decreased producer batch size and increased request.timeout.ms.
Restarted target kafka cluster, still no luck.
Checked replication on target kafka cluster, that as well was working fine.
Added retries, retries.backout.ms in prodcuer properties.
Added linger.time as well in kafka prodcuer properties.
Finally our case there was issue with kafka cluster itself, from 2 servers we were unable to fetch metadata in between.
When we changed target kafka cluster to our dev box, it worked fine.

flink kafka consumer groupId not working

I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.