Kafka producer acknowledgement time delay and debugging - apache-kafka

When Kafka producer invokes send() method it returns a future of RecordMetadata which contains
public RecordMetadata(TopicPartition topicPartition,
long baseOffset,
long relativeOffset,
long timestamp,
java.lang.Long checksum,
int serializedKeySize,
int serializedValueSize)
This contains the timestamp of the record in the topic/partition but is there a way to find out timestamp of acknowledgment sent by the broker.
I am noticing a delay in acknowledgment receipt and would like to debug further to understand the cause of this delay.
Is there a log level in Kafka broker that allows printing acknowledgment information in server logs?

I found TRACE log level in both Apache Kafka and Spring Kafka. Could it be what you are looking for.
org.springframework.kafka.core.KafkaTemplate
protected ListenableFuture<SendResult<K, V>> doSend(final ProducerRecord<K, V> producerRecord) {
final Producer<K, V> producer = getTheProducer();
if (this.logger.isTraceEnabled()) {
this.logger.trace("Sending: " + producerRecord);
}
...
producer.send(producerRecord, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
...
if (KafkaTemplate.this.logger.isTraceEnabled()) {
KafkaTemplate.this.logger.trace("Sent ok: " + producerRecord + ", metadata: " + metadata);
}
...
}
}
...
}
org.apache.kafka.clients.producer.KafkaProducer
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback
callback) {
...
log.trace("Sending record {} with callback {} to topic {} partition {}",
record, callback, record.topic(), partition);
...
}

Related

Multi threading on Kafka Send in Spring reactor Kafka

I have a reactive kafka application that reads data from a topic, transforms the message and writes to another topic. I have multiple partitions in the topic so I am creating multiple consumers to read from the topics in parallel. Each consumer runs on a different thread. But looks like kafka send runs on the same thread even though it is called from different consumers.
I tested by logging the thread name to understand the thread workflow, the receive thread name is different for each consumer, but on kafka send [kafkaProducerTemplate.send] the thread name [Thread name: producer-1] is the same for all the consumers. I don't understand how that works, i would expect it to be different for all consumers on send as well. Can someone help me understand how this works.
#Bean
public ReceiverOptions<String, String> kafkaReceiverOptions(String topic, KafkaProperties kafkaProperties) {
ReceiverOptions<String, String> basicReceiverOptions = ReceiverOptions.create(kafkaProperties.buildConsumerProperties());
return basicReceiverOptions.subscription(Collections.singletonList(topic))
.addAssignListener(receiverPartitions -> log.debug("onPartitionAssigned {}", receiverPartitions))
.addRevokeListener(receiverPartitions -> log.debug("onPartitionsRevoked {}", receiverPartitions));
}
#Bean
public ReactiveKafkaConsumerTemplate<String, String> kafkaConsumerTemplate(ReceiverOptions<String, String> kafkaReceiverOptions) {
return new ReactiveKafkaConsumerTemplate<String, String>(kafkaReceiverOptions);
}
#Bean
public ReactiveKafkaProducerTemplate<String, List<Object>> kafkaProducerTemplate(
KafkaProperties properties) {
Map<String, Object> props = properties.buildProducerProperties();
return new ReactiveKafkaProducerTemplate<String, List<Object>>(SenderOptions.create(props));
}
public void run(String... args) {
for(int i = 0; i < topicPartitionsCount ; i++) {
readWrite(destinationTopic).subscribe();
}
}}
public Flux<String> readWrite(String destTopic) {
return kafkaConsumerTemplate
.receiveAutoAck()
.doOnNext(consumerRecord -> log.info("received key={}, value={} from topic={}, offset={}",
consumerRecord.key(),
consumerRecord.value(),
consumerRecord.topic(),
consumerRecord.offset())
)
.doOnNext(consumerRecord -> log.info("Record received from partition {} in thread {}", consumerRecord.partition(),Thread.currentThread().getName()))
.doOnNext(s-> sendToKafka(s,destTopic))
.map(ConsumerRecord::value)
.onErrorContinue((exception,errorConsumer)->{
log.error("Error while consuming : {}", exception.getMessage());
});
}
public void sendToKafka(ConsumerRecord<String, String> consumerRecord, String destTopic){
kafkaProducerTemplate.send(destTopic, consumerRecord.key(), transformRecord(consumerRecord))
.doOnNext(senderResult -> log.info("Record received from partition {} in thread {}", consumerRecord.partition(),Thread.currentThread().getName()))
.doOnSuccess(senderResult -> {
log.debug("Sent {} offset : {}", metrics, senderResult.recordMetadata().offset());
}
.doOnError(exception -> {
log.error("Error while sending message to destination topic : {}", exception.getMessage());
})
.subscribe();
}
All sends for a producer are run on a single-threaded Scheduler (via .publishOn()).
See DefaultKafkaSender.doSend().
You should create a sender for each consumer.

Kafka retry mechanism doesn't stop even the previous retry attempt was successful

I have a kafka retry mechanism in place which retry 2 times by waiting 30 seconds for each attempt. I noticed that even though the first retry attempt was successful, It's still retrying the second attempt. This results in generating duplicate messages in the kafka topic. Is there any way to stop Kafka to do unnecessary retries when the previous retry attempt is successful?
Here is my listener configuration
#Bean
#ConditionalOnMissingBean(name = "kafkaListenerContainerFactory")
public ConcurrentKafkaListenerContainerFactory<String, SpecificRecord>
kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, SpecificRecord> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(testConsumerFactory());
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setAckMode(AckMode.RECORD);
SeekToCurrentErrorHandler errorHandler =
new SeekToCurrentErrorHandler((record, exception) -> {
LOGGER.error("Error while processing the record {}", exception.getCause().getMessage());
}, new FixedBackOff(30000L, 2L));
factory.setErrorHandler(errorHandler);
return factory;
}
Here is my listener logic and the flow
ConsumerA consumes the data from the topicA and makes a call to microservice for the data
After getting the data, Producer publishes the data to topicB
ConsumerB consumes the data from the topicB and makes a call to a microservice for persisting the data
Once data gets persisted, a new message gets published to topicB.
Consumer logic for topicA
#KafkaListener(topics = "${test.topicA.name}",
containerFactory = "kafkaListenerContainerFactory")
public void topicListener(ConsumerRecord<String, SpecificRecord> record) {
LOGGER.info("Consumed {} topic from partition {} ", record.topic(), record.partition());
testService.getData(record);
}
Consumer logic for topicB
#KafkaListener(topics = "${test.topicB.name}",
containerFactory = "kafkaListenerContainerFactory")
public void topicListener(ConsumerRecord<String, SpecificRecord> record) {
LOGGER.info("Consumed {} topic from partition {} ", record.topic(), record.partition());
testService2.persistDetails(record);
}

Transactional Kafka Producer

I am trying to make make my kafka producer transactional.
I am sending 10 messages .If any error occurs no message should be sent to kafka i.e none or all.
I am using Spring Boot KafkaTemplate.
#Configuration
#EnableKafka
public class KakfaConfiguration {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
// props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
// props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,
// appProps.getJksLocation());
// props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,
// appProps.getJksPassword());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, acks);
config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, retryBackOffMsConfig);
config.put(ProducerConfig.RETRIES_CONFIG, retries);
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-99");
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean(name = "ktm")
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
}
I am sending 10 messages like below as mentioned in the document. 9 messages should be sent and I message has size over 1MB which gets rejected by Kafka broker due to RecordTooLargeException
https://docs.spring.io/spring-kafka/reference/html/#using-kafkatransactionmanager
#Component
#EnableTransactionManagement
class Sender {
#Autowired
private KafkaTemplate<String, String> template;
private static final Logger LOG = LoggerFactory.getLogger(Sender.class);
#Transactional("ktm")
public void sendThem(List<String> toSend) throws InterruptedException {
List<ListenableFuture<SendResult<String, String>>> futures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(toSend.size());
ListenableFutureCallback<SendResult<String, String>> callback = new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
LOG.info(" message sucess : " + result.getProducerRecord().value());
latch.countDown();
}
#Override
public void onFailure(Throwable ex) {
LOG.error("Message Failed ");
latch.countDown();
}
};
toSend.forEach(str -> {
ListenableFuture<SendResult<String, String>> future = template.send("t_101", str);
future.addCallback(callback);
});
if (latch.await(12, TimeUnit.MINUTES)) {
LOG.info("All sent ok");
} else {
for (int i = 0; i < toSend.size(); i++) {
if (!futures.get(i).isDone()) {
LOG.error("No send result for " + toSend.get(i));
}
}
}
But when I see the topic t_hello_world 9 messages are there. My expectation was to see 0 messages as my producer is transactional.
How can I achieve it?
I am getting the following logs
2020-04-30 18:04:36.036 ERROR 18688 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : commitTransaction failed: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#1eb5a312, txId=prod-990]
org.apache.kafka.common.KafkaException: Cannot execute transactional method because we are in an error state
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeFailWithError(TransactionManager.java:923) ~[kafka-clients-2.4.1.jar:na]
at org.apache.kafka.clients.producer.internals.TransactionManager.lambda$beginCommit$2(TransactionManager.java:297) ~[kafka-clients-2.4.1.jar:na]
at org.apache.kafka.clients.producer.internals.TransactionManager.handleCachedTransactionRequestResult(TransactionManager.java:1013) ~[kafka-clients-2.4.1.jar:na]
at org.apache.kafka.clients.producer.internals.TransactionManager.beginCommit(TransactionManager.java:296) ~[kafka-clients-2.4.1.jar:na]
at org.apache.kafka.clients.producer.KafkaProducer.commitTransaction(KafkaProducer.java:713) ~[kafka-clients-2.4.1.jar:na]
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.commitTransaction(DefaultKafkaProducerFactory.java
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
2020-04-30 18:04:36.037 WARN 18688 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : Error during transactional operation; producer removed from cache; possible cause: broker restarted during transaction: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#1eb5a312, txId=prod-990]
2020-04-30 18:04:36.038 INFO 18688 --- [ scheduling-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-prod-990, transactionalId=prod-990] Closing the Kafka producer with timeoutMillis = 5000 **ms.
2020-04-30 18:04:36.038 INFO 18688 --- [oducer-prod-990] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-990, transactionalId=prod-990] Aborting incomplete transaction due to shutdown**
Uncommitted records are written to the log; when a transaction commits or rolls back, an extra record is written to the log with the state of the transaction.
Consumers, by default, see all records, including the uncommitted records (but not the special commit/abort record).
For the console consumer, you need to set the isolation level to read_committed. See the help:
--isolation-level <String> Set to read_committed in order to
filter out transactional messages
which are not committed. Set to
read_uncommitted to read all
messages. (default: read_uncommitted)
If I provide below configurations in yml file will I need to create factory, template and tx bean as given in the example code ?
for the given tx example if I use simple Consumer ( java code) or Kafka Tools will I able to view any record for the above Tx example - hope fully not - Am I correct as per Tx example.
spring:
profiles: local
kafka:
producer:
client-id: book-event-producer-client
bootstrap-servers: localhost:9092,localhost:9093,localhost:9094
key-serializer: org.apache.kafka.common.serialization.IntegerSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
**transaction-id-prefix: tx-${random.uuid}**
properties:
**enable.idempotence: true**
**acks: all**
retries: 2
metadata.max.idle.ms: 10000

Reactor Kafka - At-Least-Once - handling failures and offsets in multi partition

Below is the consumer code to receive messages from kafka topic (8 partition) and processing it.
#Component
public class MessageConsumer {
private static final String TOPIC = "mytopic.t";
private static final String GROUP_ID = "mygroup";
private final ReceiverOptions consumerSettings;
private static final Logger LOG = LoggerFactory.getLogger(MessageConsumer.class);
#Autowired
public MessageConsumer(#Qualifier("consumerSettings") ReceiverOptions consumerSettings)
{
this.consumerSettings=consumerSettings;
consumerMessage();
}
private void consumerMessage()
{
KafkaReceiver<String, String> receiver = KafkaReceiver.create(receiverOptions(Collections.singleton(TOPIC)));
Scheduler scheduler = Schedulers.newElastic("FLUX_DEFER", 10, true);
Flux.defer(receiver::receive)
.groupBy(m -> m.receiverOffset().topicPartition())
.flatMap(partitionFlux ->
partitionFlux.publishOn(scheduler)
.concatMap(m -> {
LOG.info("message received from kafka : " + "key : " + m.key()+ " partition: " + m.partition());
return process(m.key(), m.value())
.thenEmpty(m.receiverOffset().commit());
}))
.retryBackoff(5, Duration.ofSeconds(2), Duration.ofHours(2))
.doOnError(err -> {
handleError(err);
}).retry()
.doOnCancel(() -> close()).subscribe();
}
private void close() {
}
private void handleError(Throwable err) {
LOG.error("kafka stream error : ",err);
}
private Mono<Void> process(String key, String value)
{
if(key.equals("error"))
return Mono.error(new Exception("process error : "));
LOG.error("message consumed : "+key);
return Mono.empty();
}
public ReceiverOptions<String, String> receiverOptions(Collection<String> topics) {
return consumerSettings
.commitInterval(Duration.ZERO)
.commitBatchSize(0)
.addAssignListener(p -> LOG.info("Group {} partitions assigned {}", GROUP_ID, p))
.addRevokeListener(p -> LOG.info("Group {} partitions assigned {}", GROUP_ID, p))
.subscription(topics);
}
}
#Bean(name="consumerSettings")
public ReceiverOptions<String, String> getConsumerSettings() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, GROUP_ID);
props.put(ConsumerConfig.CLIENT_ID_CONFIG, GROUP_ID);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put("max.block.ms", "3000");
props.put("request.timeout.ms", "3000");
return ReceiverOptions.create(props);
}
On receiving each message, my processing logic returns on empty mono if the consumed message processed successfully.
Everything works as expected if there is no error returned in the processing logic.
But if i throw an error to simulate the exception behaviour in my processing logic for a particular message then i am missing to process that message which caused the exception. The stream moves to the next message.
What i want to achieve is, process the current message and commit the offset if its successful then move to the next record.
If any exception in processing the message don't commit the current offset and retry the same message until its successful. Don't move to the next message until the current message is successful.
Please let me know how to handle process failures without skipping the message and make the stream start from the offset where the exception is thrown.
Regards,
Vinoth
The below code works for me. The idea is to retry the failed messages configured number of time and if its still fails then move it to failed queue and commit the message. At the same time process the messages from other partitions concurrently.
If a message from a particular partition fails configured number of time then restart the stream after a delay so that we can handle dependency failures by not hitting them continuously.
#Autowired
public ReactiveMessageConsumer(#Qualifier("consumerSettings") ReceiverOptions consumerSettings,MessageProducer producer)
{
this.consumerSettings=consumerSettings;
this.fraudCheckService=fraudCheckService;
this.producer=producer;
consumerMessage();
}
private void consumerMessage() {
int numRetries=3;
Scheduler scheduler = Schedulers.newElastic("FLUX_DEFER", 10, true);
KafkaReceiver<String, String> receiver = KafkaReceiver.create(receiverOptions(Collections.singleton(TOPIC)));
Flux<GroupedFlux<TopicPartition, ReceiverRecord<String, String>>> f = Flux.defer(receiver::receive)
.groupBy(m -> m.receiverOffset().topicPartition());
Flux f1 = f.publishOn(scheduler).flatMap(r -> r.publishOn(scheduler).concatMap(b ->
Flux.just(b)
.concatMap(a -> {
LOG.error("processing message - order: {} offset: {} partition: {}",a.key(),a.receiverOffset().offset(),a.receiverOffset().topicPartition().partition());
return process(a.key(), a.value()).
then(a.receiverOffset().commit())
.doOnSuccess(d -> LOG.info("committing order {}: offset: {} partition: {} ",a.key(),a.receiverOffset().offset(),a.receiverOffset().topicPartition().partition()))
.doOnError(d -> LOG.info("committing offset failed for order {}: offset: {} partition: {} ",a.key(),a.receiverOffset().offset(),a.receiverOffset().topicPartition().partition()));
})
.retryWhen(companion -> companion
.doOnNext(s -> LOG.info(" --> Exception processing message for order {}: offset: {} partition: {} message: {} " , b.key() , b.receiverOffset().offset(),b.receiverOffset().topicPartition().partition(),s.getMessage()))
.zipWith(Flux.range(1, numRetries), (error, index) -> {
if (index < numRetries) {
LOG.info(" --> Retying {} order: {} offset: {} partition: {} ", index, b.key(),b.receiverOffset().offset(),b.receiverOffset().topicPartition().partition());
return index;
} else {
LOG.info(" --> Retries Exhausted: {} - order: {} offset: {} partition: {}. Message moved to error queue. Commit and proceed to next", index, b.key(),b.receiverOffset().offset(),b.receiverOffset().topicPartition().partition());
producer.sendMessages(ERROR_TOPIC,b.key(),b.value());
b.receiverOffset().commit();
//return index;
throw Exceptions.propagate(error);
}
})
.flatMap(index -> Mono.delay(Duration.ofSeconds((long) Math.pow(1.5, index - 1) * 3)))
.doOnNext(s -> LOG.info(" --> Retried at: {} ", LocalTime.now()))
))
);
f1.doOnError(a -> {
LOG.info("Moving to next message because of : ", a);
try {
Thread.sleep(5000); // configurable
} catch (InterruptedException e) {
e.printStackTrace();
}
}
).retry().subscribe();
}
public ReceiverOptions<String, String> receiverOptions(Collection<String> topics) {
return consumerSettings
.commitInterval(Duration.ZERO)
.commitBatchSize(0)
.addAssignListener(p -> LOG.info("Group {} partitions assigned {}", GROUP_ID, p))
.addRevokeListener(p -> LOG.info("Group {} partitions assigned {}", GROUP_ID, p))
.subscription(topics);
}
private Mono<Void> process(OrderId orderId, TraceId traceId)
{
try {
Thread.sleep(500); // simulate slow response
} catch (InterruptedException e) {
// Causes the restart
e.printStackTrace();
}
if(orderId.getId().startsWith("error")) // simulate error scenario
return Mono.error(new Exception("processing message failed for order: " + orderId.getId()));
return Mono.empty();
}
Create different consumer groups.
Each consumer group would be related to one database.
Create your consumer so that they only process relevant event and push it to related database. If database is down then configure consumer to retry infinite amount of time.
For any reason, if your consumer dies then make sure that they start from where earlier consumer left. There is small possibility that your consumer dies right after committing data to database and sending ack to kafka broker. You need to update consumer code to make sure that you process messages exactly-once (if needed).

Kafka Producer Timestamp not able to read in from consumer record

I am using Kafka 0.10.2 KafkaProducer to produce data in topic.
Below is my code.
ProducerRecord record = new ProducerRecord(topic, (Integer)null, Long.valueOf(System.currentTimeMillis()), partitionKey.getBytes(), message.getBytes());
Future<RecordMetadata> future = this.producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if(e != null) {
LOGGER.error("Error producing to topic " + partitionKey, e.getCause());
} else {
LOGGER.info(" Successfully produced topic " + recordMetadata.topic() + " on partition " + recordMetadata.partition() + "at " + recordMetadata.timestamp());
}
}
});
this message is getting replicated through mirrormaker in different cluster. I wrote a Mirrormaker Handler to send capture the delay though per message.
when I check the timestamp at mirrormaker consumer. I am not able to see the timestamp. below is the code for mirrormaker handler.
#Override
public List<ProducerRecord<byte[], byte[]>> handle(BaseConsumerRecord record) {
LOGGER.debug("Timestamp from dal producer "+record.timestamp());
return Collections.singletonList(new ProducerRecord<byte[], byte[]>(topicToSend,partitionToSend,timeStampAtMM, record.key(), record.value()));
}