Kafka Real-Time guarantees - apache-kafka

Can Kafka gurantee that a consumer sees the message x ms after it has been (successfully) produced?
I have a system, where service A accepts requests. Service B needs to be able to answer how many requests have been coming in by a certain time. Service B needs to be precise. My plan is:
Service A accepts requests, it produces a message and waits for the ack of at least one replica. As it got it, it will send the user that it's request is "in the system".
As Service B is asked, I wait x ms. Then I check the topic for new requests. So I know 100% the state of Service A at "now() - x ms".
In this case, Kafka needs to guarantee that I can consume a message maximum X ms after it has been produced. Is that the case?

In Kafka, messages are available for consumption once the high watermark has been incremented. This is guaranteed to happen after the minimum number of in sync replicas has been satisfied. The trade-off here is durability for latency. If you require lower latency, only acknowledging at the leader is faster. However, this isn't as durable as waiting for 2 in sync replicas. When answering "when can I consume?", you are really answering "when has the message been acknowledged as written by Kafka?"


Handling catastrophic failover in Kafka

Let's imaging a simple message processing pipeline, like on the image below:
A group of consumers listens to a topic, picks messages one by one, does some sort of processing and sends them over to the next topic.
Some messages crash the consumer or make it stuck forever (so then a liveness probe kills the consumer after timeout).
In this case a consumer is not able to commit the offset, so the malicious message gets picked up by another consumer. And also makes it crash.
Ideally we want to move the message to a dead letter topic after N such attempts.
This can be achieved by introducing a shared storage:
But this creates coupling between the services and introduces a Single Point of Failure (SPOF) which is the shared database.
I'm looking for ideas on how to work this around with stateless services.
If your context is correct with this approach (that's something you should judge, as I'm only trying to give a suggestion), please consider decoupling the consumption and the processing.
In your case, the consumer is stopped, not because it was not able to read from kafka, and/or the kafka broker wasn't able to provide messages, but because the processing of the message was too slow and/or unsuccesful.
The consumer, in fact, was correctly receiving the messages. It was the processing of them that made it be declared dead.
First of all, the KafkaConsumer javadoc block regarding this (just above the constructor summary). The second option is the one quoted here
2. Decouple Consumption and Processing
Another alternative is to have one or more consumer threads that do
all data consumption and hands off ConsumerRecords instances to a
blocking queue consumed by a pool of processor threads that actually
handle the record processing. This option likewise has pros and cons:
PRO: This option allows independently scaling the number of consumers
and processors. This makes it possible to have a single consumer that
feeds many processor threads, avoiding any limitation on partitions.
CON: Guaranteeing order across the processors requires particular care
as the threads will execute independently an earlier chunk of data may
actually be processed after a later chunk of data just due to the luck
of thread execution timing. For processing that has no ordering
requirements this is not a problem.
CON: Manually committing the position becomes harder as it requires
that all threads co-ordinate to ensure that processing is complete for
that partition.**
Esentially, works like this. The consumer keeps reading and gives the responsibility of the processing and process-timeout management to the processor threads .
The error handling of the message processing would be responsibility of the processor threads as well. For example, if a timeout is thrown or an exception occurs, the processor will send the message to your defined "dead" queue, or whatever management of this you wish to perform, without involving the consumer. Regardless of the processor threads' success or fail, the consumer will continue its job and never be considered dead for not calling poll() in the specified timeout.
You should control the amount of messages the consumer retrieves in its poll call in order not to saturate the processors. Its a game regarding how fast the processors finish their job, how many messages the consumer retrieves (max.poll.records) at each iteration, and what's the specified timeout for the consumer.
Decoupled workflow
The first element to be quoted is the queue (with a limited size, which you should also manage in order not getting too filled - OOM).
This queue would be the link between consumer and processor threads, essentially a buffer that could dynamically get bigger or smaller depending on the specific word load at each time; It would manage overloads, something like a dam, or barrier, to find a similarity.
KAFKA <------> CONSUMER ----> QUEUE -----|
What you get is a second queue-lag mechanism:
1. Kafka Consumer LAG (the messages still to be read from the partition/topic)
2. Queue LAG (received messages still need to be processed)
KAFKA <--(LAG)--> CONSUMER ----> QUEUE --(LAG)--|
The queue could be some kind of synchronized queue, such a ConcurrentLinkedQueue. for example. Or you could manage yourself the synchronization with a customized queue.
Essentially, the duties would be divided, and the consumer is given the easiest one (as its the one that is most crucial).
consume-->send to queue
read from queue|-->[manage timeout]
|==>PROCESS MESSAGE ==> send to topic
|-->[handle failed messages]
You should also manage if the processor threads die/deadlock; but usually those mechanisms are already implemented in most of ThreadPool variants.
I suggest the workers to share a unique KafkaProducer; The producer is thread safe and since the output topic would be the same for the group of consumers, this would also increase its performance. Also from the Kafka Producer javadoc:
The producer is thread safe and sharing a single producer instance
across threads will generally be faster than having multiple
In resume, each consumer thread feeds n processor threads. Some variants could be:
- 1 consumer - 1 worker (no processing paralellization, just division of duties)
- 1 consumer - 2 workers
- 1 consumer - 4 workers
- 2 consumers - 4 workers (2 for each)
- 2 consumers - 8 workers (4 for each)
Read carefully the pros and contras from this mechanism in the javadoc, and judge if this could be a solution to your specific case.
In my oppinion, there's a PRO that doesn't get reflected in the docs, which is the root of this answer/suggestion:
Consumption shouldn't be affected by processing. This approach avoids any consumer thread being considered dead due to a slow processing of the messages, and offers an extra "safety-window" thanks to the queue. I'm not saying that, at the point in which all processors fail for every message, or the queue hits maximum size, for example, the consumer would continue happily as if that didn't affect it; It will in fact be stopped by processing, but much, much later and due to bigger reasons that couldn't be avoided. This approach offers some extra time, or extra shield, for that to happen. Just like a dam can fail if it can't hold any more water.
Well, hope you take this as a suggestion, and may it be helpful somehow. It may avoid most of the dead consumer issues you're having. If well managed, it's a good approach for 24/7 real time data workflow.

Kafka - max.in.flight.requests.per.connection is per producer or session?

I am going through the documentation and it is little confusing about the parameter "max.in.flight.requests.per.connection"
The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled).
The phrase "unacknowledged requests" refers to per producer or per connection or per client ?
Please see the answer below from Eugene. I'm not sure if this answer was wrong or if Kafka changes the behaviour in the 2 years between the answers.
Original answer
It's per partition. Kafka internally might multiplex connections (e.g. to send several requests using a single connect for different topics/partitions that are handled by the same broker), or have an individual connection per partition, but these are performance concerns which are mostly dealt within the client.
The documentation of retries, sheds some more light (and clarifies that is per partition)
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first. Note additionally that produce requests will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. Users should generally prefer to leave this config unset and instead use delivery.timeout.ms to control retry behavior.
This is a setting per connection, per broker. If you have a producer, then internally it uses a Sender Thread that dispatches batches from the RecordAccumulator to the broker (in simpler words : sends messages). This sender thread is allowed to have a max of ${max.in.flight.requests.per.connection} requests that it has not yet received acknowledgements from the broker. Think about this way: a sender does some operations in typical processing.
Drain batches -> Make Requests -> Pool Connections -> Fire Callbacks.
So at some point (Pool Connections) it can send a request to the broker, but not wait for a response, it will check for the response in the next cycle. It can have such unacknowledged requests, up to that max.in.flight.requests.per.connection value.

dealing with Kafka's exactly once processing edge-cases

Trying to do a POC for processing messages using Kafka for an implementation which absolutely requires only once processing. Example: as a payment system, process a credit card transaction only once
What edge cases should we protect against?
One failure scenario covered here is:
1.) If a consumer fails, and does not commit that it has read through a particular offset, the message will be read again.
Lets say consumers live in Kubernetes pods, and one of the hosts goes offline. We will potentially have messages that have been processed, but not marked as processed in Kafka before the pods went away due to underlying hardware issue. Do i understand this error scenario correctly?
Are there other failure scenarios which we need to fully understand on the producer/consumer side when thinking of Kafka doing only-once processing?
im going to basically repeat and exand on an answer i gave here:
a few scenarios can result in duplication:
consumers only periodically checkpoint their positions. a consumer crash can result in duplicate processing of some range or records
producers have client-side timeouts. this means the producer may think a request timed out and re-transmit while broker-side it actually succeeded.
if you mirror data between kafka clusters thats usually done with a producer + consumer pair of some sort that can lead to more duplication.
there are also scenarios that end in data loss - look up "unclean leader election" (disabling that trades with availability).
also - kafka "exactly once" configurations only work if all you inputs, outputs, and side effects happen on the same kafka cluster. which often makes it of limited use in real life.
there are a few kafka features you could try using to reduce the likelihood of this happening to you:
set enable.idempotence to true in your producer configs (see https://kafka.apache.org/documentation/#producerconfigs) - incurs some overhead
use transactions when producing - incurs overhead and adds latency
set transactional.id on the producer in case your fail over across machines - gets complicated to manage at scale
set isolation.level to read_committed on the consumer - adds latency (needs to be done in combination with 2 above)
shorten auto.commit.interval.ms on the consumer - just reduces the window of duplication, doesnt really solve anything. incurs overhead at really low values.
I have to say that as someone who's been maintaining a VERY large kafka installation for the past few years I'd never use a bank that relied on kafka for its core transaction processing though ...

What atomicity guarantees - if any - does Kafka have regarding batch writes?

We're now moving one of our services from pushing data through legacy communication tech to Apache Kafka.
The current logic is to send a message to IBM MQ and retry if errors occur. I want to repeat that, but I don't have any idea about what guarantees the broker provide in that scenario.
Let's say I send 100 messages in a batch via producer via Java client library. Assuming it reaches the cluster, is there a possibility only part of it be accepted (e.g. a disk is full, or some partitions I touch in my write are under-replicated)? Can I detect that problem from my producer and retry only those messages that weren't accepted?
I searched for kafka atomicity guarantee but came up empty, may be there's a well-known term for it
When you say you send 100 messages in one batch, you mean, you want to control this number of messages or be ok letting the producer batch a certain amount of messages and then send the batch ?
Because not sure you can control the number of produced messages in one producer batch, the API will queue them and batch them for you, but without guarantee of batch them all together ( I'll check that though).
If you're ok with letting the API batch a certain amount of messages for you, here is some clues about how they are acknowledged.
When dealing with producer, Kafka comes with some kind of reliability regarding writes ( also "batch writes")
As stated in this slideshare post :
https://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign (83)
The original list of messages is partitioned (randomly if the default partitioner is used) based on their destination partitions/topics, i.e. split into smaller batches.
Each post-split batch is sent to the respective leader broker/ISR (the individual send()’s happen sequentially), and each is acked by its respective leader broker according to request.required.acks
So regarding atomicity.. Not sure the whole batch will be seen as atomic regarding the above behavior. Maybe you can assure to send your batch of message using the same key for each message as they will go to the same partition, and thus maybe become atomic
If you need more clarity about acknowlegment rules when producing, here how it works As stated here https://docs.confluent.io/current/clients/producer.html :
You can control the durability of messages written to Kafka through the acks setting.
The default value of "1" requires an explicit acknowledgement from the partition leader that the write succeeded.
The strongest guarantee that Kafka provides is with "acks=all", which guarantees that not only did the partition leader accept the write, but it was successfully replicated to all of the in-sync replicas.
You can also look around producer enable.idempotence behavior if you aim having no duplicates while producing.

Reliable fire-n-forget Kafka producer implementation strategy

I'm in middle of a 1st mile problem with Kafka. Everybody deals with partitioning, etc. but how to handle the 1st mile?
My system consists of many applications producing events distributed on nodes. I need to deliver these events to a set of applications acting as consumers in a reliable/fail-safe way. The messaging system of choice is Kafka (due its log nature) but it's not set in stone.
The events should be propagated in a decoupled fire-n-forget manner as most as possible. This means the producers should be fully responsible for reliable delivering their messages. This means apps producing events shouldn't worry about the event delivery at all.
Producer's reliability schema has to account for:
box connection outage - during an outage producer can't access network at all; Kafka cluster is thus not reachable
box restart - both producer and event producing app restart (independently); producer should persist in-flight messages (during retrying, batching, etc.)
internal Kafka exceptions - message size was too large; serialization exception; etc.
No library I've examined so far covers these cases. Is there a suggested strategy how to solve this?
I know there are retriable and non-retriable errors during Producer's send(). On those retriable, the library usually handles everything internally. However, non-retriable ends with an exception in async callback...
Should I blindly replay these to infinity? For network outages it should work but how about Kafka internal errors - say message too large. There might be a DeadLetterQueue-like mechanism + replay. However, how to deal with message count...
About the persistence - a lightweight DB backend should solve this. Just creating a persistent queue and then removing those already send/ACKed. However, I'm afraid that if it was this simple it would be already implemented in standard Kafka libraries long time ago. Performance would probably go south.
Seeing things like KAFKA-3686 or KAFKA-1955 makes me a bit worried.
Thanks in advance.
We have a production system whose primary use case is reliable message delivery. I can't go in much detail, however i can share a high level design on how we achieve this. However this system is guarantees "atleast once delivery" messaging sematics.
First we designed a message schema, and all the message sent to this
system must follow it.
Then we write the message to the a mysql message table, which is sharded by
date, with a field marked as delivered or not
We have a app constantly polling db, with rows marked un-delivered, picks up a row, constructs the message and send it to the load balancer, this is a blocking call and
updates the message row to delivered, only when returned 200
In case of 5xx, the app will retry the message with sleep back off. Also you can make the retries configurable as per your need.
Each source system maintains their own polling app and db.
Producer Array
This is basically a array of machines under a load balancer waiting for incoming messages and produce those to the Kafka Cluster.
We maintain 3 replicas of each topic and in the producer Config we keep acks = -1 , which is very important for your fire-n-forget requirement. As per the doc
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. This is equivalent
to the acks=-1 setting
As I said producing is a blocking call, and it will return 2xx if the message is produced succesfully across all 3 replicas.
4xx, if message is doesn't meet the schema requirements
5xx, if the kafka broker threw some exception.
Consumer Array
This is a normal array of machines, running Kafka High level Consumers for the topic's consumer groups.
We are currently running this setup with few additional components for some other functional flows in production and it is basically fire-n-forget from the source point of view.
This system addresses all of your concerns.
box connection outage : Unless the source polling app gets 2xx,it
will produce again-again which may lead to duplicates.
box restart : Due to retry mechanism of the source , this shouldn't be a problem as well.
internal Kafka exceptions : Taken care by polling app, as producer array will reply with 5xx unable to produce, and will be further retried.
Acks = -1, also ensures that all the replicas are in-sync and have a copy of the message, so broker going down will not be a issue as well.