Kafka producer resilience config: Fail but never block - apache-kafka

I am currently learning some Kafka best practices from netflix (https://www.slideshare.net/wangxia5/netflix-kafka). It is a very good slide. However, I really dont understand one of the slides (slide 18) mentioned about producer resilience configuration, I hope someone in stackoverflow is very kind to give me insight for that (Cant find the video or reach out the author...).
The slide mentioned: Fail but never block in producer resilience configuration.
Block.on.buffer.full=false
Even thought this is the deprecated configuration, I guess the idea is to let producer fail right away rather than block to wait. In the latest kafka configuration, I can use a small value for block.max.ms to fail the producer to sends message rather than blocking it.
Question 1: Why we want to fail it right away, does it means retry later on rather than block it ?
Handle Potential Block for first meta data request
Question 2: I can understand the meta data in the consumer side. i.e registering consumer group and sort of stuff, but what is meta data request for producer point of view ? and is it potentially blocked ? Is there any kafka documentation to describe that
Periodically check whether Kafka producer was open successfully
Question 3: Is there a way we can check that and what benefits for that check ?
Thanks in advance :)

You have to keep in mind how a kafka producer works:
From the API-Documentation:
The producer consists of a pool of buffer space that holds records
that haven't yet been transmitted to the server as well as a
background I/O thread that is responsible for turning these records
into requests and transmitting them to the cluster.
If you call the send method to send a record to the broker, this message will be added to an internal buffer (the size of this buffer can be configured using the buffer.memory configuration property). Now different things can happen:
Happy path: The messages from the buffer will get converted into requests to the broker by the background I/O thread, the broker will ACK this messages and everything will be fine.
The messages can not be send to the kafka broker (connection to broker is broken, you are producing messages faster than they can send out, etc.). In this case it is up to you to decide what to do. Setting the max.block.ms (as an replacement for block.on.buffer.full) to a positive value the send message will block for this amount of time(1) and through a timeout exception afterwards.
Regarding your questions:
(1) If I got the slides right, Netflix explicitly wants to throw away the messages which they can't send to the broker (instead of blocking, retrying, failing ...). This of course highly depends on your application and the kind of messages you are dealing with. If it "just log messages" it might be no big deal. If it comes to financial transactions you may want to
(2) The producer needs some metadata about the cluster. E.g. it needs to know which key goes to which partition. There is a good blogpost by hortonworks how the producer works internaly. I think it is worth reading: https://community.hortonworks.com/articles/72429/how-kafka-producer-work-internally.html
Furthermore the statement:
Handle Potential Block for first meta data request
points to an issues which is as far as I know still around. The very first call of send will do a sync. metadata request to the broker and therefor may take longer.
(3) Connections to the producers are closed by the broker if the producer is idle for some time (see connections.max.idle.ms). I am not aware of some standard way to keep the connection of your consumer alive or even to check if the connection is still alive. What you could do is peridicaly send a metadatarequest to the broker (producer.partitionsFor(anyTopic)). But again maybe this is not an issue for your application.
(1) When it comes to details what is taken into account to calculate the time passed it get's a bit tricky. For max.block.ms it is actually:
metadata fetch time
buffer full block time
serialization time (customized serializer)
partitioning time (customized partitioner)

Related

kafka connect behavior on no consumers in group

I have a problem that I'm not sure how to understand it, more than actually fix it.
I tried some support here but I think I was wrong and this might be related to the actual connectors themselves and not my infrastructure setup
https://github.com/strimzi/strimzi-kafka-operator/discussions/6418
Not sure if relevant but I am using the JDBC Sink connector from confluent.
Once I set it up everything seems to work fine but every one in a while all my consumers will leave the group and basically data will not longer be sink.
I can see in the log:
[Consumer clientId=connector-consumer-sink-jdbc-bde-objects-0, groupId=connect-sink-jdbc-bde-objects] Member connector-consumer-sink-jdbc-bde-objects-0-3333ce54-ddc4-41ad-878e-cc74f36e8ffd sending LeaveGroup request to coordinator abc-strimzi-kafka-kafka-0.abc-strimzi-kafka-kafka-brokers.insight-data-flow-nightly.svc:9093 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
The message is relatively clear, the data cannot be processed in time and the connector stops working.
Now the part I don't like, or is strange to me is that I would expect the connector to go into an error status and not stay as if nothing bad was happening.
Second even if this happens I would like to sort of get an alert but I can't distinguish this from normal rebalancing activities in the consumer group.
Does anyone have experience with this problem or can point me if there is any special configuration I should look out for at the kafka level that could be used to manage this?
I can't for sure define the pool size and the pool interval but somehow the values might not always be correct and when they stop being correct I would like to see that there is some sort of error and not that the lag simply keeps grown and no messages are consumed.
Thanks for any feedback and best regards

Kafka Streams commits offset when producer throws an exception

In my Kafka streams application I have a single processor that is scheduled to produce output messages every 60 seconds. Output message is built from messages that come from a single input topic. Sometimes it happens that the output message is bigger than the configured limit on broker (1MB by default). An exception is thrown and the application shuts down. Commit interval is set to default (60s).
In such case I would expect that on the next run all messages that were consumed during those 60s preceding the crash would be re-consumed. But in reality the offset of those messages is committed and the messages are not processed again on the next run.
Reading answers to similar questions it seems to me that the offset should not be committed. When I increase commit interval to 120s (processor still punctuates every 60s) then it works as expected and the offset is not committed.
I am using default processing guarantee but I have also tried exactly_once. Both have the same result. Calling context.commit() from processor seems to have no effect on the issue.
Am I doing something wrong here?
The contract of a Processor in Kafka Streams is, that you have fully processed an input record and forward() all corresponding output messages before process() return. -- This contract implies that Kafka Streams is allowed to commit the corresponding offset after process() returns.
It seem you "buffer" messages within process() in-memory to emit them later. This violated this contract. If you want to "buffer" messages, you should attach a state store to the Processor and put all those messages into the store (cf https://kafka.apache.org/25/documentation/streams/developer-guide/processor-api.html#state-stores). The store is managed by Kafka Streams for you and it's fault-tolerant. This way, after an error the state will be recovered and you don't loose any data (even if the input messages are not reprocessed).
I doubt that setting the commit interval to 120 seconds actually works as expected for all cases, because there is no alignment between when a commit happens and when punctuation is called.
Some of this will depend on the client you are using and whether it's based on librdkafka.
Some of the answer will also depend on how you are "looping" over the "poll" method. A typical example will look like the code under "Automatic Offset Committing" at https://kafka.apache.org/23/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
But this assumes quite a rapid poll loop (100ms + processing time) and a auto.commit.timeout.ms at 1000ms (the default is usually 5000ms).
If I read your question correctly, you seem to consuming messages once per 60 seconds?
Something to be aware of is that the behavior of kafka client is quite tied to how frequently poll is called (some libraries will wrap poll inside something like a "Consume" method). Calling poll frequently is important in order to appear "alive" to the broker. You will get other exceptions if you do not poll at least every max.poll.interval.ms (default 5min). It can lead to clients being kicked out of their consumer groups.
anyway, to the point... auto.commit.interval.ms is just a maximum. If a message has been accepted/acknowledged or StoreOffset has been used, then, on poll, the client can decide to update the offset on the broker. Maybe due to client side buffer size being hit or some other semantic.
Another thing to look at (esp if using a librdkafka based client. others have something similar) is enable.auto.offset.store (default true) this will "Automatically store offset of last message provided to application" so every time you poll/consume a message from the client it will StoreOffset. If you also use auto.commit then your offset may move in ways you might not expect.
See https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md for the full set of config for librdkafka.
There are many/many ways of consuming/acknowledging. I think for your case, the comment for max.poll.interval.ms on the config page might be relevant.
"
Note: It is recommended to set enable.auto.offset.store=false for long-time processing applications and then explicitly store offsets (using offsets_store()) after message processing
"
Sorry that this "answer" is a bit long winded. I hope there are some threads for you to pull on.

How to manage Kafka transactional producer objects in request oriented applications

What is the best practice for managing Kafka producer objects in request oriented (e.g. http or RPC servers) applications, when configured as transactional producers? Specifically, how to share producer objects among serving threads, and how to define the transactional.id configuration value for those objects?
In non-transactional usage, producer objects are thread safe and it is common to share one object among all request serving threads. It is also straightforward to setup transactional producer objects to be used by kafka consumer threads, just instantiating one object for each consumer thread works well.
Combining transactional producers with request oriented applications appears to be more complicated, as the life-cycle of serving threads is usually dynamically controlled by a thread pool. I can think of a few options, all with downsides:
Share a single object, protected against concurrency by some kind of mutex. Contention under load would probably be a serious problem.
Instantiate a producer object for each request coming in. KafkaProducer objects are slow to initialize, as they maintain network connections, threads, and other heavyweight objects; paying this cost for each request seems impractical.
Maintain a pool of producer objects, and lease one for each request. The main downside I can see is the amount of machinery required. It is also unclear how to configure transactional.id for these objects, as their lifecycle does not map cleanly to a shard identifier in a partitioned, stateful, application as the documentation says.
Are there other options? Is there an optimal approach?
TL;DR
The transactional id is for preventing duplicates caused by zombie processes in the read-process-write pattern where you read from and produce to kafka topics. For request oriented applications, e.g. messages being produced by an incoming http request, transactional id doesn't bring any benefit (of course you still need to assign one if you want to use transactions and shouldn't be repeated between producers in the same process or different processes in your cluster)
Long answer
As the docs say, transactional producers are not thread safe
As is hinted at in the example, there can be only one open transaction per producer. All messages sent between the beginTransaction() and commitTransaction() calls will be part of a single transaction
so as you correctly explained there can't be concurrent access to the producer so we must pick one of the three options you described.
For this answer I'm going to assume that request oriented applications corresponds to http requests as the mechanism is triggering a message being produced with a transaction (actually, more than one message, otherwise will be enough with idempotent producers and transactions won't be needed)
In terms of correctness all of them are ok as, option 1 would work but depending on your application throughput it could have a high contention, option 2 will also work but you will pay the price of a higher latency and won't be very efficient.
IMHO I think option 3 could be the best since is a compromise between of the two previous options, although of course requires a more careful implementation than just opening a new producer each time.
Transactional id
The question that remains is how to assign a transactional id to the producer, specially in the last case (although both options 1 and 3 share the same concern, since in both cases we are reusing a producer with the same transactional id to handle different requests).
To answer this we first need to understand that the goal of transactional.id is to protect us from having duplicate message being produced caused by zombie processes (a process that hangs for a while, e.g. bc of a long gc pause, and is considered dead but after a while comes back and continues), this is called zombie fencing.
An important detail to understand the need of zombie fencing is understanding in which use case it could happen and this is the read-process-write pattern where you read from a topic, process the element and write to an output topic and the offset topic, which give us atomicity and Exactly-once semantics (if you are not doing any side effects on the process step).
Idempotent producers prevent us from having duplicates caused by producer retries (where the message was persisted by the broker but the ack wasn't received by the producer) and two-phase commit within kafka (where we are not only writing to the output but also marked the message as consumed by also producing to the offset topic) prevent us from having duplicates caused by consuming the message more than once (if the process crashes after producing to the output topic but before committing the offset).
There is still a subtle case where a duplicate can be introduced and it is a zombie producer, which is fenced by monotonically increasing an epoch each time a producer calls initTransactions that will be send with every message the producer sends.
So, for a producer to be fenced, another producer should have being started with the same transaction id, the key here is explained by Jason Gustafson in this talk
"what we are looking for is a guarantee that for each input partition there is only a single write that is responsible for reading that data and writing the output"
This means the transactional.id is assigned in terms of the partition is being consumed in the "read-process-write" pattern.
So if a process that has assigned partition 0 of topic A is considered dead, a rebalance will kick off and the new process that is assigned should create a producer with the same transactional.id, that's why it should be something like this <prefix><group>.<topic>.<partition> as described in this answer, where the partition is part of the transactional.id. This also means a producer per partition assigned, which could also represent an overhead depending on how many topics and partitions your consumers are being assigned.
This slides from the talk clarifies this situation
Transactional id before process crash
Transactional id reassigned to other process after crash
Transactional id in http requests
Going back to your original question, http requests won't follow the read-process-write pattern where zombies can introduce duplicates, because each http request will be unique, even if you introduce a unique identifier it will be a different message from the point of view of the transactional producer.
In this case I would argue that you may still have value using the transactional producer if you want the atomicity of writing to two different topics, but you can choose a random transactional id for option 2, or reuse it for options 1 and 3.
UPDATE
My answer is outdated since is based in an old version of kafka.
The overhead of having one producer per partition described before was a concern that was tackled in KIP-447
This architecture does not scale well as the number of input partitions increases. Every producer come with separate memory buffers, a separate thread, separate network connections. This limits the performance of the producer since we cannot effectively use the output of multiple tasks to improve batching. It also causes unneeded load on brokers since there are more concurrent transactions and more redundant metadata management.
This is the main difference as explained in this post
When the partition assignment is finalized after a consumer group rebalance, the first step for the consumer is to always get the next offset to begin fetching data. With this observation, the OffsetFetch protocol protection is enhanced, such that when a consumer group has pending transactional offsets associated with one partition, the OffsetFetch call can be blocked until the associated transaction completes. Previously, the “outdated” offset data would be returned and the application allowed to continue immediately.
Whit this new feature, the use of transactional.id is no longer clear to me.
Although it is still unclear why fencing requires both blocking the poll if there are pending transactions while it seems to me that the sending the consumer group metadata should be enough (I assume a zombie producer will be fenced by commiting with an old generation.id for that group.id, the generation.id being bumped with each rebalance) it seems the transactional.id doesn't play a major role anymore. e.g. spring docs says
With mode V1, the producer is "fenced" if another instance with the same transactional.id is started. Spring manages this by using a Producer for each group.id/topic/partition; when a rebalance occurs a new instance will use the same transactional.id and the old producer is fenced.
With mode V2, it is not necessary to have a producer for each group.id/topic/partition because consumer metadata is sent along with the offsets to the transaction and the broker can determine if the producer is fenced using that information instead.

Reliable fire-n-forget Kafka producer implementation strategy

I'm in middle of a 1st mile problem with Kafka. Everybody deals with partitioning, etc. but how to handle the 1st mile?
My system consists of many applications producing events distributed on nodes. I need to deliver these events to a set of applications acting as consumers in a reliable/fail-safe way. The messaging system of choice is Kafka (due its log nature) but it's not set in stone.
The events should be propagated in a decoupled fire-n-forget manner as most as possible. This means the producers should be fully responsible for reliable delivering their messages. This means apps producing events shouldn't worry about the event delivery at all.
Producer's reliability schema has to account for:
box connection outage - during an outage producer can't access network at all; Kafka cluster is thus not reachable
box restart - both producer and event producing app restart (independently); producer should persist in-flight messages (during retrying, batching, etc.)
internal Kafka exceptions - message size was too large; serialization exception; etc.
No library I've examined so far covers these cases. Is there a suggested strategy how to solve this?
I know there are retriable and non-retriable errors during Producer's send(). On those retriable, the library usually handles everything internally. However, non-retriable ends with an exception in async callback...
Should I blindly replay these to infinity? For network outages it should work but how about Kafka internal errors - say message too large. There might be a DeadLetterQueue-like mechanism + replay. However, how to deal with message count...
About the persistence - a lightweight DB backend should solve this. Just creating a persistent queue and then removing those already send/ACKed. However, I'm afraid that if it was this simple it would be already implemented in standard Kafka libraries long time ago. Performance would probably go south.
Seeing things like KAFKA-3686 or KAFKA-1955 makes me a bit worried.
Thanks in advance.
We have a production system whose primary use case is reliable message delivery. I can't go in much detail, however i can share a high level design on how we achieve this. However this system is guarantees "atleast once delivery" messaging sematics.
Source
First we designed a message schema, and all the message sent to this
system must follow it.
Then we write the message to the a mysql message table, which is sharded by
date, with a field marked as delivered or not
We have a app constantly polling db, with rows marked un-delivered, picks up a row, constructs the message and send it to the load balancer, this is a blocking call and
updates the message row to delivered, only when returned 200
In case of 5xx, the app will retry the message with sleep back off. Also you can make the retries configurable as per your need.
Each source system maintains their own polling app and db.
Producer Array
This is basically a array of machines under a load balancer waiting for incoming messages and produce those to the Kafka Cluster.
We maintain 3 replicas of each topic and in the producer Config we keep acks = -1 , which is very important for your fire-n-forget requirement. As per the doc
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. This is equivalent
to the acks=-1 setting
As I said producing is a blocking call, and it will return 2xx if the message is produced succesfully across all 3 replicas.
4xx, if message is doesn't meet the schema requirements
5xx, if the kafka broker threw some exception.
Consumer Array
This is a normal array of machines, running Kafka High level Consumers for the topic's consumer groups.
We are currently running this setup with few additional components for some other functional flows in production and it is basically fire-n-forget from the source point of view.
This system addresses all of your concerns.
box connection outage : Unless the source polling app gets 2xx,it
will produce again-again which may lead to duplicates.
box restart : Due to retry mechanism of the source , this shouldn't be a problem as well.
internal Kafka exceptions : Taken care by polling app, as producer array will reply with 5xx unable to produce, and will be further retried.
Acks = -1, also ensures that all the replicas are in-sync and have a copy of the message, so broker going down will not be a issue as well.

multiplexing consumer and producer in kafka

In my kafka consumer threads(high level), after I consumed a message I am applying some business logic to this message and forwarding this to a WS. But this webservice may be down sometimes and since I consumed this object from kafka and offset is moved forward, i would missed this object.
One way get rid of from this problem is to disabling autocommit in zookeeper and committing offset by calling programmaticaly but i expect that this is a very costly operation. I will be producing to kafka at about 2000 tps and may increase later times.
Another way - which i am not sure if it is a good idea - is if i face with any problem, producing this consumed object to kafka again but i didn't see any post related to this across all my googleings. Is this a thing which is even not considerable?
Can you please give me some insights about handling this situation.
Thanks
You can post back the failed message to the same topic or another of your choice.
If you use the same topic, you will push the messages at the end of the topic and they will be picked up after the others (so if order matters to you don't do this). Also if the action that you perform before sending the message is not idempotent you will have to something to identifying this records so they don't perform the action twice.
If you use a failed_topic, you can push the messages that you can't send to this topic and when the WS is healthy again you need to create a consumer that consumes all the messages there and sends them to the WS.
Hope it helps!
Moving such messages to an error queue and retrying them later is a well known approach.
See Dead letter channel