Kafka-streams corrupt message-handling semantics

Kafka-streams corrupt message-handling semantics - apache-kafka

I notice that Kafka records have a CRC field. If a record in a log file corrupts (e.g. a single bit in the middle of the message gets flipped), what would I expect to see in the streams application in the case of:
The topic is replicated
The topic is not replicated
Since we are using Avro, I can imagine one of the following occurs:
Underlying infrastructure detects CRC error and sources it from another broker
The DeserializationExceptionHandler kicks in
Some other error occurs and the topology falls over or the message is skipped, according to policy

For CRC errors, a exception should be thrown in your Streams application when it tries to deserializer the records. And thus, the DeserializationExceptionHandler kicks in.
In Kafka, all read/writes are handled by the partition leader, and follower brokers only replicate data passively in the background and don't serve any read/write from clients.

Related

Kafka Stream delivery semantic for a simple forwarder

I got a stateless Kafka Stream that consumes from a topic and publishes into a different queue (Cloud PubSub) within a forEach. The topology does not end on producing into a new Kafka topic.
How do I know which delivery semantic I can guarantee? Knowing that it's just a message forwarder and no deserialisation or any other transformation or whatsoever is applied: are there any cases in which I could have duplicates or missed messages?
I'm thinking about the following scenarios and related impacts on how offsets are commited:
Sudden application crash
Error occurring on publish
Thanks guys

If You consider the kafka to kafka loop that a Kafka Stream application usually creates, setting the property:
processing.guarantee=exactly_once
It's enough to have exactly-once semantic, of course also in failure scenarios.
Under the hood Kafka uses a transaction to guarantee that the consume - process - produce - commit offset processing is executed with all or nothing guarantee.
Writing a sink connector with exaclty once semantic kafka to Google PubSub, would mean solving the same issues Kafka solves already for the kafka to kafka scenario.
The producer.send() could result in duplicate writes of message B due to internal retries. This is addressed by the idempotent producer and is not the focus of the rest of this post.
We may reprocess the input message A, resulting in duplicate B messages being written to the output, violating the exactly once processing semantics. Reprocessing may happen if the stream processing application crashes after writing B but before marking A as consumed. Thus when it resumes, it will consume A again and write B again, causing a duplicate.
Finally, in distributed environments, applications will crash or—worse!temporarily lose connectivity to the rest of the system. Typically, new instances are automatically started to replace the ones which were deemed lost. Through this process, we may have multiple instances processing the same input topics and writing to the same output topics, causing duplicate outputs and violating the exactly once processing semantics. We call this the problem of “zombie instances.”
Assuming your producer logic to Cloud PubSub does not suffer from problem 1, just like Kafka producers when using enable.idempotence=true, you are still left with problems 2 and 3.
Without solving these issues your processing semantic will be the delivery semantic your consumer is using, so at least once, if you choose to manually commit the offset.

How to handle various failure conditions in Kafka

Issue we were facing:
In our system we were logging a ticket in database with status NEW and also putting it in the kafka queue for further processing. The processors pick those tickets from kafka queue, do processing and update the status accordingly. We found that some tickets are left in NEW state forever. So we were guessing whether tickets are failing to get produced in the queue or are no getting consumed.
Message loss / duplication scenarios (and some other related points):
So I started to dig exhaustively to know in what all ways we can face message loss and duplication in Kafka. Below I have listed all possible message loss and duplication scenarios that I can find in this post:
How data loss can occur in different approaches to handle all replicas down
Handle by waiting for leader to come online
Messages sent between all replica down and leader comes online are lost.
Handle by electing new broker as a leader once it comes online
If new broker is out of sync from previous leader, all data written between the
time where this broker went down and when it was elected the new leader will be
lost. As additional brokers come back up, they will see that they have committed
messages that do not exist on the new leader and drop those messages.
How data loss can occur when leader goes down, while other replicas may be up
In this case, the Kafka controller will detect the loss of the leader and elect a new leader from the pool of in sync replicas. This may take a few seconds and result in LeaderNotAvailable errors from the client. However, no data loss will occur as long as producers and consumers handle this possibility and retry appropriately.
When a consumer may miss to consume a message
If Kafka is configured to keep messages for a day and a consumer is down for a period of longer than a day, the consumer will lose messages.
Evaluating different approaches to consumer consistency
Message might not be processed when consumer is configured to receive each message at most once
Message might be duplicated / processed twice when consumer is configured to receive each message at least once
No message is processed multiple times or left unprocessed if consumer is configured to receive each message exactly once.
Kafka provides below guarantees as long as you are producing to one partition and consuming from one partition. All guarantees are off if you are reading from the same partition using two consumers or writing to the same partition using two producers.
Kafka makes the following guarantees about data consistency and availability:
Messages sent to a topic partition will be appended to the commit log in the order they are sent,
a single consumer instance will see messages in the order they appear in the log,
a message is ‘committed’ when all in sync replicas have applied it to their log, and
any committed message will not be lost, as long as at least one in sync replica is alive.
Approach I came up with:
After reading several articles, I felt I should do following:
If message is not enqueued, producer should resend
For this producer should listen for acknowledgement for each message sent. If no ackowledement is received, it can retry sending message
Producer should be async with callback:
As explained in last example here
How to avoid duplicates in case of producer retries sending
To avoid duplicates in queue, set enable.idempotence=true in producer configs. This will make producer ensure that exactly one copy of each message is sent. This requires following properties set on producer:
max.in.flight.requests.per.connection<=5
retries>0
acks=all (Obtain ack when all brokers has committed message)
Producer should be transactional
As explained here.
Set transactional id to unique id:
producerProps.put("transactional.id", "prod-1");
Because we've enabled idempotence, Kafka will use this transaction id as part of its algorithm to deduplicate any message this producer sends, ensuring idempotency.
Use transactions semantics: init, begin, commit, close
As explained here:
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record1);
producer.send(record2);
producer.commitTransaction();
} catch(ProducerFencedException e) {
producer.close();
} catch(KafkaException e) {
producer.abortTransaction();
}
Consumer should be transactional
consumerProps.put("isolation.level", "read_committed");
This ensures that consumer don't read any transactional messages before the transaction completes.
Manually commit offset in consumer
As explained here
Process record and save offsets atomically
Say by atomically saving both record processing output and offsets to any database. For this we need to set auto commit of database connection to false and manually commit after persisting both processing output and offset. This also requires setting enable.auto.commit to false.
Read initial offset (say to read after recovery from cache) from database
Seek consumer to this offset and then read from that position.
Doubts I have:
(Some doubts might be primary and can be resolved by implementing code. But I want words from experienced kafka developer.)
Does the consumer need to read the offset from database only for initial (/ first after consumer recovery) read or for all reads? I feel it needs to read offset from database only on restarts, as explained here
Do we have to opt for manual partitioning? Does this approach works only with auto partitioning off? I have this doubt because this example explains storing offset in MySQL by specifying partitions explicitly.
Do we need both: Producer side kafka transactions and consumer side database transactions (for storing offset and processing records atomically)? I feel for producer idempotence, we need producer to have unique transaction id and for that we need to use kafka transactional api (init, begin, commit). And as a counterpart, consumer also need to set isolation.level to read_committed. However can we ensure no message loss and duplicate processing without using kafka transactions? Or they are absolutely necessary?
Should we persist offset to external db as explained above and here
or send offset to transaction as explained here (also I didnt get what does it exactly mean by sending offset to transaction)
or follow sync async commit combo explained here.
I feel message loss / duplication scenarios 1 and 2 are handled by points 1 to 4 of approach I explained above.
I feel message loss / duplication scenario 3 is handled by point 6 of approach I explained above.
How do we implement different consumer consistency approaches as stated in message loss / duplication scenario 4? Is their any configuration or it needs to be implemented inside custom logic inside consumer?
Message loss / duplication scenario 5 says: "Kafka provides below guarantees as long as you are producing to one partition and consuming from one partition."? Is it something to concern about while building correct system?
Is any consideration unnecessary/redundant in the approach I came up with above? Also did I miss any necessary consideration? Did I miss any message loss / duplication scenarios?
Is their any other standard / recommended / preferable approach to ensure no message loss and duplicate processing than what I have thought above?
Do I have to actually code above approach using kafka APIs? or is there any high level API built atop kafka API which allows to easily ensure no message loss and duplicate processing?
Looking at issue we were facing (as stated at very beginning), we were thinking if we can recover any lost/unprocessed messages from files in which kafka stores messages. However that isnt correct, right?
(Extremely sorry for such an exhaustive post but wanted to write question which will ask all related question at one place allowing to build big picture of how to build system around kafka.)

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.

Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

Kafka stream application not consume data after restart

After I did restart our Kafka cluster my application of Kafka streams didn't receive messages from input topic and I got an exception of "can׳t create internal topic". After some research, I did reset with the Kafka tool (to the input topic and the application) the tool is Kafka-streams-application-reset.sh.
Unfortunately, it didn't resolve the problem and I also got the exception again

From the error message, you can infer that the topic already exists and thus, cannot be created. The reason for the failure is, that the existing topic does not have the expected number of partitions (it has 1 instead of 150) -- if the number of partitions would match, Kafka Streams would just use the existing topic.
This can happen, if you have topic auto-create enabled at the brokers (and the topic was created with a wrong number of partitions), or if the number of partitions of your input topic changed. Kafka Streams does not automatically change the number of partitions for the repartition topic, because this might result in data corruption and thus lead to incorrect results.
One way to fix this, it to either manually delete this topic: note, that this might result in data loss and you should only do this, if you know that it is what you want.
Another (better way) would be, to reset the application cleanly using bin/kafka-streams-application-reste.sh in combination with KafkaStreams#cleanup().
Because you need to clean up the application and users should be aware of the implication, Kafka Streams fails to make user aware of the issue instead of "auto magically" take some actions that might be undesired from a user point of view.
Check out the docs for more details. There is also a blog post that explains application reset in details:
https://kafka.apache.org/11/documentation/streams/developer-guide/app-reset-tool.html
https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/

Reliable fire-n-forget Kafka producer implementation strategy

I'm in middle of a 1st mile problem with Kafka. Everybody deals with partitioning, etc. but how to handle the 1st mile?
My system consists of many applications producing events distributed on nodes. I need to deliver these events to a set of applications acting as consumers in a reliable/fail-safe way. The messaging system of choice is Kafka (due its log nature) but it's not set in stone.
The events should be propagated in a decoupled fire-n-forget manner as most as possible. This means the producers should be fully responsible for reliable delivering their messages. This means apps producing events shouldn't worry about the event delivery at all.
Producer's reliability schema has to account for:
box connection outage - during an outage producer can't access network at all; Kafka cluster is thus not reachable
box restart - both producer and event producing app restart (independently); producer should persist in-flight messages (during retrying, batching, etc.)
internal Kafka exceptions - message size was too large; serialization exception; etc.
No library I've examined so far covers these cases. Is there a suggested strategy how to solve this?
I know there are retriable and non-retriable errors during Producer's send(). On those retriable, the library usually handles everything internally. However, non-retriable ends with an exception in async callback...
Should I blindly replay these to infinity? For network outages it should work but how about Kafka internal errors - say message too large. There might be a DeadLetterQueue-like mechanism + replay. However, how to deal with message count...
About the persistence - a lightweight DB backend should solve this. Just creating a persistent queue and then removing those already send/ACKed. However, I'm afraid that if it was this simple it would be already implemented in standard Kafka libraries long time ago. Performance would probably go south.
Seeing things like KAFKA-3686 or KAFKA-1955 makes me a bit worried.
Thanks in advance.

We have a production system whose primary use case is reliable message delivery. I can't go in much detail, however i can share a high level design on how we achieve this. However this system is guarantees "atleast once delivery" messaging sematics.
Source
First we designed a message schema, and all the message sent to this
system must follow it.
Then we write the message to the a mysql message table, which is sharded by
date, with a field marked as delivered or not
We have a app constantly polling db, with rows marked un-delivered, picks up a row, constructs the message and send it to the load balancer, this is a blocking call and
updates the message row to delivered, only when returned 200
In case of 5xx, the app will retry the message with sleep back off. Also you can make the retries configurable as per your need.
Each source system maintains their own polling app and db.
Producer Array
This is basically a array of machines under a load balancer waiting for incoming messages and produce those to the Kafka Cluster.
We maintain 3 replicas of each topic and in the producer Config we keep acks = -1 , which is very important for your fire-n-forget requirement. As per the doc
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. This is equivalent
to the acks=-1 setting
As I said producing is a blocking call, and it will return 2xx if the message is produced succesfully across all 3 replicas.
4xx, if message is doesn't meet the schema requirements
5xx, if the kafka broker threw some exception.
Consumer Array
This is a normal array of machines, running Kafka High level Consumers for the topic's consumer groups.
We are currently running this setup with few additional components for some other functional flows in production and it is basically fire-n-forget from the source point of view.
This system addresses all of your concerns.
box connection outage : Unless the source polling app gets 2xx,it
will produce again-again which may lead to duplicates.
box restart : Due to retry mechanism of the source , this shouldn't be a problem as well.
internal Kafka exceptions : Taken care by polling app, as producer array will reply with 5xx unable to produce, and will be further retried.
Acks = -1, also ensures that all the replicas are in-sync and have a copy of the message, so broker going down will not be a issue as well.