Using Kafka Connect HOWTO "commit offsets" as soon as a "put" is completed in SinkTask - apache-kafka

I am using Kafka Connect to get messages from a Kafka Broker (v0.10.2) and then sync it to a downstream service.
Currently, I have code in SinkTask#put that will process the SinkRecord & then persist it to the downstream service.
A couple of key requirements,
We need to make sure the messages are persisted to the downstream service AT LEAST once.
If the downstream service throws an error or says it didn't process the message then we need to make sure that the messages are re-read again.
So we thought we can rely on SinkTask#flush to effectively back out of committing offsets for that particular poll/cycle of received messages by throwing an exception or something that will tell Connect not to commit the offsets, but retry in the next poll.
But as we found out flush is actually time-based & is more or less independent of the polls & it will commit the offsets when it reaches a certain time threshold.
In 0.10.2 SinkTask#preCommit was introduced, so we thought we can use it for our purposes. But nowhere in the documentation it is mentioned that there is a 1:1 relationship between SinkTask#put & SinkTask#preCommit.
Since essentially we want to commit offsets as soon as a single put succeeds. And similarly, not commit the offsets, if that particular put failed.
How to accomplish this, if not via SinkTask#preCommit?

Getting data into and out of Kafka correctly can be challenging, and Kafka Connect makes this easier since it uses best practices and hides many of the complexities. For sink connectors, Kafka Connect reads messages from a topic, sends them to your connector, and then periodically commits the largest offsets for the various topic partitions that have been read and processed.
Note that "sending them to your connector" corresponds to the put(Collection<SinkRecord>) method, and this may be called many times before Kafka Connect commits the offsets. You can control how frequently Kafka Connect commits offsets, but Kafka Connect ensures that it will only commit an offset for a message when that message was successfully processed by the connector.
When the connector is operating nominally, everything is great and your connector sees each message once, even when the offsets are committed periodically. However, should the connector fail, then when it restarts the connector will start at the last committed offset. That might mean your connector sees some of the same messages that it processed just before the crash. This usually is not a problem if you carefully write your connector to have at least once semantics.
Why does Kafka Connect commit offsets periodically rather than with every record? Because it saves a lot of work and doesn't really matter when things are going nominally. It's only when things go wrong that the offset lag matters. And even then, if you're having Kafka Connect handle offsets your connector needs to be ready to handle messages at least once. Exactly once is possible, but your connector has to do more work (see below).
Writing Records
You have a lot of flexibility in writing a connector, and that's good because a lot will depend on the capabilities of the external system to which it's writing. Let's look at different ways of implementing put and flush.
If the system supports transactions or can handle a batch of updates, your connector's put(Collection<SinkRecord>) could write all of the records in that collection using a single transaction / batch, retrying as many times as needed until the transaction / batch completes or before finally throwing an error. In this case, put does all the work and will either succeed or will fail. If it succeeds, then Kafka Connect knows all of the records were handled properly and can thus (at some point) commit the offsets. If your put call fails, then Kafka Connect assumes doesn't know whether any of the records were processed, so it doesn't update its offsets and it stops your connector. Your connector's flush(...) would need to do nothing, since Kafka Connect is handling all the offsets.
If the system doesn't support transactions and instead you can only submit items one at a time, you might have have your connector's put(Collection<SinkRecord>) attempt to write out each record individually, blocking until it succeeds and retrying each as needed before throwing an error. Again, put does all the work, and the flush method might not need to do anything.
So far, my examples do all the work in put. You always have the option of having put simply buffer the records and to instead do all the work of writing to the external service in flush or preCommit. One reason you might do this is so that you're writes are time-based just like flush and preCommit. If you don't want your writes to be time-based, you probably don't want to do the writes in flush or preCommit.
To Record Offsets or Not To Record
As mentioned above, by default Kafka Connect will periodically record the offsets so that upon restart the connector can begin where it last left off.
However, sometimes it is desirable for a connector to record the offsets in the external system, especially when that can be done atomically. When such a connector starts up, it can look in the external system to find out the offset that was last written, and can then tell Kafka Connect where it wants to start reading. With this approach your connector may be able to do exactly once processing of messages.
When sink connectors do this, they actually don't need Kafka Connect to commit any offsets at all. The flush method is simply an opportunity for your connector to know which offsets that Kafka Connect is committing for you, and since it doesn't return anything it can't modify those offsets or tell Kafka Connect which offsets the connector is handling.
This is where the preCommit method comes in. It really is a replacement for flush (it actually takes the same parameters as flush), except that it is expected to return the offsets that Kafka Connect should commit. By default, preCommit just calls flush and then returns the same offsets that were passed to preCommit, which means Kafka Connect should commit all the offsets it passed to the connector via preCommit. But if your preCommit returns an empty set of offsets, then Kafka Connect will record no offsets at all.
So, if your connector is going to handle all offsets in the external system and doesn't need Kafka Connect to record anything, then you should override the preCommit method instead of flush, and return an empty set of offsets.

Related

(Spring) Kafka appears to consume newly produced messages out of order

Situation:
We have a Spring Boot / Spring Kafka application that is reading from a Kafka topic with a single partition. There is a single instance of the application running and it has a single-threaded KafkaMessageListenerContainer (not Concurrent). We have a single consumer group.
We want to manage offsets ourselves based on committing to a transactional database. At startup, we read initial offsets from our database and seek to that offset and begin reading older messages. (For example with an empty database, we would start at offset 0.) We do this via implementing ConsumerRebalanceListener and seek()ing in that callback. We pause() the KafkaMessageListenerContainer prior to starting it so that we don't read any messages prior to the ConsumerRebalanceListener being invoked (then we resume() the container inside the ConsumerRebalanceListener.onPartitionsAssigned() callback). We acknowledge messages manually as they are consumed.
Issue:
While in the middle of reading these older messages (1000s of messages and 10s of seconds/minutes into the reading), a separate application produces messages into the same topic and partition we're reading from.
We observe that these newly produced messages are consumed immediately, intermingled with the older messages we're in the process of reading. So we observe message offsets that jump in this single consumer thread: from the basically sequential offsets of the older messages to ones that are from the new messages that were just produced, and then back to the older, sequential ones.
We don't see any errors in reading messages or anything that would trigger retries or anything like that. The reads of newer messages happen in the main thread as do the reads of older messages, so I don't believe there's another listener container running.
How could this happen? Doesn't this seem contrary to the ordering guarantees Kafka is supposed to provide? How can we prevent this behavior?
Details:
We have the following settings (some in properties, some in code, please excuse the mix):
properties.consumer.isolationLevel = KafkaProperties.IsolationLevel.READ_COMMITTED
properties.consumer.maxPollRecords = 500
containerProps.ackMode = ContainerProperties.AckMode.MANUAL
containerProps.eosMode = ContainerProperties.EOSMode.BETA
spring.kafka.consumer.auto-offset-reset=none
spring.kafka.enable-auto-commit=false
Versions:
Spring Kafka 2.5.5.RELEASE
Kafka 2.5.1
(we could definitely try upgrading if there was a reason to believe this was the result of a bug that was fixed since then.)
I can share some code snippets for any of the above if it's interesting.

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

How to make restart-able producer?

Latest version of kafka support exactly-once-semantics (EoS). To support this notion, extra details are added to each message. This means that at your consumer; if you print offsets of messages they won't be necessarily sequential. This makes harder to poll a topic to read the last committed message.
In my case, consumer printed something like this
Offset-0 0
Offset-2 1
Offset-4 2
Problem: In order to write restart-able proudcer; I poll the topic and read the content of last message. In this case; last message would be offset#5 which is not a valid consumer record. Hence, I see errors in my code.
I can use the solution provided at : Getting the last message sent to a kafka topic. The only problem is that instead of using consumer.seek(partition, last_offset=1); I would use consumer.seek(partition, last_offset-2). This can immediately resolve my issue, but it's not an ideal solution.
What would be the most reliable and best solution to get last committed message for a consumer written in Java? OR
Is it possible to use local state-store for a partition? OR
What is the most recommended way to store last message to withstand producer-failure? OR
Are kafka connectors restartable? Is there any specific API that I can use to make producers restartable?
FYI- I am not looking for quick fix
In my case, multiple producers push data to one big topic. Therefore, reading entire topic would be nightmare.
The solution that I found is to maintain another topic i.e. "P1_Track" where producer can store metadata. Within a transaction a producer will send data to one big topic and P1_Track.
When I restart a producer, it will read P1_Track and figure out where to start from.
Thinking about storing last committed message in a database and using it when producer process restarts.

How Kafka Connectors are reliable in case of failures?

I'm thinking of using a Kafka Connector vs creating my own Kafka Consumers/Producers to move some data from/to Kafka, and I see the value Kafka Connectors provide in terms of scalability and fault tolerance. However, I haven't been able to find how exactly connectors behave if the "Task" fails for some reason. Here are a couple of scenarios:
For a sink connector (S3-Sink), if it (the Task) fails (after all retries) to successfully send the data to the destination (for example due to a network issue), what happens to the worker? Does it crash? Is it able to re-consume the same data from Kafak later on?
For a source connector (JDBC Source), if it fails to send to Kafka, does it re-process the same data later on? Does it depend on what the source is?
Does answer to the above questions depend on which connector we are talking about?
In Kafka 2.0, I think, they introduced the concept of graceful error handling, which can skip the over bad messages or write to a DLQ topic.
1) The S3 sink can fail, and it'll just stop processing data. However, if you fix the problem (for various edge cases that may arise) the sink itself is exactly once delivery to S3. The consumed offsets are stored as a regular consumer offset offset will not commit to Kafka until the file upload completes. However, obviously if you don't fix the issue before the retention period of a topic, you're losing data.
2) Yes, it depends on the source. I don't know the semantics of the JDBC Connector, but it really depends which query mode you're using. For example, for the incrementing timestamp, if you try to run a query every 5 seconds for all rows within a range, I do not believe it'll retry old, missed time windows
Overall, the failure recovery scenario are all dependent on the systems that are being connected to. Some errors are recoverable, and some are not (for example, your S3 access keys get revoked, and it won't write files until you get a new credential set)

Simple-Kafka-consumer message delivery duplication

I am trying to implement a simple Producer-->Kafka-->Consumer application in Java. I am able to produce as well as consume the messages successfully, but the problem occurs when I restart the consumer, wherein some of the already consumed messages are again getting picked up by consumer from Kafka (not all messages, but a few of the last consumed messages).
I have set autooffset.reset=largest in my consumer and my autocommit.interval.ms property is set to 1000 milliseconds.
Is this 'redelivery of some already consumed messages' a known problem, or is there any other settings that I am missing here?
Basically, is there a way to ensure none of the previously consumed messages are getting picked up/consumed by the consumer?
Kafka uses Zookeeper to store consumer offsets. Since Zookeeper operations are pretty slow, it's not advisable to commit offset after consumption of every message.
It's possible to add shutdown hook to consumer that will manually commit topic offset before exit. However, this won't help in certain situations (like jvm crash or kill -9). To guard againts that situations, I'd advise implementing custom commit logic that will commit offset locally after processing each message (file or local database), and also commit offset to Zookeeper every 1000ms. Upon consumer startup, both these locations should be queried, and maximum of two values should be used as consumption offset.