KafkaStreams processing guarantee exactly_once and exactly_once_beta difference - apache-kafka

The question is simple, what is the difference between those two guarantees in Kafka Streams?
processing.guarantee: exactly_once / exactly_once_beta
Docs says
Using "exactly_once" requires broker version 0.11.0 or newer, while using "exactly_once_beta" requires broker version 2.5 or newer. Note that if exactly-once processing is enabled, the default for parameter commit.interval.ms changes to 100ms.
But there's nothing about difference.

When you configure exactly_once_beta, transaction processing will be done using a new implementation, enabling better performance as the number of producers increases.
Note however that a two-step migration will be necessary if you have been using exactly_once with an earlier Kafka version.

Related

Can new flink Kafka consumer (KafkaSource) start from the old FlinkKafkaConsumer's Savepoint/checkpoint?

I have a job which is running with old flink Kafka consumer ( FlinkKafkaConsumer ) Now I want to migrate it to KafkaSource . But I am not sure what will be the impact of this migration. I want my job to start from the latest successful checkpoint taken by old FlinkKafkaConsumer, Is that possible? If it is not possible then what should be the right way for me to migrate Kafka consumer?
Assuming the same configuration, the two should be able to be used interchangeably as long as your previous group-id configuration for the consumer matches the one used by your earlier implementation. You can use this in conjunction with OffsetsInitializer.latest() to ensure that you continue reading from the same offsets that were previously committed:
KafkaSource.<YourExampleClass>builder()
...
.setGroupId("your-previous-group-id")
.setStartingOffsets(OffsetsInitializer.latest())
While the two should just work, it's worth noting your specific pipeline and how it uses parallelism could reveal some of the differences between FlinkKafkaConsumer and the newer KafkaSource:
the KafkaSource behaves differently than FlinkKafkaConsumer in the case where the number of Kafka partitions is smaller than the parallelism of Flink's Kafka Source operator.
How to upgrade from FlinkKafkaConsumer to KafkaSource has been included in the release notes for Flink 1.14, when the FlinkKafkaConsumer was deprecated. You can find it at https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer

Micronaut kafka : unable to use exactly once kafka message semantics

I am using micronaut kafka to set up my producer. I am using the #KafkaClient annotation to set up all the producer config
Micronaut kafka enables me set all the parameters to set up a transactional producer.
When I push the message, i get back an exception saying
io.micronaut.messaging.exceptions.MessagingClientException: Exception sending producer record: Cannot perform 'send' before completing a call to initTransactions when transactions are enabled.
Referring back to the mirconaut documentation section looks like it is asking you to use KafkaProducer API to implement this feature.
From what I can assess KafkaProducer.initTransactions() method needs to be invoked before starting transactions and doesn't look that it is happening.
Has anyone faced a similar issue implementing this?
I guess, you are using single-node-cluster for development right? If it is so, you should configure transaction.state.log.min.isr=1 and transaction.state.log.replication.factor=1 on your local cluster. They are all preconfigured with 3 by default.
There is also a section from confluent https://docs.confluent.io/current/streams/developer-guide/config-streams.html
processing.guarantee
The processing guarantee that should be used. Possible values are "at_least_once" (default) and "exactly_once". Note that if exactly-once processing is enabled, the default for parameter commit.interval.ms changes to 100ms. Additionally, consumers are configured with isolation.level="read_committed" and producers are configured with retries=Integer.MAX_VALUE and enable.idempotence=true per default. Note that "exactly_once" processing requires a cluster of at least three brokers by default, which is the recommended setting for production. For development, you can change this by adjusting the broker settings in both transaction.state.log.replication.factor and transaction.state.log.min.isr to the number of brokers you want to use.

Can we set a custom value for the producer configuration 'delivery.timeout.ms'?

I'm using apache kafka clients 2.0.1 and I'm looking at the class ProducerConfig and didn't find the property delivery.timeout.ms. Does it mean, we can't override this producer configuration to set a custom value?
This parameter was introduced via KIP-91.
That KIP was implemented in Kafka version 2.1.0.
Released Nov 20, 2018
Kafka 2.1.0 includes a number of significant new features. Here is a
summary of some notable changes:
Java 11 support
Support for Zstandard, which achieves compression comparable to gzip with higher compression and especially decompression speeds
(KIP-110)
Avoid expiring committed offsets for active consumer group (KIP-211)
Provide Intuitive User Timeouts in The Producer (KIP-91)
...
Time to upgrade :)

Fault Tolerance of FlinkKafkaConsumer in HiBench

I am running some experiments to test the fault tolerance capabilities of Apache Flink. I am currently using the HiBench framework with the WordCount micro benchmark implemented for Flink.
I noticed that if I kill a TaskManager during an execution, the state of the Flink operators is recovered after the automatic "redeploy" but many (all?) tuples sent from the benchmark to Kafka are missed (stored in Kafka but not received in Flink).
It seems that after the recovery, the FlinkKafkaConsumer (the benchmark uses FlinkKafkaConsumer08) in place of start reading from the last offset read before the failure start reading from the latest available one (losing all the event sent during the failure).
Any suggestion?
Thanks!
The problem was with the HiBench framework itself and with the latest version of Flink.
I had to update the version of Flink in the benchmark in order to use the "setStartFromGroupOffsets()" method in the Kafka consumer.

Kafka: copy topics between different versions?

I've got two brokers. the first runs 0.9 and the second runs 0.10
Various workers and daemons consume and produce messages on both brokers.
For one application, I need messages from a 0.9 topic consumable from an application that's using KStreams and is connected to the 0.10 broker.
Is there a straightforward way of copying just the one topic from 0.9 to 0.10? Or using the 0.10 clients to connect to 0.9? I'd hate to have to resort to cramming both versions in the same jar. Just consuming the 0.9 broker with a 0.10 client doesn't seem to work.
In general, only Kafka broker are backward compatible (not Kafka client). Thus, a client can connect to newer brokers, but not to older once.
Because Kafka Streams library uses 0.10.x client, it only works with 0.10.x brokers.
Thus, upgrading your broker as described here: https://kafka.apache.org/documentation.html#upgrade should be the best way to go (this is safe, as brokers are backward compatible, thus not breaking any other applications using this broker with older clients).
As an alternative, you could also use Mirror Maker to replicate the topic from 0.9.x cluster to 0.10.x cluster.
my solution was to use jarjar to rewrite the 0.9 clients jar so that the classes/types don't conflict with the 0.10 client. it's dirty, but it works around the jvm's opinion on having two versions of the same library.