Kafka Streams state store backup topic partitioning strategy - apache-kafka

Kafka guarantees that messages with same key will always go to the same
partition.
For instance, I have message with the string key: 2329. And two topics t1 and t2. When I perform write of this message it goes into partition 1 in both topics, as expected.
Now the problem itself: I'm using Kafka Streams 0.10.2.0 persistent state store, which automatically creates a backup topic. Now in case of this backup topic message with the key: 2329 goes into another partition (partition 0), which is strange to me.
Has anyone encountered this issue?

I've found where was the issue.
To increase performance I've skipped repartitioning before write data to the state store. And used another column from the value as the state store key. And it worked until additional topic with enrichment information has been added. So I just forgot to perform repartitioning.

Related

Kafka Streams: Increasing topic partitions for an application performing a KTable-KTable foreign key join

Most of the information I find relates to primary key joins. I understand foreign key joins are a relatively new feature for Kafka Streams. I'm interested in how this will scale. I understand that Kafka Streams parallelism is capped by the number of partitions on each topic, however I have a few questions around what it means to increase the input topic partitions.
Does the foreign key join have the same requirement to co-partition input topics? That is, do both topics need to have the same number of partitions?
How does one add a partitions later after the application has been running in production for months or years? The changelog topics backing each KTable store data from certain input topic partitions. If one is to increase the partitions in the input topics, how does this impact our KTables' state stores and changelogs? Presumably, we cannot just start over and lose that data since it has accumulated over months and years and is essential to performing the join. It may not be quickly replaced by upstream data. Do we need to blow away our state stores, create new input topics, and re-send all KTable changelog topic data to them?
How about the other internal "subscription" topics?
Does the foreign key join have the same requirement to co-partition input topics? That is, do both topics need to have the same number of partitions?
No. For more details check out https://www.confluent.io/blog/data-enrichment-with-kafka-streams-foreign-key-joins/
How does one add a partitions later after the application has been running in production for months or years?
You cannot really do this, even if you don't use Kafka Streams. The issue is, that your input data is partitioned by key, and if you add a partition the partitioning in your input topic breaks. -- The recommended pattern is to create a new topic with different number of partitions.
The changelog topics backing each KTable store data from certain input topic partitions. If one is to increase the partitions in the input topics, how does this impact our KTables' state stores and changelogs?
It would break the application. In fact, Kafka Streams will check and will raise an exception if it detect that the number of input topic partitions does not match the number of changelog topic partitions.

Tombstones and key store cleanup

I have a few Kafka consumers implemented in Java and I am implementing a standalone application to check records and tombstone them. The hope is that Kafka will delete state stores as it compacts topics.
Now... I am getting a bit confused on the different types of stores that are created by Kafka. For each type of store, I would like to know:
Is it deleted when Kafka deletes old records in the corresponding topic?
Is it deleted when you tombstone records in the corresponding topic?
Are we stuck with it?
The types of stores that I see are the following:
KSTREAM-AGGREGATE-STATE-STORE changelog
KSTREAM-AGGREGATE-STATE-STORE repartition
(KTABLE) STATE-STORE changelog
KSTREAM-KEY-SELECT repartition
For the stream topology using an Aggregate function, we already have a tombstone strategy which should cover stores of type #1. We send a null message to the stream application & it returns it as the aggregate result.
For stores of type #3, I will be running a tombstoning application on the corresponding ktable. I expect the changelog to shrink.
However, for stores of type #2 and #4, I have no idea how they are cleaned up. They corresponding to selectKey()+leftJoin() functions in my consumer's topology. However, they are tied to a kstream-centric topology so I don't know what to do to have them cleaned up. Any suggestions that don't involve stopping the broker?
#2 and #4 are not stores - they are internal topics created by Kafka Streams.
There is no way to explicitly cleanup these repartition topics but Kafka would automatically cleanup these based on the kafka server's retention.ms config (which is default of 7 days if not specified explicitly) as these topics (#2 and #4) are created by Kafka Streams with cleanup.policy = delete.

Kafka message partitioning by key

We have a business process/workflow that is being started when initial event message is received and closed when the last message is processed. We have up to 100,000 processes executed each day. My problem is that the order of the messages that come to specific process has to be processed by the same order messages were received. If one of the messages fails, the process has to freeze until the problem is fixed, despite that all other processes has to continue. For this kind of situation i am thinking of using Kafka. first solution that came to my mind was to use Topic partitioning by message key. The key of the message would be the ProcessId. This way i could be sure that all process messages would be partitioned and kafka would guarantee the order. As i am new to Kafka what i managed to figure out that partitions has to be created in advance and that makes everything to difficult. so my questions are:
1) when i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions?
2) there can be more than 100,000 active partitions on the topic, is that a problem?
3) can partition be deleted after all messages from that topic were read?
4) maybe you can suggest other approaches to my problem?
When i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions?
You need to specify number of partitions while creating topic. New Partitions won't be create automatically(as is the case with topic creation), you have to change number of partitions using topic tool.
More Info: https://kafka.apache.org/documentation/#basic_ops_modify_topi
As soon as you increase number of partitions, producer and consumer will be notified of new paritions, thereby leading them to rebalance. Once rebalanced, producer and consumer will start producing and consuming from new partition.
there can be more than 100,000 active partitions on the topic, is that a problem?
Yes, having this much partitions will increase overall latency.
Go through how-choose-number-topics-partitions-kafka-cluster on how to decide number of partitions.
can partition be deleted after all messages from that topic were read?
Deleting a partition would lead to data loss and also the remaining data's keys would not be distributed correctly so new messages would not get directed to the same partitions as old existing messages with the same key. That's why Kafka does not support decreasing partition count on topic.
Also, Kafka doc states that
Kafka does not currently support reducing the number of partitions for a topic.
I suppose you choose wrong feature to solve you task.
In general, partitioning is used for load balancing.
Incoming messages will be distributed on given number of partition according to the partitioning strategy which defined at broker start. In short, default strategy just calculate i=key_hash mod number_of_partitions and put message to ith partition. More about strategies you could read here
Message ordering is guaranteed only within partition. With two messages from different partitions you have no guarantees which come first to the consumer.
Probably you would use group instead. It's option for consumer
Each group consumes all messages from topic independently.
Group could consist of one consumer or more if you need it.
You could assign many groups and add new group (in fact, add new consumer with new groupId) dynamically.
As you could stop/pause any consumer, you could manually stop all consumers related to specified group. I suppose there is no single command to do that but I'm not sure. Anyway, if you have single consumer in each group you could stop it easily.
If you want to remove the group you just shutdown and drop out related consumers. No actions on broker side is needed.
As a drawback you'll get 100,000 consumers which read (single) topic. It's heavy network load at least.

Get latest values from a topic on consumer start, then continue normally

We have a Kafka producer that produces keyed messages in a very high frequency to topics whose retention time = 10 hours. These messages are real-time updates and the used key is the ID of the element whose value has changed. So the topic is acting as a changelog and will have many duplicate keys.
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal, keeping the minimum load on Kafka server and letting the consumer do most of the job. We tried many ways and none of them seems the best.
What we tried:
1 changelog topic + 1 compact topic:
The producer sends the same message to both topics wrapped in a transaction to assure successful send.
Consumer launches and requests the latest offset of the changelog topic.
Consumes the compacted topic from beginning to construct the table.
Continues consuming the changelog since the requested offset.
Cons:
Having duplicates in compacted topic is a very high possibility even with setting the log compaction frequency the highest possible.
x2 number of topics on Kakfa server.
KSQL:
With KSQL we either have to rewrite a KTable as a topic so that consumer can see it (Extra topics), or we will need consumers to execute KSQL SELECT using to KSQL Rest Server and query the table (Not as fast and performant as Kafka APIs).
Kafka Consumer API:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Kafka Streams:
By using KTables as following:
KTable<Integer, MarketData> tableFromTopic = streamsBuilder.table("topic_name", Consumed.with(Serdes.Integer(), customSerde));
KTable<Integer, MarketData> filteredTable = tableFromTopic.filter((key, value) -> keys.contains(value.getRiskFactorId()));
Kafka Streams will create 1 topic on Kafka server per KTable (named {consumer_app_id}-{topic_name}-STATE-STORE-0000000000-changelog), which will result in a huge number of topics since we a big number of consumers.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Thanks in advance.
By using KTables, Kafka Streams will create 1 topic on Kafka server per KTable, which will result in a huge number of topics since we a big number of consumers.
If you are just reading an existing topic into a KTable (via StreamsBuilder#table()), then no extra topics are being created by Kafka Streams. Same for KSQL.
It would help if you could clarify what exactly you want to do with the KTable(s). Apparently you are doing something that does result in additional topics being created?
1 changelog topic + 1 compact topic:
Why were you thinking about having two separate topics? Normally, changelog topics should always be compacted. And given your use case description, I don't see a reason why it should not be:
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal [...]
Hence compaction would be very useful for your use case. It would also prevent this problem you described:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Note that, to reconstruct the latest table values, all three of Kafka Streams, KSQL, and the Kafka Consumer must read the table's underlying topic completely (from beginning to end). If that topic is NOT compacted, this might indeed take a long time depending on the data volume, topic retention settings, etc.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Without knowing more about your use case, particularly what you want to do with the KTable(s) once they are populated, my answer would be:
Make sure the "changelog topic" is also compacted.
Try KSQL first. If this doesn't satisfy your needs, try Kafka Streams. If this doesn't satisfy your needs, try the Kafka Consumer.
For example, I wouldn't use the Kafka Consumer if it is supposed to do any stateful processing with the "table" data, because the Kafka Consumer lacks built-in functionality for fault-tolerant stateful processing.
Consumer starts and consumes the topic from beginning. This worked
perfectly, but the consumer has to consume the 10 hours change log to
construct the last values table.
During the first time your application starts up, what you said is correct.
To avoid this during every restart, store the key-value data in a file.
For example, you might want to use a persistent map (like MapDB).
Since you give the consumer group.id and you commit the offset either periodically or after each record is stored in the map, the next time your application restarts it will read it from the last comitted offset for that group.id.
So the problem of taking a lot of time occurs only initially (during first time). So long as you have the file, you don't need to consume from beginning.
In case, if the file is not there or is deleted, just seekToBeginning in the KafkaConsumer and build it again.
Somewhere, you need to store this key-values for retrieval and why cannot it be a persistent store?
In case if you want to use Kafka streams for whatever reason, then an alternative (not as simple as the above) is to use a persistent backed store.
For example, a persistent global store.
streamsBuilder.addGlobalStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(topic), keySerde, valueSerde), topic, Consumed.with(keySerde, valueSerde), this::updateValue);
P.S: There will be a file called .checkpoint in the directory which stores the offsets. In case if the topic is deleted in the middle you get OffsetOutOfRangeException. You may want to avoid this, perhaps by using UncaughtExceptionHandler
Refer to https://stackoverflow.com/a/57301986/2534090 for more.
Finally,
It is better to use Consumer with persistent file rather than Streams for this, because of simplicity it offers.

Kafka Streams - KTable from topic with retention policy

I'm experimenting with kafka streams and I have the following setup:
I have an existing kafka topic with a key space that is unbounded (but predictable and well known).
My topic has a retention policy (in bytes) to age out old records.
I'd like to materialize this topic into a Ktable where I can use the Interactive Queries API to retrieve records by key.
Is there any way to make my KTable "inherit" the retention policy from my topic? So that when records are aged out of the primary topic, they're no longer available in the ktable?
I'm worried about dumping all of the records into the KTable and having the StateStore grow unbounded.
One solution that I can think of is to transform into a windowed stream with hopping windows equal to a TimeToLive for the record, but I'm wondering if there's a better solution in a more native way.
Thanks.
It is unfortunately not support atm. There is a JIRA though: https://issues.apache.org/jira/browse/KAFKA-4212
Another possibility would be to insert tombstone messages (<key,null>) into the input topic. The KTable would pick those up and delete the corresponding key from the store.