How does KStreams handle state store data when adding additional partitions? - apache-kafka

I have one partition of data with one app instance and one local state store. It's been running for a time and has lots of stateful data. I need to update that to 5 partitions with 5 app instances. What happens to the one local state store when the partitions are added and the app is brought back online? Do I have to delete the local state store and start over? Will the state store be shuffled across the additional app instance state stores automatically according to the partitioning strategy?

Do I have to delete the local state store and start over?
That is the recommended way to handle it. (cf https://docs.confluent.io/platform/current/streams/developer-guide/app-reset-tool.html) As a matter of fact, if you change the number of input topic partitions and restart your application, Kafka Stream would fail with an error, because the state store has only one shard, while 5 shards would be expected given that you will have 5 input topic partitions now.
Will the state store be shuffled across the additional app instance state stores automatically according to the partitioning strategy?
No. Also note, that this also applies to your data in your input topic. Thus, if you plan to partition your input data by key (ie, when writing into the input topic upstream), old records would remain in the existing partition and thus would not be partitioned properly.
In general, it is recommended to over-partitions your input topics upfront, to avoid that you need to change the number of partitions later on. Thus, you might also consider to maybe go up to 10, or even 20 partitions instead of just 5.

Related

Sharing state between KStream applications in same consumer group with globalStateStore

The current problem I am trying to solve is about sharing states between multiple applications in the same consumer group, they consume data from the same topic but different partition.
So I have an inputTopic with say 3 partitions. I will be running 3 KStream microservices (eg: MS1, MS2, MS3) of each partition, each microservice will process and the write result to an output topic.
Problem: most of the time the microservice can operate independently within its partition, but there are cases a microservice will need to pull the previous state of an attribute before it is able to process, and this state might previously be processed and stored by another microservice.
So an example would be if I have data of a guy walking on 3 section of a road. Each section represents a partition. So if this guy walk from section 1 to section 2, we are no longer publishing his state from section 1 publisher. His state is now published by section 2 publisher. And if I have a microservice to process his data per section. When I see records coming to section 2, I need to check if the guy's previous state whether he just started walking on my section or is he coming to my section from another section in order for me to continue process his data.
Proposed solution: I have been reading about globalStateStore, and it seems like it might solve my problem. So I will write down my thinking here and some questions, just wondering if you can see any problems in my approach:
Have the microservices read input topic from its assigned partition.
Have a GlobalStateStore to store the state so all 3 microservices can read it.
Since you can not write directly into the globalStateStore, I might have to create an intermediate topic to store the state (eg: <BobLocation,Long>; <BobMood,String>). The global state store will be created from this topic ("global-topic") - Is this correct?
Then everytime my microservice get a message I will always read the globalStateStore to update its state then process the record - Do I read it as a GlobalKTable?
Then update the state to the "global-topic"
Is there any implication on the restart process? As I am storing the state from the global state store all the time is there a problem when one app dies and the other one takes over?
Thank you so much guys!

How long is the data in KTable stored?

This as reference, stream of profile updates stored in KTable object.
How long this data will be stored in KTable object?
Let say we run multiple instance of application. And somehow, an instance crash. How about KTable data belong to that instance? Is it will be "recovered" by another instance?
I am thinking about storing update of data that rarely updated. So if an instance crash and another instance will be build those data from scratch again, it is possible they will never get thos data again. Because they never be streamed again, or easy saying, very rarely.
The KTable is backed by a topic, so it would determine on what its retention + cleanup policies are.
If the cleanup policy is compact, then each unique key is stored "forever", or until the broker runs out of space, whichever is sooner.
If you run multiple instances, then each KTable will hold onto a subset of data from the partitions it consumed from, each table will not have all the data.
If any instance crashes, it will need to read all data from the beginning of its changelog topic, but you can configure standby replicas to account for that scenario
More info at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management

Kafka local state store of multiple partitions

I am using kafka processor api and I create a state store from a topic of 3 partitions(I have 3 brokers), I have 1 instance of stream. I wonder to know when I get local state store, can I get all keys? Why certain keys work but certain don't? Is it normal?
Thank you
The number if application instances does not matter for this case. Because the input topic has 3 partitions, the state store is created with 3 shards. Processing happens with 3 parallel tasks. Each task instantiates a copy of your topology, processes one input topic partition, and uses one shard.
Compare: https://kafka.apache.org/21/documentation/streams/architecture
If you want to access different shards, you can use "Interactive Queries" feature for key/value lookups (and key-range queried) over all shards.
Also, the is the notion of a global state store, that would load data from all partitions into a single store (not sharding). However, it provided different semantics compared to "regular" stores, because store updates are not time-synchronized with the other processing.

Kafka: topic compaction notification?

I was given the following architecture that I'm trying to improve.
I receive a stream of DB changes which end up in a compacted topic. The stream is basically key/value pairs and the keyspace is large (~4 GB).
The topic is consumed by one kafka stream process that stores the data in RockDB (separate for each consumer/shard). The processor does two different things:
join the data into another stream.
check if a message from the topic is a new key or an update to an existing one. If it is an update it sends the old key/value and the new key/value pair to a different topic (updates are rare).
The construct has a couple of problems:
The two different functionalities of the stream processor belong to different teams and should not be part of the same code base. They are put together to save memory. If we separate it we would have to duplicate RockDB's.
I would prefer to use a normal KTable join instead of the handcrafted join that's currently in the code.
RockDB seems to be a bit of overkill if the data is already persisted in a topic. We currently running into some performance issues and I assume it would be faster if we just keep everything in memory.
Question 1:
Is there a way to hook into the compaction process of a compacted topic? I would like a notification (to a different topic) for every key that is actually compacted (including the old and new value).
If this is somehow possible I could easily split the code bases apart and simplify the join.
Question 2:
Any other idea on how this can be solved more elegantly?
You overall design makes sense.
About your join semantics: I guess you need to stick with Processor API as regular KTable cannot provide you want. It's also not possible to hook into the compaction process.
However, Kafka Streams also supports in-memory state stores: https://kafka.apache.org/documentation/streams/developer-guide/processor-api.html#state-stores
RocksDB is used by default, to allow the state to be larger than available main-memory. Spilling to disk with RocksDB to reliability -- however, it also has the advantage, that stores can be recreated quicker if an instance come back online on the same machine, as it's not required to re-read the whole changelog topic.
If you want to split the app into two, is your own decision on how much resources you want to provide.

Kafka Streams - all instances local store pointing to the same topic

We have the following problem:
We want to listen on certain Kafka topic and build it's "history" - so for specified key extract some data, add it to already existing list for that key (or create a new one if it does not exist) an put it to another topic, which has only single partition and is highly compacted. Another app can just listen on that topic and update it's history list.
I'm thinking how does it fit with Kafka streams library. We can certainly use aggregation:
msgReceived.map((key, word) -> new KeyValue<>(key, word))
.groupBy((k,v) -> k, stringSerde, stringSerde)
.aggregate(String::new,
(k, v, stockTransactionCollector) -> stockTransactionCollector + "|" + v,
stringSerde, "summaries2")
.to(stringSerde, stringSerde, "transaction-summary50");
which creates a local store backed by Kafka and use it as history table.
My concern is, if we decide to scale such app, each running instance will create a new backed topic ${applicationId}-${storeName}-changelog (I assume each app has different applicationId). Each instance start to consume input topic, gets a different set of keys and build a different subset of the state. If Kafka decides to rebalance, some instances will start to miss some historic states in local store as they get a completely new set of partitions to consume from.
Question is, if I just set the same applicationId for each running instance, should it eventually replay all data from the very same kafka topic that each running instance has the same local state?
Why would you create multiple apps with different ID's to perform the same job? The way Kafka achieves parallelism is through tasks:
An application’s processor topology is scaled by breaking it into multiple tasks.
More specifically, Kafka Streams creates a fixed number of tasks based on the input stream partitions for the application, with each task assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of partitions to tasks never changes so that each task is a fixed unit of parallelism of the application.
Tasks can then instantiate their own processor topology based on the assigned partitions; they also maintain a buffer for each of its assigned partitions and process messages one-at-a-time from these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.
If you need to scale your app, you can start new instances running the same app (same application ID), and some of the already assigned tasks will reassigned to the new instance. The migration of the local state stores will be automatically handled by the library:
When the re-assignment occurs, some partitions – and hence their corresponding tasks including any local state stores – will be “migrated” from the existing threads to the newly added threads. As a result, Kafka Streams has effectively rebalanced the workload among instances of the application at the granularity of Kafka topic partitions.
I recommend you to have a look to this guide.
My concern is, if we decide to scale such app, each running instance will create a new backed topic ${applicationId}-${storeName}-changelog (I assume each app has different applicationId). Each instance start to consume input topic, gets a different set of keys and build a different subset of the state. If Kafka decides to rebalance, some instances will start to miss some historic states in local store as they get a completely new set of partitions to consume from.
Some assumptions are not correct:
if you run multiple instances of your application to scale your app, all of them must have the same application ID (cf. Kafka's consumer group management protocol) -- otherwise, load will not be shared because each instance will be considered an own application, and each instance will get all partitions assigned.
Thus, if all instanced do use the same application ID, all running application instance will use the same changelog topic name and thus, what you intend to do, should work out-of-the box.