Mix of State Stores and Partitions on kafka stream instances - apache-kafka

I built a kafka streaming application with a state store. Now I am trying to scale this application. When running the application on three different servers Kafka splits up partitions and state stores randomly.
For example:
Instance1 gets: partition-0, partition-1
Instance2 gets: partition-2, stateStore-repartition-0
Instance3 gets: stateStore-repartition-1, stateStore-repartition-2
I want to assign one stateStore and one partition per instance. What am I doing wrong?
My KafkaStreams Config:
final Properties properties = new Properties();
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS_CONFIG);
try {
properties.setProperty(StreamsConfig.STATE_DIR_CONFIG,
Files.createTempDirectory(stateStoreName).toAbsolutePath().toString());
} catch (final IOException e) {
// use the default one
}
And my stream is:
stream.groupByKey()
.windowedBy(TimeWindows.of(timeWindowDuration))
.<TradeStats>aggregate(
() -> new TradeStats(),
(k, v, tradestats) -> tradestats.add(v),
Materialized.<String, TradeStats, WindowStore<Bytes, byte[]>>as(stateStoreName)
.withValueSerde(new TradeStatsSerde()))
.toStream();

From what I can see so far (as mentioned in my comment to your question, please share your state store definition), everything is fine and I suspect a slight misconception on your side regarding the question
What am I doing wrong?
Basically, nothing. :-)
For the partition part of your question: They get distributed around the consumers according to the configured assignor (consult https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/CooperativeStickyAssignor.html or adjacent interfaces).
For the state store part of your question: May be here lies a little misconception on how (in memory) state stores work: They are usually backed by a Kafka topic which does not reside on your application host(s) but in the Kafka cluster itself. To be more precise, a part of the whole state store lives in the (RocksDB) in-memory key/value store on each of your application hosts, exactly as you showed in the state store assignment in your question. However these are only parts or slices of the complete state store which is maintained in the Kafka cluster.
So in a nutshell: Everything is fine, let Kafka do the assignment job and interfere with this only if you have really special use-cases or good reasons. :-) Kafka also assures correct redundancy and re-balancing of all partitions in case of outages of your application hosts.
If you still want to assign something on your own, the use-case would be interesting for further help.

Related

Kafka Stateful Stream processor with statestore: Behind the scenes

I am trying to understand Stateful Stream processor.
As I understand in this type of stream-processor, it maintains some sort of state using State Store.
I came to know, one of the ways to implement State Store is using RocksDB. Assuming the following topology (and only one processor being stateful)
A->B->C ; processor B as stateful with local state store and changelog enabled. I am using low level API.
Assuming the sp listens on a single kafka topic, say topic-1 with 10 partitions.
I observed, that when the application is started (2 instances in different physical machines and num.stream.threads = 5), then for state store it creates directory structure which
has something like below:
0_0 , 0_1, 0_2.... 0_9 (Each machines has five so total 10 partitions).
I was going through some online material where it said we should create a StoreBuilder and attach it topology using addStateStore() instead of creating a state store within a processor.
Like:
topology.addStateStore(storeBuilder,"processorName")
Ref also: org.apache.kafka.streams.state.Store
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
The second part: For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store?
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
In order to let Kafka Streams manage the store for you (fault-tolerance, migration), Kafka Streams needs to be aware of the store. Thus, you give Kafka Streams a StoreBuilder and Kafka Streams creates and manages the store for you.
If you just create a store inside your processor, Kafka Streams is not aware of the store and the store won't be fault-tolerant.
For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store?
Yes, there is a mapping. The store is shared base in the number of input topic partitions. You also get a "task" per partition and the task directories are name y_z with y being the sub-topology number and z being the partition number. For your simple topology you only have one sub-topology to all directories you see have the same 0_ prefix.
Hence, you logical store has 10 physical shards. This sharding allows Kafka Streams to mirgrate state when the corresponding input topic partition is assigned to a different instance. Overall, you can run up to 10 instanced and each would process one partition, and host one shard of your store.

Correlating in Kafka and dynamic topics

I am building a correlated system using Kafka. Suppose, there's a service A that performs data processing and there're its thousands of clients B that submit jobs to it. Bs are short-lived, they appear on the network, push the data to A and then two important things happen:
B will immediately receive a status from A;
B then will either
drop out completely, stay online to receive further updates on
status, or will sporadically pop back on to check the status.
(this is not dissimilar to grid computing or mpi).
Both points should be achieved using a well-known concept of correlationId: B possesses a unique id (UUID in my case), which it sends to A in headers, which, in turn, uses it as Reply-To topic to send status updates to. Which means it has to create topics on the fly, they can't be predetermined.
I have auto.create.topics.enable switched on, and it indeed creates topics dynamically, but existing consumers are not aware of them and require to be restarted [to fetch topic metadata i suppose, if i understood the docs right]. I also checked consumer's metadata.max.age.ms setting, but it doesn't help it seems, even if i set it to a very low value.
As far as i've read, this is yet unanswered, i.e.: kafka filtering/Dynamic topic creation, kafka consumer to dynamically detect topics added, Can a Kafka producer create topics and partitions? or answered unsatisfactory.
As there're hundreds of As and thousands of Bs, i can't possibly use shared topics or anything like it, lest i overload my network. I can use Kafka's AdminTools, or whatever it's called, to pre-create topics, but i find it somehow silly (even though i saw real-life examples of people using it to talk to Zookeeper and Kafka infrastructure itself).
So the question is, is there a way to dynamically create Kafka topics in a way that makes both consumer and producer aware of it without being restarted or anything? And, in the worst case, will AdminTools really help it and on which side must i use it - A or B?
Kafka 0.11, Java 8
UPDATE
Creating topics with AdminClient doesn't help for whatever reason, consumers still throw LEADER_NOT_AVAILABLE when i try to subscribe.
Ok, so i’d answer my own question.
Creating topics with AdminClient works only if performed before corresponding consumers are created.
Changed the topology i have, taking into account 1) and introducing exchange of correlation ids in message headers (same as in JMS). I also had to implement certain topology management methodologies, grouping Bs into containers.
It should be noted that, as many people have said, this only works when Bs are in single-consumer groups and listen to topics with 1 partition.
To get some idea of the work i'm into, you might have a look at the middleware framework i've been working on https://github.com/ikonkere/magic.
Creating an unbounded number of topics is not recommended. Id advise to redesign your topology/system.
Ive thought of making dynamic topics myself but then realized that eventually zookeeper will fail as it will run out of memory due to stale topics (imagine a year from now on how many topics could be created). Maybe this could work if you make sure you have some upper bound on topics ever created. Overall an administrative headache.
If you look up using Kafka with request response you will find others also say it is awkward to do so (Does Kafka support request response messaging).

What are the side effects of using Apache Kafka a a key/value store?

I know Kafka is not a k/v store, but bear with me. Suppose that it's roughly implemented using the k/v API below. Each key is a topic, and the current "value" of the key is the last message written to the topic:
put(key, value) --> publish(topic=key, message=value)
get(key) --> consume(topic=key, offset = last_offset - 1)
Furthermore, suppose that the state is replicated between different Kafka clusters (using MirrorMaker bidirectionally), as to allow users to read/write to closer datacenter to reduce latency.
I already know some of the obvious side effects of doing this, For instance:
Since a "key" maps to a topic, you can only have 1 partition in order to guarantee ordering (because you want the last value put to always be at the end of the log).
The retention policy needs to be considered, because the last message in the log could be deleted
If you do a put(key, value) to the cluster closest to you, even though that is technically the most recent put on that key, MirrorMaker (due to latency) may publish an outdated key from another cluster, overwriting your most recent put value
The main concerns here are latency, especially between different clusters. How do you think this solution holds up under a stressful workload (say, thousands of writes / second on a given key / topic), and stressful network conditions, compared to a tradition k/v solution such as Redis, memcached or etcd?
Thoughts?
Thank you kindly.
Kafka can works as an KV event store, actually there is an improvement already implemented: https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
And here are a couple of links with more examples of how to use Kafka Streams to query the state stored in Kafka: https://blog.codecentric.de/en/2017/03/interactive-queries-in-apache-kafka-streams/, https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
It uses RocksDB by default but is pluggable: https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
You will have to think about how to manage the storage at the application level, but essentially, your concerns are managed by Kafka Streams API.
Hope this helps.

Kafka Streams - all instances local store pointing to the same topic

We have the following problem:
We want to listen on certain Kafka topic and build it's "history" - so for specified key extract some data, add it to already existing list for that key (or create a new one if it does not exist) an put it to another topic, which has only single partition and is highly compacted. Another app can just listen on that topic and update it's history list.
I'm thinking how does it fit with Kafka streams library. We can certainly use aggregation:
msgReceived.map((key, word) -> new KeyValue<>(key, word))
.groupBy((k,v) -> k, stringSerde, stringSerde)
.aggregate(String::new,
(k, v, stockTransactionCollector) -> stockTransactionCollector + "|" + v,
stringSerde, "summaries2")
.to(stringSerde, stringSerde, "transaction-summary50");
which creates a local store backed by Kafka and use it as history table.
My concern is, if we decide to scale such app, each running instance will create a new backed topic ${applicationId}-${storeName}-changelog (I assume each app has different applicationId). Each instance start to consume input topic, gets a different set of keys and build a different subset of the state. If Kafka decides to rebalance, some instances will start to miss some historic states in local store as they get a completely new set of partitions to consume from.
Question is, if I just set the same applicationId for each running instance, should it eventually replay all data from the very same kafka topic that each running instance has the same local state?
Why would you create multiple apps with different ID's to perform the same job? The way Kafka achieves parallelism is through tasks:
An application’s processor topology is scaled by breaking it into multiple tasks.
More specifically, Kafka Streams creates a fixed number of tasks based on the input stream partitions for the application, with each task assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of partitions to tasks never changes so that each task is a fixed unit of parallelism of the application.
Tasks can then instantiate their own processor topology based on the assigned partitions; they also maintain a buffer for each of its assigned partitions and process messages one-at-a-time from these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.
If you need to scale your app, you can start new instances running the same app (same application ID), and some of the already assigned tasks will reassigned to the new instance. The migration of the local state stores will be automatically handled by the library:
When the re-assignment occurs, some partitions – and hence their corresponding tasks including any local state stores – will be “migrated” from the existing threads to the newly added threads. As a result, Kafka Streams has effectively rebalanced the workload among instances of the application at the granularity of Kafka topic partitions.
I recommend you to have a look to this guide.
My concern is, if we decide to scale such app, each running instance will create a new backed topic ${applicationId}-${storeName}-changelog (I assume each app has different applicationId). Each instance start to consume input topic, gets a different set of keys and build a different subset of the state. If Kafka decides to rebalance, some instances will start to miss some historic states in local store as they get a completely new set of partitions to consume from.
Some assumptions are not correct:
if you run multiple instances of your application to scale your app, all of them must have the same application ID (cf. Kafka's consumer group management protocol) -- otherwise, load will not be shared because each instance will be considered an own application, and each instance will get all partitions assigned.
Thus, if all instanced do use the same application ID, all running application instance will use the same changelog topic name and thus, what you intend to do, should work out-of-the box.

Is Kafka Stream StateStore global over all instances or just local?

In Kafka Stream WordCount example, it uses StateStore to store word counts. If there are multiple instances in the same consumer group, the StateStore is global to the group, or just local to an consumer instance?
Thnaks
This depends on your view on a state store.
In Kafka Streams a state is shared and thus each instance holds part of the overall application state. For example, using DSL stateful operator use a local RocksDB instance to hold their shard of the state. Thus, with this regard the state is local.
On the other hand, all changes to the state are written into a Kafka topic. This topic does not "live" on the application host but in the Kafka cluster and consists of multiple partition and can be replicated. In case of an error, this changelog topic is used to recreate the state of the failed instance in another still running instance. Thus, as the changelog is accessible by all application instances, it can be considered to be global, too.
Keep in mind, that the changelog is the truth of the application state and the local stores are basically caches of shards of the state.
Moreover, in the WordCount example, a record stream (the data stream) gets partitioned by words, such that the count of one word will be maintained by a single instance (and different instances maintain the counts for different words).
For an architectural overview, I recommend http://docs.confluent.io/current/streams/architecture.html
Also this blog post should be interesting http://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
If worth mentioning that there is a GlobalKTable improvement proposal
GlobalKTable will be fully replicated once per KafkaStreams instance.
That is, each KafkaStreams instance will consume all partitions of the
corresponding topic.
From the Confluent Platform's mailing list, I've got this information
You could start
prototyping using Kafka 0.10.2 (or trunk) branch...
0.10.2-rc0 already has GlobalKTable!
Here's the actual PR.
And the person that told me that was Matthias J. Sax ;)
Use a Processor instead of Transformer, for all the transformations you want to perform on the input topic, whenever there is a usecase of lookingup data from GlobalStateStore . Use context.forward(key,value,childName) to send the data to the downstream nodes. context.forward(key,value,childName) may be called multiple times in a process() and punctuate() , so as to send multiple records to downstream node. If there is a requirement to update GlobalStateStore, do this only in Processor passed to addGlobalStore(..) because, there is a GlobalStreamThread associated with GlobalStateStore, which keeps the state of the store consistent across all the running kstream instances.