On a Kafka Broker, it's recommended to use multiple drives for the message logs to improve throughput. That's why they have a log.dirs property that can have multiple directories that will be assigned to partitions in a round-robin fashion.
We have a lot of installations that we already setup this way for event-driven kafka applications, where we have like 4 nodes with 5 disks each.
Now we want to use Kafka-Streams with a Key-Value store where we persist computed data for fast range queries. We see that Kafka-Streams maps the partitions 1-on-1 to multiple statestores, and creates a separate subdirectory for each one.
However, we can't configure how to spread those subdirectories across different disks. We can only configure a single parent directory as 'state.dir' (StreamsConfig.STATE_DIR_CONFIG).
Is there a configuration I am missing? Or is having multiple disks not so relevant for Kafka Streams?
It's not really relevant, but this must be handled at the OS level via RAID configurations, for example.
Or you can implement the StateStore interface and write your own provider that can use multiple disks (or remote distributed filesystems)
Related
We are thinking about using the Strimzi Kafka-Bridge(https://strimzi.io/docs/bridge/latest/#proc-creating-kafka-bridge-consumer-bridge) as HTTP(s) Gateway to an existing Kafka Cluster.
The documentation mentions the creation of consumers using arbitrary names for taking part in a consumer-group. These names can subsequently be used to consume messages, seek or sync offsets,...
The question is: Am I right in assuming the following?
The bridge-consumers seem to be created and maintained just in one Kafka-Bridge instance.
If I want to use more than one bridge because of fault-tolerance-requirements, the name-information about a specific consumer will not be available on the other nodes, since there is no synchronization or common storage between the bridge-nodes.
So if the clients of the kafka-bridge are not sticky, as soon as a it communicates (e.g. because of round-robin handling by a load-balancer) with another node, the consumer-information will not be available and the http(s)-clients must be prepared to reconfigure the consumers on the new communicating node.
The offsets will be lost. Worst case the fetching of messages and syncing their offsets will always happen on different nodes.
Or did I overlook anything?
You are right. The state and the Kafka connections are currently not shared in any way between the bridge instances. The general recommendation is that when using consumers, you should run the bridge only with single replica (and if needed deploy different bridge instances for different consumer groups).
I am trying to understand Stateful Stream processor.
As I understand in this type of stream-processor, it maintains some sort of state using State Store.
I came to know, one of the ways to implement State Store is using RocksDB. Assuming the following topology (and only one processor being stateful)
A->B->C ; processor B as stateful with local state store and changelog enabled. I am using low level API.
Assuming the sp listens on a single kafka topic, say topic-1 with 10 partitions.
I observed, that when the application is started (2 instances in different physical machines and num.stream.threads = 5), then for state store it creates directory structure which
has something like below:
0_0 , 0_1, 0_2.... 0_9 (Each machines has five so total 10 partitions).
I was going through some online material where it said we should create a StoreBuilder and attach it topology using addStateStore() instead of creating a state store within a processor.
Like:
topology.addStateStore(storeBuilder,"processorName")
Ref also: org.apache.kafka.streams.state.Store
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
The second part: For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store?
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
In order to let Kafka Streams manage the store for you (fault-tolerance, migration), Kafka Streams needs to be aware of the store. Thus, you give Kafka Streams a StoreBuilder and Kafka Streams creates and manages the store for you.
If you just create a store inside your processor, Kafka Streams is not aware of the store and the store won't be fault-tolerant.
For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store?
Yes, there is a mapping. The store is shared base in the number of input topic partitions. You also get a "task" per partition and the task directories are name y_z with y being the sub-topology number and z being the partition number. For your simple topology you only have one sub-topology to all directories you see have the same 0_ prefix.
Hence, you logical store has 10 physical shards. This sharding allows Kafka Streams to mirgrate state when the corresponding input topic partition is assigned to a different instance. Overall, you can run up to 10 instanced and each would process one partition, and host one shard of your store.
I have a Spring boot application that uses Processor API to generate a Topology and also a addGlobalStateStore to the same topology.
I want to create another topology (and hence another KafkaStreams) for reading from another set of topics and want to share the previously created store in the new topology. By share I mean that the underlying state store should be the same for both topologies. Any data written from one topology should be visible in the other.
Is that possible without writing wrapper endpoints to access the state store e.g. REST calls?
Or does my usecase need an external state store e.g. redis?
No, you can't share the state stores across topologies. Instead if possible, you can break down your topologies as sub-topologies and that will make it available across all the processors defined.
If that is not possible for you, you can use external storage.
According to Stream Partitions and Tasks:
Sub-topologies (also called sub-graphs): If there are multiple
processor topologies specified in a Kafka Streams application, each
task only instantiates one of the topologies for processing. In
addition, a single processor topology may be decomposed into
independent sub-topologies (or sub-graphs). A sub-topology is a set of
processors, that are all transitively connected as parent/child or
via state stores in the topology. Hence, different sub-topologies
exchange data via topics and don’t share any state stores. Each task
may instantiate only one such sub-topology for processing. This
further scales out the computational workload to multiple tasks.
This means that sub-topologies (hence topologies too) can't share any state stores.
Solution for your scenario:
create a single KafkaStreams instance with its topology containing everything you otherwise would put in your 2 distinct topologies. This will determine the creation of a single task for the entire topology (because of that store usage); there'll be no sub-topology because you are using a store used by both initially distinct topologies. This also means that the entire topology can be run by a single thread only (this is the main drawback), can't be splitted in sub-topologies to be run by multiple threads - this doesn't mean that the topology as a whole can't be run by multiple threads depending on the chosen parallelism (num.stream.threads).
I am using kafka processor api and I create a state store from a topic of 3 partitions(I have 3 brokers), I have 1 instance of stream. I wonder to know when I get local state store, can I get all keys? Why certain keys work but certain don't? Is it normal?
Thank you
The number if application instances does not matter for this case. Because the input topic has 3 partitions, the state store is created with 3 shards. Processing happens with 3 parallel tasks. Each task instantiates a copy of your topology, processes one input topic partition, and uses one shard.
Compare: https://kafka.apache.org/21/documentation/streams/architecture
If you want to access different shards, you can use "Interactive Queries" feature for key/value lookups (and key-range queried) over all shards.
Also, the is the notion of a global state store, that would load data from all partitions into a single store (not sharding). However, it provided different semantics compared to "regular" stores, because store updates are not time-synchronized with the other processing.
I have 4 machines where a Kafka Cluster is configured with topology that
each machine has one zookeeper and two broker.
With this configuration what do you advice for maximum topic&partition for best performance?
Replication Factor 3:
using kafka 0.10.XX
Thanks?
Each topic is restricted to 100,000 partitions no matter how many nodes (as of July 2017)
As to the number of topics that depends on how large the smallest RAM is across the machines. This is due to Zookeeper keeping everything in memory for quick access (also it doesnt shard the znodes, just replicates across ZK nodes upon write). This effectively means once you exhaust one machines memory that ZK will fail to add more topics. You will most likely run out of file handles before reaching this limit on the Kafka broker nodes.
To quote the KAFKA docs on their site (6.1 Basic Kafka Operations https://kafka.apache.org/documentation/#basic_ops_add_topic):
Each sharded partition log is placed into its own folder under the Kafka log directory. The name of such folders consists of the topic name, appended by a dash (-) and the partition id. Since a typical folder name can not be over 255 characters long, there will be a limitation on the length of topic names. We assume the number of partitions will not ever be above 100,000. Therefore, topic names cannot be longer than 249 characters. This leaves just enough room in the folder name for a dash and a potentially 5 digit long partition id.
To quote the Zookeeper docs (https://zookeeper.apache.org/doc/trunk/zookeeperOver.html):
The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.
Performance:
Depending on your publishing and consumption semantics the topic-partition finity will change. The following are a set of questions you should ask yourself to gain insight into a potential solution (your question is very open ended):
Is the data I am publishing mission critical (i.e. cannot lose it, must be sure I published it, must have exactly once consumption)?
Should I make the producer.send() call as synchronous as possible or continue to use the asynchronous method with batching (do I trade-off publishing guarantees for speed)?
Are the messages I am publishing dependent on one another? Does message A have to be consumed before message B (implies A published before B)?
How do I choose which partition to send my message to?
Should I: assign the message to a partition (extra producer logic), let the cluster decide in a round robin fashion, or assign a key which will hash to one of the partitions for the topic (need to come up with an evenly distributed hash to get good load balancing across partitions)
How many topics should you have? How is this connected to the semantics of your data? Will auto-creating topics for many distinct logical data domains be efficient (think of the effect on Zookeeper and administrative pain to delete stale topics)?
Partitions provide parallelism (more consumers possible) and possibly increased positive load balancing effects (if producer publishes correctly). Would you want to assign parts of your problem domain elements to specific partitions (when publishing send data for client A to partition 1)? What side-effects does this have (think refactorability and maintainability)?
Will you want to make more partitions than you need so you can scale up if needed with more brokers/consumers? How realistic is automatic scaling of a KAFKA cluster given your expertise? Will this be done manually? Is manual scaling viable for your problem domain (are you building KAFKA around a fixed system with well known characteristics or are you required to be able to handle severe spikes in messages)?
How will my consumers subscribe to topics? Will they use pre-configured configurations or use a regex to consume many topics? Are the messages between topics dependent or prioritized (need extra logic on consumer to implement priority)?
Should you use different network interfaces for replication between brokers (i.e. port 9092 for producers/consumers and 9093 for replication traffic)?
Good Links:
http://cloudurable.com/ppt/4-kafka-detailed-architecture.pdf
https://www.slideshare.net/ToddPalino/putting-kafka-into-overdrive
https://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
https://kafka.apache.org/documentation/