Kafka version: 1.0.0
Let's say the stream application uses low level processor API which maintains the state and reads from a topic with 10 partitions. Please clarify if the internal topic is expected to be created with the same number of partitions OR is it per the broker default. If it's the later, if we need to increase the partitions of the internal topic, is there any option?
Kafka Streams will create the topic for you. And yes, it will create it with the same number of partitions as your input topic. During startup, Kafka Streams also checks if the topic has the expected number of partitions and fails if not.
The internal topic is basically a regular topic as any other and you can change the number of partitions via command line tools like for any other topic. However, this should never be required. Also note, that dropping/adding partitions, will mess up your state.
Related
Kafka version: 1.0.0
Let's say the stream application uses low level processor API which maintains the state and reads from a topic with 10 partitions. Please clarify if the internal topic is expected to be created with the same number of partitions OR is it per the broker default. If it's the later, if we need to increase the partitions of the internal topic, is there any option?
Kafka Streams will create the topic for you. And yes, it will create it with the same number of partitions as your input topic. During startup, Kafka Streams also checks if the topic has the expected number of partitions and fails if not.
The internal topic is basically a regular topic as any other and you can change the number of partitions via command line tools like for any other topic. However, this should never be required. Also note, that dropping/adding partitions, will mess up your state.
I am studying kafka streams, table, globalktable etc. Now I am confusing about that.
What exactly is GlobalKTable?
But overall if I have a topic with N-partitions, and one kafka stream, after I send some data on the topic how much stream (partition?) will I have?
I made some tries and I notice that the match is 1:1. But what if I make topic replicated over different brokers?
Thank you all
I'll try to answer your questions as you have them listed here.
A GlobalKTable has all partitions available in each instance of your Kafka Streams application. But a KTable is partitioned over all of the instances of your application. In other words, all instances of your Kafka Streams application have access to all records in the GlobalKTable; hence it used for more static data and is used more for lookup records in joins.
As for a topic with N-partitions, if you have one Kafka Streams application, it will consume and work with all records from the input topic. If you were to spin up another instance of your streams application, then each application would process half of the number of partitions, giving you higher throughput due to the parallelization of the work.
For example, if you have input topic A with four partitions and one Kafka Streams application, then the single application processes all records. But if you were to launch two instances of the same Kafka Streams application, then each instance will process records from 2 partitions, the workload is split across all running instances with the same application-id.
Topics are replicated across different brokers by default in Kafka, with 3 being the default level of replication. A replication level of 3 means the records for a given partition are stored on the lead broker for that partition and two other follower brokers (assuming a three-node broker cluster).
Hope this clears things up some.
-Bill
I'm planning to do some test with Clickhouse by ingesting my kafka topics into a SummingMergeTree using this method: https://clickhouse.yandex/docs/en/table_engines/kafka/
For my test on a dev env, I'm not afraid of the volume but on the production environment we are already consuming those topics and we have to put many consumers to be able to read message as fast as they are pushed into. My question is: is there a way on Clickhouse to have many kafka consumer on one table with kafka engine ?
Thanks,
Romaric
Reading the documentation it seems that the num_consumers parameter in the Kafka engine is exactly what you need:
num_consumers – The number of consumers per table. Default: 1. Specify
more consumers if the throughput of one consumer is insufficient. The
total number of consumers should not exceed the number of partitions
in the topic, since only one consumer can be assigned per partition.
I'm a newbie in Kafka. I had a glance at the Kafka Documentation. It seems that the the message dispatched to a subscribing consumer group is implemented by binding the partition with the consumer instance.
One important thing we should remember when we work with Apache Kafka is the number of consumers in the same consumer group should be less than or equal the number of partitions in the consumed topic. Otherwise, the exceedable consumers will not be received any messages from the topic.
In a non-prod environment, I didn't config the topic partition. In such case, is there only a single partition in Kafka. And If I start multiple consumers sharing the same group and subscribe them to the topic, would the message always dispatched to the same instance in the group? In other words, I have to partition the topic to get the load-balance feature in consumer group?
Thanks!
You are absolutely right. One partitions cannot be processed in paralell (by one consumer group). You can treat partition as atomic and it cannot be split.
If you configure non-prod and prod env with the same amount of partitions per topic, that should help you to find correct number of conumsers and catch problems before moving to prod.
I have the following questions regarding topics and partitions
1) What is the difference between n-topics with m-partitions and nm topics ?
Would there be a difference when accessing m-partitions through m threads and nm topics using n*m different processes
2)A perfect use case differentiating high level and low level consumer
3)In case of a failure (i.e) message not delivered where can i find the error logs in Kafka.
1) What is the difference between n-topics with m-partitions and nm topics ?
There has to be at least one partition for every topic. Topic is just a named group of partitions and partitions are really streams of data. The code that uses Kafka producer normally is not concerned with partitions, it just sends a message to a topic.
By default producer uses round robin approach to select a partiton to store a message but you can create a custom one if needed and select a partition based on message's content.
If there is only one partition, only one broker processes messages for the topic and appends them to a file.
On the other hand, if there are as many partitions as brokers, message processing is parallelized and there is up to m times (minus overhead) speedup. That assumes that each broker is running on its own box and kafka data storage is not shared among brokers.
If there are more partitions for a topic than brokers, Kafka tries to distribute them evenly among all of brokers.
The same goes to reading from Kafka. If there is only one partition, the kafka consumer speed is limited by max read speed of a single disk. If there are multiple partitions, messages from all partitions (on different brokers) are retrieved in parallel.
1a) Would there be a difference when accessing m-partitions through m threads and nm topics using n*m different processes
You're mixing partitions and topics here, see my answer above.
2)A perfect use case differentiating high level and low level consumer
High level consumer :
I just want to use Kafka as extermely fast persistent FIFO buffer and not worry much about details.
Low level consumer :
I want to have a custom partition data consuming logic, e.g. start reading data from newly created topics without a need of consumer reconnection to brokers.
3)In case of a failure (i.e) message not delivered where can i find the error logs in Kafka.
Kafka uses log4j for logging. It depends on its configuration where the log is stored (in case of producer and consumer).
Kafka broker logs are normally stored in /var/log/kafka/.