What is the limit of Kafka partitions that a Kafka broker can support? and if there is no limit, how many partition per Broker makes my cluster turn well and fast?
As of the recent blogs, 200K+ partitions per cluster, but that of course heavily depends on the actual hardware you run, and how well you can maintain Kafka up to that point. I doubt a Raspberry Pi will be able to handle that much load, for example.
There isn't exactly a hard limit per broker, but the old rule of thumb was less than 1000 partitions on average per broker will keep the cluster working optimally
That blog says 4000 is now a good number
As a rule of thumb, we recommend each broker to have up to 4,000 partitions and each cluster to have up to 200,000 partitions
Recording to apache blog, the Limit that a Kafka broker can support up to 4,000 partitions and each cluster up to 200,000 partitions.
Related
I have a Kafka Cluster (Using Aivan on AWS):
Kafka Hardware
Startup-2 (2 CPU, 2 GB RAM, 90 GB storage, no backups) 3-node high availability set
Ping between my consumers and the Kafka Broker is 0.7ms.
Backgroup
I have a topic such that:
It contains data about 3000 entities.
Entity lifetime is a week.
Each week there will be different 3000 entities (on avg).
Each entity may have between 15k to 50k messages in total.
There can be at most 500 messages per second.
Architecture
My team built an architecture such that there will be a group of consumers. They will parse this data, perform some transformations (without any filtering!!) and then sends the final messages back to the kafka to topic=<entity-id>.
It means I upload the data back to the kafka to a topic that contains only a data of a specific entity.
Questions
At any given time, there can be up to 3-4k topics in kafka (1 topic for each unique entity).
Can my kafka handle it well? If not, what do I need to change?
Do I need to delete a topic or it's fine to have (alot of!!) unused topics over time?
Each consumer which consumes the final messages, will consume 100 topics at the same time. I know kafka clients can consume multiple topics concurrenctly but I'm not sure what is the best practices for that.
Please share your concerns.
Requirements
Please focus on the potential problems of this architecture and try not to talk about alternative architectures (less topics, more consumers, etc).
The number of topics is not so important in itself, but each Kafka topic is partitioned and the total number of partitions could impact performance.
The general recommendation from the Apache Kafka community is to have no more than 4,000 partitions per broker (this includes replicas). The linked KIP article explains some of the possible issues you may face if the limit is breached, and with 3,000 topics it would be easy to do so unless you choose a low partition count and/or replication factor for each topic.
Choosing a low partition count for a topic is sometimes not a good idea, because it limits the parallelism of reads and writes, leading to performance bottlenecks for your clients.
Choosing a low replication factor for a topic is also sometimes not a good idea, because it increases the chance of data loss upon failure.
Generally it's fine to have unused topics on the cluster but be aware that there is still a performance impact for the cluster to manage the metadata for all these partitions and some operations will still take longer than if the topics were not there at all.
There is also a per-cluster limit but that is much higher (200,000 partitions). So your architecture might be better served simply by increasing the node count of your cluster.
Will we have any problem if we have millions of partitions for one topic?
Due to our business requirement, we are thinking if we can make a partition for every user in kafka.
We have millions of users.
Any insight would be appreciated!
Yes, I think you will end up having problems if you have millions of partitions for several reasons:
(Most importantly!!) Customers come and go, so you will have the requirement to constantly change the number of partitions or have plenty of unused partitions (because you can not reduce the number of partitions within a topic).
More Partitions Requires More Open File Handles: More Partitions means more directories and segment files on disk.
More Partitions May Increase Unavailability: Planned failures move Leaders off of a Broker one at a time, with minimal downtime per partition. In a hard failure all the leaders are immediately unavailable.
More Partitions May Increase End-to-end Latency: For the message to be seen by a Consumer it must be committed. The Broker replicates data from the leader with a single thread, resulting in overhead per Partition.
More Partitions May Require More Memory In the Client
More details are provided in the blog from Confluent on How to choose the number of topics/partitions in a Kafka cluster?.
In addition, according to Confluent's training material for Kafka developers it is recommended:
"The current limits (2-4K Partitions/Broker, 100s K Partitions per cluster) are maximums. Most environments are well below these values (typically in the 1000-1500 range or less per Broker)."
This blog explains that "Apache Kafka Supports 200K Partitions Per Cluster".
This might change with the replacement of Zookeeper KIP-500 but, again, looking at the first bullet point above this will still be a unhealthy software design.
I am exploring different PubSub platforms and I was wondering what the limits are in Kafka for listening to multiple topics. Consider for instance this Use Case. We have trains, station entry gates, devices that all publish their telemetry. Currently this is done on a MQ but as data rates increase, smart trains etc. we need to move to a new PubSub/streaming platform and Kafka is on that list of course.
As I see it there are two strategies for aggregating this telemetry into a stream:
aggregate on consumption, in which each train/device initially gets its own topic and topic aggregation is done using a regex-topic / virtual topic
aggregate on production, in which all trains produces to an single topic and consumers use filters if neccessary to single out individual producers
As I understood Kafka is not particularly suited for high number of topics (>10.000), but it could be done. Would a regex-topic be able to aggregate 2000, 3000 topics?
From the technical point view, it could be done; but in practice, this is not common. Why? Zookeeper. it is advised for cluster to have a maximum of 4000 partitions per brokers. This is partly due to the overhead of performing leader election for all of those on Zookeeper.
I recommend you to read these blog posts about this interesting topic on Confluent's blog:
How to choose the number of topics/partitions in a Kafka cluster?
Apache Kafka Supports 200K Partitions Per Cluster
Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper
We have a 3 host Kafka cluster. We have 136 topics, each of which has 100 partitions, with a replication factor of 3. This makes for 13,600 partitions across our cluster.
Is this a sane configuration of our topics?
It's too many. You should ask yourself if you have (or plan to have soon) enough consumer instances to need that many partitions. Then, if you do plan to have 13k consumer instances, what sort of hardware are you running these brokers on such that they would be able to serve that many consumers? That's even before your consider the additional impact of many partitions pre-1.1 https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
This to me looks like 100 was a round number and seemed future proof. I'd suggest starting at a much lower number per topic (like say 2 or 10) and see if you actually hit scale issues that demand more partitions before trying to jump to expert mode. You can always add more partitions later.
The short answer to your question is 'It depends'.
More partitions in a Kafka cluster leads to higher throughput however, you need to be aware that the number of partitions has an impact on availability and latency.
In general more partitions,
Lead to Higher Throughput
Require More Open File Handles
May Increase Unavailability
May Increase End-to-end Latency
May Require More Memory In the Client
You need to study the trade-offs and make sure that you've picked the number of partitions that satisfies your requirements regarding throughput, latency and required resources.
For further details refer to this blog post from Confluent.
Partitions = max(NP, NC)
where:
NP is the number of required producers determined by calculating: TT/TP.
NC is the number of required consumers determined by calculating: TT/TC.
TT is the total expected throughput for our system.
TP is the max throughput of a single producer to a single partition.
TC is the max throughput of a single consumer from a single partition.
Does number of partitions have an impact on producer throughput in Kafka?
( I understand that number of partitions is the upper bound for degree of parallelism on consumer side, but does it affect the producer performance ? )
I used the producer performance tool in Kafka to test this on a Kafka cluster setup on AWS. I observed that for 3 , 6 and 20 partitions the aggregated throughput in the cluster was approximately similar ( around 200 MB/s ). I would appreciate if you could help me clarify this issue.
Thank you.
an answer in two parts:
From the Kafka consumer perspective. Yes, partitions give improved throughput for Kafka consumers. But, I found that you really want to minimise the number of Kafka consumers (and therefore partitions) if you want good scalability. Here's a link to a blog I wrote last year on a Kafka IoT application (see section 2.3)
From the Kafka producer perspective, throughput drops with more partitions. Last week I ran some benchmarks with Kafka producers and different numbers of partitions and found that the throughput drops off significantly with more partitions. To "size" a Kafka cluster correctly, the only solution is to increase the Kafka cluster size (nodes and/or cores) until you get the target capacity with the required number of partitions. I needed 2M write/s and 200 partitions (for concurrency on the consumer side). For a 6 node (4 cores per node) cluster I could do 2.1M write/s with 6 partitions, but only 1.2M write/s with 200 partitions. On a 6 node cluster with 8 core nodes I could get 4.6M write/s with 6 partitions, and slightly more than my target throughput of 2.4M write/s with 200 partitions. I haven't blogged about these results yet but here's a link to the current blog series (Anomalia Machina).
Note: Throughput can also be increased by (a) reducing the replication factor or (b) by only writing to a subset of partitions (!) but then you probably don't need all the partitions.