How do i set up Apache Kafka's Replication Factor? - apache-kafka

I was just wondering how can i set replication factor in apache kafka? I can't find a good tutorial out there about this, I'm learning this for a mini project.
If you have any links, Please link it down

In the broker properties, you define both internal and default auto-created topic replication factors. Ideally, you'd disable auto creation, however
Any other topic creation API requires replication factor to be specified, so refer any documentation you can find, starting with the official Kafka website

Related

Do you need multiple zookeeper instances to run a multiple-broker kafka?

I'm new to kafka.
Kafka is supposed to be used as a distributed service. But the tutorials and blog posts i found online never mention if there is one or several zookeeper nodes.
The tutorials just pop one zookeper instance, and then multiple kafka brokers.
Is it how it is supposed to be done?
Zookeeper is a co-ordination service (in a centralized manner) for distributed systems that is used by clusters for maintenance of distributed system . The distributed synchronization achieved by it via metadata such as configuration information, naming, etc.
In general architectures, Kafka cluster shall be served by 3 ZooKeeper nodes, but if the size of deployment is huge, then it can be ramped up to 5 ZooKeeper nodes but that in turn will add load on the nodes as all nodes try to be in sync as all metadata related activities are handled by ZooKeeper.
Also, it should be noted that as an improvement, the new release of Kafka reduces dependency on ZooKeeper in order to enhance scalability of metadata across, to reduce the complexity in maintaining the meta data with external components and to enhance the recovery from unexpected shutdowns. With new approach, the controller failover is almost instantaneous. This is achieved by Kafka Raft Metadata mode termed as 'KRaft' that will run Kafka without ZooKeeper by merging all the responsibilities handled by ZooKeeper inside a service in the Kafka Cluster itself and operates on event based mechanism that is used in the KRaft protocol.
Tutorials generally keep things nice and simple, so one ZooKeeper (often one Kafka broker too). Useful for getting started; useless for any kind of resilience :)
In practice, you are going to need three ZooKeeper nodes minimum.
If it helps, here is an enterprise reference architecture whitepaper for the deployment of Apache Kafka
Disclaimer: I work for Confluent, who publish the above whitepaper.

Is it possible to run MirrorMaker in Kafka without using Kafka Connect?

Looking to come up with solution that would mirror or replicate one Kafka environment without needing Kafka Connect. Having a hard time coming up with any possible solutions or workarounds. Very new to Kafka, would appreciate any thoughts and/or guidance!
MirrorMaker2 is based on Kafka Connect. The original MirrorMaker is not, however it is not recommended to use this anymore as it's not very fault tolerant.
Most Kafka replication solutions are built on Kafka Connect (Confluent Replicator as another example)
Uber uReplicator mentioned in the comments is built on Apache Helix and requires a Zookeeper connection, which Kafka Connect does not, so ultimately depends on what access and infrastructure you have available
Since Kafka comes with the Connect API and MirrorMaker2 pre-installed, there should be little reason to find alternatives unless it absolutely doesn't work for your use case (which is...?)

Is there any option to set the custom replication factor in Debezium

I want to use a custom replication factor in all my topics while creation. Im running 7 node cluster and I want to set 5 as the replication factor for all of my topics.
Kafka Connect connectors cannot create their own topics with custom settings. This is something that's slated for improvement with KIP-158.
As things currently (Apache Kafka 2.4) stand, connectors can only rely on the broker creating them automatically (if auto.create.topics.enable=true) with the default replication factor etc, or you can pre-create them and at that point set the required replication factor.

How to set replication factor in librdkafka?

I'm using librdkafka to develop in C++ kafka message producer.
Is there a way to create topic with custom replication factor, different than default one?
CONFIGURATION.md does not mention explicitly any parameter, but Kafka tools allow for this.
While auto topic creation is currently supported by librdkafka, it merely uses the broker's topic default configuration.
What you need is manual topic creation from the client. The broker support for this was recently added in KIP-4, which is also supported through librdkafka's Admin API.
See the rd_kafka_CreateTopics() API.

Kafka: Is it possible to create topic with specified replication factor by java client

From the official document of Kafka, it said below from 4.7 Replication
you can set this replication factor on a topic-by-topic basis
But from the javadoc of its java client, I can't find any API is relating to createTopic with replication factor. Is it only possible by the shell script it provided?
You may use AdminUtils.createTopic() method from kafka.admin package - https://github.com/apache/kafka/blob/97e61d4ae2feaf0551e75fa8cdd041f49f42a9a5/core/src/main/scala/kafka/admin/AdminUtils.scala#L409-L418