I have a kafka cluster of 2 nodes. My kafka version is 0.8.1.
I need to migrate it to a different set of servers.
Whats the best way to migrate maintaining no downtime and no data loss?
Assuming the new servers and old servers live together in the same data center, the easiest way will be to add the new ones as replicas for all the existing partitions you have. Kafka will bring them in sync making them ISRs. One they are in-sync you should be able to safely shutdown the old nodes.
This of course depends on how your consumers are configured (will they automatically find the new nodes?) and which version of Kafka you're on.
Take a look at:
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
https://kafka.apache.org/documentation.html#basic_ops_decommissioning_brokers
You will need to use the kafka-reassign-partitions.sh tool to make this happen. Test in a non-production environment first.
I would suggest you to use kafka mirror maker. Have a look at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 or https://kafka.apache.org/documentation.html#basic_ops_mirror_maker
Basically you run something like this:
cp /usr/lib/kafka/config/consumer.properties oldCluster.consumer.properties
cp /usr/lib/kafka/config/producer.properties newCluster.producer.properties
Than you set-up old cluster things, for example:
bootstrap.servers=clusterOldServer1.full.name:9092
auto.offset.reset=earliest
#zookeeper.connect=commentedOutZookeeperForOlderKafka
And than you run migrate command, which will run as "daemon":
time kafka-run-class kafka.tools.MirrorMaker --consumer.config oldCluster.consumer.properties --producer.config newCluster.producer.properties --whitelist="topic.*regexp"
Related
I have a recent Kafka cluster which uses Kraft. I am facing some problems with it and it is possibly due to use of Kraft. I wish to switch to Zookeeper without losing data. Downtime is okay. How do I go about it?
I'm afraid there isn't a documented process to downgrade a cluster from KRaft to ZooKeeper.
If you've found an issue with KRaft, you should report it to the Kafka project via a Jira ticket so it can get fixed.
Assuming your KRaft cluster is somewhat functional, a way to preserve your data is to create a new cluster (running ZooKeeper) and use a tool like MirrorMaker to migrate your data.
I am looking to migrate a small 4 node Kafka cluster with about 300GB of data on the each brokers to a new cluster. The problem is we are currently running Cloudera's flavor of Kafka (CDK) and we would like to run Apache Kafka. For the most part CDK is very similar to Apache Kafka but I am trying to figure out the best way to migrate. I originally looked at using MirrorMaker, but to my understanding it will re-process messages once we cut over the consumers to the new cluster so I think that is out. I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster (not sure how this will work yet, if at all) then decommission the CDK server one at a time. Otherwise I am out of ideas other than spinning up a new Apache Kafka cluster and just making code changes to every producer/consumer to point to the new cluster. which I am not really a fan of as it will cause down time.
Currently running 3.1.0 which is equivalent to Apache Kafka 1.0.1
MirrorMaker would copy the data, but not consumer offsets, so they'd be left at their configured auto.offset.reset policies.
I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster
If possible, that would be the most effective way to migrate the cluster. For each new broker, give it a unique broker ID and the same Zookeeper connection string as the others, then it'll be part of the same cluster.
Then, you'll need to manually run the partition reassignment tool to move all existing topic partitions off of the old brokers and onto the new ones as data will not automatically be replicated
Alternatively, you could try shutting down the CDK cluster, backing up the data directories onto new brokers, then starting the same version of Kafka from your CDK on those new machines (as the stored log format is important).
Also make sure that you backup a copy of the server.properties files for the new brokers
I am trying to configure two Kafka servers on a cluster of 3 nodes. while there is already one Kafka broker(0.8 version) already running with the application. and there is a dependency on that kafka version 0.8 that cannot be disturbed/upgraded .
Now for a POC, I need to configure 1.0.0 since my new code is compatible with this version and above...
my task is to push data from oracle to HIVE tables. for this I am using jdbc connect to fetch data from oracle and hive jdbc to push data to hive tables. it should be fast and easy way...
I need the following help
can I use spark-submit to run this data push to hive?
can I simply copy kafka_2.12-1.0.0 on my Linux server on one of the node and run my code on it. I think I need to configure my Zookeeper.properties and server.properties with ports not in use and start this new zookeeper and kafka services separately??? please note I cannot disturb existing zookeeper and kafka already running.
kindly help me achieve it.
I'm not sure running two very memory intensive applications (Kafka and/or Kafka Connect) on the same machines is considered very safe. Especially if you do not want to disturb existing applications. Realistically, a rolling restart w/ upgrade will be best for performance and feature reasons. And, no, two Kafka versions should not be part of the same cluster, unless you are in the middle of a rolling upgrade scenario.
If at all possible, please use new hardware... I assume Kafka 0.8 is even running on machines that could be old, and out of warranty? Then, there's no significant reason that I know of not to even use a newer version of Kafka, but yes, extract it on any machine you'd like, use perhaps use something like Ansible, or preferred config management tool you choose, to do it for you.
You can share the same Zookeeper cluster actually, just make sure it's not the same settings. For example,
Cluster 0.8
zookeeper.connect=zoo.example.com:2181/kafka08
Cluster 1.x
zookeeper.connect=zoo.example.com:2181/kafka10
Also, not clear where Spark fits into this architecture. Please don't use JDBC sink for Hive. Use the proper HDFS Kafka Connect sink, which has direct Hive support via the metastore. And while the JDBC source might work for Oracle, chances are, you might already be able to afford a license for GoldenGate
i am able to achieve two kafka version 0.8 and 1.0 running on the same server with respective zookeepers.
steps followed:
1. copy the version package folder to the server at desired location
2. changes configuration setting in zookeeper.properties and server.propeties(here you need to set port which are not in used on that particular server)
3. start the services and push data to kafka topics.
Note: this requirement is only for a POC and not an ideal production environment. as answered above we must upgrade to next level rather than what is practiced above.
I have a three-node Kafka cluster in service running on a separate three-node Zookeeper cluster. I intend to switch Kafka to use a new five-node Zookeeper cluster, and although I have found information about doing that, I have an extra wrinkle where Kafka will be using a custom znode parent path on the new cluster.
For instance, my current Kafka Zookeeper string looks something like this:
192.0.2.11:2181,192.0.2.12:2181,192.0.2.13:2181
I'm looking to switch it to this:
192.0.2.21:2181,192.0.2.22:2181,192.0.2.23:2181,192.0.2.24:2181,192.0.2.25:2181/kafka/uid1
The reason for this is that we intend to reuse the larger Zookeeper cluster for other Kafka clusters. Don't worry, this is for testing and not production. However, we still want to do this without losing any data on the stream that is coming into Kafka, so we want to do this without taking anything down.
Is this possible?
I have come across the following questions:
Copy/Migrate old zookeeper znode/data to new zookeeper
best way to copy data across 2 zookeeper cluster?
Unfortunately they appear to require some downtime, which I'm hoping to avoid.
This page (https://qgraph.io/blog/migrating-kafka-zookeeper-cluster/) was a little more helpful in the way of rollover, but not with znode migration.
I've been looking for 'znode symlinks' or 'specifying znode path per zookeeper server' but neither seem possible. Am I out of luck and require downtime and possibly lost data?
By what I can tell, there is no way to move Kafka's parent znode without restarting Kafka. There are no such things as hard or soft links for znodes: https://www.igvita.com/2010/04/30/distributed-coordination-with-zookeeper/
I am going to install Kafka for company messaging. The plan is to first install the kafka on a single huge machine and scale it to 4-5 machines (a cluster) later if needed.
I have little experience about kafka. Want to ask whether it is possible to scale by just changing the parameter in broker configuration and install zookeeper on newly joined machine.
Or how can I roughly do this in the easiest way ? More specifically Cloudera Kafka in CDH.
Thanks
To scale Kafka you will have to add more partitions to topics if needed to using kafka-topics.sh. And than reassign partitions to your new brokers using kafka-reassign-partitions.sh.
The reassign utility will replicate and dispatch your data automatically. You can do it for a whole topic or for a selective set of partitions.
The complete documentation is here. Just take a look at section 6.