I have a recent Kafka cluster which uses Kraft. I am facing some problems with it and it is possibly due to use of Kraft. I wish to switch to Zookeeper without losing data. Downtime is okay. How do I go about it?
I'm afraid there isn't a documented process to downgrade a cluster from KRaft to ZooKeeper.
If you've found an issue with KRaft, you should report it to the Kafka project via a Jira ticket so it can get fixed.
Assuming your KRaft cluster is somewhat functional, a way to preserve your data is to create a new cluster (running ZooKeeper) and use a tool like MirrorMaker to migrate your data.
Related
Looking to come up with solution that would mirror or replicate one Kafka environment without needing Kafka Connect. Having a hard time coming up with any possible solutions or workarounds. Very new to Kafka, would appreciate any thoughts and/or guidance!
MirrorMaker2 is based on Kafka Connect. The original MirrorMaker is not, however it is not recommended to use this anymore as it's not very fault tolerant.
Most Kafka replication solutions are built on Kafka Connect (Confluent Replicator as another example)
Uber uReplicator mentioned in the comments is built on Apache Helix and requires a Zookeeper connection, which Kafka Connect does not, so ultimately depends on what access and infrastructure you have available
Since Kafka comes with the Connect API and MirrorMaker2 pre-installed, there should be little reason to find alternatives unless it absolutely doesn't work for your use case (which is...?)
We are trying to implement the replication of data between two Ignite data clusters.
For this purpose, we are using Kafka Connect.
We have followed the things mentioned in this document -> https://dzone.com/articles/linking-apache-ignite-and-apache-kafka-for-highly
Everything is working fine till we use one cache and PUT operation.
But when I use the same for REMOVED operation, in the consumer thread of the connector, I can see the CacheEvent record, but the data is not being removed from the Sink Cluster nodes.
Can someone please help with this case?
It might be an issue with the Ignite Kafka integration. Try to collect all the relevant details and report to the Ignite community via the user list.
In the meantime, if the issue is not solved you can consider other replication options:
Replication via the certified GridGain Kafka Connect integration.
Datacenter replication feature by GridGain.
I am trying to configure two Kafka servers on a cluster of 3 nodes. while there is already one Kafka broker(0.8 version) already running with the application. and there is a dependency on that kafka version 0.8 that cannot be disturbed/upgraded .
Now for a POC, I need to configure 1.0.0 since my new code is compatible with this version and above...
my task is to push data from oracle to HIVE tables. for this I am using jdbc connect to fetch data from oracle and hive jdbc to push data to hive tables. it should be fast and easy way...
I need the following help
can I use spark-submit to run this data push to hive?
can I simply copy kafka_2.12-1.0.0 on my Linux server on one of the node and run my code on it. I think I need to configure my Zookeeper.properties and server.properties with ports not in use and start this new zookeeper and kafka services separately??? please note I cannot disturb existing zookeeper and kafka already running.
kindly help me achieve it.
I'm not sure running two very memory intensive applications (Kafka and/or Kafka Connect) on the same machines is considered very safe. Especially if you do not want to disturb existing applications. Realistically, a rolling restart w/ upgrade will be best for performance and feature reasons. And, no, two Kafka versions should not be part of the same cluster, unless you are in the middle of a rolling upgrade scenario.
If at all possible, please use new hardware... I assume Kafka 0.8 is even running on machines that could be old, and out of warranty? Then, there's no significant reason that I know of not to even use a newer version of Kafka, but yes, extract it on any machine you'd like, use perhaps use something like Ansible, or preferred config management tool you choose, to do it for you.
You can share the same Zookeeper cluster actually, just make sure it's not the same settings. For example,
Cluster 0.8
zookeeper.connect=zoo.example.com:2181/kafka08
Cluster 1.x
zookeeper.connect=zoo.example.com:2181/kafka10
Also, not clear where Spark fits into this architecture. Please don't use JDBC sink for Hive. Use the proper HDFS Kafka Connect sink, which has direct Hive support via the metastore. And while the JDBC source might work for Oracle, chances are, you might already be able to afford a license for GoldenGate
i am able to achieve two kafka version 0.8 and 1.0 running on the same server with respective zookeepers.
steps followed:
1. copy the version package folder to the server at desired location
2. changes configuration setting in zookeeper.properties and server.propeties(here you need to set port which are not in used on that particular server)
3. start the services and push data to kafka topics.
Note: this requirement is only for a POC and not an ideal production environment. as answered above we must upgrade to next level rather than what is practiced above.
We are migrating our application from Apache Kafka to Confluent Platform .
Apache Kafka version:1.1.0
Confluent :4.1.0
Tried these options:
Manually copying the zookeeper logs and Kafka Logs- Not an optimal way
because of volume and data correctness.
Mirror Maker - This will replicate newly created topics and ACL. It will not
migrate old details in Apache Kafka
Please suggest better approaches on this.
You can keep your existing Kafka and Zookeeper installation.
Confluent does not change any way these run or manage data.
You can configure the REST Proxy, Schema Registry, Control Center, KSQL, etc. to use your existing bootstrap servers or Zookeeper connection; nothing should need migrated, you're only adding extra consumer/producer services which just happen to be provided by Confluent.
If you later plan on upgrading your brokers, then you can start up new ones from the Confluent package, migrate the partitions, then shut down the old ones. Similarly for Zookeeper, but make sure that you have at least 2 up during this process, and always have an odd number of them available after your transition
I have a kafka cluster of 2 nodes. My kafka version is 0.8.1.
I need to migrate it to a different set of servers.
Whats the best way to migrate maintaining no downtime and no data loss?
Assuming the new servers and old servers live together in the same data center, the easiest way will be to add the new ones as replicas for all the existing partitions you have. Kafka will bring them in sync making them ISRs. One they are in-sync you should be able to safely shutdown the old nodes.
This of course depends on how your consumers are configured (will they automatically find the new nodes?) and which version of Kafka you're on.
Take a look at:
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
https://kafka.apache.org/documentation.html#basic_ops_decommissioning_brokers
You will need to use the kafka-reassign-partitions.sh tool to make this happen. Test in a non-production environment first.
I would suggest you to use kafka mirror maker. Have a look at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 or https://kafka.apache.org/documentation.html#basic_ops_mirror_maker
Basically you run something like this:
cp /usr/lib/kafka/config/consumer.properties oldCluster.consumer.properties
cp /usr/lib/kafka/config/producer.properties newCluster.producer.properties
Than you set-up old cluster things, for example:
bootstrap.servers=clusterOldServer1.full.name:9092
auto.offset.reset=earliest
#zookeeper.connect=commentedOutZookeeperForOlderKafka
And than you run migrate command, which will run as "daemon":
time kafka-run-class kafka.tools.MirrorMaker --consumer.config oldCluster.consumer.properties --producer.config newCluster.producer.properties --whitelist="topic.*regexp"