Migrating topics,ACL and messages from apache kafka to confluent platform - apache-kafka

We are migrating our application from Apache Kafka to Confluent Platform .
Apache Kafka version:1.1.0
Confluent :4.1.0
Tried these options:
Manually copying the zookeeper logs and Kafka Logs- Not an optimal way
because of volume and data correctness.
Mirror Maker - This will replicate newly created topics and ACL. It will not
migrate old details in Apache Kafka
Please suggest better approaches on this.

You can keep your existing Kafka and Zookeeper installation.
Confluent does not change any way these run or manage data.
You can configure the REST Proxy, Schema Registry, Control Center, KSQL, etc. to use your existing bootstrap servers or Zookeeper connection; nothing should need migrated, you're only adding extra consumer/producer services which just happen to be provided by Confluent.
If you later plan on upgrading your brokers, then you can start up new ones from the Confluent package, migrate the partitions, then shut down the old ones. Similarly for Zookeeper, but make sure that you have at least 2 up during this process, and always have an odd number of them available after your transition

Related

Kafka Cluster Migration Impact on Clients

I'm trying to migrate our non kerberos secured HortonWorks cluster (contains Kafka 2.0.0) to kerberos secured Cloudera cluster (contains kafka 2.5.0), we have multiple Producers (around 20 applications) and Consumers that Push and Pull Messages from existing brokers. I have few questions on how to proceed :
Should my Producers communicate with the new Kerberos Server and how ? / should I install kerberos for each one of my apps ?
Since the version of Kafka changes what is the change I should make at the level of my producers ? I'm thinking about creating for each application another producer that writes to the new cluster since the old producers will not be able to the new cluster
based on your experience what is the best approach to follow for this use case ?
I'm thinking about creating another Producers with the new kafka-client version for each application that already have a producer to the old cluster.
Also I read about kafka-miror but i'm assuming it's not useful to my use case since the kafka version is changing so mirroring to another version is not supported.
For the security side, I don't know the change that i should make to the actual producers
Thanks in advance

Kafka MM1.0 vs Kafka MM2.0 vs Confluent Replicator vs Confluent Cluster linking

I know what are differences between Apache Kafka MM1 and Apache Kafka MM2.
Kafka MM1 doesn't support Active-Active setup and Offset syncing in also an issue in MM1 and many more
Overview of Active-Active Kafka Cluster using MirrorMaker 2.0
a-look-inside-kafka-mirrormaker-2
But i am not able to understand the differences between Replicator and Cluster linking.
Replicator was released before MM2 and offers most of the same features, but can also copy topic configurations, Schema Registry details, and partition changes (I don't think MM2 can do that, MM1 definitely does not).
AFAIK, cluster linking is almost like "serverless replication" ; it doesn't depend on running/maintaining a Connect cluster, as I believe it runs directly on the brokers, which also makes it not as scalable as a replication solution. It also requires restarting the brokers to enable/disable, as compared to simply starting/stopping a Connect cluster.

Using confluent cp-schema-registry, does it have to talk to the same Kafka you are using for producers/consumers?

We already have Kafka running in production. And unfortunately it's an older version, 0.10.2. I want to start using cp-schema-registry, from the community edition of Confluent Platform. That would mean installing the older 3.2.2 image of schema registry for compatibility with our old kafka.
From what I've read in the documentation, it seems that Confluent Schema Registry uses Kafka as it's backend for storing it's state. But the clients that are producing to/reading from Kafka topics talk to Schema Registry independently of Kafka.
So I am wondering if it would be easier to manage in production, running Schema Registry/Kafka/Zookeeper in one container all together, independent of our main Kafka cluster. Then I can use the latest version of everything. The other benefit is that standing up this new service component up could not cause any unexpected negative consequence to the existing Kafka cluster.
I find the documentation doesn't really explain well what the pros/cons of each deployment strategy are. Can someone offer guidance on how they have deployed schema registry in an environment with an existing Kafka? What is the main advantage of connecting schema registry to your main Kafka cluster?
Newer Kafka clients are backwards compatible with Kafka 0.10, so there's no reason you couldn't use a newer Schema Registry than 3.2
In the docs
Schema Registry that is included in Confluent Platform 3.2 and later is compatible with any Kafka broker that is included in Confluent Platform 3.0 and later
I would certainly avoid putting everything in one container... That's not how they're meant to be used and there's no reason you would need another Zookeeper server
Having a secondary Kafka cluster only to hold one topic of schemas seems unnecessary when you could store the same information on your existing cluster
the clients that are producing to/reading from Kafka topics talk to Schema Registry independently of Kafka
Clients talk to both. Only Avro schemas are sent over HTTP before your regular client code reaches the topic. No, schemas and client data do not have to be part of the same Kafka cluster
Anytime anyone deploys Schema Registry, it's being added to "an existing Kafka", just the difference is yours might have more data in it

Migration Cloudera Kafka (CDK) to Apache Kafka

I am looking to migrate a small 4 node Kafka cluster with about 300GB of data on the each brokers to a new cluster. The problem is we are currently running Cloudera's flavor of Kafka (CDK) and we would like to run Apache Kafka. For the most part CDK is very similar to Apache Kafka but I am trying to figure out the best way to migrate. I originally looked at using MirrorMaker, but to my understanding it will re-process messages once we cut over the consumers to the new cluster so I think that is out. I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster (not sure how this will work yet, if at all) then decommission the CDK server one at a time. Otherwise I am out of ideas other than spinning up a new Apache Kafka cluster and just making code changes to every producer/consumer to point to the new cluster. which I am not really a fan of as it will cause down time.
Currently running 3.1.0 which is equivalent to Apache Kafka 1.0.1
MirrorMaker would copy the data, but not consumer offsets, so they'd be left at their configured auto.offset.reset policies.
I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster
If possible, that would be the most effective way to migrate the cluster. For each new broker, give it a unique broker ID and the same Zookeeper connection string as the others, then it'll be part of the same cluster.
Then, you'll need to manually run the partition reassignment tool to move all existing topic partitions off of the old brokers and onto the new ones as data will not automatically be replicated
Alternatively, you could try shutting down the CDK cluster, backing up the data directories onto new brokers, then starting the same version of Kafka from your CDK on those new machines (as the stored log format is important).
Also make sure that you backup a copy of the server.properties files for the new brokers

Is kafka_2.11-0.9.0.1 compatible with Zookeeper 3.4.12?

Currently we are using Apache kafka_2.11-0.9.0.1 and Apache Solr 5.5 with Zookeeper 3.4.6.
But we are upgrading Apache Solr, hence need to upgrade Zookeeper to 3.4.12.
Kafka is working with this zookeeper version as per our basic testing done. But we just want to confirm whether or not Zookeeper 3.4.12 is officially supported with kafka_2.11-0.9.0.1
Yes, it will work (just tested), but without backing up the Kafka data in Zookeeper and restoring it to the new one, then you will lose all the Kafka data, meaning your topics and committed offsets will be lost.
FWIW, it might be worth upgrading Kafka as well.