Does Mirrormaker 2 need a third kafka for mirroring operation? - apache-kafka

I have a question when I use mirrormaker 2.
Mirrormaker 2 is based on the Kafka Connect framework and can be viewed at its core as a combination of a Kafka source and sink connector. So in MM2 architecture there are source and sink connectors. But is there any extra Kafka cluster for connectors in MM2 ? Because in kafka connect design; source and sink connector need Kafka cluster to move data.
For example MM2 needs source and target clusters; My question is that does MM2 need a third kafka for mirroring operation without using source and target clusters?
Other question is that does MM2 connectors can be run on distribute mode ? I didn't see any configuration about this question?
For example in docker environment; is configuration below enough for running MM2 on distributed mode?
mirrormaker:
image: 'wpietri/mirror-maker:2'
environment:
- SOURCE=source_ip:9092
- DESTINATION=dest_ip:9092
- TOPICS=test-topic
deploy:
replicas: 3
mode: replicated

Currently MirrorMaker 2 is a set of Source connectors.
A source connector grabs records from an external system and hands them to the Kafka Connect runtime that writes them into Kafka.
For MirrorMaker 2, the "external system" is another Kafka cluster. So to work, MirrorMaker 2 only needs 2 Kafka clusters. One where the connectors get records (called the source cluster) and one the Kafka Connect is connected to (called the target cluster).
MirrorMaker 2 connectors are standard Kafka Connect connectors. They can be used directly with Kafka Connect in standalone or distributed mode.

Related

How to transfer data from source Kafka cluster to target Kafka cluster using Kafka Connect?

As described in the title, I have a use case where I want to copy data from source Kafka topic (Cloudera Kafka cluster) to destination Kafka topic (AWS MSK Kafka cluster) using Kafka connect framework. I have already gone through some of the available options for e.g. KafkaCat Utility and Mirror Maker 2. But I am curious if there is any such connector available in opensource.
Links followed:
Kafkacat: https://rmoff.net/2019/09/29/copying-data-between-kafka-clusters-with-kafkacat/
MirrorMaker: https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
MirrorMaker 2 is open source, and it does exactly what you're asking
https://github.com/apache/kafka/tree/trunk/connect/mirror
kcat is also open-source, but doesn't exactly scale, and doesn't use Connect framework.

Kafka MM1.0 vs Kafka MM2.0 vs Confluent Replicator vs Confluent Cluster linking

I know what are differences between Apache Kafka MM1 and Apache Kafka MM2.
Kafka MM1 doesn't support Active-Active setup and Offset syncing in also an issue in MM1 and many more
Overview of Active-Active Kafka Cluster using MirrorMaker 2.0
a-look-inside-kafka-mirrormaker-2
But i am not able to understand the differences between Replicator and Cluster linking.
Replicator was released before MM2 and offers most of the same features, but can also copy topic configurations, Schema Registry details, and partition changes (I don't think MM2 can do that, MM1 definitely does not).
AFAIK, cluster linking is almost like "serverless replication" ; it doesn't depend on running/maintaining a Connect cluster, as I believe it runs directly on the brokers, which also makes it not as scalable as a replication solution. It also requires restarting the brokers to enable/disable, as compared to simply starting/stopping a Connect cluster.

How can I show different kafka to my confluent?

I install confluent and it has own kafka.
I want to change kafka from own to another?
Which .properties or whatelse file I must change to look different kafka.
thanks in advance
In your Kafka Connect worker configuration, you need to set bootstrap.servers to point to the broker(s) on your source Kafka cluster.
You can only connect to one source Kafka cluster per Kafka Connect worker. If you need to stream data from multiple Kafka clusters, you would run multiple Kafka Connect workers.
Edit If you're using Confluent CLI then the Kafka Connect worker config is taken from etc/schema-registry/connect-avro-distributed.properties.

Kafka and Kafka Connect deployment environment

if I already have Kafka running on premises, is Kafka Connect just a configuration on top of my existing Kafka, or does Kafka Connect require it's own Server/Environment separate from that of my existing Kafka?
Kafka Connect is part of Apache Kafka, but it runs as a separate process, called a Kafka Connect Worker. Except in a sandbox environment, you would usually deploy it on a separate machine/node from your Kafka brokers.
This diagram shows conceptually how it runs, separate from your brokers:
You can run Kafka Connect on a single node, or as part of a cluster (for throughput and redundancy).
You can read more here about installation and configuration and architecture of Kafka Connect.
Kafka Connect is its own configuration on top of your bootstrap-server's configuration.
For Kafka Connect you can choose between a standalone server or distributed connect servers and you'll have to update the corresponding properties file to point to your currently running Kafka server(s).
Look under {kafka-root}/config and you'll see
You'll basically update connect-standalone or connect-distributed properties based on your need.

Run Kafka and Kafka-connect on different servers?

I want to know if Kafka and Kafka-connect can run on different servers? So a connector would be started on server A and send data from a kafka topic on server B to HDFS or S3 etc. Thanks
Yes, and for Production deployments this is typically recommended for resource reasons. Generally you'd deploy a cluster of Kafka Brokers (3+ for HA), and then a cluster of Kafka Cluster workers (as many as needed for throughput capacity / resilience) -- all on separate nodes.
For more details, see the Confluent Enterprise Reference Architecture.
Yes, you can do it.
I have my set of kafka servers and kafka connect applications are running in different machines and writing data in hdfs. you have to mention list of brokers in bootstrap.servers under worker properties file (config/connect-distributed.properties or config/connect-standalone.properties) instead of localhost:9092