DC replication via Kafka Connect for Apache Ignite - apache-kafka

We are trying to implement the replication of data between two Ignite data clusters.
For this purpose, we are using Kafka Connect.
We have followed the things mentioned in this document -> https://dzone.com/articles/linking-apache-ignite-and-apache-kafka-for-highly
Everything is working fine till we use one cache and PUT operation.
But when I use the same for REMOVED operation, in the consumer thread of the connector, I can see the CacheEvent record, but the data is not being removed from the Sink Cluster nodes.
Can someone please help with this case?

It might be an issue with the Ignite Kafka integration. Try to collect all the relevant details and report to the Ignite community via the user list.
In the meantime, if the issue is not solved you can consider other replication options:
Replication via the certified GridGain Kafka Connect integration.
Datacenter replication feature by GridGain.

Related

Switching Kafka from Kraft to Zookeeper

I have a recent Kafka cluster which uses Kraft. I am facing some problems with it and it is possibly due to use of Kraft. I wish to switch to Zookeeper without losing data. Downtime is okay. How do I go about it?
I'm afraid there isn't a documented process to downgrade a cluster from KRaft to ZooKeeper.
If you've found an issue with KRaft, you should report it to the Kafka project via a Jira ticket so it can get fixed.
Assuming your KRaft cluster is somewhat functional, a way to preserve your data is to create a new cluster (running ZooKeeper) and use a tool like MirrorMaker to migrate your data.

Best practise and methods for Kafka parameters and monitoring

I am going to implement a Snowflake Kafka connector with the continuous ingestion of data to target database snowflake.
What are the best practices for :
Kafka for its clusters
Kafka and its related parameters
Monitoring resources
Kafka for its clusters
Run at least 3 brokers
Kafka and its related parameters
That's too broad and has nothing to do with running a Connect cluster or implementing one. The defaults are mostly fine. You can find the production recommendations in the Kafka documentation.
Monitoring resources
Use JMX. https://docs.confluent.io/platform/current/kafka/monitoring.html
going to implement a Snowflake Kafka connector
Snowflake already has a connector... I'd start by forking rather than making your own

Running Source Connector on Demand and Not Based on poll.interval.ms

I have a table that is updated once / twice a day, but I want the data to be pushed to Kafka immediately after the table is updated. Is it possible to avoid running the connector every poll.interval.ms, but rather to run it only after the table is updated (sync on demand or trigger the sync in some other way after the table update)
I apologize if this question is stupid... Can sink connector be running on one Kafka cluster, but pull messages from another Kafka cluster and insert them into Postgres. I'm not talking about replicating messages from Cluster A to Cluster B and then inserting messages from Cluster B to Postgres. I'm talking about Connector running on Cluster B but pulling messages from Cluster A and writing them to Postgres.
Thanks!
If you use log-based change data capture (Debezium, etc) then you capture changes as soon as they are there, without needing to re-query the database. If you use query-based CDC then you do have to query the database on a polling interval. For query-based vs log-based CDC see this blog or talk.
One option would be to use the Kafka Connect REST API to control the connector - but you're kind of going against the streaming paradigm here and will start to find awkward edges in doing this. For example, when do you decide to pause the connector? How do you determine that it's ingested all the changes? etc.
Using log-based CDC is low-impact on the source system and commonly the route that people go.
Kafka Connect does not run on your Kafka cluster. Kafka Connect runs as its own cluster. Physically, it can be co-located for purposes of dev/sandbox environment (this ref arch is useful for production). See also this talk "Running Kafka Connect".
So in your example, "Cluster B" is actually a Kafka Connect cluster - and it would be configured to read from Kafka cluster "A", and that is fine.

Kafka Connect with Amazon MSK

How do I use Kafka Connect adapters with Amazon MSK?
As per the AWS documentation, it supports Kafka connect but not documented about how to setup adapters and use it.
Edit Oct 2021: MSK Connect has been launched, see https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
AFAIK Amazon MSK does not provide managed connectors, so you have to run them yourself. This is done by running the Kafka Connect worker process (a JVM) and then providing it one or more connector configurations to run.
From the point of view of a Kafka Connect worker it just needs a Kafka cluster to connect to; it shouldn't matter whether it's MSK or on-premises, since it's ultimately 'just' a consumer/producer underneath.
You can see more, including a live demo, here: https://rmoff.dev/bbuzz19-kafka-connect
For an example of configuring Kafka Connect to use a cloud-hosted Kafka platform (in this case, Confluent Cloud), see this article.
If you are interested in managed connectors in the Cloud, check out the connectors that are provided in Confluent Cloud.
Disclaimer: I work for Confluent :)
AWS now supports MSK Connect, a new feature of MSK service based on Kafka Connect allowing you to deploy managed Kafka connectors built for Kafka connect
Check the announcement here: https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/
There are two aspects to this
Kafka Connect is a framework which should be deployed separately from kafka brokers. MSK only provides kafka brokers. If you want to use Kafka Connect with MSK you would need to use EC2 instances and deploy the kafka binaries.Kafka Connect framework is bundled along with kafka
Coming to connectors if you donot have a confluent subscription or similar - I am afraid your choices get very limited. But having said you can always write your own connectors. Writing new connectors is not that difficult rather you can apply your business specific logic and be on your way quite quickly.

Migrating topics,ACL and messages from apache kafka to confluent platform

We are migrating our application from Apache Kafka to Confluent Platform .
Apache Kafka version:1.1.0
Confluent :4.1.0
Tried these options:
Manually copying the zookeeper logs and Kafka Logs- Not an optimal way
because of volume and data correctness.
Mirror Maker - This will replicate newly created topics and ACL. It will not
migrate old details in Apache Kafka
Please suggest better approaches on this.
You can keep your existing Kafka and Zookeeper installation.
Confluent does not change any way these run or manage data.
You can configure the REST Proxy, Schema Registry, Control Center, KSQL, etc. to use your existing bootstrap servers or Zookeeper connection; nothing should need migrated, you're only adding extra consumer/producer services which just happen to be provided by Confluent.
If you later plan on upgrading your brokers, then you can start up new ones from the Confluent package, migrate the partitions, then shut down the old ones. Similarly for Zookeeper, but make sure that you have at least 2 up during this process, and always have an odd number of them available after your transition