Using Amazon MSK and Debezium SQL Server Connector. Error while fetching metadata with correlation id 7 : {TestKafkaDB= UNKNOWN_TOPIC_OR_PARTITION} - apache-kafka

I was trying to connect my RDS MS SQL server with Debezium SQL Server Connector to stream changes to Kafka Cluster on Amazon MSK.
I configured connector and Kafka Connect worker well run the Connect by
bin/connect-standalone.sh ../worker.properties connect/dbzmmssql.properties
Got WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 10 : {TestKafkaDB=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1031)

I've solved this problem and just want to share my possible solution with other fresher with Kafka.
TestKafkaDB=UNKNOWN_TOPIC_OR_PARTITION basically means the connector didn't find the a usable topic in Kafka broker. The reason I am facing this is the Kafka broker didn't automatically create a new topic for the stream.
To solving this, I changed Cluster Configuration in AWS MSK console, change auto.create.topics.enable from default false to true and update this configuration to the Cluster, then my problem solved.

Related

How to make a Data Pipeline from MQTT to KAFKA Broker to MongoDB?

How can I make a data pipeline, I am sending data from MQTT to KAFKA topic using Source Connector. and on the other side, I have also connected Kafka Broker to MongoDB using Sink Connector. I am having trouble making a data pipeline that goes from MQTT to KAFKA and then MongoDB. Both connectors are working properly individually. How can I integrate them?
here is my MQTT Connector
MQTT Connector
Node 1 MQTT Connector
Message Published from MQTT
Kafka Consumer
Node 2 MongoDB Connector
MongoDB
that is my MongoDB Connector
MongoDB Connector
It is hard to tell what exactly the problem is without more logs, please provide your connect.config as well, please check /status of your connector, I still did not understand exactly what the issue you are facing, you are saying that , MQTT SOURCE CONNECTOR sending messages successfully to KAFKA TOPIC and your MONGO DB SINK CONNECTOR successfully reading this KAFKA TOPIC and write to your mobgodb, hence your pipeline, Where is the error? Is your KAFKA is the same KAFKA? Or separated different KAFKA CLUSTERS? Seems like both localhost, but is it the same machine?
Please elaborate and explain what are you expecting? What does "pipeline" means in your word?
You need both connectors to share same kafka cluster, what does node1 and node2 mean is it seperate kafka instance? Your connector need to connect to the same kafka "node" / cluster in order to share the data inside the kafka topic one for input and one for output, share your bootstrap service parameters, share your server.properties as well of the kafka
In order to run two different connect clusters inside same kafka , you need to set in different internal topics for each connect cluster
config.storage.topic
offset.storage.topic
status.storage.topic

Using Debezium MySQL connector with MSK shows "INVALID_REPLICATION_FACTOR"

I'm using Debezium MySQL with MSK, very simple setup. Connection to MySQL (Aurora) is tested fine. Kafka topics creation, listing are both ok.
However, when I run the connector, after a lot of scrolling info I get
WARN [Producer clientId=xxx] Error while fetching metadata with correlation id 1 : {xxx.xxx=INVALID_REP
LICATION_FACTOR} (org.apache.kafka.clients.NetworkClient:1100)
A lot of them keeps showing up and the connector does not work properly.
After quite a while I found out the reason for this is that default replication factor setting on MSK follows Kafka best practice which is 3, but I only created 2 brokers.
The configuration stayed 3 and when connector tried to auto-create a topic with 3 replicas it fails. The strange thing is even when I manually created a topic with replication factor 2, the connector would throw the very same warning.
It seems that the internal topics are always attempted by Debezium connector.
Creating a new revision and set replication factor as 2 solved the problem.

kafka connect mongo on kafka MSK

I am using Kafka MSK in AWS. So we don't have native kafka connect with all required connectors like on confluent.
Actually I work with kakfa mongo connector and I want to find a way to push the kafka mongo connector jar to an on an instance of kafka MSK cluster.
The path to which the jar will be pushed is the plugins.path as defined in the properties of the used connector.
ANy way to make it please ?
MSK doesn't give you a hosted Kafka Connect worker. You'd need to provision and run this yourself, e.g. on EC2. This work would then connect to your Kafka cluster (MSK in this case)
To be clear: MSK is only the hosted Kafka brokers (and Zookeeper). It does not include Kafka Connect, which is what you need in order to run connectors.

Kafka-MongoDB Debezium Connector : distributed mode

I am working on debezium mongodb source connector. Can I run connector in local machine in distributed mode by giving kafka bootstrap server address as remote machine (deployed in Kubernetes) and remote MongoDB url?
I tried this and I see connector starts successfully, no errors, just few warnings but no data is flowing from mongodb.
Using below command to run connector
./bin/connect-distributed ./etc/schema-registry/connect-avro-distributed.properties ./etc/kafka/connect-mongodb-source.properties
If not how else can I achieve this, I donot want to install local kafka or mondoDB as most of the tutorial suggest. I want to use our test servers for this.
Followed below tutorial for this
: https://medium.com/tech-that-works/cloud-kafka-connector-for-mongodb-source-8b525b779772
Below are more details for the issue
Connector works fine, I see below lines at the end of connector log
INFO [Worker clientId=connect-1, groupId=connect-cluster] Starting connectors and tasks using config offset -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1000)
] INFO [Worker clientId=connect-1, groupId=connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1021)
I have also defined MongoDB config in /etc/kafka/connect-mongodb-source.properties as follows
name=mongodb-source-connector
connector.class=io.debezium.connector.mongodb.MongoDbConnector
mongodb.hosts=/remoteserveraddress:27017
mongodb.name=mongo_conn
initial.sync.max.threads=1
tasks.max=1
But Data is not flowing between MongoDB and Kafka. I have also posted saperate question for this Kafka-MongoDB Debezium Connector : distributed mode
Any pointers are appriciated
connect-distributed only accepts a single property file.
You must use the REST API to configure Kafka Connect in Distributed mode.
https://docs.confluent.io/current/connect/references/restapi.html
Note: by default, the consumer will read the latest data off the topic, not existing data.
You would add this to the connect-avro-distributed.properties to fix it
consumer.auto.offset.reset=earliest

How to use Kafka connect to transmit data to Kafka broker in another machine?

I'm trying to use Kafka connect in Confluent platform 3.2.1 and everything works fine in my local env. Then I encountered this problem when I try to use Kafka source connector to send data to another machine.
I deploy Kafka JDBC source connector in machine A and trying to capture database A. Then I deploy a Kafka borker B(along with zk, schema registry) in machine B. The source connector cannot send data to broker B and throws the following exception:
[2017-05-19 16:37:22,709] ERROR Failed to commit offsets for WorkerSourceTask{id=test-multi-0} (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:112)
[2017-05-19 16:38:27,711] ERROR Failed to flush WorkerSourceTask{id=test-multi-0}, timed out while waiting for producer to flush outstanding 3 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:304)
I tried config the server.properties in broker B like this:
listeners=PLAINTEXT://:9092
and leave the advertised.listeners setting commented.
Then I use
bootstrap.servers=192.168.19.234:9092
in my source connector where 192.168.19.234 is the IP of machine B. Machine A and B are in the same subnet.
I suspect this has something to do with my server.properties.
How should I config to get the things done? Thanks in advance.