Use multiple collections with MongoDB Kafka Connector

Use multiple collections with MongoDB Kafka Connector - mongodb

According with the documentation if you don't provide a value it will read from all collections
"name of the collection in the database to watch for changes. If not set then all collections will be watched."
I saw the connector source code and I confirmed this:
https://github.com/mongodb/mongo-kafka/blob/k133/src/main/java/com/mongodb/kafka/connect/source/MongoSourceTask.java#L462
However if the collection is not provided I got an error like this:
ERROR WorkerSourceTask{id=mongo-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: com.mongodb.MongoCommandException: Command failed with error 73 (InvalidNamespace): '{aggregate: 1} is not valid for '$changeStream'; a collection is required.' on server localhost:27018. The full response is {"operationTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "ok": 0.0, "errmsg": "{aggregate: 1} is not valid for '$changeStream'; a collection is required.", "code": 73, "codeName": "InvalidNamespace", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "signature": {"hash": {"$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00"}, "keyId": {"$numberLong": "0"}}}}
This is my configuration file
name=mongo-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb://localhost:27017,localhost:27018/order
database=order
collection=
topic.prefix=redemption
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
collation=
copy.existing=true
errors.tolerance=all
If a collection is used, I'm able to use the connector and generate topics.
Seeing the logs it appears the connector is connecting to the db:
INFO Watching for database changes on 'order' (com.mongodb.kafka.connect.source.MongoSourceTask:620)
Source Code
else if (collection.isEmpty()) {
LOGGER.info("Watching for database changes on '{}'", database);
MongoDatabase db = mongoClient.getDatabase(database);
changeStream = pipeline.map(db::watch).orElse(db.watch());
} else
If I go to my mongo console, I'm having the following:
rs0:SECONDARY> db.watch()
2020-10-28T18:13:50.344-0600 E QUERY [thread1] TypeError: db.watch is not a function :
#(shell):1:1
rs0:SECONDARY> db.watch
test.watch

I was using mongo 3.6 version which supports to watch collections but doesn't support to watch databases or deployments (instances), therefore I was getting those errors.
I found this on the documentation:
Starting in MongoDB 4.0, you can open a change stream cursor for a single database (excluding admin, local, and config database) to watch for changes to all its non-system collections.
https://docs.mongodb.com/manual/changeStreams/#watch-collection-database-deployment

You can listen to multiple change streams from multiple mongo collections. You just need to provide the Regex for the collection names in pipeline, you can even provide the Regex for database names if you have multiple databases to listen to.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
You can even exclude any given database using $nin, which you dont want to listen for any change-stream.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
Here is the complete Kafka connector configuration.
Mongo to Kafka source connector
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>#ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
}
}
You can get more details from official docs.
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
https://docs.confluent.io/platform/current/connect/index.html

Related

For Kafka mongo sink connector I want to map same Object ID which is existing

I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}

Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy

MongoDB Kafka Connect - Sink connector failing on updates

I am new to Kafka connect.
I am trying sync the change stream from 1 mongo collection to another using Kafka connectors, both Inserts and updates operations
Source config-
{
"name": "mongo-sourceV2",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "mongodb://mongo1:27017/?replicaSet=rs0",
"database": "quickstart",
"collection": "transactionV2",
"pipeline": "[{\"$match\":{\"operationType\": { \"$in\": [ \"update\",\"insert\" ]}}}]"
}}
Sink Config -
{
"name": "mongo-sinkV2",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri": "mongodb://mongo1:27017/?replicaSet=rs0",
"database": "quickstart",
"collection": "transactionV1",
"topics": "quickstart.transactionV2",
"errors.tolerance": "all",
"errors.log.enable": true,
"mongo.errors.tolerance": "all",
"mongo.errors.log.enable": true,
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
}}
Kafka Topic Event for update -
{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004\"}, \"operationType\": \"update\", \"clusterTime\": {\"$timestamp\": {\"t\": 1654985463, \"i\": 1}}, \"ns\": {\"db\": \"quickstart\", \"coll\": \"transactionV2\"}, \"documentKey\": {\"_id\": {\"$oid\": \"62a512a5f74e67e722b3b676\"}}, \"updateDescription\": {\"updatedFields\": {\"amount\": 10001}, \"removedFields\": [], \"truncatedArrays\": []}}"}
My Inserts are streaming fine but the updates are failing on the sink connector side with Exception
[2022-06-11 22:11:07,195] ERROR Unable to process record SinkRecord{kafkaOffset=9, timestampType=CreateTime} ConnectRecord{topic='quickstart.transactionV2', kafkaPartition=0, key={"_id": {"_data": "8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004"}}, keySchema=Schema{STRING}, value={"_id": {"_data": "8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004"}, "operationType": "update", "clusterTime": {"$timestamp": {"t": 1654985463, "i": 1}}, "ns": {"db": "quickstart", "coll": "transactionV2"}, "documentKey": {"_id": {"$oid": "62a512a5f74e67e722b3b676"}}, "updateDescription": {"updatedFields": {"amount": 10001}, "removedFields": [], "truncatedArrays": []}}, valueSchema=Schema{STRING}, timestamp=1654985467191, headers=ConnectHeaders(headers=)} (com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData)
org.apache.kafka.connect.errors.DataException: Warning unexpected field(s) in updateDescription [truncatedArrays]. {"updatedFields": {"amount": 10001}, "removedFields": [], "truncatedArrays": []}. Cannot process due to risk of data loss.
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.Update.perform(Update.java:57)
at com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler.handle(ChangeStreamHandler.java:84)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$3(MongoProcessedSinkRecordData.java:99)
at java.base/java.util.Optional.flatMap(Optional.java:294)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$4(MongoProcessedSinkRecordData.java:99)

Got the update from mongodb community.
The issue is in pipeline
https://www.mongodb.com/community/forums/t/mongodb-kafka-connect-changestreamhandler-do-not-support-truncatedarrays/169214/3

multiple collections mongodb to Kafka topic

The application writes data every month to a new collection (for example, journal_2205, journal_2206). Is it possible to configure the connector so that it reads the oplog from the new collection and writes to one topic? I use the connector
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
Thank you!

Yes, this is possible, you can listen to multiple change streams from multiple mongo collections. You just need to provide the Regex for the collection names in pipeline, you can even provide the Regex for database names if you have multiple databases.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
You can even exclude any given database using $nin, which you dont want to listen for any change-stream.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
Here is the complete Kafka connector configuration.
Mongo to Kafka source connector
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>#ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
}
}
You can get more details from official docs.
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
https://docs.confluent.io/platform/current/connect/index.html

Debezium Mongo Connector can't read oplog, even using changestream

I need to implement a CDC pattern but can't manage to make it work : my debezium worker is up and running, but my connectors still fail, despite my efforts.
I've tested a simple "watch" on my mongo cluster and it works :
watchCursor = db.mydb.watch()
while (!watchCursor.isExhausted()){
if (watchCursor.hasNext()){
print(watchCursor.next());
}
}
So, i can tell that my user has rights on the clusters to watch changestreams.
I still have an error when running my tasks :
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "10.114.129.247:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:115)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: Error while attempting to get oplog position\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:153)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:135)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:108)\n\t... 5 more\nCaused by: org.apache.kafka.connect.errors.ConnectException: Error while attempting to get oplog position\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$establishConnectionToPrimary$3(MongoDbSnapshotChangeEventSource.java:234)\n\tat io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:292)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$determineSnapshotOffsets$6(MongoDbSnapshotChangeEventSource.java:295)\n\tat java.base/java.util.HashMap$Values.forEach(HashMap.java:976)\n\tat io.debezium.connector.mongodb.ReplicaSets.onEachReplicaSet(ReplicaSets.java:115)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.determineSnapshotOffsets(MongoDbSnapshotChangeEventSource.java:290)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.doExecute(MongoDbSnapshotChangeEventSource.java:99)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.doExecute(MongoDbSnapshotChangeEventSource.java:52)\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)\n\t... 8 more\nCaused by: com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on local to execute command { find: \"oplog.rs\", filter: {}, sort: { $natural: -1 }, limit: 1, singleBatch: true, $db: \"local\", lsid: { id: UUID(\"de332d4a-32ec-424f-ae09-b32376444d11\") }, $readPreference: { mode: \"primaryPreferred\" } }' on server rc1a-REDACTED.mdb.yandexcloud.net:27018\n\tat com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:663)\n\tat com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:653)\n\tat com.mongodb.internal.operation.OperationHelper.withReadConnectionSource(OperationHelper.java:583)\n\tat com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:653)\n\tat com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:81)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:184)\n\tat com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:200)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$determineSnapshotOffsets$5(MongoDbSnapshotChangeEventSource.java:297)\n\tat io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:288)\n\t... 15 more\n"
}
]
Simply said, i do not have rights on oplog.rs. Despite i'm not using oplog, but "changestream".
Here is my configuration
{
"name": "account3-connector",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"errors.log.include.messages": "true",
"transforms.unwrap.delete.handling.mode": "rewrite",
"mongodb.password": "REDACTED",
"transforms": "unwrap,idToKey,extractIdKey",
"capture.mode": "change_streams_update_full",
"collection.include.list": "account.account",
"mongodb.ssl.enabled": "false",
"transforms.idToKey.fields": "id",
"transforms.unwrap.type": "io.debezium.connector.mongodb.transforms.ExtractNewDocumentState",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.extractIdKey.field": "id",
"database.include.list": "account",
"errors.log.enable": "true",
"mongodb.hosts": "rc1a-REDACTED.mdb.yandexcloud.net:27018,rc1b-REDACTED.mdb.yandexcloud.net:27018,rc1c-REDACTED.mdb.yandexcloud.net:27018",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.idToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"mongodb.user": "debezium",
"mongodb.name": "loyalty.raw.last",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "account3-connector",
"errors.tolerance": "all",
"transforms.extractIdKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key"
},
"tasks": [],
"type": "source"
}
Do any of you have an idea ? What could be wrong ?
I'm using Debezium 1.8.1 on Confluent base image 6.2.0
FROM confluentinc/cp-kafka-connect-base:6.2.0

I've run in to the same problem. I don't have a fix, but it seems that during initialization the connector tries to query the oplog even when using change streams.
This is the full stack trace:
com.mongodb.MongoQueryException: Query failed with error code 13 with name 'Unauthorized' and error message 'not authorized on local to execute command { find: \"oplog.rs\", filter: {}, sort: { $natural: -1 }, limit: 1, singleBatch: true, $db: \"local\", lsid: { id: UUID(\"51399eae-bb7a-4706-b35c-048e878d23a8\") }, $readPreference: { mode: \"primaryPreferred\" } }' on server pl-0-westeurope-azure.pi8eh.mongodb.net:1025
at com.mongodb.internal.operation.FindOperation.lambda$execute$1(FindOperation.java:699)
at com.mongodb.internal.operation.OperationHelper.lambda$withSourceAndConnection$2(OperationHelper.java:566)
at com.mongodb.internal.operation.OperationHelper.withSuppliedResource(OperationHelper.java:591)
at com.mongodb.internal.operation.OperationHelper.lambda$withSourceAndConnection$3(OperationHelper.java:565)
at com.mongodb.internal.operation.OperationHelper.withSuppliedResource(OperationHelper.java:591)
at com.mongodb.internal.operation.OperationHelper.withSourceAndConnection(OperationHelper.java:564)
at com.mongodb.internal.operation.FindOperation.lambda$execute$2(FindOperation.java:690)
at com.mongodb.internal.async.function.RetryingSyncSupplier.get(RetryingSyncSupplier.java:65)
at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:722)
at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:86)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:191)
at com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:213)
at io.debezium.connector.mongodb.MongoUtil.getOplogEntry(MongoUtil.java:236)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.lambda$initializeOffsets$4(MongoDbStreamingChangeEventSource.java:306)
at io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:317)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.lambda$initializeOffsets$5(MongoDbStreamingChangeEventSource.java:305)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:977)
at io.debezium.connector.mongodb.ReplicaSets.onEachReplicaSet(ReplicaSets.java:120)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.initializeOffsets(MongoDbStreamingChangeEventSource.java:300)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.execute(MongoDbStreamingChangeEventSource.java:89)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.execute(MongoDbStreamingChangeEventSource.java:51)
at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:174)
at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:141)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
The MongoUtil.getOplogEntry method is called which queries the "oplog.rs" collection from the "local" database.
My guess is that the user does not have permission to perform this query.
I don't which permission is required, or if it is possible to skip this step somehow.
From the documentation, it seems that adding permission to read the oplog should fix the problem, but I have not confirmed this yet.

Error handling for invalid JSON in kafka sink connector

I have a sink connector for mongodb, that takes json from a topic and puts it into the mongoDB collection. But, when I send an invalid JSON from a producer to that topic (e.g. with an invalid special character ") => {"id":1,"name":"\"}, the connector stops. I tried using errors.tolerance = all, but the same thing is happening. What should happen is that the connector should skip and log that invalid JSON, and keep the connector running. My distributed-mode connector is as follows:
{
"name": "sink-mongonew_test1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "error7",
"connection.uri": "mongodb://****:27017",
"database": "abcd",
"collection": "abc",
"type.name": "kafka-connect",
"key.ignore": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list": "id",
"value.projection.type": "whitelist",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"delete.on.null.values": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "crm_data_deadletterqueue",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}

Since Apache Kafka 2.0, Kafka Connect has included error handling options, including the functionality to route messages to a dead letter queue, a common technique in building data pipelines.
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
As commented, you're using connect-api-1.0.1.*.jar, version 1.0.1, so that explains why those properties are not working
Your alternatives outside of running a newer version of Kafka Connect include Nifi or Spark Structured Streaming

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Use multiple collections with MongoDB Kafka Connector - mongodb

Related

For Kafka mongo sink connector I want to map same Object ID which is existing

MongoDB Kafka Connect - Sink connector failing on updates

multiple collections mongodb to Kafka topic

Debezium Mongo Connector can't read oplog, even using changestream

Error handling for invalid JSON in kafka sink connector

Categories

Resources