Debezium Mongo Connector can't read oplog, even using changestream - mongodb

I need to implement a CDC pattern but can't manage to make it work : my debezium worker is up and running, but my connectors still fail, despite my efforts.
I've tested a simple "watch" on my mongo cluster and it works :
watchCursor = db.mydb.watch()
while (!watchCursor.isExhausted()){
if (watchCursor.hasNext()){
print(watchCursor.next());
}
}
So, i can tell that my user has rights on the clusters to watch changestreams.
I still have an error when running my tasks :
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "10.114.129.247:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:115)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: io.debezium.DebeziumException: org.apache.kafka.connect.errors.ConnectException: Error while attempting to get oplog position\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:153)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:135)\n\tat io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:108)\n\t... 5 more\nCaused by: org.apache.kafka.connect.errors.ConnectException: Error while attempting to get oplog position\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$establishConnectionToPrimary$3(MongoDbSnapshotChangeEventSource.java:234)\n\tat io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:292)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$determineSnapshotOffsets$6(MongoDbSnapshotChangeEventSource.java:295)\n\tat java.base/java.util.HashMap$Values.forEach(HashMap.java:976)\n\tat io.debezium.connector.mongodb.ReplicaSets.onEachReplicaSet(ReplicaSets.java:115)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.determineSnapshotOffsets(MongoDbSnapshotChangeEventSource.java:290)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.doExecute(MongoDbSnapshotChangeEventSource.java:99)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.doExecute(MongoDbSnapshotChangeEventSource.java:52)\n\tat io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:76)\n\t... 8 more\nCaused by: com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on local to execute command { find: \"oplog.rs\", filter: {}, sort: { $natural: -1 }, limit: 1, singleBatch: true, $db: \"local\", lsid: { id: UUID(\"de332d4a-32ec-424f-ae09-b32376444d11\") }, $readPreference: { mode: \"primaryPreferred\" } }' on server rc1a-REDACTED.mdb.yandexcloud.net:27018\n\tat com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:663)\n\tat com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:653)\n\tat com.mongodb.internal.operation.OperationHelper.withReadConnectionSource(OperationHelper.java:583)\n\tat com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:653)\n\tat com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:81)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:184)\n\tat com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:200)\n\tat io.debezium.connector.mongodb.MongoDbSnapshotChangeEventSource.lambda$determineSnapshotOffsets$5(MongoDbSnapshotChangeEventSource.java:297)\n\tat io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:288)\n\t... 15 more\n"
}
]
Simply said, i do not have rights on oplog.rs. Despite i'm not using oplog, but "changestream".
Here is my configuration
{
"name": "account3-connector",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"errors.log.include.messages": "true",
"transforms.unwrap.delete.handling.mode": "rewrite",
"mongodb.password": "REDACTED",
"transforms": "unwrap,idToKey,extractIdKey",
"capture.mode": "change_streams_update_full",
"collection.include.list": "account.account",
"mongodb.ssl.enabled": "false",
"transforms.idToKey.fields": "id",
"transforms.unwrap.type": "io.debezium.connector.mongodb.transforms.ExtractNewDocumentState",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.extractIdKey.field": "id",
"database.include.list": "account",
"errors.log.enable": "true",
"mongodb.hosts": "rc1a-REDACTED.mdb.yandexcloud.net:27018,rc1b-REDACTED.mdb.yandexcloud.net:27018,rc1c-REDACTED.mdb.yandexcloud.net:27018",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.idToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"mongodb.user": "debezium",
"mongodb.name": "loyalty.raw.last",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "account3-connector",
"errors.tolerance": "all",
"transforms.extractIdKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key"
},
"tasks": [],
"type": "source"
}
Do any of you have an idea ? What could be wrong ?
I'm using Debezium 1.8.1 on Confluent base image 6.2.0
FROM confluentinc/cp-kafka-connect-base:6.2.0

I've run in to the same problem. I don't have a fix, but it seems that during initialization the connector tries to query the oplog even when using change streams.
This is the full stack trace:
com.mongodb.MongoQueryException: Query failed with error code 13 with name 'Unauthorized' and error message 'not authorized on local to execute command { find: \"oplog.rs\", filter: {}, sort: { $natural: -1 }, limit: 1, singleBatch: true, $db: \"local\", lsid: { id: UUID(\"51399eae-bb7a-4706-b35c-048e878d23a8\") }, $readPreference: { mode: \"primaryPreferred\" } }' on server pl-0-westeurope-azure.pi8eh.mongodb.net:1025
at com.mongodb.internal.operation.FindOperation.lambda$execute$1(FindOperation.java:699)
at com.mongodb.internal.operation.OperationHelper.lambda$withSourceAndConnection$2(OperationHelper.java:566)
at com.mongodb.internal.operation.OperationHelper.withSuppliedResource(OperationHelper.java:591)
at com.mongodb.internal.operation.OperationHelper.lambda$withSourceAndConnection$3(OperationHelper.java:565)
at com.mongodb.internal.operation.OperationHelper.withSuppliedResource(OperationHelper.java:591)
at com.mongodb.internal.operation.OperationHelper.withSourceAndConnection(OperationHelper.java:564)
at com.mongodb.internal.operation.FindOperation.lambda$execute$2(FindOperation.java:690)
at com.mongodb.internal.async.function.RetryingSyncSupplier.get(RetryingSyncSupplier.java:65)
at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:722)
at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:86)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:191)
at com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:213)
at io.debezium.connector.mongodb.MongoUtil.getOplogEntry(MongoUtil.java:236)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.lambda$initializeOffsets$4(MongoDbStreamingChangeEventSource.java:306)
at io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:317)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.lambda$initializeOffsets$5(MongoDbStreamingChangeEventSource.java:305)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:977)
at io.debezium.connector.mongodb.ReplicaSets.onEachReplicaSet(ReplicaSets.java:120)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.initializeOffsets(MongoDbStreamingChangeEventSource.java:300)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.execute(MongoDbStreamingChangeEventSource.java:89)
at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.execute(MongoDbStreamingChangeEventSource.java:51)
at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:174)
at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:141)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
The MongoUtil.getOplogEntry method is called which queries the "oplog.rs" collection from the "local" database.
My guess is that the user does not have permission to perform this query.
I don't which permission is required, or if it is possible to skip this step somehow.
From the documentation, it seems that adding permission to read the oplog should fix the problem, but I have not confirmed this yet.

Related

For Kafka mongo sink connector I want to map same Object ID which is existing

I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}
Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy

MongoDB Kafka Connect - Sink connector failing on updates

I am new to Kafka connect.
I am trying sync the change stream from 1 mongo collection to another using Kafka connectors, both Inserts and updates operations
Source config-
{
"name": "mongo-sourceV2",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "mongodb://mongo1:27017/?replicaSet=rs0",
"database": "quickstart",
"collection": "transactionV2",
"pipeline": "[{\"$match\":{\"operationType\": { \"$in\": [ \"update\",\"insert\" ]}}}]"
}}
Sink Config -
{
"name": "mongo-sinkV2",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri": "mongodb://mongo1:27017/?replicaSet=rs0",
"database": "quickstart",
"collection": "transactionV1",
"topics": "quickstart.transactionV2",
"errors.tolerance": "all",
"errors.log.enable": true,
"mongo.errors.tolerance": "all",
"mongo.errors.log.enable": true,
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
}}
Kafka Topic Event for update -
{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004\"}, \"operationType\": \"update\", \"clusterTime\": {\"$timestamp\": {\"t\": 1654985463, \"i\": 1}}, \"ns\": {\"db\": \"quickstart\", \"coll\": \"transactionV2\"}, \"documentKey\": {\"_id\": {\"$oid\": \"62a512a5f74e67e722b3b676\"}}, \"updateDescription\": {\"updatedFields\": {\"amount\": 10001}, \"removedFields\": [], \"truncatedArrays\": []}}"}
My Inserts are streaming fine but the updates are failing on the sink connector side with Exception
[2022-06-11 22:11:07,195] ERROR Unable to process record SinkRecord{kafkaOffset=9, timestampType=CreateTime} ConnectRecord{topic='quickstart.transactionV2', kafkaPartition=0, key={"_id": {"_data": "8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004"}}, keySchema=Schema{STRING}, value={"_id": {"_data": "8262A512F7000000012B022C0100296E5A1004195DB8CC822F4A4FAE4ECCE5917B98A946645F6964006462A512A5F74E67E722B3B6760004"}, "operationType": "update", "clusterTime": {"$timestamp": {"t": 1654985463, "i": 1}}, "ns": {"db": "quickstart", "coll": "transactionV2"}, "documentKey": {"_id": {"$oid": "62a512a5f74e67e722b3b676"}}, "updateDescription": {"updatedFields": {"amount": 10001}, "removedFields": [], "truncatedArrays": []}}, valueSchema=Schema{STRING}, timestamp=1654985467191, headers=ConnectHeaders(headers=)} (com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData)
org.apache.kafka.connect.errors.DataException: Warning unexpected field(s) in updateDescription [truncatedArrays]. {"updatedFields": {"amount": 10001}, "removedFields": [], "truncatedArrays": []}. Cannot process due to risk of data loss.
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.Update.perform(Update.java:57)
at com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler.handle(ChangeStreamHandler.java:84)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$3(MongoProcessedSinkRecordData.java:99)
at java.base/java.util.Optional.flatMap(Optional.java:294)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$4(MongoProcessedSinkRecordData.java:99)
Got the update from mongodb community.
The issue is in pipeline
https://www.mongodb.com/community/forums/t/mongodb-kafka-connect-changestreamhandler-do-not-support-truncatedarrays/169214/3

Kafka Connect Mongo Sink connector is failing due to schema

I'm doing my first steps in the kafka-connect area and things are not going as expected.
I created a Mongo sink connector (kafka to mongo), but it fails to consume the messages with the following error:
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic test_topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:114)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
I don't wanna use any schema but for some reason it fails on the deserialize, probably because the schema config is missing.
My configurations:
{
"name": "kafka-to-mongo",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"connection.uri": "mongodb://mongodb4:27017/test-db",
"database": "auto-resolve",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"collection": "test_collection",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list": "id",
"document.id.strategy.partial.value.projection.type": "AllowList",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}
}
Example of a message in the topic:
{
"_id": "62a6e4d88c1e1e0011902616",
"type": "alert",
"key": "test444",
"timestamp": 1655104728,
"source_system": "api.test",
"tags": [
{
"type": "aaa",
"value": "bbb"
},
{
"type": "fff",
"value": "rrr"
}
],
"ack": "no",
"main": "nochange",
"all_ids": []
}
Any ideas? Thanks!

Use multiple collections with MongoDB Kafka Connector

According with the documentation if you don't provide a value it will read from all collections
"name of the collection in the database to watch for changes. If not set then all collections will be watched."
I saw the connector source code and I confirmed this:
https://github.com/mongodb/mongo-kafka/blob/k133/src/main/java/com/mongodb/kafka/connect/source/MongoSourceTask.java#L462
However if the collection is not provided I got an error like this:
ERROR WorkerSourceTask{id=mongo-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: com.mongodb.MongoCommandException: Command failed with error 73 (InvalidNamespace): '{aggregate: 1} is not valid for '$changeStream'; a collection is required.' on server localhost:27018. The full response is {"operationTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "ok": 0.0, "errmsg": "{aggregate: 1} is not valid for '$changeStream'; a collection is required.", "code": 73, "codeName": "InvalidNamespace", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "signature": {"hash": {"$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00"}, "keyId": {"$numberLong": "0"}}}}
This is my configuration file
name=mongo-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb://localhost:27017,localhost:27018/order
database=order
collection=
topic.prefix=redemption
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
collation=
copy.existing=true
errors.tolerance=all
If a collection is used, I'm able to use the connector and generate topics.
Seeing the logs it appears the connector is connecting to the db:
INFO Watching for database changes on 'order' (com.mongodb.kafka.connect.source.MongoSourceTask:620)
Source Code
else if (collection.isEmpty()) {
LOGGER.info("Watching for database changes on '{}'", database);
MongoDatabase db = mongoClient.getDatabase(database);
changeStream = pipeline.map(db::watch).orElse(db.watch());
} else
If I go to my mongo console, I'm having the following:
rs0:SECONDARY> db.watch()
2020-10-28T18:13:50.344-0600 E QUERY [thread1] TypeError: db.watch is not a function :
#(shell):1:1
rs0:SECONDARY> db.watch
test.watch
I was using mongo 3.6 version which supports to watch collections but doesn't support to watch databases or deployments (instances), therefore I was getting those errors.
I found this on the documentation:
Starting in MongoDB 4.0, you can open a change stream cursor for a single database (excluding admin, local, and config database) to watch for changes to all its non-system collections.
https://docs.mongodb.com/manual/changeStreams/#watch-collection-database-deployment
You can listen to multiple change streams from multiple mongo collections. You just need to provide the Regex for the collection names in pipeline, you can even provide the Regex for database names if you have multiple databases to listen to.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
You can even exclude any given database using $nin, which you dont want to listen for any change-stream.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
Here is the complete Kafka connector configuration.
Mongo to Kafka source connector
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>#ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
}
}
You can get more details from official docs.
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
https://docs.confluent.io/platform/current/connect/index.html

Kafka Connect JDBC Sink "schema does not exist"

I am tryin to sink data into postgresql with kafka connect but I am getting the error that the schema does not exist.
Is it possible that the name of the topic, that includes dots, makes the problem, because the error mentioned that the schema "logstash" does not exist, and this is the string till the first dot?
ERROR:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:568)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
... 10 more
Caused by: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
... 12 more
Sink config:
{
"name": "jdbc.apache.access.log.sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "logstash.apache.access.log",
"connection.url": "jdbc:postgresql://<IP_OF_POSTGRESQL>:5432/kafka",
"connection.user": "kafka",
"connection.password": "<PASSWORD>",
"insert.mode": "upsert",
"pk.mode": "kafka",
"auto.create": true,
"auto.evolve": true,
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true"
}
}
Schema (called with API):
{
"subject": "logstash.apache.access.log-value",
"version": 3,
"id": 3,
"schema": "{\"type\":\"record\",\"name\":\"log\",\"namespace\":\"value_logstash.apache.access\",\"fields\":[{\"name\":\"clientip\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"verb\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"response\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"request\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"bytes\",\"type\":[\"null\",\"string\"],\"default\":null}]}"
}
EDITED:
I tried to create a new topic with underscores. It looks like the dots are really the cause of the error. Is there any solution I can avoid it or do I made a mistake in my configuration...?
You should be able to use a RegexRouter SMT to rename a topic to something without periods before the Sink action of the database write.