MongoDB Kafka Sink Connector doesn't process the RenameByRegex processor - mongodb

I need to listening events from a Kafka Topic and Sink to a collection in MongoDB. The message contains an nested object with an id property, like in the example above.
{
"testId": 1,
"foo": "bar",
"foos": [{ "id":"aaaaqqqq-rrrrr" }]
}
I'm trying to rename this nested id to _id with RegExp
{
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "mongodb://mongo:27017",
"database": "test_db",
"collection": "test",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list":"testId",
"value.projection.type": "whitelist",
"post.processor.chain": "com.mongodb.kafka.connect.sink.processor.DocumentIdAdder, com.mongodb.kafka.connect.sink.processor.field.renaming.RenameByRegex",
"field.renamer.regexp": "[{\"regexp\":\"\b(id)\b\", \"pattern\":\"\b(id)\b\",\"replace\":\"_id\"}]"
}
And the result of a config/validate is 500 Internal Server Error, with that message:
{
"error_code": 500,
"message": null
}
I missing something or is a issue?

I think all you want is Kafka Connect Single Message Transform (SMT) and more precisely ReplaceField:
Filter or rename fields within a Struct or Map.
The following will replace id field name with _id:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "id:_id"
In your case, before applying the above trasnformation you might also want to Flatten foos:
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "."
and finally apply the transformation for renaming the field:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "foos.id:foos._id"

Related

Only Map objects supported in absence of schema for record conversion to BigQuery format

I'm streaming data from Postgres to Kakfa to Big Query. Most tables in PG have a primary key, as such most tables/topics have an Avro key and value schema, these all go to Big Query fine.
I do have a couple of tables that do not have a PK, and subsequently have no Avro key schema.
When I create a sink connector for those tables the connector errors with,
Caused by: com.wepay.kafka.connect.bigquery.exception.ConversionConnectException: Only Map objects supported in absence of schema for record conversion to BigQuery format.
If I remove the 'key.converter' config then I get 'Top-level Kafka Connect schema must be of type 'struct'' error.
How do I handle this?
Here's the connector config for reference,
{
"project": "staging",
"defaultDataset": "data_lake",
"keyfile": "<redacted>",
"keySource": "JSON",
"sanitizeTopics": "true",
"kafkaKeyFieldName": "_kid",
"autoCreateTables": "true",
"allowNewBigQueryFields": "true",
"upsertEnabled": "false",
"bigQueryRetry": "5",
"bigQueryRetryWait": "120000",
"bigQueryPartitionDecorator": "false",
"name": "hd-sink-bq",
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "<redacted>",
"key.converter.basic.auth.credentials.source": "USER_INFO",
"key.converter.schema.registry.basic.auth.user.info": "<redacted>",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "<redacted>",
"value.converter.basic.auth.credentials.source": "USER_INFO",
"value.converter.schema.registry.basic.auth.user.info": "<redacted>",
"topics": "public.event_issues",
"errors.tolerance": "all",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "connect.bq-sink.deadletter",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true",
"transforms": "tombstoneHandler",
"offset.flush.timeout.ms": "300000",
"transforms.dropNullRecords.predicate": "isNullRecord",
"transforms.dropNullRecords.type": "org.apache.kafka.connect.transforms.Filter",
"transforms.tombstoneHandler.behavior": "drop_warn",
"transforms.tombstoneHandler.type": "io.aiven.kafka.connect.transforms.TombstoneHandler"
}
For my case, I used to handle such case by using the predicate, as following
{
...
"predicates.isTombstone.type":
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone",
"predicates": "isTombstone",
"transforms.x.predicate":"isTombstone",
"transforms.x.negate":true
...
}
This as per the docs here, and the transforms.x.negate will skip such tompStone records.

For Kafka mongo sink connector I want to map same Object ID which is existing

I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}
Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy

Kafka connect failing to create Mongo Source Connector

I'm getting an error while creating a source connector. It works fine in all the environments except one place. I'm using a mongodb user having read-write permission which has all the actions changeStream and find. But still I'm getting this error. Also this is not present in /connector-plugins/{connectorType}/config/validate though.
Config:
{
"name": "mongo-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
}
ERROR
{
"error_code": 400,
"message": "Connector configuration is invalid and contains the following 1 error(s):\nInvalid user permissions. Missing the following action permissions: changeStream, find\nYou can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`"
}
You have to try /connector-plugins/{connectorType}/config/validate with config value in data:
{
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
P.S.: At least you need a field "name" inside config value.

multiple collections mongodb to Kafka topic

The application writes data every month to a new collection (for example, journal_2205, journal_2206). Is it possible to configure the connector so that it reads the oplog from the new collection and writes to one topic? I use the connector
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
Thank you!
Yes, this is possible, you can listen to multiple change streams from multiple mongo collections. You just need to provide the Regex for the collection names in pipeline, you can even provide the Regex for database names if you have multiple databases.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
You can even exclude any given database using $nin, which you dont want to listen for any change-stream.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
Here is the complete Kafka connector configuration.
Mongo to Kafka source connector
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>#ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^journal_.*/}}]}}]"
}
}
You can get more details from official docs.
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
https://docs.confluent.io/platform/current/connect/index.html

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}