Kafka connect failing to create Mongo Source Connector - mongodb

I'm getting an error while creating a source connector. It works fine in all the environments except one place. I'm using a mongodb user having read-write permission which has all the actions changeStream and find. But still I'm getting this error. Also this is not present in /connector-plugins/{connectorType}/config/validate though.
Config:
{
"name": "mongo-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
}
ERROR
{
"error_code": 400,
"message": "Connector configuration is invalid and contains the following 1 error(s):\nInvalid user permissions. Missing the following action permissions: changeStream, find\nYou can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`"
}

You have to try /connector-plugins/{connectorType}/config/validate with config value in data:
{
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"pipeline": "{{pipeline}}", //variable replaced appropriate value
"transforms.dropPrefix.replacement": "{{topic}}", //variable replaced appropriate value
"topic.prefix": "",
"tasks.max": "1",
"poll.await.time.ms": 5,
"connection.uri": "${file:/secrets/secrets.properties:mongo.connection.uri}",
"transforms": "dropPrefix",
"change.stream.full.document": "updateLookup",
"errors.tolerance": "none",
"transforms.dropPrefix.regex": ".*",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter"
}
P.S.: At least you need a field "name" inside config value.

Related

sink json like string to a databae with kafka connect jdbc

i am producing simple plaintext json like data to kafka with simple kafka-console-producer command and i want to sink this data to database table. i have tried many ways to do this. but always i get deserializer error or unknown magic bytes error.
there is no serialization and schema validation on that. but the data is always same type.
we cant change the producer configs to add serializer also.
schema :
{
"type": "record",
"name": "people",
"namespace": "com.cena",
"doc": "This is a sample Avro schema to get you started. Please edit",
"fields": [
{
"name": "first_name",
"type": "string",
"default":null
},
{
"name": "last_name",
"type": "string",
"default":null
},
{
"name": "town",
"type": "string",
"default":null
},
{
"name": "country_code",
"type": "string",
"default":null
},
{
"name": "mobile_number",
"type": "string",
"default":null
}
]
}
Connector :
{
"name": "JdbcSinkConnecto",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"table.name.format": "people",
"topics": "people",
"tasks.max": "1",
"transforms": "RenameField",
"transforms.RenameField.renames": "\"town:city,mobile_number:msisdn\"",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"connection.url": "jdbc:postgresql://localhost:5432/postgres",
"connection.password": "postgres",
"connection.user": "postgres",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://http://localhost:8081"
}
data sample :
{"first_name": "some_name","last_name": "Family","town": "some_city","country_code": "+01","mobile_number": "some_number"}
Is there a way to use kafka connect for this ?
with simple kafka-console-producer
That doesn't use Avro, so I'm not sure why you added an Avro schema to the question.
You also don't show value.converter value, so it's unclear if that is truly JSON or Avro...
You are required to add a schema to the data for JDBC sink. If you use plain JSON and kafka-console-producer, then you need data that looks like {"schema": ... , "payload": { your data here } }, then you need value.converter.schemas.enabled=true for class of JsonConverter
ref. Converters and Serializers Deep Dive
If you want to use Avro, then use kafka-avro-console-producer ... This still accepts JSON inputs, but serializes to Avro (and will fix your magic byte error)
Another option would be to use ksqlDB to first parse the JSON into a defined STREAM with typed and named fields, then you can run the Connector from it in embedded mode
By the way, StringConverter does not use schema registry, so remove schema.registry.url property for it... And if you want to use a registry, don't put http:// twice

For Kafka mongo sink connector I want to map same Object ID which is existing

I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}
Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy

Debezium Postgres and ElasticSearch - Store complex Object in ElasticSearch

I have in Postgres a database with a table "product" which is connected 1 to n with "sales_Channel". So 1 Product can have multiple SalesChannel. Now I want to transfer it to ES and keep it up to date, so I am using debezium and kafka. It is no problem to transfer the single tables to ES. I can query for SalesChannels and Products. But I need Products with all SalesChannels attached as a Result. How get I debezium to transfer this?
mapping for Product
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
}
}
}
}
}
sink for Product
{
"name": "es-sink-product",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "product",
"connection.url": "http://elasticsearch:9200",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.drop.deletes": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id",
"key.ignore": "false",
"type.name": "_doc",
"behavior.on.null.values": "delete"
}
}
you either need to use Outbox pattern, see https://debezium.io/documentation/reference/1.2/configuration/outbox-event-router.html
or you can use aggregate objects, see
https://github.com/debezium/debezium-examples/tree/master/jpa-aggregations
https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join

MongoDB Kafka Sink Connector doesn't process the RenameByRegex processor

I need to listening events from a Kafka Topic and Sink to a collection in MongoDB. The message contains an nested object with an id property, like in the example above.
{
"testId": 1,
"foo": "bar",
"foos": [{ "id":"aaaaqqqq-rrrrr" }]
}
I'm trying to rename this nested id to _id with RegExp
{
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "mongodb://mongo:27017",
"database": "test_db",
"collection": "test",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list":"testId",
"value.projection.type": "whitelist",
"post.processor.chain": "com.mongodb.kafka.connect.sink.processor.DocumentIdAdder, com.mongodb.kafka.connect.sink.processor.field.renaming.RenameByRegex",
"field.renamer.regexp": "[{\"regexp\":\"\b(id)\b\", \"pattern\":\"\b(id)\b\",\"replace\":\"_id\"}]"
}
And the result of a config/validate is 500 Internal Server Error, with that message:
{
"error_code": 500,
"message": null
}
I missing something or is a issue?
I think all you want is Kafka Connect Single Message Transform (SMT) and more precisely ReplaceField:
Filter or rename fields within a Struct or Map.
The following will replace id field name with _id:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "id:_id"
In your case, before applying the above trasnformation you might also want to Flatten foos:
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "."
and finally apply the transformation for renaming the field:
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "foos.id:foos._id"

Kafka Connect JDBC failed on JsonConverter

I am working on a design MySQL -> Debezium -> Kafka -> Flink -> Kafka -> Kafka Connect JDBC -> MySQL. Following is sample message i write from Flink (I also tried using Kafka console producer as well)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "name"
}
],
"optional": true,
"name": "user"
},
"payload": {
"id": 1,
"name": "Smith"
}
}
but connect failed on JsonConverter
DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:338)
I have debugged and in method public SchemaAndValue toConnectData(String topic, byte[] value) value is null. My sink configurations are:
{
"name": "user-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
Can someone please help me on this issue?
I think an issue is not related with the value serialization (of Kafka message). It is rather problem with the key of the message.
What is your key.converter? I think it is the same like value.converter (org.apache.kafka.connect.json.JsonConverter). Your key might be simple String, that doesn't contain schema, payload
Try to change key.converter to org.apache.kafka.connect.storage.StringConverter
For Kafka Connect you set default Converters, but you can also set specific one for your particular Connector configuration (that will overwrite default one). For that you have to modify your config request:
{
"name": "user-sink",
"config": {
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "user",
"connection.url": "jdbc:mysql://localhost:3306/my_db?verifyServerCertificate=false",
"connection.user": "root",
"connection.password": "root",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}