I am getting the following error when I run kafka JDBC connector to PSQL:
JsonConverter with schemas.enable requires "schema" and "payload"
fields and may not contain additional fields. If you are trying to
deserialize plain JSON data, set schemas.enable=false in your
converter configuration.
However my topic contains the following message structure with a schema added just like its presented online:
rowtime: 2022/02/04 12:45:48.520 Z, key: , value: "{"schema":
{"type": "struct", "fields": [{"type": "int", "field":
"ID", "optional": false}, {"type": "date", "field":
"Date", "optional": false}, {"type": "varchar",
"field": "ICD", "optional": false}, {"type": "int",
"field": "CPT", "optional": false}, {"type": "double",
"field": "Cost", "optional": false}], "optional": false,
"name": "test"}, "payload": {"ID": "24427934",
"Date": "2019-05-22", "ICD": "883.436", "CPT":
"60502", "cost": "1374.36"}}",
partition: 0
My configuration for the connector is:
curl -X PUT http://localhost:8083/connectors/claim_test/config \
-H "Content-Type: application/json" \
-d '{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url":"jdbc:postgresql://localhost:5432/ae2772",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"topics":"test_7",
"auto.create":"true",
"insert.mode":"insert"
}'
After some changes, I now get the following message:
WorkerSinkTask{id=claim_test} Error converting message value in topic 'test_9' partition 0 at offset 0 and timestamp 1644005137197: Unknown schema type: int
int is not a valid schema type. Should be int8, int16, int32, or int64.
Similarly, date, varchar and double are not valid either.
The types used in the JSON are not the same as Postgres or any SQL-specific types (a date should be converted to a Unix Epoch int64 time or be made a string).
You can find the supported schema types here: https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/data/Schema.java
Related
i am producing simple plaintext json like data to kafka with simple kafka-console-producer command and i want to sink this data to database table. i have tried many ways to do this. but always i get deserializer error or unknown magic bytes error.
there is no serialization and schema validation on that. but the data is always same type.
we cant change the producer configs to add serializer also.
schema :
{
"type": "record",
"name": "people",
"namespace": "com.cena",
"doc": "This is a sample Avro schema to get you started. Please edit",
"fields": [
{
"name": "first_name",
"type": "string",
"default":null
},
{
"name": "last_name",
"type": "string",
"default":null
},
{
"name": "town",
"type": "string",
"default":null
},
{
"name": "country_code",
"type": "string",
"default":null
},
{
"name": "mobile_number",
"type": "string",
"default":null
}
]
}
Connector :
{
"name": "JdbcSinkConnecto",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"table.name.format": "people",
"topics": "people",
"tasks.max": "1",
"transforms": "RenameField",
"transforms.RenameField.renames": "\"town:city,mobile_number:msisdn\"",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"connection.url": "jdbc:postgresql://localhost:5432/postgres",
"connection.password": "postgres",
"connection.user": "postgres",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://http://localhost:8081"
}
data sample :
{"first_name": "some_name","last_name": "Family","town": "some_city","country_code": "+01","mobile_number": "some_number"}
Is there a way to use kafka connect for this ?
with simple kafka-console-producer
That doesn't use Avro, so I'm not sure why you added an Avro schema to the question.
You also don't show value.converter value, so it's unclear if that is truly JSON or Avro...
You are required to add a schema to the data for JDBC sink. If you use plain JSON and kafka-console-producer, then you need data that looks like {"schema": ... , "payload": { your data here } }, then you need value.converter.schemas.enabled=true for class of JsonConverter
ref. Converters and Serializers Deep Dive
If you want to use Avro, then use kafka-avro-console-producer ... This still accepts JSON inputs, but serializes to Avro (and will fix your magic byte error)
Another option would be to use ksqlDB to first parse the JSON into a defined STREAM with typed and named fields, then you can run the Connector from it in embedded mode
By the way, StringConverter does not use schema registry, so remove schema.registry.url property for it... And if you want to use a registry, don't put http:// twice
I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}
Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy
I'm trying to create a stream in ksqldb to a topic in Kafka using an avro schema.
The command looks like this:
CREATE STREAM customer_stream WITH (KAFKA_TOPIC='customers', VALUE_FORMAT='JSON', VALUE_SCHEMA_ID=1);
Topic customers looks like this:
Using the command - print 'customers';
Key format: ¯_(ツ)_/¯ - no data processed
Value format: JSON or KAFKA_STRING
rowtime: 2022/09/29 12:34:53.440 Z, key: , value: {"Name":"John Smith","PhoneNumbers":["212 555-1111","212 555-2222"],"Remote":false,"Height":"62.4","FicoScore":" > 640"}, partition: 0
rowtime: 2022/09/29 12:34:53.440 Z, key: , value: {"Name":"Jane Smith","PhoneNumbers":["269 xxx-1111","269 xxx-2222"],"Remote":false,"Height":"69.9","FicoScore":" > 690"}, partition: 0
To this topic an avro schema has been added.
{
"type": "record",
"name": "Customer",
"namespace": "com.acme.avro",
"fields": [{
"name": "ficoScore",
"type": ["null", "string"],
"default": null
}, {
"name": "height",
"type": ["null", "double"],
"default": null
}, {
"name": "name",
"type": ["null", "string"],
"default": null
}, {
"name": "phoneNumbers",
"type": ["null", {
"type": "array",
"items": ["null", "string"]
}
],
"default": null
}, {
"name": "remote",
"type": ["null", "boolean"],
"default": null
}
]
}
When I run the command below I got this reply:
CREATE STREAM customer_stream WITH (KAFKA_TOPIC='customers', VALUE_FORMAT='JSON', VALUE_SCHEMA_ID=1);
VALUE_FORMAT should support schema inference when VALUE_SCHEMA_ID is provided. Current format is JSON.
Any suggestion?
JSON doesn't use schema IDs. JSON_SR format does, but if you want Avro, then you need to use AVRO as the format.
You dont "add schemas" to topics. You can only register them in the registry.
Example of converting JSON to Avro with kSQL:
CREATE STREAM sensor_events_json (sensor_id VARCHAR, temperature INTEGER, ...)
WITH (KAFKA_TOPIC='events-topic', VALUE_FORMAT='JSON');
CREATE STREAM sensor_events_avro WITH (VALUE_FORMAT='AVRO') AS SELECT * FROM sensor_events_json;
Notice that you dont need to refer to any ID as the serializer will auto-register the necessary schema.
I'm producing a message by using the kafka-avro-console-producer binary by doing:
kafka-avro-console-producer --broker-list broker:9092 --topic example-topic --property schema.registry.url='http://schema-registry:8081 --property value.schema='{"type": "record","name": "test","fields": [{"name": "before", "type": ["null", {"type": "record", "name": "columns", "fields":[{"name": "name", "type": "string"}]}],"default": "null"},{"name": "after", "type": ["null", "columns"],"default": "null"}]}'
{"before": null,"after": {"name": "John"}}'
sending the following message:
{"before": null,"after": {"name": "John"}}
and by appling the following Avro schema:
{
"type": "record",
"name": "test",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "columns",
"fields": [{
"name": "name",
"type": "string"
}]
}],
"default": "null"
}, {
"name": "after",
"type": ["null", "columns"],
"default": "null"
}]
}
the error I'm getting in return is the following:
Caused by: org.apache.avro.AvroTypeException: Unknown union branch name
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at io.confluent.kafka.formatter.AvroMessageReader.jsonToAvro(AvroMessageReader.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:180)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
For those of you that are willing to go deeper into the rabbit hole, I'm making an integration between Oracle Golden Gate and Apache Kafka by using the Oracle Golden Gate Big Data connector. I'm currently experiencing problems with an equivalent model of the one described in here:
https://www.ateam-oracle.com/oracle-goldengate-big-data-adapter-apache-kafka-producer
When trying to apply the schema described in the above webpage to it's corresponding model(and after completing the JSON model), I'm getting the same error as the one I'm getting with the model and schema in the question.
Thank you all very much.
This is the problem
"type": ["null", "columns"]
You cannot refer back to other record types. You'll need to expand that out like you did for the other field
I have an avro schema defined as follows:
[
{
"namespace": "com.fun.message",
"type": "record",
"name": "FileData",
"doc": "Avro Schema for FileData",
"fields": [
{"name": "id", "type": "string", "doc": "Unique file id" },
{"name": "absolutePath", "type": "string", "doc": "Absolute path of file" },
{"name": "fileName", "type": "string", "doc": "File name" },
{"name": "source", "type": "string", "doc": "unique identification of source" },
{"name": "metaData", "type": {"type": "map", "values": "string"}}
]
}
]
I want to push this data to postgres using jdbc-sink-connector so that I can transform the "metaData" field (which is map type) in my schema to a string. How do I do this?
You need to use SMTs and AFAIK there is currently no SMT that perfectly meets your requirement (ExtractField is a Map.get operation and therefore nested fields cannot be extracted in one pass). You could take a look at Debezium's io.debezium.transforms.UnwrapFromEnvelope SMT which you can modify in order to extract nested fields.
UnwrapFromEnvelope is being used for CDC Event Flattening in order to extract fields from more complex structures like the data formed by Debezium (which I believe is similar to your structure).