Produce Avro messages in Confluent Control Center UI - apache-kafka

To develop a data transfer application I need first define a key/value avro schemas. The producer application is not developed yet till define the avro schema.
I cloned a topic and its key/value avro schemas that are already working and
and also cloned the the jdbc snink connector. Simply I just changed the topic and connector names.
Then I copied and existing message successfully sent sink using Confluent Topic Message UI Producer.
But it is sending the error: "Unknown magic byte!"
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.<init>(AbstractKafkaAvroDeserializer.java:323)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:164)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:172)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:107)
... 17 more
[2022-07-25 03:45:42,385] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask)
Reading other questions it seems the message has to be serialized using the schema.
Unknown magic byte with kafka-avro-console-consumer
is it possible to send a message to a topic with AVRO key/value schemas using the Confluent Topic UI?
Any idea if the avro schemas need information depending on the connector/source? or if namespace depends on the topic name?
This is my key schema. And the topic's name is knov_03
{
"connect.name": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming.Key",
"fields": [
{
"name": "id_sap_incoming",
"type": "long"
}
],
"name": "Key",
"namespace": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming",
"type": "record"
}
Connector:
{
"name": "knov_05",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "knov_03",
"connection.url": "jdbc:mysql://eXXXXX:3306/MY_DB_SCHEMA?useSSL=FALSE&nullCatalogMeansCurrent=true",
"connection.user": "USER",
"connection.password": "PASSWORD",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
"pk.fields": "id_sap_incoming",
"auto.create": "true",
"auto.evolve": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schema.registry.url": "http://schema-registry:8081"
}
}
Thanks.

Related

Only Map objects supported in absence of schema for record conversion to BigQuery format

I'm streaming data from Postgres to Kakfa to Big Query. Most tables in PG have a primary key, as such most tables/topics have an Avro key and value schema, these all go to Big Query fine.
I do have a couple of tables that do not have a PK, and subsequently have no Avro key schema.
When I create a sink connector for those tables the connector errors with,
Caused by: com.wepay.kafka.connect.bigquery.exception.ConversionConnectException: Only Map objects supported in absence of schema for record conversion to BigQuery format.
If I remove the 'key.converter' config then I get 'Top-level Kafka Connect schema must be of type 'struct'' error.
How do I handle this?
Here's the connector config for reference,
{
"project": "staging",
"defaultDataset": "data_lake",
"keyfile": "<redacted>",
"keySource": "JSON",
"sanitizeTopics": "true",
"kafkaKeyFieldName": "_kid",
"autoCreateTables": "true",
"allowNewBigQueryFields": "true",
"upsertEnabled": "false",
"bigQueryRetry": "5",
"bigQueryRetryWait": "120000",
"bigQueryPartitionDecorator": "false",
"name": "hd-sink-bq",
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "<redacted>",
"key.converter.basic.auth.credentials.source": "USER_INFO",
"key.converter.schema.registry.basic.auth.user.info": "<redacted>",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "<redacted>",
"value.converter.basic.auth.credentials.source": "USER_INFO",
"value.converter.schema.registry.basic.auth.user.info": "<redacted>",
"topics": "public.event_issues",
"errors.tolerance": "all",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "connect.bq-sink.deadletter",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true",
"transforms": "tombstoneHandler",
"offset.flush.timeout.ms": "300000",
"transforms.dropNullRecords.predicate": "isNullRecord",
"transforms.dropNullRecords.type": "org.apache.kafka.connect.transforms.Filter",
"transforms.tombstoneHandler.behavior": "drop_warn",
"transforms.tombstoneHandler.type": "io.aiven.kafka.connect.transforms.TombstoneHandler"
}
For my case, I used to handle such case by using the predicate, as following
{
...
"predicates.isTombstone.type":
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone",
"predicates": "isTombstone",
"transforms.x.predicate":"isTombstone",
"transforms.x.negate":true
...
}
This as per the docs here, and the transforms.x.negate will skip such tompStone records.

how to connect kafka topics to postgres database using kafka connect jdbc sink

got errors while trying to connect kafka topics to postgres using jdbcsink connector
these are the error logs(see image) that i got when tried with the configuration
{
"name": "temperature_jdbcsink",
"config" : {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"task.max": "1",
"topics": "temperature",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "Flatten, RenameFields",
"transfores.Flatten.type":"org.apache.kafka.connect.transforms.Flatten$value",
"transforms.Flatten.deliniter":"_",
"transforms.RenameFields.type": "org.apache.kafka.connect.transforms.ReplaceField$value",
"transforms.RenameFields.renames": "value:value,timestamp:timestamp",
"connection.url": "jdbc:postgresql://localhost:5432/jdbcsink",
"connection.user": "postgres",
"connection.password": "postgres",
"insert.mode": "upsert",
"batch.size":"2",
"table.name.format": "temperature",
"pk.mode":"none",
"db.timezone": "Asia/Kolkata"
}
}
https://i.stack.imgur.com/fXqO3.png
https://i.stack.imgur.com/fXqO3.png
https://i.stack.imgur.com/V5Btk.png
Error says Unknown magic byte. This seems there is data in the topic that wasn't produced using Confluent Avro serializer
For example, are your keys really Avro? This isn't common for storing data in JDBC sink since database primary keys are typically plain string or integer types. Therefore, use the respective converter for those, not Avro

Getting junk/control characters in Kafka topic from confluent MongoDb source connector

I am trying to get data in kafka topic from confluent MongoDB source connector. Below is config of connector:
**{
"value.converter.schema.registry.url": <-SR url->,
"key.converter.schema.registry.url": <-SR url->,
"name": <-connector-name->,
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"tasks.max": "2",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"connection.uri": <-mongo-source-url->,
"database": <-source-db->,
"collection": <-source-tablename->,
"publish.full.document.only": "true",
"output.format.key": "json",
"output.format.value": "json",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"copy.existing": "false"
}**
I am getting data in kafka topic but key & value are like below:
**"key":"\u0000\u0000\u0000\u0005Đ\u0002{\"_id\": {\"_data\": \"ABCD123\"}}",
"value":"\u0000\u0000\u0000\u0005��\t{\"_id\": \"ABCD123\", \"name\": abc, \"id\": 174}"**
Can anyone has gone through similar issue?
My topic schema is in avro format that's why I need to use AvroConverter only. Also I have to use 'output.format' for key and value as 'json' only as source schema is not constant.
Would really appreciate any help here.

How to get logical types from schema registry to avro files using Kafka GSC Connector

I'm loading avro files into GCS using Kafka GCS connector. In my schema in the schema registry I have logical types on some of my columns, but it seems like they're not being transferred to the files. How can logical types from a schema be transferred to avro files?
Here is my connector configuration for what it's worth:
{
"connector.class": "io.confluent.connect.gcs.GcsSinkConnector",
"confluent.topic.bootstrap.servers": "kafka.internal:9092",
"flush.size": "200000",
"tasks.max": "300",
"topics": "prod_ny, prod_vr",
"group.id": "gcs_sink_connect",
"value.converter.value.subject.name.strategy": "io.confluent.kafka.serializers.subject.RecordNameStrategy",
"gcs.credentials.json": "---",
"confluent.license: "---",
"value.converter.schema.registry.url": "http://p-og.prod:8081",
"gcs.bucket.name": "kafka_load",
"format.class": "io.confluent.connect.gcs.format.avro.AvroFormat",
"gcs.part.size": "5242880",
"confluent.topic.replication.factor": "1",
"name": "gcs_sink_prod",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"storage.class": "io.confluent.connect.gcs.storage.GcsStorage",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"auto.offset.reset": "latest"
}

Error handling for invalid JSON in kafka sink connector

I have a sink connector for mongodb, that takes json from a topic and puts it into the mongoDB collection. But, when I send an invalid JSON from a producer to that topic (e.g. with an invalid special character ") => {"id":1,"name":"\"}, the connector stops. I tried using errors.tolerance = all, but the same thing is happening. What should happen is that the connector should skip and log that invalid JSON, and keep the connector running. My distributed-mode connector is as follows:
{
"name": "sink-mongonew_test1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "error7",
"connection.uri": "mongodb://****:27017",
"database": "abcd",
"collection": "abc",
"type.name": "kafka-connect",
"key.ignore": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list": "id",
"value.projection.type": "whitelist",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"delete.on.null.values": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "crm_data_deadletterqueue",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}
Since Apache Kafka 2.0, Kafka Connect has included error handling options, including the functionality to route messages to a dead letter queue, a common technique in building data pipelines.
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
As commented, you're using connect-api-1.0.1.*.jar, version 1.0.1, so that explains why those properties are not working
Your alternatives outside of running a newer version of Kafka Connect include Nifi or Spark Structured Streaming