Kafka Connect issue when reading from a RabbitMQ queue - apache-kafka

I'm trying to read data into my topic from a RabbitMQ queue using the Kafka connector with the configuration below:
{
"name" : "RabbitMQSourceConnector1",
"config" : {
"connector.class" : "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max" : "1",
"kafka.topic" : "rabbitmqtest3",
"rabbitmq.queue" : "taskqueue",
"rabbitmq.host" : "localhost",
"rabbitmq.username" : "guest",
"rabbitmq.password" : "guest",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true"
}
}
But I´m having troubles when converting the source stream to JSON format as I´m losing the original message
Original:
{'id': 0, 'body': '010101010101010101010101010101010101010101010101010101010101010101010'}
Received:
{"schema":{"type":"bytes","optional":false},"payload":"eyJpZCI6IDEsICJib2R5IjogIjAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMCJ9"}
Does anyone have an idea why this is happening?
EDIT: I tried to convert the message to String using the "value.converter": "org.apache.kafka.connect.storage.StringConverter", but the result is the same:
11/27/19 4:07:37 PM CET , 0 , [B#1583a488
EDIT2:
I´m now receiving the JSON file but the content is still encoded in BASE64
Any idea on how to convert it back to UTF8 directly?
{
"name": "adls-gen2-sink",
"config": {
"connector.class":"io.confluent.connect.azure.datalake.gen2.AzureDataLakeGen2SinkConnector",
"tasks.max":"1",
"topics":"rabbitmqtest3",
"flush.size":"3",
"format.class":"io.confluent.connect.azure.storage.format.json.JsonFormat",
"value.converter":"org.apache.kafka.connect.converters.ByteArrayConverter",
"internal.value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"topics.dir":"sw66jsoningest",
"confluent.topic.bootstrap.servers":"localhost:9092",
"confluent.topic.replication.factor":"1",
"partitioner.class" : "io.confluent.connect.storage.partitioner.DefaultPartitioner"
}
}
UPDATE:
I got the solution, considering this flow:
Message (JSON) --> RabbitMq (ByteArray) --> Kafka (ByteArray) -->ADLS (JSON)
I used this converter on the RabbitMQ to Kafka connector to decode the message from Base64 to UTF8.
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter"
Afterwards I treated the message as a String and saved it as a JSON.
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"format.class":"io.confluent.connect.azure.storage.format.json.JsonFormat",
Many thanks!

If you set schemas.enable": "false", you shouldn't be getting the schema and payload fields
If you want no translation to happen at all, use ByteArrayConverter
If your data is just a plain string (which includes JSON), use StringConverter
It's not clear how you're printing the resulting message, but looks like you're printing the byte array and not decoding it to a String

Related

Kafka connect RabbitMQ unable to use insert field transform: Only Struct objects supported for [field insertion], found: [B

I'm trying to use the InsertField kafka connect transformation with rabbitmq connector.
my configuration:
"config": {
"connector.class": "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"confluent.topic.bootstrap.servers": "kafka:29092",
"topic.creation.default.replication.factor": 1,
"topic.creation.default.partitions": 1,
"tasks.max": "2",
"kafka.topic": "test",
"rabbitmq.queue": "events",
"rabbitmq.host": "rabbitmq",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "InsertField",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "MessageSource",
"transforms.InsertField.static.value": "Kafka Connect framework"
}
I have also tried using BytesArrayConverter as the value. Using python, I send a message as follows:
msg = json.dumps(body)
self.channel.basic_publish(exchange="", routing_key="events", body=msg)
where using encode() to transform it into a byte array does not work as well.
The exception I'm receiving is:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [field insertion], found: [B
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.InsertField.applyWithSchema(InsertField.java:162)
at org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:133)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
I understand the error and thought that using JsonConverter will solve it, but I was wrong. I've also used "value.converter.schemas.enable" : "false" to no avail.
Would appreciate any help. I don't mind sending the data in json form or bytes form, I just want a key:value pair to be added to the event.
Thanks
As the error indicates, you can only insert fields into structs. To get a Struct from RabbitMQ String/Bytes schemas, you must chain a HoistField transform before InsertField one.
To get any Struct from JSONConverter, your JSON needs two top level fields named schema and payload, then connector needs
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/
Alternatively, use Kafka headers for "source" information, rather than trying to inject into the value

MirrorSourceConnector: override consumer key.serializer property

I am trying to run MirrorSourceConnector from a Topic in cluster A to cluster B.
After creating the connector and consuming first message I noticed that mirrored topic key and value is always serialized as a ByteArray. Which in case of a key is a bit of a problem when doing the transformations with a custom class.
After checking MirrorSourceConfig class in github I found out that with source.admin. and target.admin I could basically add consumer and producer properties. But seems it does not make any different (in logs I could still see that ByteArray serializer is being used).
My connector config looks like that:
{"target.cluster.status.storage.replication.factor": "-1",
"connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
"auto.create.mirror.topics.enable": true,
"offset-syncs.topic.replication.factor": "1",
"replication.factor": "1",
"sync.topic.acls.enabled": "false",
"topics": "test-topic",
"target.cluster.config.storage.replication.factor": "-1",
"source.cluster.alias": "source-cluster-dev",
"source.cluster.bootstrap.servers": "source-cluster-dev:9092",
"target.cluster.offset.storage.replication.factor": "-1",
"target.cluster.alias": "target-cluster-dev",
"target.cluster.security.protocol": "PLAINTEXT",
"header.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"name": "test-mirror-connector",
"source.admin.key.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
"source.admin.value.deserializer":"org.apache.kafka.common.serialization.ByteArrayDeserializer",
"target.admin.key.serializer": "org.apache.kafka.common.serialization.StringDeserializer",
"target.admin.value.serializer":"org.apache.kafka.common.serialization.ByteArrayDeserializer",
"target.cluster.bootstrap.servers": "target-cluster-dev:9092"}
Is there a way to override Consumer and Producer Ser/De-serialization properties or any other way to make mirror topic to be exactly the same as a source topic? In the meaning of seralization.

Error With RowKey Definition on Confluent BigTable Sink Connector

I'm trying to use the BigTable Sink Connector from Confluent to read data from kafka and write it into my BigTable Instance, but I'm receiving the following message error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:614)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Error with RowKey definition: Row key definition was defined, but received, deserialized kafka key is not a struct. Unable to construct a row key.
at io.confluent.connect.bigtable.client.RowKeyExtractor.getRowKey(RowKeyExtractor.java:69)
at io.confluent.connect.bigtable.client.BufferedWriter.addWriteToBatch(BufferedWriter.java:84)
at io.confluent.connect.bigtable.client.InsertWriter.write(InsertWriter.java:47)
at io.confluent.connect.bigtable.BaseBigtableSinkTask.put(BaseBigtableSinkTask.java:99)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
... 10 more
The message producer, due to some technical limitations, will not be able to produce the messages with the key property and, because of that, I'm using some Transforms to get information from payload and setting it as the key message.
Here's my connector payload:
{
"name" : "DATALAKE.BIGTABLE.SINK.QUEUEING.ZTXXD",
"config" : {
"connector.class" : "io.confluent.connect.gcp.bigtable.BigtableSinkConnector",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"value.converter" : "org.apache.kafka.connect.json.JsonConverter",
"topics" : "APP-DATALAKE-QUEUEING-ZTXXD_DATALAKE-V1",
"transforms" : "HoistField,AddKeys,ExtractKey",
"gcp.bigtable.project.id" : "bigtable-project-id",
"gcp.bigtable.instance.id" : "bigtable-instance-id",
"gcp.bigtable.credentials.json" : "XXXXX",
"transforms.ExtractKey.type" : "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.HoistField.field" : "raw_data_cf",
"transforms.ExtractKey.field" : "KEY1,ATT1",
"transforms.HoistField.type" : "org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.AddKeys.type" : "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.AddKeys.fields" : "KEY1,ATT1",
"row.key.definition" : "KEY1,ATT1",
"table.name.format" : "raw_ZTXXD_DATALAKE",
"consumer.override.group.id" : "svc-datalake-KAFKA_2_BIGTABLE",
"confluent.topic.bootstrap.servers" : "xxxxxx:9092",
"input.data.format" : "JSON",
"confluent.topic" : "_dsp-confluent-license",
"input.key.format" : "STRING",
"key.converter.schemas.enable" : "false",
"confluent.topic.security.protocol" : "SASL_SSL",
"row.key.delimiter" : "/",
"confluent.topic.sasl.jaas.config" : "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"XXXXX\" password=\"XXXXXX\";",
"value.converter.schemas.enable" : "false",
"auto.create.tables" : "true",
"auto.create.column.families" : "true",
"confluent.topic.sasl.mechanism" : "PLAIN"
}
}
And here's my message produced to Kafka:
{
"MANDT": "110",
"KEY1": "1",
"KEY2": null,
"ATT1": "1M",
"ATT2": "0000000000",
"TABLE_NAME": "ZTXXD_DATALAKE",
"IUUC_OPERATION": "I",
"CREATETIMESTAMP": "2022-01-24T20:26:45.247Z"
}
In my transforms I'm doing three operations:
HoistField is putting my payload inside a two-level structure (the connect docs for BigTable says that connect expects a two-level structure in order to be able to infer the family columns
addKey is adding the columns that I consider key to the message key
ExtractKey is removing the key from the fields added in the header, leaving only the values ​​themselves.
I've been reading the documentation for this connector for Bigtable and it's not clear to me if the connector works well with the JSON format. Could you let me know?
JSON should work, but...
deserialized kafka key is not a struct
This is because you have set the schemas.enable=false property on the value converter, such that when you do ValueToKey, it's not a Connect Struct type; the HoistField makes a Java Map instead.
If you're not able to use the Schema Registry and switch the serialization format, then you'll need to try and find a way to get the REST Proxy to infer the schema of the JSON message before it produces the data (I don't think it can). Otherwise, your records need to include schema and payload fields, and you need to enable schemas on the converters. Explained here
Another option - There may be a transform project around that sets the schema of the record, but it's not builtin.. (it's not part of SetSchemaMetadata)

Kafka Connect Transformations - RegexRouter replacement topic names in lowercase

We are trying to setup a connector (Debezium) in Kafka Connect and transform all the topic names generated by this connector via regular expressions. The regex below is working and detects the patterns we want, but we also need to create all the topic names in lowercase.
We have tried to put this in the replacement expression as \L$1 but it is just printing and L in front of our topic names, for example LOutbound.Policy instead of outbound.policy
Does anybody know how to do this? Thanks in advance!
This is the connector curl command
curl -i -X PUT http://kafka-alpha-cp-kafka-connect:8083/connectors/kafka-bi-datacontract/config -H "Content-Type: application/json" -d '{
"name": "kafka-bi-datacontract",
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname" : "ukdb3232123",
"database.server.name" : "ukdb3232123\\perf",
"database.port" : "12442",
"database.user" : "KafkaConnect-BI",
"database.password" : "*******",
"database.dbname" : "BeazleyIntelligenceDataContract",
"snapshot.lock.timeout.ms" : "10000000",
"table.whitelist" : "Outbound.Policy,Outbound.Section",
"database.history.kafka.bootstrap.servers" : "kafka-alpha-cp-kafka-headless:9092",
"database.history.kafka.topic": "schema-changes.bidatacontract",
"transforms": "dropTopicPrefix",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"^[^.]+.(.*)",
"transforms.dropTopicPrefix.replacement":"\\L$1"
}'
\L$1 or \\L$1 would be the same as L$1.
You would need to create/find your own transform for lowercasing.
Once you do, you can do something like this
"transforms": "dropTopicPrefix,lowertopic",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"^[^.]+.(.*)",
"transforms.dropTopicPrefix.replacement":"$1",
"transforms.lowerTopic.type":"com.example.foo.LowerCase$Topic",

Kafka Connect: Topic shows 3x the number of events than expected

We are using Kafka Connect JDBC to sync tables between to databases (Debezium would be perfect for this but is out of the question).
The Sync in general works fine but it seems there are 3x the number of events / messages stored in the topic than expected.
What could be the reason for this?
Some additional information
The target database contains the exact number of messages (count of messages in the topics / 3).
Most of the topics are split into 3 partitions (Key is set via SMT, DefaultPartitioner is used).
JDBC Source Connector
{
"name": "oracle_source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:oracle:thin:#dbdis01.allesklar.de:1521:stg_cdb",
"connection.user": "****",
"connection.password": "****",
"schema.pattern": "BBUCH",
"topic.prefix": "oracle_",
"table.whitelist": "cdc_companies, cdc_partners, cdc_categories, cdc_additional_details, cdc_claiming_history, cdc_company_categories, cdc_company_custom_fields, cdc_premium_custom_field_types, cdc_premium_custom_fields, cdc_premiums, cdc, cdc_premium_redirects, intermediate_oz_data, intermediate_oz_mapping",
"table.types": "VIEW",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "ts",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"validate.non.null": false,
"numeric.mapping": "best_fit",
"db.timezone": "Europe/Berlin",
"transforms":"createKey, extractId, dropTimestamp, deleteTransform",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id",
"transforms.dropTimestamp.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.dropTimestamp.blacklist": "ts",
"transforms.deleteTransform.type": "de.meinestadt.kafka.DeleteTransformation"
}
}
JDBC Sink Connector
{
"name": "postgres_sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:postgresql://writer.branchenbuch.psql.integration.meinestadt.de:5432/branchenbuch",
"connection.user": "****",
"connection.password": "****",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.schemas.enable": true,
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id",
"delete.enabled": true,
"auto.create": true,
"auto.evolve": true,
"topics.regex": "oracle_cdc_.*",
"transforms": "dropPrefix",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "oracle_cdc_(.*)",
"transforms.dropPrefix.replacement": "$1"
}
}
Strange Topic Count
This isn't an answer per-se but it's easier to format here than in the comments box.
It's not clear why you'd be getting duplicates. Some possibilities would be:
You have more than one instance of the connector running
You have on instance of the connector running but have previously run other instances which loaded the same data to the topic
Data's coming from multiple tables and being merged into one topic (not possible here based on your config, but if you were using Single Message Transform to modify target-topic name could be a possibility)
In terms of investigation I would suggest:
Isolate the problem by splitting the connector into one connector per table.
Examine each topic and locate examples of the duplicate messages. See if there is a pattern to which topics have duplicates. KSQL will be useful here:
SELECT ROWKEY, COUNT(*) FROM source GROUP BY ROWKEY HAVING COUNT(*) > 1
I'm guessing at ROWKEY (the key of the Kafka message) - you'll know your data and which columns should be unique and can be used to detect duplicates.
Once you've found a duplicate message, use kafkacat to examine the duplicate instances. Do they have the exact same Kafka message timestamp?
For more back and forth, StackOverflow isn't such an appropriate platform - I'd recommend heading to http://cnfl.io/slack and the #connect channel.