ExtractField$Key doesn't work with MySqlCdcSource in Confluent Cloud - apache-kafka

I'm creating the following connector:
CREATE
SOURCE CONNECTOR IF NOT EXISTS `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'MySqlCdcSource',
"tasks.max" = 1,
--Database config--------------------------
"database.hostname" = '${dbHost}',
"database.port"....
--Kafka config------------------------------
"kafka.api.key" = '${kafkaApiKey}',
"kafka.auth.mode"...
--Connector behavior------------------------
"output.data.format" = 'AVRO',
"output.key.format" = 'STRING',
"key.converter" = 'org.apache.kafka.connect.storage.StringConverter',
"key.converter.schemas.enable" = true,
"tombstones.on.delete" = true,
"null.handling.mode" = 'keep',
"include.schema.changes" = false,
"table.include.list" = 'sandbox\.(xxx|yyy)',
"errors.tolerance" = 'none',
"errors.log.enable" = true,
"errors.log.include.messages" = true,
--Topics configuration----------------------
"topic.creation.default.cleanup.policy" = 'compact',
"topic.creation.default.min.insync.replicas"...
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
);
In local development I'm working with io.debezium.connector.mysql.MySqlConnector and the transformation works correctly, the problem is when I make de creation in Confluent Cloud (using MySqlCdcSource). It gives me the following error:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
The problem is that it doesn't find the field Id, but when I make a test without the transformation, I see this key in the topic: Struct(id=0000), so the field exists. I suppose it should be related with types or some configuration option is missed in my connector. Any ideas?

The problem was related with a wrong usage of a predicate. Take a look at the following part:
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
As you can see negate is being used, so extractKey transforms includes all the tables that doesn't match with it. Despite that I was using "table.include.list", some topics where included additionally and they didn't have id field.
In my case the confusion was motivated because of the message in log:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
(No information about the involved topic) I was supposing that the message was talking about any of the both allowed tables ('zzz' or 'ooo') but in the end it was another third topic included.

Related

Tombstone disappears when ExtractField$Key transform is added to connector config

I have the following connector declared with ksqldb:
CREATE
SOURCE CONNECTOR `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'io.debezium.connector.mysql.MySqlConnector',
"tasks.max" = 1,
"database.hostname" = 'myconn-db',
"database.port" = '${dbPort}',
"database.user" = '${dbUsername}',
"database.password" = '${dbPassword}',
"database.history.kafka.topic" = 'myconn_db_history',
"database.history.kafka.bootstrap.servers" = '${bootstrapServer}',
"database.server.name" = 'myconn_db',
"database.allowPublicKeyRetrieval" = '${allowPublicKeyRetrieval}',
"table.include.list" = 'myconn.links,myconn.imports',
"message.key.columns" = 'myconn.links:id',
"tombstones.on.delete" = true,
"null.handling.mode" = 'keep',
"transforms" = 'unwrap',
"transforms.unwrap.type" = 'io.debezium.transforms.ExtractNewRecordState',
"transforms.unwrap.drop.tombstones" = false,
"transforms.unwrap.delete.handling.mode" = 'none'
);
Tombstones are successfully sent, but the key in messages is Struct(id=00000). In order to change the key by 00000, I've used ExtractField$Key transform:
CREATE
SOURCE CONNECTOR `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'io.debezium.connector.mysql.MySqlConnector',
"tasks.max...
--- I omit all the rest for convenience ---
"transforms" = 'unwrap,extractKey',
---New lines added (next 3)
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"include.schema.changes" = false
);
Just adding the last three lines, now the keys are ok but tombstones disappear; no tombstone in the topic. Do you know the reason?
As you can see I have more than one table allowed in the white list (table.include.list). The second one has a different id field; not 'id' but 'import_id'. Seems like internally that field couldn't be properly extracted and tombstones (for all tables) was ignored.
I'm not sure about what's the reason of that behavior (no errors reported by doing describe connector myconn; something like 'key id not found' would have been useful) but I solved the issue just handling each topic with its proper key.
Here you have the new connector definition:
CREATE
SOURCE CONNECTOR `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'io.debezium.connector.mysql.MySqlConnector',
"tasks.max" = 1,
--Database config--------------------------
"database.hostname" = 'myconn-db',
"database.port" = '${dbPort}',
"database.user" = '${dbUsername}',
"database.password" = '${dbPassword}',
"database.history.kafka.topic" = 'myconn_db_history',
"database.history.kafka.bootstrap.servers" = '${bootstrapServer}',
"database.server.name" = 'myconn_db',
"database.allowPublicKeyRetrieval" = '${allowPublicKeyRetrieval}',
"table.include.list" = 'myconn.links,myconn.imports',
--Connector behavior------------------------
"tombstones.on.delete" = true,
"null.handling.mode" = 'keep',
"include.schema.changes" = false,
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField,IsImportTopic',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn_db.myconn\.(imports)',
"predicates.IsImportTopic.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.IsImportTopic.pattern" = 'myconn_db.myconn.imports',
--Transforms--------------------------------
"transforms" = 'unwrap,extractKey,extractImportKey',
"transforms.unwrap.type" = 'io.debezium.transforms.ExtractNewRecordState',
"transforms.unwrap.drop.tombstones" = false,
"transforms.unwrap.delete.handling.mode" = 'none',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true,
"transforms.extractImportKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractImportKey.field" = 'import_id',
"transforms.extractImportKey.predicate" = 'IsImportTopic'
);
Now I have the tombstones in the topics and rows are properly removed from tables.

Cannot use csvSftpConnector with schema registry

I'm trying to configure a standalone producer to read csv files from a sftp server and send data to a topic on the cloud.
So far I succeeded in reading my csv data from the file and parsing it according to my value.schema.
But now instead of using a fixed configuration schema, I'd like to use the schema registry. So I configured an AVRO schema for my test topic on the confluent cloud, generated the API key/secret and updated my config files.
I can see that connection is working fine, no authentication errors, via cli I can access the test schema, but when I try to run the producer I get the following error:
[2021-09-20 16:39:53,442] INFO SftpCsvSourceConnectorConfig values:
batch.size = 1000
behavior.on.error = IGNORE
cleanup.policy = NONE
csv.case.sensitive.field.names = false
csv.escape.char = 92
csv.file.charset = UTF-8
csv.first.row.as.header = false
csv.ignore.leading.whitespace = true
csv.ignore.quotations = false
csv.keep.carriage.return = false
csv.null.field.indicator = NEITHER
csv.quote.char = 34
csv.rfc.4180.parser.enabled = false
csv.separator.char = 44
csv.skip.lines = 0
csv.strict.quotes = false
csv.verify.reader = true
empty.poll.wait.ms = 250
error.path = /home/alberto/opt/confluent-6.2.0/sftp2/error
file.minimum.age.ms = 0
finished.path = /home/alberto/opt/confluent-6.2.0/sftp2/finished
input.file.pattern = .*.csv
input.path = /home/alberto/opt/confluent-6.2.0/sftp2/data
kafka.topic = testSchema
kerberos.keytab.path =
kerberos.user.principal =
key.schema = {"name" : "com.example.users.UserKey","type" : "STRUCT","isOptional" : true,"fieldSchemas" : {"material" : {"type" : "STRING","isOptional" : true}}}
parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
parser.timestamp.timezone = UTC
processing.file.extension = .PROCESSING
proxy.password = [hidden]
proxy.username =
schema.generation.enabled = false
schema.generation.key.fields = []
schema.generation.key.name = defaultkeyschemaname
schema.generation.value.name = defaultvalueschemaname
sftp.host = 192.168.1.6
sftp.password = [hidden]
sftp.port = 22
sftp.proxy.url =
sftp.username = user
timestamp.field =
timestamp.mode = PROCESS_TIME
tls.passphrase = [hidden]
tls.pemfile =
tls.private.key = [hidden]
tls.public.key = [hidden]
value.schema =
...
Caused by: org.apache.kafka.common.config.ConfigException: Both configs key.schema and value.schema must be set if schema.generation.enabled is false, but key.schema was not null and value.schema was null.
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.validateSchema(SftpSourceConnectorConfig.java:181)
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.<init>(SftpSourceConnectorConfig.java:121)
at io.confluent.connect.sftp.source.SftpCsvSourceConnectorConfig.<init>(SftpCsvSourceConnectorConfig.java:157)
at io.confluent.connect.sftp.SftpCsvSourceConnector.start(SftpCsvSourceConnector.java:44)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:184)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:209)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:348)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:331)
... 7 more
If I set schema.generation.enabled to true, it seems that it creates an empty schema:
value.schema = {"type":"STRUCT","isOptional":false,"fieldSchemas":{}}
and then I get:
org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic testSchema : Schema being registered is incompatible with an earlier schema for subject "testSchema-value"; error code: 409; error code: 409
as if it's trying to register a schema, except that it's not what I want, I just need to fetch the schema from the registry and use it.
If anyone need any addition information regarding the configuration I'll happy to provide.

How to rename/replace a field within a struct in Kafka-connect SMT?

The description for replaceField SMT says it can Filter or rename fields within a Struct or Map. However I can't find any working example for replacing or renaming fields within a struct.
I've got data in a topic being written into ElasticSearch using Kafka Connect Elasticsearch Sink. For simplicity, assume the format of the data looks like this.
{
'ID':22,
'ITEM': 'Shampoo'
'USER':{
'NAME': 'jon',
'AGE':25
}
}
So if I'm trying to rename/replace USER.NAME or USER.AGE, how would I configure that in the connector? (I've written everything in ksqldb). This is my current config where I rename ITEM to product and ID to id
CREATE SINK CONNECTOR ELASTIC_SINK WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://host.docker.internal:9200',
'type.name' = '_doc',
'topics' = 'ELASTIC_TOPIC',
'key.ignore' = 'false',
'schema.ignore' = 'true',
'transforms' = 'RenameField',
'transforms.RenameField.type' = 'org.apache.kafka.connect.transforms.ReplaceField$Value',
'transforms.RenameField.renames' = 'ITEM:product,ID:id',
);
Take a look at the existing SO question and answer: https://stackoverflow.com/a/56601093/4778022
You can provide the path to the field to rename, with parts separated by periods.
CREATE SINK CONNECTOR ELASTIC_SINK WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://host.docker.internal:9200',
'type.name' = '_doc',
'topics' = 'ELASTIC_TOPIC',
'key.ignore' = 'false',
'schema.ignore' = 'true',
'transforms' = 'RenameField',
'transforms.RenameField.type' = 'org.apache.kafka.connect.transforms.ReplaceField$Value',
'transforms.RenameField.renames' = 'USER.NAME:name,ITEM:product,ID:id',
);

Flink SQL CLI client CREATE TABLE from Kafka

I am trying to create a table in Apache Flink SQL client. I want to filter my JSON data in Flink, which arrives continously from a Kafka cluster.
The JSON looks like this:
{"lat":25.77,"lon":-80.19,"timezone":"America\/New_York",
"timezone_offset":-14400,
"current.dt":1592151550,
"current.sunrise":1592130546,
"current.sunset":1592179999,
"current.temp":302.77,
"current.feels_like":306.9,
"current.pressure":1017,
"current.humidity":78,
"current.dew_point":298.52,
"current.uvi":11.97,
"current.clouds":75,
"current.visibility":16093,
"current.wind_speed":3.6,
"current.wind_deg":60,
"current.weather.0.id":803,
"current.weather.0.main":"Clouds",
"current.weather.0.description":"broken clouds",
"current.weather.0.icon":"04d"}
The part I am interested in :
"current.weather.0.description":"broken clouds"
I want to filter my data whenever the current.weather description is "moderate rain". I tried to create two tables in Flink:
the Rain table, where the whole JSON arrives, and
where my filtered data will be stored and sent back to another Kafka cluster.
CREATE TABLE Rain (current.weather.0.description varchar) WITH ('connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherRawData',
'format.type' = 'json',
'connector.properties.0.key' = 'bootstrap.servers',
'connector.properties.0.value' = 'kafka:9092',
'connector.properties.1.key' = 'group.id',
'connector.properties.1.value' = 'flink-input-group',
'connector.startup-mode' = 'earliest-offset'
);
CREATE TABLE ProcessedRain(
current.weather.0.description varchar
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherProcessedData',
'format.type' = 'json',
'connector.properties.0.key' = 'bootstrap.servers',
'connector.properties.0.value' = 'kafka:9092',
'connector.properties.1.key' = 'group.id',
'connector.properties.1.value' = 'flink-output-group'
);
The error message I get :
[ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered "current" at line 1, column 20. Was expecting one of:
"PRIMARY" ...
"UNIQUE" ...
"WATERMARK" ...
<BRACKET_QUOTED_IDENTIFIER> ...
<QUOTED_IDENTIFIER> ...
<BACK_QUOTED_IDENTIFIER> ...
<IDENTIFIER> ...
<UNICODE_QUOTED_IDENTIFIER> ...
How should my CREATE TABLE be created correctly?
I think it should be
CREATE TABLE ProcessedRain (
`current.weather.0.description` VARCHAR
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherProcessedData',
'format.type' = 'json',
'connector.properties.bootstrap.servers' = 'kafka:9092',
'connector.properties.group.id' = 'flink-output-group'
);

WSO2 SP - Kafka source with JSON attributes

I'm trying to read JSON data from Kafka, using following code:
#source(type = 'kafka', bootstrap.servers = 'localhost:9092', topic.list = 'TestTopic',
group.id = 'test', threading.option = 'single.thread', #map(type = 'json'))
define stream myDataStream (json object);
But failed with following error:
[2019-03-27_11-39-32_103] ERROR
{org.wso2.extension.siddhi.map.json.sourcemapper.JsonSourceMapper} -
Stream "myDataStream" does not have an attribute named "ABC",
but the received event {"event":{"ABC":"1"}} does. Hence dropping the message.
Check whether the json string is in a correct format for default mapping.
I've tried adding the attributes
#source(type = 'kafka', bootstrap.servers = 'localhost:9092',
topic.list = 'TestTopic', group.id = 'test',
threading.option = 'single.thread',
#map(type = 'json', #attributes(ABC = '$.ABC')))
Syntax error:
Error at 'json' defined at stream 'myDataStream', attribute 'json' is
not mapped
Any help would be greatly appreciated.
There is an error in the syntax of the stream,
define stream myDataStream (ABC string);
Here the attribute name is the key of the JSON messages, in this case, ABC