I'm trying to read JSON data from Kafka, using following code:
#source(type = 'kafka', bootstrap.servers = 'localhost:9092', topic.list = 'TestTopic',
group.id = 'test', threading.option = 'single.thread', #map(type = 'json'))
define stream myDataStream (json object);
But failed with following error:
[2019-03-27_11-39-32_103] ERROR
{org.wso2.extension.siddhi.map.json.sourcemapper.JsonSourceMapper} -
Stream "myDataStream" does not have an attribute named "ABC",
but the received event {"event":{"ABC":"1"}} does. Hence dropping the message.
Check whether the json string is in a correct format for default mapping.
I've tried adding the attributes
#source(type = 'kafka', bootstrap.servers = 'localhost:9092',
topic.list = 'TestTopic', group.id = 'test',
threading.option = 'single.thread',
#map(type = 'json', #attributes(ABC = '$.ABC')))
Syntax error:
Error at 'json' defined at stream 'myDataStream', attribute 'json' is
not mapped
Any help would be greatly appreciated.
There is an error in the syntax of the stream,
define stream myDataStream (ABC string);
Here the attribute name is the key of the JSON messages, in this case, ABC
Related
I'm creating the following connector:
CREATE
SOURCE CONNECTOR IF NOT EXISTS `myconn` WITH (
"name" = 'myconn',
"connector.class" = 'MySqlCdcSource',
"tasks.max" = 1,
--Database config--------------------------
"database.hostname" = '${dbHost}',
"database.port"....
--Kafka config------------------------------
"kafka.api.key" = '${kafkaApiKey}',
"kafka.auth.mode"...
--Connector behavior------------------------
"output.data.format" = 'AVRO',
"output.key.format" = 'STRING',
"key.converter" = 'org.apache.kafka.connect.storage.StringConverter',
"key.converter.schemas.enable" = true,
"tombstones.on.delete" = true,
"null.handling.mode" = 'keep',
"include.schema.changes" = false,
"table.include.list" = 'sandbox\.(xxx|yyy)',
"errors.tolerance" = 'none',
"errors.log.enable" = true,
"errors.log.include.messages" = true,
--Topics configuration----------------------
"topic.creation.default.cleanup.policy" = 'compact',
"topic.creation.default.min.insync.replicas"...
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
);
In local development I'm working with io.debezium.connector.mysql.MySqlConnector and the transformation works correctly, the problem is when I make de creation in Confluent Cloud (using MySqlCdcSource). It gives me the following error:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
The problem is that it doesn't find the field Id, but when I make a test without the transformation, I see this key in the topic: Struct(id=0000), so the field exists. I suppose it should be related with types or some configuration option is missed in my connector. Any ideas?
The problem was related with a wrong usage of a predicate. Take a look at the following part:
--Predicates--------------------------------
"predicates" = 'TopicDoestHaveIdField',
"predicates.TopicDoestHaveIdField.type" = 'org.apache.kafka.connect.transforms.predicates.TopicNameMatches',
"predicates.TopicDoestHaveIdField.pattern" = 'myconn.sandbox\.(zzz|ooo)$',
--Transforms--------------------------------
"transforms" = 'extractKey',
"transforms.extractKey.type" = 'org.apache.kafka.connect.transforms.ExtractField$Key',
"transforms.extractKey.field" = 'id',
"transforms.extractKey.predicate" = 'TopicDoestHaveIdField',
"transforms.extractKey.negate" = true
As you can see negate is being used, so extractKey transforms includes all the tables that doesn't match with it. Despite that I was using "table.include.list", some topics where included additionally and they didn't have id field.
In my case the confusion was motivated because of the message in log:
The field configured for org.apache.kafka.connect.transforms.ExtractField transform does not exist in the key or value of Kafka records. Please verify the record's key or value has the configured field.
(No information about the involved topic) I was supposing that the message was talking about any of the both allowed tables ('zzz' or 'ooo') but in the end it was another third topic included.
Clickhouse version 21.12.3.32. I`m following this PR(https://github.com/ClickHouse/ClickHouse/pull/21850) to handle incorrect messages from kafka topic, but after some investigation I`ve found that if a single message contains broken data, a whole batch of received messages cannot be parsed and it can lead to data loss.
Kafka engine table:
CREATE TABLE default.kafka_engine (message String)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'topic',
kafka_group_name = 'group',
kafka_format = 'JSONAsString',
kafka_row_delimiter = '\n',
kafka_num_consumers = 1,
kafka_handle_error_mode ='stream';
Example of broken message: [object Object]
First message error: JSON object must begin with '{'.: (at row 1).
Other messages error: Cannot parse input: expected ']' at end of stream..
Is it possible to skip just that broken message and correctly parse other messages in a batch received from kafka topic?
Changing the kafka_format to RawBLOB fixed my issue.
I'm trying to configure a standalone producer to read csv files from a sftp server and send data to a topic on the cloud.
So far I succeeded in reading my csv data from the file and parsing it according to my value.schema.
But now instead of using a fixed configuration schema, I'd like to use the schema registry. So I configured an AVRO schema for my test topic on the confluent cloud, generated the API key/secret and updated my config files.
I can see that connection is working fine, no authentication errors, via cli I can access the test schema, but when I try to run the producer I get the following error:
[2021-09-20 16:39:53,442] INFO SftpCsvSourceConnectorConfig values:
batch.size = 1000
behavior.on.error = IGNORE
cleanup.policy = NONE
csv.case.sensitive.field.names = false
csv.escape.char = 92
csv.file.charset = UTF-8
csv.first.row.as.header = false
csv.ignore.leading.whitespace = true
csv.ignore.quotations = false
csv.keep.carriage.return = false
csv.null.field.indicator = NEITHER
csv.quote.char = 34
csv.rfc.4180.parser.enabled = false
csv.separator.char = 44
csv.skip.lines = 0
csv.strict.quotes = false
csv.verify.reader = true
empty.poll.wait.ms = 250
error.path = /home/alberto/opt/confluent-6.2.0/sftp2/error
file.minimum.age.ms = 0
finished.path = /home/alberto/opt/confluent-6.2.0/sftp2/finished
input.file.pattern = .*.csv
input.path = /home/alberto/opt/confluent-6.2.0/sftp2/data
kafka.topic = testSchema
kerberos.keytab.path =
kerberos.user.principal =
key.schema = {"name" : "com.example.users.UserKey","type" : "STRUCT","isOptional" : true,"fieldSchemas" : {"material" : {"type" : "STRING","isOptional" : true}}}
parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
parser.timestamp.timezone = UTC
processing.file.extension = .PROCESSING
proxy.password = [hidden]
proxy.username =
schema.generation.enabled = false
schema.generation.key.fields = []
schema.generation.key.name = defaultkeyschemaname
schema.generation.value.name = defaultvalueschemaname
sftp.host = 192.168.1.6
sftp.password = [hidden]
sftp.port = 22
sftp.proxy.url =
sftp.username = user
timestamp.field =
timestamp.mode = PROCESS_TIME
tls.passphrase = [hidden]
tls.pemfile =
tls.private.key = [hidden]
tls.public.key = [hidden]
value.schema =
...
Caused by: org.apache.kafka.common.config.ConfigException: Both configs key.schema and value.schema must be set if schema.generation.enabled is false, but key.schema was not null and value.schema was null.
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.validateSchema(SftpSourceConnectorConfig.java:181)
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.<init>(SftpSourceConnectorConfig.java:121)
at io.confluent.connect.sftp.source.SftpCsvSourceConnectorConfig.<init>(SftpCsvSourceConnectorConfig.java:157)
at io.confluent.connect.sftp.SftpCsvSourceConnector.start(SftpCsvSourceConnector.java:44)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:184)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:209)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:348)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:331)
... 7 more
If I set schema.generation.enabled to true, it seems that it creates an empty schema:
value.schema = {"type":"STRUCT","isOptional":false,"fieldSchemas":{}}
and then I get:
org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic testSchema : Schema being registered is incompatible with an earlier schema for subject "testSchema-value"; error code: 409; error code: 409
as if it's trying to register a schema, except that it's not what I want, I just need to fetch the schema from the registry and use it.
If anyone need any addition information regarding the configuration I'll happy to provide.
The description for replaceField SMT says it can Filter or rename fields within a Struct or Map. However I can't find any working example for replacing or renaming fields within a struct.
I've got data in a topic being written into ElasticSearch using Kafka Connect Elasticsearch Sink. For simplicity, assume the format of the data looks like this.
{
'ID':22,
'ITEM': 'Shampoo'
'USER':{
'NAME': 'jon',
'AGE':25
}
}
So if I'm trying to rename/replace USER.NAME or USER.AGE, how would I configure that in the connector? (I've written everything in ksqldb). This is my current config where I rename ITEM to product and ID to id
CREATE SINK CONNECTOR ELASTIC_SINK WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://host.docker.internal:9200',
'type.name' = '_doc',
'topics' = 'ELASTIC_TOPIC',
'key.ignore' = 'false',
'schema.ignore' = 'true',
'transforms' = 'RenameField',
'transforms.RenameField.type' = 'org.apache.kafka.connect.transforms.ReplaceField$Value',
'transforms.RenameField.renames' = 'ITEM:product,ID:id',
);
Take a look at the existing SO question and answer: https://stackoverflow.com/a/56601093/4778022
You can provide the path to the field to rename, with parts separated by periods.
CREATE SINK CONNECTOR ELASTIC_SINK WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://host.docker.internal:9200',
'type.name' = '_doc',
'topics' = 'ELASTIC_TOPIC',
'key.ignore' = 'false',
'schema.ignore' = 'true',
'transforms' = 'RenameField',
'transforms.RenameField.type' = 'org.apache.kafka.connect.transforms.ReplaceField$Value',
'transforms.RenameField.renames' = 'USER.NAME:name,ITEM:product,ID:id',
);
I have sample Python code and i am trying to construct and populate Rest API request parameters.
Headers and Authorization params are working fine but i am not sure how to translate below mention "QueryBands" and "data" variable for my Rest request using rest client.
queryBands = {}
queryBands['appName'] = 'MyApp'
queryBands['version'] = '1.0'
# Setting request fields, including SQL.
data = {}
data['query'] = 'SELECT * from db limit 5'
data['queryBands'] = queryBands
data['format'] = 'array'
request = urllib2.Request(url, json.dumps(data), headers)
try:
response = urllib2.urlopen(request);
Should i need to declare new variables or pass these values as "body" while doing Rest api call?
I am using chrome advance rest-client. But it could be any rest client.
import json
queryBands = {}
queryBands['applicationName'] = 'MyApp'
queryBands['version'] = '1.0'
data = {}
data['query'] = 'SELECT * from db limit 5'
data['queryBands'] = queryBands
data['format'] = 'array'
print(json.dumps(data))