kafka SMT keeps failing to extract json field to use as message key - apache-kafka

I am using leneses.io s3 source connector to read json files and trying to set message key using SMT.
Here is the config used for connector on AWS MSK
connector.class=io.lenses.streamreactor.connect.aws.s3.source.S3SourceConnector
tasks.max=1
topics=topic_3
connect.s3.vhost.bucket=true
connect.s3.aws.auth.mode=Credentials
connect.s3.aws.access.key=<<access key>>
connect.s3.aws.region=eu-central-1
connect.s3.aws.secret.key=<<secret key>>
schema.enable=false
connect.s3.kcql=INSERT INTO topic_3 SELECT * FROM bucket1:json STOREAS `JSON` WITH_FLUSH_COUNT = 1
aws.region=eu-central-1
aws.custom.endpoint=https://s3.eu-central-1.amazonaws.com
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms=createKey
key.converter.schemas.enable=false
transforms.createKey.fields=id
value.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
I can't get the SMT to work and running into below error
[Worker-0d3e3af50908b12ee] [2022-04-13 11:43:08,461] ERROR [dev2-s3-source-connector-4|task-0] Error encountered in task dev2-s3-source-connector-4-0. Executing stage 'TRANSFORMATION' with class 'org.apache.kafka.connect.transforms.ValueToKey'. (org.apache.kafka.connect.runtime.errors.LogReporter:66)
[Worker-0d3e3af50908b12ee] org.apache.kafka.connect.errors.DataException: Only Map objects supported in absence of schema for [copying fields from value to key], found: java.lang.String
P.s. if the SMT commands were removed from config then json files are being read into kafka topic with no issues (but the message key is empty)

Related

How to solve my error in redshiftsinkconnector

I try to connect kafka and redshift in the redshiftsink connector.the connector is running and the task is
enter image description here failed .
Your error - Failed to deserialize data in topic ... to Avro
So, if your data is not Avro, then change your key.converter and/or value.converter to the appropriate config. You need to consult your Producer code for the matching serializers.

SF_KAFKA_CONNECTOR name is empty or invalid error using Confluent Cloud and Snowflake Kafka Connector

I have a cluster running in Confluent Cloud and am able to Produce and Consume data using other applications. However, when I try to hook up the Snowflake Kafka Connector I receive these errors:
[2019-10-15 22:12:08,979] INFO Creating connector source-snowflake of type com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,983] INFO Instantiated connector source-snowflake with version 0.5.1 of type class com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,986] INFO
[SF_KAFKA_CONNECTOR] Snowflake Kafka Connector Version: 0.5.1 (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,029] INFO
[SF_KAFKA_CONNECTOR] SnowflakeSinkConnector:start (com.snowflake.kafka.connector.SnowflakeSinkConnector)
[2019-10-15 22:12:09,030] ERROR
[SF_KAFKA_CONNECTOR] name is empty or invalid. It should match Snowflake object identifier syntax. Please see the documentation. (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,033] ERROR WorkerConnector{id=source-snowflake} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector)
com.snowflake.kafka.connector.internal.SnowflakeKafkaConnectorException:
[SF_KAFKA_CONNECTOR] Exception: Invalid input connector configuration
[SF_KAFKA_CONNECTOR] Error Code: 0001
[SF_KAFKA_CONNECTOR] Detail: input kafka connector configuration is null, missing required values, or wrong input value
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:347)
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:306)
at com.snowflake.kafka.connector.Utils.validateConfig(Utils.java:400)
at com.snowflake.kafka.connector.SnowflakeSinkConnector.start(SnowflakeSinkConnector.java:131)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:252)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1079)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:117)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1095)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1091)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is my scrubbed Snowflake config file:
{
"name":"snowsink",
"config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"tp-snow-test",
"buffer.count.records":"100",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"xxxxxxx.east-us-2.azure.snowflakecomputing.com",
"snowflake.user.name":"svc_cc_strm",
"snowflake.private.key":"<key>",
"snowflake.private.key.passphrase":<password>,
"snowflake.database.name":"testdb",
"snowflake.schema.name":"test1",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
}
}
Any ideas? Thanks.
The name of the connector should be a valid SQL identifier to Snowflake. So many of the kafka topic examples have dashes in them that when I first tried the Snowflake Kafka connector I got this same error.
According to the documentation, a Snowflake pipe is created using the connector_name specified, and pipe names must be valid SQL identifiers.
The connector creates one pipe for each topic partition. The name is:
SNOWFLAKE_KAFKA_CONNECTOR_PIPE_.
Also from the same doc page at "Fields in the Configuration File" for name:
Application name. This must be unique across all Kafka connectors used by the customer. This name name must be a valid Snowflake unquoted identifier.
If the topic has a dash in it then it will need to mapped to a table name that is also a proper SQL identifier in your connector config, otherwise it will try to create the table name as the same as the topic name and fail on the "-" in the name.
You need to change the name of your connector (source-snow) to remove the - from it (so that it matches this validation pattern).
🤷‍♂️
You need to have below entry in your config file , below topics entry.
"topics":"tp-snow-test",
"snowflake.topic2table.map": "tp-snow-test:TestKafkaTable",

MongoDB Kafka Connector not generating the message key with the Mongo document id

I'm using the beta release of the MongoDB Kafka Connector to publish from MongoDB to a Kafka topic.
Messages are generated into Kafka but their key is null when it should be the document id:
This is my connect standalone config:
bootstrap.servers=xxx:9092
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter you want to apply
# it to
key.converter.schemas.enable=false
value.converter.schemas.enable=false
# The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
And the mongodb source properties:
name=mongo-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb+srv://xxx
database=mydb
collection=mycollection
topic.prefix=someprefix
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
collation=
Below there's an example of a message String value:
"{\"_id\": {\"_data\": \"xxx\"}, \"operationType\": \"replace\", \"clusterTime\": {\"$timestamp\": {\"t\": 1564140389, \"i\": 1}}, \"fullDocument\": {\"_id\": \"5\", \"name\": \"Some Client\", \"clientId\": \"someclient\", \"clientSecret\": \"1234\", \"whiteListedIps\": [], \"enabled\": true, \"_class\": \"myproject.Client\"}, \"ns\": {\"db\": \"mydb\", \"coll\": \"mycollection\"}, \"documentKey\": {\"_id\": \"5\"}}"
I tried using a transform to extract if from the value, specifically from the documentKey field:
transforms=InsertKey
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=documentKey
But got an exception:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.ValueToKey.applyWithSchema(ValueToKey.java:79)
at org.apache.kafka.connect.transforms.ValueToKey.apply(ValueToKey.java:65)
Any ideas to generate a key with the document id?
According to exception, that is thrown:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.ValueToKey.applyWithSchema(ValueToKey.java:79)
at org.apache.kafka.connect.transforms.ValueToKey.apply(ValueToKey.java:65)
Unfortunately Mongo DB connector, that you use, it doesn't create properly schema.
Above connector create Record with key and value schema's as String.
Check this line:: How record is created by connector. That is the reason why you can't apply Transformation to it
This should be supported in release 1.3.0:
https://jira.mongodb.org/browse/KAFKA-40

Kafka Connect Hdfs Sink Connector - Class io.confluent.connect.hdfs.string.StringFormat could not be found

Hi i am trying to move csv data from kafka to hdfs using hdfs sink connector and below are the properties i used
Connect.properties
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
format.class=io.confluent.connect.hdfs.string.StringFormat
tasks.max=1
topics=topic_name
hadoop.conf.dir=/etc/hadoop/conf
hdfs.url=hdfs://nameservice1/dir
flush.size=3
hdfs.authentication.kerberos=true
connect.hdfs.principal=principal
connect.hdfs.keytab=principal.keytab
hdfs.namenode.principal=principal
partitioner.class=io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner
partition.duration.ms=300000
path.format=path.format='year'=YYYY/'month'=MM/'day'=dd
locale=en
timezone=EST
worker properties
bootstrap.servers=kafkaserver
plugin.path=/opt/confluent/share/java
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
offset.storage.file.filename=/tmp/connect.offsets
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
consumer.group.id=connect_group
consumer.auto.offset.reset=earliest
and i use confluent-5.0.1
but i get the below exception when i run kafka connect
java.util.concurrent.ExecutionException:
org.apache.kafka.connect.runtime.rest.errors.BadRequestException:
Connector configuration is invalid and contains the following 1
error(s): Invalid value io.confluent.connect.hdfs.string.StringFormat
for configuration format.class: Class
io.confluent.connect.hdfs.string.StringFormat could not be found.**
You can also find the above list of errors at the endpoint
/{connectorType}/config/validate at
org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at
org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at
org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:110)
Caused by:
org.apache.kafka.connect.runtime.rest.errors.BadRequestException:
Connector configuration is invalid and contains the following 1
error(s): Invalid value io.confluent.connect.hdfs.string.StringFormat
for configuration format.class: Class
io.confluent.connect.hdfs.string.StringFormat could not be found. You
can also find the above list of errors at the endpoint
/{connectorType}/config/validate at
org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
at
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:189)
at
org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:107)

Can I write custom kafka connect transform for converting JSON to AVRO?

I want to use kafka-connect-hdfs for writing schemaless json records from kafka to hdfs file.
If I am using JsonConvertor as key/value convertor then it is not working. But if I am using StringConvertor then it writes the json as escaped string.
For example:
actual json -
{"name":"test"}
data written to hdfs file -
"{\"name\":\"test\"}"
expected output to hdfs file -
{"name":"test"}
Is there any way or alternative I can achieve this or I have to use it with schema only?
Below is the exception I get when I try to use JSONConvertor:
[2017-09-06 14:40:19,344] ERROR Task hdfs-sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148)
org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:308)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:406)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:180)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Configuration of quickstart-hdfs.properties:
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=test_hdfs_avro
hdfs.url=hdfs://localhost:9000
flush.size=1
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
Configuration of connect-avro-standalone.properties:
bootstrap.servers=localhost:9092
schemas.enable=false
key.converter.schemas.enable=false
value.converter.schemas.enable=false
When you specify a converter in you connector's configuration properties you need to include all the properties pertaining to this converter, regardless of whether such properties are included in the worker's config too.
In the above example, you'll need to specify both:
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
in quickstart-hdfs.properties.
FYI, JSON export is coming up in the HDFS Connector soon. Track the related pull request here: https://github.com/confluentinc/kafka-connect-hdfs/pull/196
Update: JsonFormat has been merged to master branch.