Snowflake Kafka connector config issue - apache-kafka

I'm following the steps in this guide Snowflake Connector for Kafka
The error message I'm getting is
BadRequestException: Connector config {.....} contains no connector type
I am running the command as
sh kafka_2.12-2.3.0/bin/connect-standalone.sh connect-standalone.properties snowflake_kafka_config.json
my config files are
connect-standalone.properties
bootstrap.servers=localhost:9092
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/Users/kafka_test/kafka
jar file snowflake-kafka-connector-0.5.1.jar is in plugin.path
snowflake_kafka_config.json
{
"name":"Kafka_Test",
"Config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"test",
"snowflake.topic2table.map": "",
"buffer.count.records":"1",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"<url>",
"snowflake.user.name":"<user_name>",
"snowflake.private.key":"<private_key>",
"snowflake.private.key.passphrase":"<pass_phrase>",
"snowflake.database.name":"<db>",
"snowflake.schema.name":"<schema>",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"value.converter.schema.registry.url":"",
"value.converter.basic.auth.credentials.source":"",
"value.converter.basic.auth.user.info":""
}
}
Kafka is running on local, I have a producer and consumer up, can see the data flowing.

This is the same question I answered over on the Confluent community Slack, but I'll post it here for reference too :-)
The connect worker log shows that the connector JAR itself is being loaded, so the 'contains no connector type` is because your config formatting is fubar.
You're running in Standalone mode, but passing in a JSON file which won't. My personal opinion is always use distributed, even if just a single node of it. Check this out if you need a recap on standalone vs distributed : http://rmoff.dev/ksldn19-kafka-connect
If you must use standalone then you need your connector config (snowflake_kafka_config.json) to be a properties file like this:
param1=argument1
param2=argument2
You can see valid JSON examples (if you use distributed mode) here: https://github.com/confluentinc/demo-scene/blob/master/kafka-connect-zero-to-hero/demo_zero-to-hero-with-kafka-connect.adoc#stream-data-from-kafka-to-elasticsearch

Related

Kafka connect MongoDB sink connector using kafka-avro-console-producer

I'm trying to write some documents to MongoDB using the Kafka connect MongoDB connector. I've managed to set up all the components required and start up the connector but when I send the message to Kafka using the kafka-avro-console-producer, Kafka connect is giving me the following error:
org.apache.kafka.connect.errors.DataException: Error: `operationType` field is doc is missing.
I've tried to add this field to the message but then kafka connect is asking me to include a documentKey field. It seems like I need to include some extra fields apart from the payload defined in my schema but I can't find a comprehensive documentation. Does anyone have an example of a kafka message payload (using kafka-avro-console-producer) that goes through a Kafka -> Kafka connect -> MongoDB pipeline?
See following an example of one of the messages I'm sending to Kafka (btw, kafka-avro-console-consumer is able to consume the messages):
./kafka-avro-console-producer --broker-list kafka:9093 --topic sampledata --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"field1","type":"string"}]}'
{"field1": "value1"}
And see also following the configuration of the sink connector:
{"name": "mongo-sink",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://schemaregistry:8081",
"connection.uri":"mongodb://cadb:27017",
"database":"cognitive_assistant",
"collection":"topicData",
"topics":"sampledata6",
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
}
}
I've just managed to make the connector work. I deleted the change.data.capture.handler property from the connector configuration and it works now.

Kafka topics not created after initial batch load

I have a running instance of landoop fast data dev docker container which has sqlserver as source and s3 as the sink. The source configuration is
name=JdbcSourceConnector
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:sqlserver:/hostname.com.au:1433;databaseName=DB;user=;password=
topic.prefix=activitylog
timestamp.column.name=DateTimeStamp
query=Select * from dbo.activitylog
mode=timestamp
table.types=VIEW
poll.interval.ms=5000
batch.max.rows=1000
Once the s3 sink is configured, I can see the around 1000 rows immediately written to s3,but after that nothing happens.My source has got thousands of rows but the producer is not producing any more topics after the first run.
Is there any configuration further to be done in order for the producer to create topics continuously?
S3 sink connection looks like this
name=S3SinkConnector
connector.class=io.confluent.connect.s3.S3SinkConnector
topics=activitylog
tasks.max=1
format.class=io.confluent.connect.s3.format.avro.AvroFormat
flush.size=3
s3.bucket.name=testbucket
storage.class=io.confluent.connect.s3.storage.S3Storage
s3.region=ap-southeast-2
s3.part.size=5242880
partitioner.class=TimeBasedPartitioner
path.format= YYYYMMdd_HH
timezone=AEST
Please help.

Connector config contains no connector type

I'm trying to use JDBC Connector to connect to a PostgreSQL database on my cluster (the database is not directly managed by the cluster).
I've been calling the Kafka Connect with the following command:
connect-standalone.sh worker.properties jdbc-connector.properties
This is the content of the worker.propertiesfile:
class=io.confluent.connect.jdbc.JdbcSourceConnector
name=test-postgres-1
tasks.max=1
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/home/user/offest
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
connection.url=jdbc:postgresql://database-server.url:port/database?user=user&password=password
And this are the content of the jdbc-connector.properties:
mode=incrementing
incrementing.column.name=id
topic.prefix=test-postgres-jdbc-
When I try to launch the connector with the above command it crashes with the following error:
[2018-04-16 11:39:08,164] ERROR Failed to create job for jdbc.properties (org.apache.kafka.connect.cli.ConnectStandalone:88)
[2018-04-16 11:39:08,166] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:99)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {mode=incrementing, incrementing.column.name=pdv, topic.prefix=test-postgres-jdbc-} contains no connector type
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:80)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:67)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:96)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {mode=incrementing, incrementing.column.name=id, topic.prefix=test-postgres-jdbc-} contains no connector type
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:233)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:158)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:93)
After noting that the connector causing the error displayed only informations from jdbc-connector.properties I've tried merging the two files together, but then the command terminates abruptly (without creating a topic or an offset file) with the following output:
[SLF4J infos...]
[2018-04-16 11:48:54,620] INFO Usage: ConnectStandalone worker.properties connector1.properties [connector2.properties ...] (org.apache.kafka.connect.cli.ConnectStandalone:59)
You need to have most of those properties in the jdbc-connector.properties, not the worker.properties. See https://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html for a full list of config options that go in the connector configuration (jdbc-connector.properties in your example).
Try this:
worker.properties:
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/home/user/offest
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
jdbc-connector.properties:
class=io.confluent.connect.jdbc.JdbcSourceConnector
name=test-postgres-1
tasks.max=1
mode=incrementing
incrementing.column.name=id
topic.prefix=test-postgres-jdbc-
connection.url=jdbc:postgresql://database-server.url:port/database?user=user&password=password
You can see some more examples with Kafka Connect here:
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
https://www.confluent.io/blog/blogthe-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

Kafka Connect Number type fields

When i would like to use Kafka connect with source RDBMS which is Oracle , Number type fields are seen as bytes like below,
Column "ID" with value "4" as number has been sent ,but at consumer console this value has been seen as "ID":"BA=="
What can i do in order to solve this issue ?
Kafka connect is started with below command
connect-standalone ./etc/kafka/connect-standalone.properties /home/kafka/oracle.properties.test
######## connect-standalone.properties
# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=kafkaserver01.localdomain:9092
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
# The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
######## /home/kafka/oracle.properties.test Configuration File
name=oracle-connect-test1
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
topic.prefix=
connection.url=jdbc:oracle:thin:test/oracle#testsrv01:1521:testdb
table.whitelist=TEST1,TEST2
mode=timestamp
timestamp.column.name=CDC_TIMESTAMP
## Console Consumer
kafka-console-consumer --bootstrap-server kafkaserver01.localdomain:9092 --topic TEST1
Thanks.
I found the solution please add below configuration in your connector source properties
numeric.precision.mapping = true
it will disable encoding numeric value at topic
with new version of kafka-connect-jdbc-4.1.1
you can use property
numeric.mapping=best_fit
for best result

Kafka-connect issue

I installed Apache Kafka on centos 7 (confluent), am trying to run filestream kafka connect in distributed mode but I was getting below error:
[2017-08-10 05:26:27,355] INFO Added alias 'ValueToKey' to plugin 'org.apache.kafka.connect.transforms.ValueToKey' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:290)
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "internal.key.converter" which has no default value.
at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:463)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:453)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:75)
at org.apache.kafka.connect.runtime.WorkerConfig.<init>(WorkerConfig.java:197)
at org.apache.kafka.connect.runtime.distributed.DistributedConfig.<init>(DistributedConfig.java:289)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:65)
Which is now resolved by updating the workers.properties as mentioned in http://docs.confluent.io/current/connect/userguide.html#connect-userguide-distributed-config
Command used:
/home/arun/kafka/confluent-3.3.0/bin/connect-distributed.sh ../../../properties/file-stream-demo-distributed.properties
Filestream properties file (workers.properties):
name=file-stream-demo-distributed
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
file=/tmp/demo-file.txt
bootstrap.servers=localhost:9092,localhost:9093,localhost:9094
config.storage.topic=demo-2-distributed
offset.storage.topic=demo-2-distributed
status.storage.topic=demo-2-distributed
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter.schemas.enable=false
group.id=""
I added below properties and command went through without any errors.
bootstrap.servers=localhost:9092,localhost:9093,localhost:9094
config.storage.topic=demo-2-distributed
offset.storage.topic=demo-2-distributed
status.storage.topic=demo-2-distributed
group.id=""
But, now when I run consumer command, I am unable to see the messages in /tmp/demo-file.txt. Please let me know if there is a way I can check if the messages are published to kafka topics and partitions ?
kafka-console-consumer --zookeeper localhost:2181 --topic demo-2-distributed --from-beginning
I believe I am missing something really basic here. Can some one please help?
You need to define unique topics for Kafka connect framework to store its config, offset, and status.
In your workers.properties file change these parameters to something like the following:
config.storage.topic=demo-2-distributed-config
offset.storage.topic=demo-2-distributed-offset
status.storage.topic=demo-2-distributed-status
These topics are use to store state and configuration metadata of connect and not for storing the messages for any of the connectors that run on top of connect. Do not use console consumer on any of these three topics and expect to see the messages.
The messages are stored in the topic configured in the connector configuration json with the parameter called "topic".
Example file-sink-config.json file
{
"name": "MyFileSink",
"config": {
"topics": "mytopic",
"connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector",
"tasks.max": 1,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"file": "/tmp/demo-file.txt"
}
}
Once the distributed worker is running you need to apply the config file to it using curl like so:
curl -X POST -H "Content-Type: application/json" --data #file-sink-config.json http://localhost:8083/connectors
After that the config will be safely stored in the config topic you created for all distributed workers to use. Make sure the config topic (and the status and offset topics) will not expire messages or you will loose you Connector configuration when it does.