I try to create Kafka to BigQuery data pipeline using Confluent BigQuerySinkConnector. Test environment is a cp-all-in-one docker container. I added on it (it does not exist as default). All definitions I did on Google BigQuery side (I hope...). But it just gives Schema Registry error that I cannot understand why it occurs. I created a table which is named as rest_avro on BigQuery dataset. The topic schema:
{
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": [
"null",
"int"
]
}
],
"name": "User",
"type": "record"
}
I defined this schema on BigQuery table manually.
There is no error when connector running.
My connector configuration is loaded successfully
[2020-08-19 13:21:46,803] INFO SinkConnectorConfig values:
config.action.reload = restart
connector.class = com.wepay.kafka.connect.bigquery.BigQuerySinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = null
name = kcbq-connect1
tasks.max = 1
topics = [rest-avro]
topics.regex =
transforms = []
value.converter = null
(org.apache.kafka.connect.runtime.SinkConnectorConfig)
[2020-08-19 13:21:46,803] INFO EnrichedConnectorConfig values:
config.action.reload = restart
connector.class = com.wepay.kafka.connect.bigquery.BigQuerySinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = null
name = kcbq-connect1
tasks.max = 1
topics = [rest-avro]
topics.regex =
transforms = []
value.converter = null
(org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig)
[2020-08-19 13:21:46,804] INFO [Worker clientId=connect-1, groupId=compose-connect-group] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2020-08-19 13:21:55,940] INFO [Consumer clientId=connector-consumer-kcbq-connect1-0, groupId=connect-kcbq-connect1] Seeking to offset 8 for partition rest-avro-0 (org.apache.kafka.clients.consumer.KafkaConsumer)
But console puts errors like below. Do you have any idea?
[2020-08-19 13:09:45,663] ERROR Task failed with org.apache.kafka.connect.errors.ConnectException error: Exception encountered while trying to fetch latest schema metadata from Schema Registry (com.wepay.kafka.connect.bigquery.write.batch.KCBQThreadPoolExecutor)
Exception in thread "pool-3-thread-79" org.apache.kafka.connect.errors.ConnectException: Exception encountered while trying to fetch latest schema metadata from Schema Registry
at com.wepay.kafka.connect.bigquery.schemaregistry.schemaretriever.SchemaRegistrySchemaRetriever.retrieveSchema(SchemaRegistrySchemaRetriever.java:67)
at com.wepay.kafka.connect.bigquery.SchemaManager.updateSchema(SchemaManager.java:58)
at com.wepay.kafka.connect.bigquery.write.row.AdaptiveBigQueryWriter.attemptSchemaUpdate(AdaptiveBigQueryWriter.java:129)
at com.wepay.kafka.connect.bigquery.write.row.AdaptiveBigQueryWriter.performWriteRequest(AdaptiveBigQueryWriter.java:96)
at com.wepay.kafka.connect.bigquery.write.row.BigQueryWriter.writeRows(BigQueryWriter.java:117)
at com.wepay.kafka.connect.bigquery.write.batch.TableWriter.run(TableWriter.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:153)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getLatestVersion(RestService.java:359)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getLatestVersion(RestService.java:351)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getLatestSchemaMetadata(CachedSchemaRegistryClient.java:136)
at com.wepay.kafka.connect.bigquery.schemaregistry.schemaretriever.SchemaRegistrySchemaRetriever.retrieveSchema(SchemaRegistrySchemaRetriever.java:63)
... 8 more
The problem says that BigQuery Sink Connector is unable to retrieve the current schema from Schema Registry container.
It seems that the schemaRegistryLocation field has been misconfigured on BigQuery Sink connector properties. Most probably it has been set to be http://localhost:8081.
As the docker-compose.yml shipped with Confluent's repository says, this endpoint needs to be defined as http://schema-registry:8081 on dockerized environments.
With below properties I managed to create the BigQuery Sink Connector;
{
"name": "customer-connect1",
"config": {
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max": "1",
"topics": "dbserver1.venue_organization.customers",
"topicsToTables": "dbserver1.venue_organization.customers=customers",
"sanitizeTopics": "true",
"autoCreateTables": "true",
"autoUpdateSchemas": "true",
"schemaRetriever": "com.wepay.kafka.connect.bigquery.schemaregistry.schemaretriever.SchemaRegistrySchemaRetriever",
"schemaRegistryLocation": "http://schema-registry:8081",
"bufferSize": "100000",
"maxWriteSize": "10000",
"tableWriteWait": "1000",
"project": "venue-organization",
"datasets": ".*=venue_organization",
"keyfile": " /data/venue-organization-service-account.json"
}
}
More in here: Google BigQuery Sink Connector Configuration Properties
Related
I am trying to get a distributed setup for the ignite-connector to run. Sadly, it does not work. I was able to grab the log on creation of the connector via the api.
API POST payload to /connectors
{
"name": "ignite-connector",
"config": {
"connector.class": "org.apache.ignite.stream.kafka.connect.IgniteSinkConnector",
"tasks.max": "2",
"topics": "someTopic1",
"cacheName": "myCache",
"cacheAllowOverwrite": true,
"igniteCfg":"/opt/ignite/examples/config/example-cache.xml"}
}
}
I set up the ignite-connector as a plugin. I built an uber-jar from the repo and put it to a separate direcotry and included it as plugin in the .properties file I am using to start connect-distributed.sh.
I set the classpath for the jobs for both the connetor and kafka I am managing with systemd:
Environment=CLASSPATH=/opt/kafka/ignite-connector/*
Following the full error log:
[2022-11-17 19:49:30,268] INFO [ignite-connector|worker] SinkConnectorConfig values:
config.action.reload = restart
connector.class = org.apache.ignite.stream.kafka.connect.IgniteSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = null
name = ignite-connector
predicates = []
tasks.max = 2
topics = [someTopic1]
topics.regex =
transforms = []
value.converter = null
(org.apache.kafka.connect.runtime.SinkConnectorConfig:376)
[2022-11-17 19:49:30,272] INFO [ignite-connector|worker] EnrichedConnectorConfig values:
config.action.reload = restart
connector.class = org.apache.ignite.stream.kafka.connect.IgniteSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = null
name = ignite-connector
predicates = []
tasks.max = 2
topics = [someTopic1]
topics.regex =
transforms = []
value.converter = null
(org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:376)
[2022-11-17 19:49:30,276] INFO [ignite-connector|worker] Instantiated connector ignite-connector with version 3.3.1 of type class org.apache.ignite.stream.kafka.connect.IgniteSinkConnector (org.apache.kafka.connect.runtime.Worker:322)
[2022-11-17 19:49:30,276] INFO [ignite-connector|worker] Finished creating connector ignite-connector (org.apache.kafka.connect.runtime.Worker:347)
[2022-11-17 19:49:30,277] ERROR [ignite-connector|worker] WorkerConnector{id=ignite-connector} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:201)
java.lang.NoClassDefFoundError: org/apache/ignite/internal/util/typedef/internal/A
at org.apache.ignite.stream.kafka.connect.IgniteSinkConnector.start(IgniteSinkConnector.java:55)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:193)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:218)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:363)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:346)
at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:146)
at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:123)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
[2022-11-17 19:49:30,277] INFO [Worker clientId=connect-1, groupId=connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1687)
[2022-11-17 19:49:30,280] ERROR [ignite-connector|worker] [Worker clientId=connect-1, groupId=connect-cluster] Failed to start connector 'ignite-connector' (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1811)
org.apache.kafka.connect.errors.ConnectException: Failed to start connector: ignite-connector
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$startConnector$35(DistributedHerder.java:1782)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:349)
at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:146)
at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:123)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to transition connector ignite-connector to state STARTED
... 8 more
Caused by: java.lang.NoClassDefFoundError: org/apache/ignite/internal/util/typedef/internal/A
at org.apache.ignite.stream.kafka.connect.IgniteSinkConnector.start(IgniteSinkConnector.java:55)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:193)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:218)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:363)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:346)
... 7 more
The mentioned class (A) is included in the ignite-core-2.9.1.jar that is bundeld in the uberJar in the Plugin directory.
Any pointers are appreciated
There seems to be a misunderstanding what "plugins" are. Those only are classes defined as implementations of Converters, Transforms, and Connectors.
Internal Ignite classes are none of these, so they wouldn't be loaded into the plugin.path classloader.
To fix this, you'll need to ensure you export CLASSPATH=/path/to/ignite-files/*.jar and you can use jar -tf commands to validate the class exists in any specific JAR before running Connect process.
It is not a hack; that's just how Java Classloader works.
I currently have the following data in my Kafka topic:
{"tran_slip":"00002060","tran_amount":"111.22"}
{"tran_slip":"00000005","tran_amount":"123"}
{"tran_slip":"00000006","tran_amount":"123"}
{"tran_slip":"00000007","tran_amount":"123"}
Since the data in my Kafka topic does not have a schema, I figured I can force a schema using AWS Glue Schema Registry.
So I created an Avro Schema in the following manner:
{
"type": "record",
"namespace": "int_trans",
"name": "transaction",
"fields": [
{
"name": "tran_slip",
"type": "string"
},
{
"name": "tran_amount",
"type": "string"
}
]
}
Now I created a confluent sink connector on MSK Kafka Connect to sink data from a Kafka topi back to an Oracle DB with the below properties:
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
value.converter.schemaAutoRegistrationEnabled=true
connection.password=******
transforms.extractKeyFromStruct.type=org.apache.kafka.connect.transforms.ExtractField$Key
tasks.max=1
key.converter.region=*******
transforms=RenameField
key.converter.schemaName=KeySchema
value.converter.avroRecordType=GENERIC_RECORD
internal.key.converter.schemas.enable=false
value.converter.schemaName=ValueSchema
auto.evolve=false
transforms.RenameField.type=org.apache.kafka.connect.transforms.ReplaceField$Value
key.converter.avroRecordType=GENERIC_RECORD
value.converter=com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter
insert.mode=upsert
key.converter=org.apache.kafka.connect.storage.StringConverter
transforms.RenameField.renames=tran_slip:TRAN_SLIP, tran_amount:TRAN_AMOUNT
table.name.format=abc.transactions_sink
topics=aws-db.abc.transactions
batch.size=1
value.converter.registry.name=registry_transactions
value.converter.region=*****
key.converter.registry.name=registry_transactions
key.converter.schemas.enable=false
internal.key.converter=com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter
delete.enabled=false
key.converter.schemaAutoRegistrationEnabled=true
connection.user=*******
internal.value.converter.schemas.enable=false
value.converter.schemas.enable=true
internal.value.converter=com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter
auto.create=false
connection.url=*********
pk.mode=record_value
pk.fields=tran_slip
With these settings I keep getting the following error:
org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter.toConnectData(AWSKafkaAvroConverter.java:118)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:545)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:501)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:501)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:478)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:238)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.amazonaws.services.schemaregistry.exception.AWSSchemaRegistryException: Didn't find secondary deserializer.
at com.amazonaws.services.schemaregistry.deserializers.SecondaryDeserializer.deserialize(SecondaryDeserializer.java:65)
at com.amazonaws.services.schemaregistry.deserializers.avro.AWSKafkaAvroDeserializer.deserializeByHeaderVersionByte(AWSKafkaAvroDeserializer.java:150)
at com.amazonaws.services.schemaregistry.deserializers.avro.AWSKafkaAvroDeserializer.deserialize(AWSKafkaAvroDeserializer.java:114)
at com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter.toConnectData(AWSKafkaAvroConverter.java:116)
Can someone please guide me on what I am doing wrong in the configurations since I'm rather new to this topic?
I am using Kafka Connect with the Google PubSub Connector to write messages from gcp PubSub into Kafka Topics.
My Connector has the following configuration:
{
"name": "MyTopicSourceConnector",
"config": {
"connector.class": "com.google.pubsub.kafka.source.CloudPubSubSourceConnector",
"tasks.max": "10",
"kafka.topic": "myTopic",
"cps.project": "my-project-id",
"cps.subscription": "myTopic-sub",
"name": "MyTopicSourceConnector",
"key.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"key.converter.schema.registry.url": "http://myurl-schema-registry:8081",
"value.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"value.converter.schema.registry.url": "http://myurl-schema-registry:8081"
}
}
The proto message value schema looks like this:
syntax = "proto3";
message value_myTopic {
bytes message = 1;
string notificationConfig = 2;
string eventTime = 3;
string bucketId = 4;
string payloadFormat = 5;
string eventType = 6;
string objectId = 7;
string objectGeneration = 8;
}
This setup works when I am using avro or json (with the appropriate converters) but with Protobuf my connector is throwing the following error message right after deploying it and fails:
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:292)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:245)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Unsupported root schema of type STRING
at io.confluent.connect.protobuf.ProtobufData.rawSchemaFromConnectSchema(ProtobufData.java:315)
at io.confluent.connect.protobuf.ProtobufData.fromConnectSchema(ProtobufData.java:304)
at io.confluent.connect.protobuf.ProtobufData.fromConnectData(ProtobufData.java:109)
at io.confluent.connect.protobuf.ProtobufConverter.fromConnectData(ProtobufConverter.java:83)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(WorkerSourceTask.java:292)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
I don't know why you have this issue.
In case you are interested, instead of the Confluent Protobuf converter, there is a community one, from Blue Apron, and it juste necessitate that you add another config to specify which Protobuf (de)serialization class to use.
Example from Snowflake of the 2 covnerters can be seen here.
Cannot create a Kafka -> Cassandra Sink connector using Ksqldb :
CREATE SINK CONNECTOR cassandra WITH( "connector.class" = 'io.confluent.connect.cassandra.CassandraSinkConnector', "tasks.max" = '1', "topics" = 'tst', "cassandra.contact.points" = 'cassandra', "cassandra.keyspace" = 'test', "cassandra.write.mode" = 'Update', "confluent.topic.bootstrap.servers" = 'kafka:9092' );
ERROR [CASS|worker] WorkerConnector{id=CASS} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:118)
org.apache.kafka.connect.errors.ConnectException: Error while attempting to create/find topic(s) '_confluent-command'
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:262)
at io.confluent.license.LicenseStore$1.run(LicenseStore.java:161)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:128)
at io.confluent.license.LicenseStore.start(LicenseStore.java:190)
at io.confluent.license.LicenseManager.<init>(LicenseManager.java:155)
at io.confluent.license.LicenseManager.<init>(LicenseManager.java:140)
at io.confluent.connect.utils.licensing.ConnectLicenseManager$Builder.lambda$build$0(ConnectLicenseManager.java:210)
at io.confluent.connect.utils.licensing.ConnectLicenseManager.registerOrValidateLicense(ConnectLicenseManager.java:255)
at io.confluent.connect.cassandra.CassandraSinkConnector.doStart(CassandraSinkConnector.java:50)
at io.confluent.connect.cassandra.CassandraSinkConnector.start(CassandraSinkConnector.java:45)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:110)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:135)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:195)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:257)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1190)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:126)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1206)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1202)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:229)
... 21 more
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
The Confluent Cassandra sink connector Replication factor had a default value = 3.
Modifying the default value in the connector config solved the problem!
"confluent.topic.replication.factor" = '1',
I get task error for a distributed configuration for a connector sink cassandra. I was running the command :
curl -s localhost:8083/connectors/cassandraSinkConnector2/status | jq
to get the status
{
"name": "cassandraSinkConnector2",
"connector": {
"state": "RUNNING",
"worker_id": localhost:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "localhost:8083",
"trace": "org.apache.kafka.common.KafkaException: Failed to construct kafka consumer\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:811)\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:624)\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:605)\n\tat org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:505)\n\tat org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:441)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:865)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:110)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:880)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:876)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.common.KafkaException: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor ClassNotFoundException exception occurred\n\tat org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:357)\n\tat org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:332)\n\tat org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:319)\n\tat org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:701)\n\t... 12 more\nCaused by: java.lang.ClassNotFoundException: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:382)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:424)\n\tat org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:357)\n\tat java.lang.Class.forName0(Native Method)\n\tat java.lang.Class.forName(Class.java:348)\n\tat org.apache.kafka.common.utils.Utils.loadClass(Utils.java:338)\n\tat org.apache.kafka.common.utils.Utils.newInstance(Utils.java:327)\n\tat org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:355)\n\t... 15 more\n"
}
],
"type": "sink"
Stack trace:
"trace": "org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:811)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:624)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:605)
at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:505)
at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:441)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:865)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:110)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:880)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:876)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor ClassNotFoundException exception occurred
at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:357)
at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:332)
at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:319)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:701)
... 12 more
Caused by: java.lang.ClassNotFoundException: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.kafka.common.utils.Utils.loadClass(Utils.java:338)
at org.apache.kafka.common.utils.Utils.newInstance(Utils.java:327)
at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:355)
... 15 more
You can find below the configuration of the connector.
{
"name": "cassandraSinkConnector2",
"config": {
"connector.class": "io.confluent.connect.cassandra.CassandraSinkConnector",
"tasks.max": "1",
"topics": "appartenance_de",
"cassandra.contact.points": "localhost",
"cassandra.kcql": "INSERT INTO app_test SELECT * FROM app_de",
"cassandra.port": "9042",
"cassandra.keyspace": "dev_dkks",
"cassandra.username": "superuser",
"cassandra.password": "password",
"cassandra.write.mode": "insert",
"value.converter.schemas.enable": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"name": "cassandraSinkConnector2"
},
"tasks": [
{
"connector": "cassandraSinkConnector2",
"task": 0
}
],
"type": "sink"
}
New error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:560)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Record with a null key was encountered. This connector requires that records from Kafka contain the keys for the Cassandra table. Please use a transformation like org.apache.kafka.connect.transforms.ValueToKey to create a key with the proper fields.
at io.confluent.connect.cassandra.CassandraSinkTask.put(CassandraSinkTask.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
... 10 more
"
The root error is
java.lang.ClassNotFoundException: io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
The Monitoring Interceptors are part of Confluent Platform. You can either disable their use in your Kafka Connect worker config, or better, make sure that the /usr/share/java/monitoring-interceptors/monitoring-interceptors-5.2.1.jar JAR is available to your Kafka Connect worker.
The new error you're seeing is
org.apache.kafka.connect.errors.DataException:
Record with a null key was encountered. This connector requires that records from Kafka contain the keys for the Cassandra table.
Please use a transformation like org.apache.kafka.connect.transforms.ValueToKey to create a key with the proper fields.
I'd suggest using a Single Message Transform as suggested in the error to correctly key your data. You can see an example of doing this here and the documentation for the transform here.