Kafka connect with mysql custom query - apache-kafka

I have done incremental data sync with help of kafka connect.
Now i want to achieve same with custom query. But I am getting error.
My config file is
name=mysql-whitelist-timestamp-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://127.0.0.1:3306/demouser=root&password=root
query=select name from students3 where marks = 10
mode=timestamp table.whitelist=students3
timestamp.column.name=timestamp
topic.prefix=test-mysql-jdbc-
And getting below error:
ERROR WorkerConnector{id=mysql-whitelist-timestamp-source} Error while
starting connector
(org.apache.kafka.connect.runtime.WorkerConnector:119)
org.apache.kafka.connect.errors.ConnectException: query may not be
combined with whole-table copying settings.

We shouldn't use the tag table.whitelist with the custom query. see the full explanation.

Related

How to solve my error in redshiftsinkconnector

I try to connect kafka and redshift in the redshiftsink connector.the connector is running and the task is
enter image description here failed .
Your error - Failed to deserialize data in topic ... to Avro
So, if your data is not Avro, then change your key.converter and/or value.converter to the appropriate config. You need to consult your Producer code for the matching serializers.

Kafka connect S3 source failing with read-only registry

I am trying to read avro records stored in S3 in order to put them back in a kafka topic using the S3 source provided by confluent.
I already have the topics and the registry setup with the right schemas but when the connect S3 source tries to serialize the my records to the topics I get this error
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema: ... at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:121)
at
io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:143)
at
io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:84)
... 15 more Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Subject com-row-count-value is in read-only mode; error code: 42205
at
io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:495)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:486)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:459)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:214)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:276)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:252)
at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:75)
it seems that the connect producer does not try to get the schema_id if it exists but tries to write it but my registry is readonly.
Anyone knows if this is an issue or there are some configuration I am missing ?
If you're sure the correct schema for that subject is already registered by some other means, you can try to set auto.register.schemas to false in the serializer configuration.
See here for more details: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#handling-differences-between-preregistered-and-client-derived-schemas

How to ignore error result in Kafka Connect Elasticsearch

I am trying to run kafka connect for elastic search .
But because of some mistake i entered wrong record in kafka topic .
Now i fixed that issue and inserting correct value but elastic search is still throwing error on previous record in the topic
Here is the error
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'lambdaDemo0': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"lambdaDemo0-9749-0e710000fd04"; line: 1, column: 13]
Is there any way i can ignore the older record in the topic and tell kafka connect to pick latest record ?
I am trying to delete the topic i get topic marked for deletion but still records are present in the topic .
I tried below two properties but does seems to be working
drop.invalid.message=true
behavior.on.malformed.documents=ignore
Please suggest how i can clean up the wrong record in the topic
You can tell Kafka Connect to just skip bad records
errors.tolerance = all
Optionally, you can route these messages to another topic (known as a dead letter queue) for inspection by adding
errors.tolerance = all
errors.deadletterqueue.topic.name = my-dlq-topic
These settings are valid for Kafka Connect with any connector that is failing in the serialisation/deserialisation stage of processing. For more information see this article.

SF_KAFKA_CONNECTOR name is empty or invalid error using Confluent Cloud and Snowflake Kafka Connector

I have a cluster running in Confluent Cloud and am able to Produce and Consume data using other applications. However, when I try to hook up the Snowflake Kafka Connector I receive these errors:
[2019-10-15 22:12:08,979] INFO Creating connector source-snowflake of type com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,983] INFO Instantiated connector source-snowflake with version 0.5.1 of type class com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,986] INFO
[SF_KAFKA_CONNECTOR] Snowflake Kafka Connector Version: 0.5.1 (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,029] INFO
[SF_KAFKA_CONNECTOR] SnowflakeSinkConnector:start (com.snowflake.kafka.connector.SnowflakeSinkConnector)
[2019-10-15 22:12:09,030] ERROR
[SF_KAFKA_CONNECTOR] name is empty or invalid. It should match Snowflake object identifier syntax. Please see the documentation. (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,033] ERROR WorkerConnector{id=source-snowflake} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector)
com.snowflake.kafka.connector.internal.SnowflakeKafkaConnectorException:
[SF_KAFKA_CONNECTOR] Exception: Invalid input connector configuration
[SF_KAFKA_CONNECTOR] Error Code: 0001
[SF_KAFKA_CONNECTOR] Detail: input kafka connector configuration is null, missing required values, or wrong input value
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:347)
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:306)
at com.snowflake.kafka.connector.Utils.validateConfig(Utils.java:400)
at com.snowflake.kafka.connector.SnowflakeSinkConnector.start(SnowflakeSinkConnector.java:131)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:252)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1079)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:117)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1095)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1091)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is my scrubbed Snowflake config file:
{
"name":"snowsink",
"config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"tp-snow-test",
"buffer.count.records":"100",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"xxxxxxx.east-us-2.azure.snowflakecomputing.com",
"snowflake.user.name":"svc_cc_strm",
"snowflake.private.key":"<key>",
"snowflake.private.key.passphrase":<password>,
"snowflake.database.name":"testdb",
"snowflake.schema.name":"test1",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
}
}
Any ideas? Thanks.
The name of the connector should be a valid SQL identifier to Snowflake. So many of the kafka topic examples have dashes in them that when I first tried the Snowflake Kafka connector I got this same error.
According to the documentation, a Snowflake pipe is created using the connector_name specified, and pipe names must be valid SQL identifiers.
The connector creates one pipe for each topic partition. The name is:
SNOWFLAKE_KAFKA_CONNECTOR_PIPE_.
Also from the same doc page at "Fields in the Configuration File" for name:
Application name. This must be unique across all Kafka connectors used by the customer. This name name must be a valid Snowflake unquoted identifier.
If the topic has a dash in it then it will need to mapped to a table name that is also a proper SQL identifier in your connector config, otherwise it will try to create the table name as the same as the topic name and fail on the "-" in the name.
You need to change the name of your connector (source-snow) to remove the - from it (so that it matches this validation pattern).
🤷‍♂️
You need to have below entry in your config file , below topics entry.
"topics":"tp-snow-test",
"snowflake.topic2table.map": "tp-snow-test:TestKafkaTable",

Kafka Connect - Cannot ALTER to add missing field SinkRecordField{schema=Schema{BYTES}, name='CreateUID', isPrimaryKey=true},

I am using JDBC source connector to read data from a Teradata table and push to Kafka topic . But when I am trying to use JDBC sink connector to read Kafka topic and push to Oracle table it throws the below ERROR.
I am sure the error is because of the parameters pk.mode and pk.fields which I am not sure what to use.
My terradata has a primary key UserID+ DatabaseID . I have created the table in Oracle with the primay key as Userid+databaseID.
** ERROR Cannot ALTER to add missing field SinkRecordField{schema=Schema{BYTES},
name='CreateUID', isPrimaryKey=true}, as it is not
optional and does not have a default value**
Below is my sink connector-
{name=teradata_sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=TERADATA_ACCESSRIGHTS
connection.url=
connection.user=
connection.password=
auto.create=false
table.name.format=TERADATA_ACCESSRIGHTS
pk.mode=record_value
pk.fields=USERID+DATABASEID
auto.evolve=true
insert.mode=upsert
}
Please suggest how can I use the JDBC sink connector with the given primary keys.
It looks like the schema of your target does not match the source. Since you have auto.evolve=true Connect tries to ALTER the target, and here it cannot for the error shown
`CreateUID`…is not optional and does not have a default value
Does that column exist in your target table as well as source?