Kafka: Source-Connector to Topic mapping is Flakey - apache-kafka

I have the following Kafka connector configuration (below). I have created the "member" topic already (30 partitions). The problem is that I will install the connector and it will work; i.e.
curl -d "#mobiledb-member.json" -H "Content-Type: application/json" -X PUT https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/config
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/member-connector/topics
returns:
{"member-connector":{"topics":[member]}}
the status call returns no errors:
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/status
{"name":"member-connector","connector":{"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"}],"type":"source"}
... but at other times, I will install a similar connector config and it will return no topics.
{"member-connector":{"topics":[]}}
Yet the status shows no errors and the Connector logs show no clues as to why this "connector to topic" mapping isn't working. Why aren't the logs helping out?
Connector configuration.
{
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"connection.url":"jdbc:sqlserver:****;",
"connection.user":"***",
"connection.password":"***",
"transforms":"createKey",
"table.poll.interval.ms":"120000",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"poll.interval.ms":"5000",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"name":"member-connector",
"tasks.max":"4",
"query":"SELECT * FROM member_kafka_test",
"table.types":"TABLE",
"topic.prefix":"member",
"mode":"timestamp+incrementing",
"transforms.createKey.fields":"member_id",
"incrementing.column.name": "member_id",
"timestamp.column.name" : "update_datetime"
}

Related

Not able to get the data insert into snowflake database, through snowflake kafka connector, however connector is getting started

Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$ curl -X POST -H "Content-Type: application/json" --data #snowflake-connector.json http://XX.XX.XX.XXX:8083/connectors
{"name":"file-stream-distributed","config":{"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector","tasks.max":"1","topics":"uat.product.topic","buffer.count.records":"10000","buffer.flush.time":"60","buffer.size.bytes":"5000000","snowflake.url.name":"XXX00000.XX-XXXX-0.snowflakecomputing.com:443","snowflake.user.name":"kafka_connector_user_1","snowflake.private.key":"XXXXX,"snowflake.database.name":"KAFKA_DB","snowflake.schema.name":"KAFKA_SCHEMA","key.converter":"org.apache.kafka.connect.storage.StringConverter","key.converter.schemas.enable":"true","value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter","value.converter:schemas.enable":"true","value.converter.schema.registry.url":"http://localhost:8081","name":"file-stream-distributed"},"tasks":[],"type":"sink"}[Panamax-UAT-Kafka,Zookeeper deploy#ip-XX-XX-XX-XXX config]$
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$
Checking for the status of connector: But its giving not found
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$ curl XX.XX.XX.XXX:8083/connectors/file-stream-demo-distributed/tasks
{"error_code":404,"message":"Connector file-stream-demo-distributed not found"}
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$
And the data is not getting inserted into the database
logs: /opt/Kafka/kafka/logs/connect.logs:
[2022-05-29 14:51:22,521] INFO [Worker clientId=connect-1, groupId=connect-cluster] Joined group at generation 39 with protocol version 2 and got assignment: Assignme nt{error=0, leader='connect-1-fee23ff6-e61d-4e4c-982d-da2af6272a08', leaderUrl='http://XX.XX.XX.XXX:8083/', offset=4353, connectorIds=[], taskIds=[], revokedConnector Ids=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1681)
You need to specify a user that has appropriate CREATE TABLE privileges for use in Kafka connector. Suggest you review the documention.

How does JDBC sink connector inserts values into postgres database

I'm using JDBC sink connector to load data from kafka topic to postgres database.
here is my configuration:
curl --location --request PUT 'http://localhost:8083/connectors/sink_1/config' \
--header 'Content-Type: application/json' \
--data-raw '{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url":"jdbc:postgresql://localhost:5432/postgres",
"connection.user":"user",
"connection.password":"passwd",
"tasks.max" : "10",
"topics":"<topic_name_same_as_tablename>",
"insert.mode":"insert",
"key.converter":"org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"quote.sql.identifiers":"never",
"errors.tolerance":"all",
"errors.deadletterqueue.topic.name":"failed_records",
"errors.deadletterqueue.topic.replication.factor":"1",
"errors.log.enable":"true"
}'
In my table, I have 100k+ records so, I tried partitioning the topic into 10 and I tried with tasks.max with 10 to speed up the loading process, which was much faster when compared to single partition.
Can someone help me understand how the sink connector loads data into postgres? How will be the insert statement it will consider? either approach-1 or approach-2? If it is approach-1 then can we achieve approach-2? if yes, how can we?

Stream geometry points into kafka topic using kafka JDBC source connector

I am trying to stream a table from PostgreSQL to Kafka topic using Kafka JDBC connector.
This table contains a geometry column with data type Point
Here is my JDBC connector configuration:
curl -i -X PUT http://localhost:8083/connectors/source-jdbc-pg-00/config \
-H "Content-Type: application/json" -d '{
"connector.class" : "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url" : "jdbc:postgresql://postgres:5432/postgres",
"connection.user" : "me",
"connection.password" : "12345",
"topic.prefix" : "pg-",
"mode" : "bulk",
"poll.interval.ms" : 3600000
}'
The connector is running successfully, but when I check the topic, the geometry column is missing.
Is it possible to stream geospatial data using Kafka JDBC connectors?
Does anyone have any advice?
UPDATE 2021-04-12
I can see the postgis-jdbc-2.5.0.jar had been installed properly in this directory:
cd /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib
But when I checked the kafka-connect log, it seemed that the postgis jdbc plugin was not working properly,
kafka-connect | [2021-04-11 23:38:25,275] WARN [source-jdbc-pg-00|task-0] JDBC type 1111 ("public"."geometry") not currently supported (io.confluent.connect.jdbc.dialect.PostgreSqlDatabaseDialect:1153)

Kafka FileStreamSinkConnector recreated the file to write in after it was deleted by me

OK, this is an unusual one.
I made a File stream sink connector as follows:
curl -X POST http://cpnode.local.lan:8083/connectors -H "Content-Type: application/json" --data '{
"name":"file-sink-connector-002",
"config": {
"tasks.max":"1",
"batch.size":"1000",
"batch.max.rows":"1000",
"poll.interval.ms":"500",
"connector.class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"file":"/kafka/names.txt",
"table.name.format":"tb_sensordata",
"topics":"names",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false"
}
}'
While the connector was running, I deleted the file names.txt
After an hour or so.....It recreated the file.....
I started console producer and produced some data in the topic. The sink connector wrote the data in the file.
Can anyone explain this behavior....plz.
According to this pull request (MINOR: Append or create file in FileStreamSinkTask #5406) the file mentioned in a FileStreamSinkConnector will get created by kafka-connect if it does not exists.

Shows invalid characters while consuming using kafka console consumer

While consuming from the Kafka topic using Kafka console consumer or kt(GoLang CLI tool for Kafka), I am getting invalid characters.
...
\u0000\ufffd?\u0006app\u0000\u0000\u0000\u0000\u0000\u0000\u003e#\u0001
\u0000\u000cSec-39\u001aSome Actual Value Text\ufffd\ufffd\ufffd\ufffd\ufffd
\ufffd\u0015#\ufffd\ufffd\ufffd\ufffd\ufffd\ufff
...
Even though Kafka connect can actually sink the proper data to an SQL database.
Given that you say
Kafka connect can actually sink the proper data to an SQL database.
my assumption would be that you're using Avro serialization for the data on the topic. Kafka Connect configured correctly will take the Avro data and deserialise it.
However, console tools such as kafka-console-consumer, kt, kafkacat et al do not support Avro, and so you get a bunch of weird characters if you use them to read data from a topic that is Avro-encoded.
To read Avro data to the command line you can use kafka-avro-console-consumer:
kafka-avro-console-consumer
--bootstrap-server kafka:29092\
--topic test_topic_avro \
--property schema.registry.url=http://schema-registry:8081
Edit: Adding a suggestion from #CodeGeas too:
Alternatively, reading data using REST Proxy can be done with the following:
# Create a consumer for JSON data
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
-H "Accept: application/vnd.kafka.v2+json" \
--data '{"name": "my_consumer_instance", "format": "avro", "auto.offset.reset": "earliest"}' \
# Subscribe the consumer to a topic
http://kafka-rest-instance:8082/consumers/my_json_consumer
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
--data '{"topics":["YOUR-TOPIC-NAME"]}' \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance/subscription
# Then consume some data from a topic using the base URL in the first response.
curl -X GET -H "Accept: application/vnd.kafka.avro.v2+json" \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance/records
Later, to delete the consumer afterwards:
curl -X DELETE -H "Accept: application/vnd.kafka.avro.v2+json" \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance
By default, the console consumer tools deserializes both the message key and value using ByteArrayDeserializer but then obviously tries to print data to the command line using the default formatter.
This tool however allows to customize the deserializers and formatter used. See the following extract from the help output:
--formatter <String: class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--property <String: prop> The properties to initialize the
message formatter. Default
properties include:
print.timestamp=true|false
print.key=true|false
print.value=true|false
key.separator=<key.separator>
line.separator=<line.separator>
key.deserializer=<key.deserializer>
value.deserializer=<value.
deserializer>
Users can also pass in customized
properties for their formatter; more
specifically, users can pass in
properties keyed with 'key.
deserializer.' and 'value.
deserializer.' prefixes to configure
their deserializers.
--key-deserializer <String:
deserializer for key>
--value-deserializer <String:
deserializer for values>
Using these settings, you should be able to change the output to be what you want.