Retry Attempt without data loss when sink side solr is down during runtime - apache-kafka

curl -X POST -H "Content-Type: application/json" --data '{
"name": "t1",
"config": {
"tasks.max": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"connector.class": "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector",
"topics": "TRAN",
"solr.queue.size": "100",
"solr.commit.within": "10",
"solr.url": "http://192.168.2.221:27052/solr/TRAN",
"errors.retry.delay.max.ms":"5000",
"errors.retry.timeout":"600000",
"errors.tolerance":"all",
"errors.log.enable":"true",
"errors.log.include.messages":"false",
"errors.deadletterqueue.topic.name":"DEAD_TRAN",
"errors.deadletterqueue.topic.replication.factor":"1",
"retry.backoff.ms":"1000",
"reconnect.backoff.ms":"5000",
"reconnect.backoff.max.ms":"600000"
}
}' http://localhost:8083/connectors
Need to retry ( without any data loss) based on count from connector config if solr server is down during runtime.
In my case, Its working perfectly whenboth connector and solr are in running state [Active].
But while only solr server is down, there is no retry process until my data passed to the solr leads to data loss..
Error Information shown below
Connector Config from the Kafka Connect Log

I've just checked the SinkTask implementation of that specific connector and it does throw a RetriableException in the put() method.
In theory, and according to your connector configuration, it should block for 10 minutes ("errors.retry.timeout" : "600000"). If your SolR instance recover within the 10 minutes there shouldn't be any problem in terms of data loss.
If you want to fully block your connector until solR is up on his feet, have you tried to set "errors.retry.timeout" : "-1"?
As per the documentation of errors.retry.timeout:
The maximum duration in milliseconds that a failed operation will be
reattempted. The default is 0, which means no retries will be
attempted. Use -1 for infinite retries.
PS: IMHO this might lead to a deadlock situation if for some reason a single message is permanently failing its sink operation (i.e: if the sink is rejecting the operation).

Related

Sink data from DLQ topic to another table as CLOB

I'm using a connector to sink records from a topic to a DB. There are also some values redirected to DLQ (Dead Letter Queue). The records in DLQ may contain wrong types, sizes, non-avro values etc. What I want to do is, sinking all the records to an Oracle DB table. This table will only have 2 columns; a CLOB for the entire message and record date.
To sink from Kafka, we need a schema. Since this topic will contain many types of records, I can't create a proper schema (or can I?). I just want to sink the messages as a whole, how can I achieve this?
I've tried it with this schema and connector:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"DLQ_TEST\",\"namespace\":\"DLQ_TEST\",\"fields\":[
{\"name\":\"VALUE\",\"type\":[\"null\",\"string\"]},
{\"name\":\"RECORDDATE\",\"type\":[\"null\",\"long\"]}]}"}' http://server:8071/subjects/DLQ_INSERT-value/versions
curl -i -X PUT -H "Content-Type:application/json" http://server:8083/connectors/sink_DLQ_INSERT/config -d
'{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:oracle:thin:#oracleserver:1357/VITORCT",
"table.name.format": "GLOBAL.DLQ_TEST_DLQ",
"connection.password": "${file:/kafka/vty/pass.properties:vitweb_pwd}",
"connection.user": "${file:/kafka/vty/pass.properties:vitweb_user}",
"tasks.max": "1",
"topics": "DLQ_TEST_dlq",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"auto.create": "false",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
I don't exactly understand how to make connector use this schema.

Kafka Connect: streaming changes from Postgres to topics using debezium

I'm pretty new to Kafka and Kafka Connect world. I am trying to implement CDC using Kafka (on MSK), Kafka Connect (using the Debezium connector for PostgreSQL) and an RDS Postgres instance. Kafka Connect runs in a K8 pod in our cluster deployed in AWS.
Before diving into the details of the configuration used, I'll try to summarise the problem:
Once the connector starts, it sends messages to the topic as expected (snahpshot)
Once we make any change to a table (Create, Update, Delete), no messages are sent to the topic. We would expect to see messages about the changes made to the table.
My connector config looks like:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "root",
"database.dbname": "insights",
"slot.name": "cdc_organization",
"tasks.max": "1",
"column.blacklist": "password, access_key, reset_token",
"database.server.name": "insights",
"database.port": "5432",
"plugin.name": "wal2json_rds_streaming",
"schema.whitelist": "public",
"table.whitelist": "public.kafka_connect_cdc_test",
"key.converter.schemas.enable": "false",
"database.hostname": "de-test-sre-12373.cbplqnioxomr.eu-west-1.rds.amazonaws.com",
"database.password": "MYSECRETPWD",
"value.converter.schemas.enable": "false",
"name": "source-postgres",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}
We have tried different configurations for the plugin.name property: wal2josn, wal2json_streaming and wal2json_rds_streaming.
There's no problem of connection between the connector and the DB as we already saw messages flowing through as soon as the connector starts.
Is there a configuration issue with the connector described above that prevent us to see messages related to new changes appearing in the topic?
Thanks
Your connector config looks a bit confusing. I'm pretty new to Kafka as well so I don't really know the issue but this is my connector config that works for me.
{
"name":"<connector_name>",
"config": {
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.server.name":"<server>",
"database.port":"5432",
"database.hostname":"<host>",
"database.user":"<user>",
"database.dbname":"<password>",
"tasks.max":"1",
"database.history.kafka.boostrap.servers":"localhost:9092",
"database.history.kafka.topic":"<kafka_topic_name>",
"plugin.name":"pgoutput",
"include.schema.changes":"true"
}
}
If this configuration didn't work aswell, try look up the log console; sometimes the error isn't the last write of the console

How to pass data when meets a condition from MongoDB to a Kafka topic with a source connector and a pipeline property?

I'm working in a source connector to watch for changes in a Mongo's collection and take them to a Kafka topic. This works nicely till I add the requirement to just put them in Kafka topic if meets a specific condition (name=Kathe). It means I need to put data in a topic just if the update process changes the name to Kathe.
My connector's config looks like:
{
"connection.uri":"xxxxxx",
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"topic.prefix": "qu",
"database":"sample_analytics",
"collection":"customers",
"copy.existing": "true",
"pipeline":"[{\"$match\":{\"name\":\"Kathe\"}}]",
"publish.full.document.only": "true",
"flush.timeout.ms":"15000"
}
I also have tried with
"pipeline":"[{\"$match\":{\"name\":{ \"$eq\":\"Kathe\"}}}]"
But it is not producing messages, when the condition meets.
Am I making a mistake?

how to override key.serializer in kafka connect jdbc

I am doing mysql to kafka connection using kafka jdbc source connector. Everything working fine. Now i need to pass key.serializer and value.serializer to encrypt data as show at macronova. but i didn't found any changes in output.
POST API to start source connector
curl -X POST -H "Content-Type: application/json" --data '{
"name": "jdbc-source-connector-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"tasks.max": 10,
"connection.url": "jdbc:mysql://localhost:3306/connect_test?user=roo&password=roo",
"mode": "incrementing",
"table.whitelist" : "test",
"incrementing.column.name": "id",
"timestamp.column.name": "modified",
"topic.prefix": "table-",
"poll.interval.ms": 1000
}
}' http://localhost:8083/connectors
Connectors take Converters only, not serializers via key and value properties
If you want to encrypt a whole string, you'd need to implement your own converter or edit your code that writes into the database to write into Kafka instead, then consume and write to the database as well as other downstream systems

With Kafka JDBC source connector getting only 1000 records/sec. How to improve the record pulling rate

I am using kafka connect with JDBC source connector. The connector is working fine but i able to get only 1000 messages/sec. to the topic from Oracle DB. I tried most of the configuration settings but no luck. i tried in both standalone and distributed modes. Pls. help on this. Below is my JDBC Source connector configuration:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{"name": "ORA_SRC_DEVDB",
"config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:oracle:thin:#xxxxxxx/DBDEV",
"connection.user": "xxxxxx",
"connection.password": "xxxxxx",
"query": "select * from A.LOG_AUDIT",
"topic.prefix": "Topic_POC",
"tasks.max": "1",
"poll.interval.ms": "5000",
"batch.max.rows": "1000",
"table.poll.interval.ms": "60000",
"mode": "timestamp",
"timestamp.column.name": "MODIFIED_DATEnTIME" }
}'
And also the destination Topic "Topic_POC" created with 3 partitions & 3 replicas.
poll.interval.ms: Frequency in ms to poll for new data in each table(default 5000)
batch.max.rows: Maximum number of rows to include in a single batch(default 100)
In your case every 5 seconds you are polling max 1000 record from DB. Trying to decrease poll.interval.ms and increase batch.max.rows could improve fetch rate.
Not only that below factors also impact on your fetch rate
Rate of incoming data into the Database that also depends
I/O rate from DB to JDBC connector to Kafka
DB table performance if you have a proper index on the time column.
After all its uses JDBC to fetch data from the database so implies all that you face on a single JDBC application
As per my experience JDBC connector is pretty fas