Debezium: Check if snapshot is complete for postgres initial_only - postgresql

I'm using Debezium postgres connector v1.4.2.Final.
I'm using snapshot.mode=initial_only, where I only want to get the table(s) snapshot and not stream the incremental changes. Once the snapshot is completed, I want to stop/kill the connector. How can I find out if the snapshotting is complete and that it's safe to kill the connector?
I'm using this to be able to add new tables to an existing connector. For doing that I'm trying this:
kill the original connector (snapshot.mode=initial)
start a new connector with snapshot.mode=initial_only for new tables
stop the new connector once snapshotting is complete
Start original connector after adding new tables to table.whitelist

please check JMX metrics. Verify if this one https://debezium.io/documentation/reference/1.5/connectors/postgresql.html#connectors-snaps-metric-snapshotcompleted_postgresql would suite to your needs.

Related

Can I initiate an ad-hoc Debezium snapshot without a signaling table?

I am running a Debezium connector to PostgreSQL. The snapshot.mode I use is initial, since I don't want to resnapshot just because the connector has been restarted. However, during development I want to restart the process, as the messages expire from Kafka before they have been read.
If I delete and recreate the connector via Kafka Connect REST API, this doesn't do anything, as the information in the offset/status/config topics is preserved. I have to delete and recreate them when restarting the whole connect cluster to trigger another snapshot.
Am I missing a more convenient way of doing this?
You will also need a new name for the connector as well as a new database.server.name name in the connector config, which stores all the offset information. It should almost be like deploying a connector for the first time again.

How can I increase the tasks.max for debezium sql connnector?

I tried setting the configuration for debezium MySQL connector for the property
'tasks.max=50'
But the connector in logs shows error as below:
'java.lang.IllegalArgumentException: Only a single connector task may be started'
I am using MSK Connector with debezium custom plugin and Debezium version 1.8.
It's not possible.
The database bin log must be read sequentially by only one task.
Run multiple connectors for different tables if you want to distribute workload

Debezium SQL Server Connector Kafka Initial Snapshot

according to the Debezium SQL Server Connector documentation, initial snapshot only fires on connector first run.
However if I delete connector and create new one but with the same name, initial snapshot is not working also.
Is this by design or known Issue?
Any help appreciated
Kafka Connect stores details about connectors such as their snapshot status and ingest progress even after they've been deleted. If you recreate it with the same name it will assume it's the same connector and thus will try to continue from where the previous connector got to.
If you want a connector to start from scratch (i.e. run snapshot etc) then you need to give the connector a new name. (Technically, you could also go into Kafka Connect and muck about with the internal data to remove the data for the connector of the same name, but that's probably a bad idea)
Give your connector a new database.server.name value or create a new topic. The reason why the snapshot doesn't fire again is because the current offset value for your topic and consumer has already passed the snapshot count index.

How do I read a Table In Postgresql Using Flink

I want to do some analytics using Flink on the Data in Postgresql. How and where should I give the port address,username and password. I was trying with the table source as mentioned in the link:https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/common.html#register-tables-in-the-catalog.
final static ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
final static TableSource csvSource = new CsvTableSource("localhost", port);
I am unable to start with actually. I went through all the documents but detailed report about this not found.
The tables and catalog referred to the link you've shared are part of Flink's SQL support, wherein you can use SQL to express computations (queries) to be performed on data ingested into Flink. This is not about connecting Flink to a database, but rather it's about having Flink behave somewhat like a database.
To the best of my knowledge, there is no Postgres source connector for Flink. There is a JDBC table sink, but it only supports append mode (via INSERTs).
The CSVTableSource is for reading data from CSV files, which can then be processed by Flink.
If you want to operate on your data in batches, one approach you could take would be to export the data from Postgres to CSV, and then use a CSVTableSource to load it into Flink. On the other hand, if you wish to establish a streaming connection, you could connect Postgres to Kafka and then use one of Flink's Kafka connectors.
Reading a Postgres instance directly isn't supported as far as I know. However, you can get realtime streaming of Postgres changes by using a Kafka server and a Debezium instance that replicates from Postgres to Kafka.
Debezium connects using the native Postgres replication mechanism on the DB side and emits all record inserts, updates or deletes as a message on the Kafka side. You can then use the Kafka topic(s) as your input in Flink.

Postgres streaming using JDBC Kafka Connect

I am trying to stream changes in my Postgres database using the Kafka Connect JDBC Connector. I am running into issues upon startup as the database is quite big and the query dies every time as rows change in between.
What is the best practice for starting off the JDBC Connector on really huge tables?
Assuming you can't pause the workload on the database that you're streaming the contents in from to allow the initialisation to complete, I would look at Debezium.
In fact, depending on your use case, I would look at Debezium regardless :) It lets you do true CDC against Postgres (and MySQL and MongoDB), and is a Kafka Connect plugin just like the JDBC Connector is so you retain all the benefits of that.