Debezium or JDBC skip tables without primary key - postgresql

I have Debezium in a container, capturing all changes of PostgeSQL database records. In addition a have a confluent JDBC container to write all changes to another database.
In the source connector the definition is allowing multiple tables to be captured and therefore some of the tables have primary keys and some others not.
In the sink connector the definition specifies pk.mode like the following:
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
But while there are tables in the source database without keys on the tables, sink connector is throwing the following message:
Caused by: org.apache.kafka.connect.errors.ConnectException: PK mode for table 'contex_str_dealer_branch_address' is RECORD_KEY, but record key schema is missing
Normally there should be a few options
to exclude tables without primary keys from the source
to exclude tables without primary keys from the sink
to have primary key for tables that they have or use all other columns for tables that they have not primary key
Is there any way to skip those tables from any operation?

In debezium there isn't a parameter to filter tables without primary keys.
If you know the table in advance you can use database.exclude.list in debeziun connector source.
In the JDBC sink connector you have two options:
Ignored as no fields are used as primary key.
"pk.mode": "none"
Or all fields from the value struct will be used
"pk.mode": "record_value"

Related

PostgreSQL Upsert without using On Conflict clause

Is it possible to achieve Upsert in Postgres without using the On Conflict clause?
I have a requirement where I converted a normal table into a partitioned table with a partition key that was not part of the Primary Key when the table was non-partitioned.
Since the partition key is added to the primary key column list now, my Upsert statements are failing as the On Conflict clause is missing the partition key. But as per the requirement, I cannot add the partition key to the On Conflict clause as I will have more than one row for the previous primary key column combination in the partitioned table.
Hence, I want Upsert to be achieved without the On Conflict clause. Can someone suggest what alternatives would work.

Is it possible to define a different Reroute.key.field.name per postgres table?

Debezium by default uses the primary key as partition key, however some tables of mine should be paritioned by a different key (e.g. user)
hence I wanted to use: transforms.Reroute.key.field.name=user_id for that specific table only, and all of the rest would keep using the primary key
Docs:
https://debezium.io/documentation/reference/configuration/topic-routing.html#_example
However I'm not very clear on how to apply that transformer only to one table, but not all others.
Instead of re-routing, you could specify the message.key.columns connector option for customizing the columns that make up the message key for specific tables.
message.key.columns=inventory.customers:user_id

kafka sink connector creation for table having primary key as three columns

I have created a source jdbc connector for a table that has no primary key (table has column a,b,c,d,e) and it is part of an external database. I have the replica table in my database and I have created primary key using the columns a,b and c since those three combined together form unique data and can be used to form primary key. I am trying to create upsert sink connector and gave the pk.fields as a,b,c but when I launch the sink connector, it goes to degraded State and I am not able to see any proper error in the connect.log as well. I have given the pk.mode as record_value and in the pk.fields I gave it as a,b,c. Can someone please let me know if there is anything missing in the setup?
Note: it works if I change the mode to insert and remove the pk.fields. the pk.mode is record_value.
Update:
Hi Robin, Source table named as AccountDetails has columns accNumber, bankABA, bankOrigAccNumber, SpendingLimit and ExpirationDate and there is no primary key for this table. The target table is AccountInformation and has the same columns but has the primary key as (accNumber, bankABA and bankOrigAccNumber) since we need to have primary key at target for using in a different application. I have created source connector which is working fine to pull the data once in 24 hours. I am trying to create a sink connector with the mode as upsert for pushing the data from topic to table and the primary key mode as record_value and primary key fields as "accNumber,bankABA,bankOrigAccNumber". When i launch the sink, it goes to degraded state.

How to write another query in IN function when partitioning

I have 2 local docker postgresql-10.7 servers set up. On my hot instance, I have a huge table that I wanted to partition by date (I achieved that). The data from the partitioned table (Let's call it PART_TABLE) is stored on the other server, only PART_TABLE_2019 is stored on HOT instance. And here comes the problem. I don't know how to partition 2 other tables that have foreign keys from PART_TABLE, based on FK. PART_TABLE and TABLE2_PART are both stored on HOT instance.
I was thinking something like this:
create table TABLE2_PART_2019 partition of TABLE2_PART for values in (select uuid from PART_TABLE_2019);
But the query doesn't work and I don't know if this is a good idea (performance wise and logically).
Let me just mention that I can solve this with either function or script etc. but I would like to do this without scripting.
From doc at https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
"While primary keys are supported on partitioned tables, foreign keys
referencing partitioned tables are not supported. (Foreign key
references from a partitioned table to some other table are
supported.)"
With PostgreSQL v10, you can only define foreign keys on the individual partitions. But you could create foreign keys on each partition.
You could upgrade to PostgreSQL v11 which allows foreign keys to be defined on partitioned tables.
Can you explain what a HOT instance is and why it would makes this difficult?

Postgres Foreign-key constraints in non public schema

I have a question regarding constraints on custom schemas.
My app creates a new/separate schema for each clients corresponding with clients' name (i.e .clienta, clientb,...). Some of the tables have a foreign-key constraints but, they don't work on schemas other than the default public schema. For example, let's say there is schema called clienta and it has projects and tasks tables, model Task has a belongsTo(models.Project) association (i.e projects table primary_key is a foreign_key for table tasks. The issue starts here: when trying to create a record in table tasks there comes an error saying foreign key violation error... Key (project_id)=(1) is not present in table "projects... even though projects table has the respective record with id = 1. I am wording if this is a limitation of sequelize library itself or am I missing something in the configs?
Sequelize config
"development": {
"database": "my_app",
"host": "127.0.0.1",
"dialect": "postgres",
"operatorsAliases": "Sequelize.Op",
"dialectOptions": {
"prependSearchPath": true
},
"define": {
"underscored": true
}
}
Example of create function:
models.Task.create({...args}, { searchPath: 'clienta' })
N.B Everything works as expected in public schema.
The sync method API lists two options relating to DB-schema:
options.schema - schema the table should be created in
options.searchPath - optional parameter to set searchPath (Postgresql)
When using schemas other than the default and an association between Models has been created (using for instance belongsTo), it is important to set the searchPath to hold the name of the schema of the target table. Following the explanation in search_path in postgresql, not specifying the searchPath will have the constraint referring to a table (if it exists) in the default schema (usually 'public').