kafka sink connector creation for table having primary key as three columns - apache-kafka

I have created a source jdbc connector for a table that has no primary key (table has column a,b,c,d,e) and it is part of an external database. I have the replica table in my database and I have created primary key using the columns a,b and c since those three combined together form unique data and can be used to form primary key. I am trying to create upsert sink connector and gave the pk.fields as a,b,c but when I launch the sink connector, it goes to degraded State and I am not able to see any proper error in the connect.log as well. I have given the pk.mode as record_value and in the pk.fields I gave it as a,b,c. Can someone please let me know if there is anything missing in the setup?
Note: it works if I change the mode to insert and remove the pk.fields. the pk.mode is record_value.
Update:
Hi Robin, Source table named as AccountDetails has columns accNumber, bankABA, bankOrigAccNumber, SpendingLimit and ExpirationDate and there is no primary key for this table. The target table is AccountInformation and has the same columns but has the primary key as (accNumber, bankABA and bankOrigAccNumber) since we need to have primary key at target for using in a different application. I have created source connector which is working fine to pull the data once in 24 hours. I am trying to create a sink connector with the mode as upsert for pushing the data from topic to table and the primary key mode as record_value and primary key fields as "accNumber,bankABA,bankOrigAccNumber". When i launch the sink, it goes to degraded state.

Related

Adding a partition from another DB to existing partitioned table in Postgres 12

Short description of a problem
I need to build a partitioned table "users" with 2 partitions located on separate servers (Moscow and Hamburg), each partition is table with columns:
id - integer primary key with auto increment
region - smallint partition key, which equals either 100 for Hamburg or 200 for Moscow
login - unique character varying with length of 100.
I intended to make sequences for id as n*1000 + 100 for Hamburg, and n*1000 + 200 for Moscow, so just looking on primary key I will know which partition it belongs to.
region is intended to be read only and never change after creation, so no records will move between partitions.
SELECT queries must be able to return records from all partitions and UPDATE queries must be able to modify records on all partitions, INSERT/DELETE queries must be able to add/delete records only to local partition, so data stored in them is not completely isolated.
What was done
Using pgAdmin4
I created a "test" table on Hamburg server, added all column info, marked it as partitioned table with partition key region and partition type List.
I created a "hamburg" partition in this table, adding primary key constraint as id,region and unique key constraint as login,region.
I created a "moscow" table on Moscow server with the same column info as "test"
I added postgres_fdw extension to Hamburg server, created Foreign server pointing to DB on Moscow server and User mapping.
I added "moscow" foreign table to Hamburg server pointing to "moscow" table on Moscow server.
What is my problem
I couldn't figure out how to attach this foreign table as second partition to "test" table.
When I tried to attach partition through pgAdmin dialog in "test" table partitions properties it shows me an error: cannot unpack non-iterable Response object
When I tried to add partition with query as follows:
ALTER TABLE public.test ATTACH PARTITION public.moscow FOR VALUES IN (200);
It shows me an error:
ERROR: cannot attach foreign table "moscow" as partition of partitioned table "test"
DETAIL: Table "test" contains unique indexes.
SQL state: 42809
I removed unique constraint from login column but it shows the same error.
When I make partitioned table with the same properties and both partitions initially located on the same server all works well, except for postgres watch for login uniqueness per-partition rather than in whole table, but I suggest this is its limitation.
So, how can I attach a table located on the second server as partition to partitioned table located on the first one?
The error message is pretty clear: Since you cannot create an index on a partitioned table, PostgreSQL cannot create a partition of the unique index. But the unique index is required to implement the constraint.
See this source comment:
/*
* If we're attaching a foreign table, we must fail if any of the indexes
* is a constraint index; otherwise, there's nothing to do here. Do this
* before starting work, to avoid wasting the effort of building a few
* non-unique indexes before coming across a unique one.
*/
Either drop the unique constraint or don't use foreign tables as partitions.
Ok, I was finally able to add a foreign table as partition to partitioned table.
It was necessary to drop primary key property on id and unique property on login columns for partitioned table
After that I was able to attach foreign table as partition to partitioned table
Later I have added primary key property on id and unique property on login columns for each local partition.
So in the end I have unique global id as it is generated by sequences for each DB with never intersected values. For login uniqueness I have to manually check global table if there is any record with it before inserting.
P.S. Hopefully, this partitioning mechanism in postgres is suitable for geographically distant regions.

Is it possible to define a different Reroute.key.field.name per postgres table?

Debezium by default uses the primary key as partition key, however some tables of mine should be paritioned by a different key (e.g. user)
hence I wanted to use: transforms.Reroute.key.field.name=user_id for that specific table only, and all of the rest would keep using the primary key
Docs:
https://debezium.io/documentation/reference/configuration/topic-routing.html#_example
However I'm not very clear on how to apply that transformer only to one table, but not all others.
Instead of re-routing, you could specify the message.key.columns connector option for customizing the columns that make up the message key for specific tables.
message.key.columns=inventory.customers:user_id

Instance has a NULL identity key error during insert with session.add() in sqlalchemy into partitioned table with trigger

I am using postgresql and sqlalchemy for my flask project.
I recently partitioned one of my big tables based on created_on using postgresql triggers.
But now if a try to insert a record into master table with db.session.add(obj) in sqlalchemy, i am getting error saying
Instance has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
Here I am using a sequence to increment my primary key. Please help me with this.
use autoincrement=True while defining your column example in my code sno is an autoincrement field :
class Contact(db.Model):
sno = db.Column(db.Integer, primary_key=True,autoincrement=True)

How to write another query in IN function when partitioning

I have 2 local docker postgresql-10.7 servers set up. On my hot instance, I have a huge table that I wanted to partition by date (I achieved that). The data from the partitioned table (Let's call it PART_TABLE) is stored on the other server, only PART_TABLE_2019 is stored on HOT instance. And here comes the problem. I don't know how to partition 2 other tables that have foreign keys from PART_TABLE, based on FK. PART_TABLE and TABLE2_PART are both stored on HOT instance.
I was thinking something like this:
create table TABLE2_PART_2019 partition of TABLE2_PART for values in (select uuid from PART_TABLE_2019);
But the query doesn't work and I don't know if this is a good idea (performance wise and logically).
Let me just mention that I can solve this with either function or script etc. but I would like to do this without scripting.
From doc at https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
"While primary keys are supported on partitioned tables, foreign keys
referencing partitioned tables are not supported. (Foreign key
references from a partitioned table to some other table are
supported.)"
With PostgreSQL v10, you can only define foreign keys on the individual partitions. But you could create foreign keys on each partition.
You could upgrade to PostgreSQL v11 which allows foreign keys to be defined on partitioned tables.
Can you explain what a HOT instance is and why it would makes this difficult?

Unable to load the source file details into mysql database using TALEND tool

I am new to talend.
I need to upload the incoming file details into MySQL tables.
Please, can you provide me an example (Talend ETL) to propagate the primary key from one table (tMysqlOutput) to another table (tMysqlOutput) through Talend? This primary key will act as foreign key for another table. I am struggling with this.
Existing scenario:
filedelimiter --> tmap --> stage --> tmap --> master table
(I want this primary key from this table table to child table)
This is the tool we are using http://www.talend.com/