SymmetricDS pk alternative - postgresql

After reading the SymmetricDS userguide I'm not sure if SymmetricDS supports conflict resolution which is not based on PK but exclusively on my own custom columns.
Given the following scenario:
2 nodes with bi-directional update
each node has one table products which must be synchronized
Now, the table schema looks like this (simplified):
id (pk) | name (char) | reference (char)
What I would like to know is, is it possible to define the column reference as identifier for conflict resolution and insert / update operations instead of the pk column id?
Example:
Node0
id (pk) | name (char) | reference (char)
1 Foo IN001
2 FooBaz IN003
----
Node1
id (pk) | name (char) | reference (char)
1 Bar EX001
2 Foo IN001
Changes on row 2 in Node1 will trigger updates on row 1 in Node 1 while creating a new record in Node0/1 will trigger an insert in the respective node but considering that the PK might be already taken.
Furthermore I would like to filter the to be synchronized table rows by the value of column reference. Which means that only rows should by synced where reference startwith('IN') == True.
Thanks!

Look at the column 'SYNC_KEY_NAMES' on the TRIGGER table.
Specify a comma-delimited list of columns that should be used as the
key for synchronization operations. By default, if not specified, then
the primary key of the table will be used.
If you insert the value 'name' into this column, SDS will handle it as the PK.
Leaving id as a PK creates a hurdle. If this column auto-increments, you can try to exclude it in the trigger table column, 'EXCLUDED_COLUMN_NAMES'. Since this is the PK, I don't know if SDS will ignore it or not.
If that does not work you will have to write a Custom Load Filter to increment the id field on insert.

Related

Implement revision number while keeping primary key in sql

Suppose I have a psql table with a primary key and some data:
pkey | price
----------------------+-------
0075QlyLvw8bi7q6XJo7 | 20
(1 row)
However, I would like to save historical updates on it without losing the functionality that comes from referencing it's key in other tables as foreign keys.
I am thinking of doing some kind of revision_number + timestamp approach where each "update" would be a new row, example:
pkey | price | rev_no
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 20 | 0
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 15 | 1
(2 rows)
Then create a view that always takes the highest revision number of the table and reference keys from that view.
However to me this workaraound seems a bit too heavy for a task that in my opinion should be fairly common. Is there something I'm missing? Do you have a better solution or is there a well known paradigm for these types of problems which I don't know about?
Assuming PKey is actually the defined primary key you cannot do the revision scheme you outlined without creating a history table and moving old data to it. The primary key must be unique for any revision. But if you have a properly normalized table there several valid method, the following is one:
Review the other attributes and identify the candidate business keys (columns of business meaning that could be defined unique -- perhaps the item name.
If not already present add 2 columns: effective timestamp and superseded timestamp.
Now create a partial unique index on the identified column,from #1) and the superseded timestamp being a column meaning this is the currently active version.
Create a simple view as Select * from table. Since this is a simple view it is fully update-able. Use this View for Select,Insert and Delete, but
for Update create an instead of trigger. This trigger will set the superseded timestamp of the current active row and insert a new row update applied and the updated the version number.
With the above you can get you uniquely keep on the current active revision. Further you maintain the history of all relationships at each version. (See demo, including a couple useful functions)

One-way multi column UNIQUE constraint

I have a system in which I am trying to describe event-based interactions between two targets. In our system, an event (interaction) has a "source" and a "target" (basically who [did what] to whom):
-- tried to remove some of the "noise" from this for the sake of the post:
CREATE TABLE interaction_relationship (
id integer CONSTRAINT interaction_pk PRIMARY KEY,
source_id integer CONSTRAINT source_fk NOT NULL REFERENCES entity(id),
target_id integer CONSTRAINT target_fk NOT NULL REFERENCES entity(id),
-- CONSTRAINT(s)
CREATE CONSTRAINT interaction_relationship_deduplication UNIQUE(source_id, target_id)
);
Constraint interaction_relationship_deduplication is the source of my question:
In our system, a single source can interact with a single target multiple times, but that relationship can only exist once, i.e. if I am a mechanic working on a car, I may see that car multiple times in my shop, but I have only one relationship with that single car:
id
source_id
target_id
a
123abc
456def
b
123abc
789ghi
Ideally, this table also represents a unidirectional relationship. source_id is always the "owner" of the interaction, i.e. if the car 456def ran over the mechanic 123abc there would be another entry in the interaction_relationship table:
id
source_id
target_id
1
123abc
456def
2
123abc
789ghi
3
456def
123abc
So, my question: does UNIQUE on multiple columns take value order into consideration? Or would the above cause a failure?
does UNIQUE on multiple columns take value order into consideration?
Yes. The tuple (123abc, 456def) is different from the tuple (456def, 123abc), they may both exist in the table at the same time.
That said, you might want to remove the surrogate id from the relationship table, there's hardly any use for it. A relation table (as opposed to an entity table, and even there) would do totally fine with a multi-column primary key, which is naturally the combination of source and target.

Moving data between PostgreSQL databases respecting conflicting keys

Situation
I have a 2 databases which were in one time direct copies of each other but now they contain new different data.
What do I want to do
I want to move data from database "SOURCE" to database "TARGET" but the problem is that the tables use auto-incremented keys, and since both databases are used at the same time, a lot of the IDs are already taken up in TARGET so I cannot just identity insert the data coming from SOURCE.
But in theory we could just not use identity insert at all and let the database take care of assigning new IDs.
What makes it harder is that we have like 50 tables where each of them is connected by foreign keys. Clearly the foreign keys will also have to be changed else they will no longer reference the correct thing.
Let's see a very simplified example:
table Human {
id integer NOT NULL PK AutoIncremented
name varchar NOT NULL
parentId integer NULL FK -> Human.id
}
table Pet {
id integer NOT NULL PK AutoIncremented
name varchar NOT NULL
ownerId integer NOT NULL FK -> Human.id
}
SOURCE Human
Id name parentId
==========================
1 Aron null
2 Bert 1
3 Anna 2
SOURCE Pet
Id name ownerId
==========================
1 Frankie 1
2 Doggo 2
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
Let's say I want to move Aron, Bert, Anna, Frankie and Doggo to the TARGET database.
But if we directly try to insert them with not caring about original ids, the foreign keys will be garbled:
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
3 Aron null
4 Bert 1
5 Anna 2
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
2 Frankie 1
3 Doggo 2
The father of Anna is Cecil and the Owner of Doggo is Cecil also instead of Bert. The parent of Bert is Armin instead of Aron.
How I want it to look is:
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
3 Aron null
4 Bert 3
5 Anna 4
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
2 Frankie 3
3 Doggo 4
Imagine having like 50 similar tables with 1000 of lines, so we will have to automate the solution.
Questions
Is there a specific tool I can utilize?
Is there some simple SQL logic to precisely do that?
Do I need to roll my own software to do this (e.g. a service that connects to both databases, read everything in EF with including all relations, and save it to the other DB)? I fear that there are too many gotchas and it is time consuming.
Is there a specific tool? Not as far as I know.
Is there some simple SQL? Not exactly simple but not all that complex either.
Do you need to roll own? Maybe, depends on if you think you use the SQL (balow).
I would guess there is no direct path, the problem being as you note, getting the FK values reassigned. The following adds a column to all the tables, which can be used to span the across the tables. For this I would use a uuid. Then with that you can copy from one table set to the other except for the FK. After copying you can join on the uuid to complete the FKs.
-- establish a reference field unique across databases.
alter table target_human add sync_id uuid default gen_random_uuid ();
alter table target_pet add sync_id uuid default gen_random_uuid ();
alter table source_human add sync_id uuid default gen_random_uuid ();
alter table source_pet add sync_id uuid default gen_random_uuid ();
--- copy table 2 to table 1 except parent_id
insert into target_human(name,sync_id)
select name, sync_id
from source_human;
-- update parent id in table to prior parent in table 2 reasigning parent
with conv (sync_parent, sync_child, new_parent) as
( select h2p.sync_id sync_parent, h2c.sync_id sync_child, h1.id new_parent
from source_human h2c
join source_human h2p on h2c.parentid = h2p.id
join target_human h1 on h1.sync_id = h2p.sync_id
)
update target_human h1
set parentid = c.new_parent
from conv c
where h1.sync_id = c.sync_child;
-----------------------------------------------------------------------------------------------
alter table target_pet alter column ownerId drop not null;
insert into target_pet(name, sync_id)
select name, sync_id
from source_pet ;
with conv ( sync_pet,new_owner) as
( select p2.sync_id, h1.id
from source_pet p2
join source_human h2 on p2.ownerid = h2.id
join target_human h1 on h2.sync_id = h1.sync_id
)
update target_pet p1
set ownerid = c.new_owner
from conv c
where p1.sync_id = c.sync_pet;
alter table target_pet alter column ownerId set not null;
See demo. You now reverse the source and target table definitions to complete the other side of the sync. You can then drop the uuid columns if so desired. But you may want to keep them. If you have gotten them out of sync, you will do so again. You could even go a step further and make the UUID your PK/FK and then just copy the data, the keys will remain correct, but that might involve updating the apps to the revised DB structure. This does not address communication across databases, but I assume you already have that handled. You will need to repeat for each set, perhaps you can write a script to generate them. Further, I would guess there are fewer gotchas and less time consuming than rolling your own. This is basically 5 queries per table set you have, but to clean-up the current mess, 500 queries is not that much;

ERROR: cannot create a unique index without the column "date_time" (used in partitioning)

I just started using timescaleDB with postgresql. I have a database named storage_db which contains a table named day_ahead_prices.
After installing timescaledb, I was following Migrate from the same postgresql database to migrate my storage_db into a timescaledb.
When I did (indexes included):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It gave me the following error:
ERROR: cannot create a unique index without the column "date_time" (used in partitioning)
But when I did (indexed excluded):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It was successful. Following which, I did
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
and it gave me the following output:
create_hypertable
------------------------------------
(3,public,tsdb_day_ahead_prices,t)
(1 row)
I am a bit new to this so can anyone please explain to me what is the difference between both of them and why was I getting an error in the first case?
P.S.:
My day_ahead_prices looks as follows:
id | country_code | values | date_time
----+--------------+---------+----------------------------
1 | LU | 100.503 | 2020-04-11 14:04:30.461605
2 | LU | 100.503 | 2020-04-11 14:18:39.600574
3 | DE | 106.68 | 2020-04-11 15:59:10.223965
Edit 1:
I created the day_ahead_prices table in python using flask and flask_sqlalchemy and the code is:
class day_ahead_prices(db.Model):
__tablename__ = "day_ahead_prices"
id = db.Column(db.Integer, primary_key=True)
country_code = db.Column(avail_cc_enum, nullable=False)
values = db.Column(db.Float(precision=2), nullable=False)
date_time = db.Column(db.DateTime, default=datetime.now(tz=tz), nullable=False)
def __init__(self, country_code, values):
self.country_code = country_code
self.values = values
When executing CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); you're telling the database to create the tsdb_day_ahead_prices table using the day_ahead_prices as a template (same columns, same types for those columns), but you're also telling it to include the default values, constraints and indexes that you have defined on the original table, and apply/create the same for your new table.
Then you are executing the timescaledb command that makes the tsdb_day_ahead_prices table
a hypertable. A hypertable is an abstraction that hides away the partitioning of the physical
table. (https://www.timescale.com/products/how-it-works). You are telling
TimescaleDB to make the tsdb_day_ahead_prices a hypertable using the date_time column as a partitioning key.
When creating hypertables, one constraing that TimescaleDB imposes is that the partitioning column (in your case 'date_time') must be included in any unique indexes (and Primary Keys) for that table. (https://docs.timescale.com/latest/using-timescaledb/schema-management#indexing-best-practices)
The first error you get cannot create a unique index without the column "date_time" is exactly because of this. You copied the primary key definition on the id column. So the primary key is preventing
the table to be a hypertable.
The second time, you created the tsdb_day_ahead_prices table but you didn't copy
the indexes from the original table, so the primary key is not defined (which is really a unique index). So the creation of the hypertable was successfull.
The output you get from the create_hypertable function tells you that you have a new hypertable, in the public schema, the name of the hypertable, and the internal id that timescaledb uses for it.
So now you can use the tsdb_day_ahead_prices as normal, and timescaledb underneath will make sure the data goes into the proper partitions/chunks
Does the id need to be unique for this table?
If you're going to be keeping time-series data
then each row may not really be unique for each id, but may be uniquely identified by the id at a given time.
You can create a separate table for the items that you're identifying
items(id PRIMARY KEY, country_code) and have the hypertable be
day_ahead_prices(time, value, item_id REFERENCES items(id))

Postgres check if related table has an entry if field is True

I have a table with a boolean column automated. When that field is set to TRUE, then another table needs to have an entry referring to that row.
Table A
id | automated
---------------------
1 | False
2 | True
3 | False
Table B
id | FK-TableA | Value
-------------------------------
2 | 2 | X
So whenever a new entry gets inserted into Table A where automated is set to TRUE, then there also has to be a row inserted (or present) in Table B with a referece to it.
It seems an unnatural flow to me, with the constraint you are stating the natural flow should be creating a TRIGGER on Table B that inserts a record on Table A whenever a new Table B record is inserted.
But I understand this is a simplification of a more elaborated problem so if you really need to create this kind of procedure, there is still a question to be answered, what happens when check is negative, should there be an exception? should the record be inserted with FALSE instead of TRUE, should the record need to be ignored? there are two options from my point of view:
Create a TRIGGER before INSERT on table A that updates the table accordingly (Create a PROCEDURE that checks if this exists and a TRIGGER that executes this procedure)
Create a RULE on insert on your table A that checks that record exists on Table B and changes record or does instead nothing.
With a little more background I can help you with Trigger/Rule.
Anyway, take into account this can be performance-wise a real mistake if this table gets lots of INSERTs and you should go for some offline(as not being done on the live INSERT) procedure instead of doing on live INSERT
It is ugly and introduces a redundancy into the database, but I cannot think of a better way than this:
Introduce a new column b_id to a.
Add a UNIQUE constraint on ("FK-TableA", id) to b.
Add a foreign key on a so that (id, b_id) REFERENCES b("FK-TableA", id).
Add a CHECK (b_id IS NOT NULL OR NOT automated) constraint on a.
Then you have to point b_id to one of the rows in b that points back to this a row.
To make it perfect, you'd have to add triggers that guarantee that after each modification, the two foreign keys are still consistent.
I told you it was ugly!