Debezium does not capture 'cascade delete' - debezium

Is there any way to capture 'on delete cascade' events?
I have read in the Debezium faq (at the bottom of the page):
== Why don't I see DELETE events in some cases?
This may be caused by the usage of CASCADE DELETE statements. In
this case the deletion events generated by the database
https://dev.mysql.com/doc/refman/5.7/en/innodb-and-mysql-replication.html [are
not part of the binlog] and thus cannot be captured by Debezium.
I appreciate a similar behaviour with 'ON UPDATE CASCADE'
If that events are not part of the binlog I understand that Debezium cannot capture it but, Is there any easy alternative way to do it? In a database this is very important.

The only solution I can think about is to replace cascade operations with triggers that will execute the delete/update as needed.

Related

PostgreSQL logical replication - ignore pre-existing data

Imagine dropping a subscription and recreating it from scratch. Is it possible to ignore existing data during the first synchronization?
Creating a subscription with (copy_data=false) is not an option because I do want to copy data, I just don't want to copy already existing data.
Example: There is a users table and a corresponding publication on the master. This table has 1 million rows and every minute a new row is added. Then we drop the subscription for a day.
If we recreate the subscription with (copy_data=true), replication will not start due to a conflict with already existing data. If we specify (copy_data=false), 1440 new rows will be missing. How can we synchronize the publisher and the subscriber properly?
You cannot do that, because PostgreSQL has no way of telling when the data were added.
You'd have to reconcile the tables by hand (or INSERT ... ON CONFLICT DO NOTHING).
Unfortunately PostgreSQL does not support nice skip options for conflicts yet, but I believe it will be enhanced in the feature.
Based on #Laurenz Albe answer which recommends the use of the statement:
INSERT ... ON CONFLICT DO NOTHING.
I believe that it would be better to use the following command which also will take care any possible updates on your data before you start the subscription again:
INSERT ... ON CONFLICT UPDATE SET...
Finally I have to say that both are dirty solutions as during the execution of the above statement and the creation of the subscription, new lines may have been arrived which will result in losing them until you perform again the custom sync.
I have seen some other suggested solutions using the LSN number from the Postgresql log file...
For me maybe is elegant and safe to delete all the data from the destination table and create the replication again!

Postgresql replication without DELETE statement

We have a requirement that says we should have a copy of all the items that were in our system at one point. The most simple way to explain it would be replication but ignoring the delete statement (INSERT and UPDATE are ok)
Is this possible ? or maybe the better question would be what is the best approach to tackle this kind of problem?
Make a copy/replica of current database and use triggers via dblink from current database to the replica. Use after insert and after update trigger to insert and update data in replica.
So whenever a row insertion/updation take place in current database it will directly reflect to replica.
I'm not sure that I understand the question completely, but I'll try to help:
First (opposite to #Sunit) - I suggest avoiding triggers. Triggers are introducing additional overhead and impacting performance.
The solution I would use (and I'm actually using in few of my projects with similar demands) - don't use DELETE at all. Instead you can add bit (boolean) column called "Deleted", set its default value to 0 (false), and instead of deleting the row you update this field to 1 (true). You'll also need to change your other queries (SELECT) to include something like "WHERE Deleted = 0".
Another option is to continue using DELETE as usual, and to allow deleting records from both primary and replica, but configure WAL archiving, and store WAL archives in some shared directory. This will allow you to moment-in-time recovery, meaning that you'll be able to restore another PostgreSQL instance to state of your cluster in any moment in time (i.e. before the deletion). This way you'll have a trace of deleted records, but pretty complicated procedure to reach the records. Depending on how often deleted records will be checked in the future (maybe they are not checked at all, but simply kept for just-in-case tracking) this approach my also help.

CREATE SCHEMA IF NOT EXISTS raises duplicate key error

To give some context, the command is issued inside a task, and many task might issue the same command from multiple workers at the same time.
Each tasks tries to create a postgres schema. I often get the following error:
IntegrityError: (IntegrityError) duplicate key value violates unique constraint "pg_namespace_nspname_index"
DETAIL: Key (nspname)=(9621584361) already exists.
'CREATE SCHEMA IF NOT EXISTS "9621584361"'
Postgres version is PostgreSQL 9.4rc1.
Is it a bug in Postgres?
This is a bit of a wart in the implementation of IF NOT EXISTS for tables and schemas. Basically, they're an upsert attempt, and PostgreSQL doesn't handle the race conditions cleanly. It's safe, but ugly.
If the schema is being concurrently created in another session but isn't yet committed, then it both exists and does not exist, depending on who you are and how you look. It's not possible for other transactions to "see" the new schema in the system catalogs because it's uncommitted, so it's entry in pg_namespace is not visible to other transactions. So CREATE SCHEMA / CREATE TABLE tries to create it because, as far as it's concerned, the object doesn't exist.
However, that inserts a row into a table with a unique constraint. Unique constraints must be able to see uncommitted rows in order to function. So the insert blocks (stops) until the first transaction that did the CREATE either commits or rolls back. If it commits, the second transaction aborts, because it tried to insert a row that violates a unique constraint. CREATE SCHEMA isn't smart enough to catch this case and re-try.
To properly fix this PostgreSQL would probably need predicate locking, where it could lock the potential for a row. This might get added as part of the current work going on for implementing UPSERT.
For these particular commands, PostgreSQL could probably do a dirty read of the system catalogs, where it can see uncommitted changes. Then it could wait for the uncommitted transaction to commit or roll back, re-do the dirty read to see if someone else is waiting, and retry. But this would have a race condition where someone else might create the schema between when you do the read to check for it and when you try to create it.
So the IF NOT EXISTS variants would have to:
Check to see if the schema exists; if it does, finish without doing anything.
Attempt to create the table
If creation fails due to a unique constraint error, retry at the start
If table creation succeeds, finish
As far as I know nobody's implemented that, or they tried and it wasn't accepted. There would be possible issues with transaction ID burn rate, etc, with this approach.
I think this is a bug of sorts, but it's a "yeah, we know" kind of bug, not a "we'll get right on fixing that" kind of bug. Feel free to post to pgsql-bugs about it; at the very least the documentation should mention this caveat about IF NOT EXISTS.
I don't recommend doing DDL concurrently like that.
I needed to work around this limitation in an application where schemas are created concurrently. What worked for me was adding
LOCK TABLE pg_catalog.pg_namespace
in the transaction including CREATE SCHEMA. Looks like a dirty and unsafe thing to do, but helped me to solve the problem which occurred only in tests anyway.

How to wait during SELECT that pending INSERT commit?

I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.

What is supported as transactional in postgres

I am trying find out what is postgres can handle safely inside of transaction, but I cannot find the relavant information in the postgres manual. So far I have found out the following:
UPDATE, INSERT and DELTE are fully supported inside transactions and rolled back when the transaction is not finished
DROP TABLE is not handled safely inside a transaction, and is undone with a CREATE TABLE, thus recreates the dropped table but does not repopulate it
CREATE TABLE is also not truly transactionized and is instead undone with a corresponding DROP TABLE
Is this correct? Also I could not find any hints as to the handling of ALTER TABLE and TRUNCATE. In what way are those handled and are they safe inside transactions? Is there a difference of the handling between different types of transactions and different versions of postgres?
DROP TABLE is transactional. To undo this, you need to issue a ROLLBACK not a CREATE TABLE. The same goes for CREATE TABLE (which is also undone using ROLLBACK).
ROLLBACK is always the only correct way to undo a transaction - that includes ALTER TABLE and TRUNCATE.
The only thing that is never transactional in Postgres are the numbers generated by a sequence (CREATE/ALTER/DROP SEQUENCE themselves are transactional though).
Best I'm aware all of these commands are transaction aware, except for TRUNCATE ... RESTART IDENTITY (and even that one is transactional since 9.1.)
See the manual on concurrency control and transaction-related commands.