Slony "duplicate key value violates unique constraint" error - postgresql

I have a problem which goes on for longer time. I use slony to replicate database from master to slave and from that slave to three other backup servers. Once a 2-3 weeks there is a key duplication problem that happens only on one specific table (big but not biggest in database).
It started to occur like year ago on Postgres 8.4 and slony 1 and we switched to 2.0.1. Later we upgraded it to 2.0.4, and we succesfuly upgraded slony to 2.1.3 and it's our current version. We started fresh replication on same computers and it was all going well until today. We got the same duplication key error on same table (with different keys every time of course).
Solution to clean it up is just to delete invalid key on slaves (it spreads across all nodes) and it's all working again. Data is not corrupted. But source of problem remains unsolved.
In googles I found nothing related to this problem (we did not used truncate on any table, we did not change the structure of table).
Any ideas what can be done about it?

When this problem occured in our setup, it turned out that the schema of the master database was older than the slaves' and didn't have the UNIQUE constraint for this particular column. So, my advice would be:
make sure the master table has in fact the constraint
if not:
clean the table
add the constraint
else:
revoke write privileges from all clients except slony for the replicated tables.

As Craig has said usually this is a write transaction to a replica. So the first things to do is to verify permissions. If this keeps happening, what you can do is start logging connections of the readers of the replicas and keep them around so when the issue happens, you can track down where the bad tuple came from. This can generate a LOT of logs however so you probably want to see to what extent you can narrow this down first. You presumably know which replica this is starting on, so you can go from there.
A particular area of concern I would spot would be what happens if you have a user defined function which writes. A casual observer might not spot that in the query, nor might a connection pooler.

Related

Postgresql logical replication, duplicate key errors on subscriber

I am still quite new to postgres logical replication and am having trouble when replicating a large data set.
In our development setup there is one publisher with 5 subscribers, all tables in one schema are being replicated. All servers are running pg 13.1, the subscribers are basically all clones of the same system.
Once a month or so we have to clear down most of the tables being replicated and re-populate them from a legacy system, a process that starts with deleting a chunk of data from the table (as defined by part of the key) and then copying that chunk of data across for each table. The size of the data is around 90GB all told.
Every time we do this one or more of the subscribers will get stuck (not always the same ones), looking at the logs on the publisher it shows for the stuck subscriber(s) "could not send data to client: Connection reset by peer".
Looking at the logs on the subscriber(s) it shows duplicate key errors "ERROR: duplicate key value violates unique constraint" but, from the key it shows, it will be a different row on each server (though often the same table).
Deleting the offending row on the subscriber simply makes it then fall over at the next row (so it's obviously more than just one row).
This makes no sense to me, nothing else is writing to the tables on these subscribers and I can't really picture a situation where replication would be trying to write the same data twice.
So far the only solution I have is to drop the bad subscriber(s) and restart replication on them.
Does anyone have any advice or ideas as to why this happens or how to fix it?

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

Stop BDR from replicating DROP TABLE or CREATE TABLE

I have two databases with tables that I want to sync. I don't want to sync any other table. I'm using Postgres-BDR to do that.
Those tables are part of replication set common. There are some circumstances where other tables share a name across nodes (but are NOT in common), and a node will call DROP TABLE and then CREATE TABLE. Even though those tables aren't part of the common replication set, these commands are still replicated to the other nodes, causing the other node to lose all of its data in its table and then create an empty table.
How can I stop this? I only want commands that affect common to be replicated to the other nodes.
Nevermind, I found it. It's available with bdr.skip_ddl_replication.
I just put bdr.skip_ddl_replication = on in postgresql.conf, restarted the server, and BOOM! Works like a charm.
EDIT
It would be prudent of me to point out that the documentation warns that this option could break database replication if used improperly. But since I'll be VERY tightly controlling the table schema, it shouldn't cause any problems.

How to rollback an update in PostgreSQL

While editing some records in my PostgreSQL database using sql in the terminal (in ubuntu lucid), I made a wrong update.
Instead of -
update mytable set start_time='13:06:00' where id=123;
I typed -
update mytable set start_time='13:06:00';
So, all records are now having the same start_time value.
Is there a way to undo this change? There are some 500+ records in the table, and I do not know what the start_time value for each record was
Is it lost forever?
I'm assuming it was a transaction that's already committed? If so, that's what "commit" means, you can't go back.
Some data may be recoverable if you're lucky. Stop the database NOW.
Here's an answer I wrote on the same topic earlier. I hope it's helpful.
This might be too: Recoved deleted rows in postgresql .
Unless the data is absolutely critical, just restore from backups, it'll be lots easier and less painful. If you didn't have backups, consider yourself soundly thwacked.
If you catch the mistake and immediately bring down any applications using the database and take it offline, you can potentially use Point-in-Time Recovery (PITR) to replay your Write Ahead Log (WAL) files up to, but not including, the moment when the errant transaction was made. This would return the database to the state it was in prior, thus effectively 'undoing' that transaction.
As an approach for a production application database it has a number of obvious limitations, but there are circumstances in which PITR may be the best option available, especially when critical data loss has occurred. However, it is of no value if archiving was not already configured before the corruption event.
https://www.postgresql.org/docs/current/static/continuous-archiving.html
Similar capabilities exist with other relational database engines.

How to prevent Write Ahead Logging on just one table in PostgreSQL?

I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.