Does anyone heard or experienced about following phenomenon ?
Using postgresql 9.0.5 on Windows
= table structure =
[parent] - [child] - [grandchild]
I found out a record remained strangely on the [child] table.
This record exists violating the restriction of foreign key.
these tables store transaction data of my application
all the above tables have numeric PRIMARY KEY
all these tables have FOREIGN KEY restriction (between parent and child, grandchild)
my application updates each record status along with the transaction progress
my app copies this record to archive tables (same structure, same restrictions)
once the all status changed to "normal_end".
then, delete these records when it finished copy them to the archive tables.
the status of remained record on the [child] table was NOT "normal_end" but "processing".
but the status of copied data (same ID) in archive table was "normal_end".
no error reported at pg_log
I felt it very strange...
I suspect that the deleted data might came back to active !?
Can deleted data be active unexpected?
There should never be data that violates a foreign key constraint (except during a transaction with deferred constraints).
A deleted row should stay deleted once the transaction is committed. That's one of the requirements of ACID. However the correct working of PostgreSQL relies on the correct functioning of your os and hardware. When postgresql fsyncs a file it should really be written to disk or a non volatile cache. Unfortunatly it sometimes happens that disks or controllers tell the system the write has finished while it hasn't and is still in a volatile cache. If you have a raid controller with RAM but no battery make sure the controllers cache is set to write-through.
Personally I have seen PostgreSQL have incorrect data once, it had a duplicate row (same primary key) this was after a crash on a windows xp machine (this was most likely a 9.0.x). Windows XP machines are not very reliable running postgresql. They often give strange network errors.
Related
I am still quite new to postgres logical replication and am having trouble when replicating a large data set.
In our development setup there is one publisher with 5 subscribers, all tables in one schema are being replicated. All servers are running pg 13.1, the subscribers are basically all clones of the same system.
Once a month or so we have to clear down most of the tables being replicated and re-populate them from a legacy system, a process that starts with deleting a chunk of data from the table (as defined by part of the key) and then copying that chunk of data across for each table. The size of the data is around 90GB all told.
Every time we do this one or more of the subscribers will get stuck (not always the same ones), looking at the logs on the publisher it shows for the stuck subscriber(s) "could not send data to client: Connection reset by peer".
Looking at the logs on the subscriber(s) it shows duplicate key errors "ERROR: duplicate key value violates unique constraint" but, from the key it shows, it will be a different row on each server (though often the same table).
Deleting the offending row on the subscriber simply makes it then fall over at the next row (so it's obviously more than just one row).
This makes no sense to me, nothing else is writing to the tables on these subscribers and I can't really picture a situation where replication would be trying to write the same data twice.
So far the only solution I have is to drop the bad subscriber(s) and restart replication on them.
Does anyone have any advice or ideas as to why this happens or how to fix it?
I'm performing schema changes on a large database, correcting ancient design mistakes (expanding primary keys and their corresponding foreign keys from INTEGER to BIGINT). The basic process is:
Shutdown our application.
Drop DB triggers and constraints.
Perform the changes (ALTER TABLE foo ALTER COLUMN bar TYPE BIGINT for each table and primary/foreign key).
Recreate the triggers and constraints (NOT VALID).
Restart the application.
Validate the constraints (ALTER TABLE foo VALIDATE CONSTRAINT bar for each constraint).
Note:
Our Postgres DB (version 11.7) and our application are hosted on Heroku.
Some of our tables are quite large (millions of rows, the largest being ~1.2B rows).
The problem is in the final validation step. When conditions are just "right", a single ALTER TABLE foo VALIDATE CONSTRAINT bar can create database writes at a pace that exceeds the WAL's write capacity. This leads to varying degrees of unhappiness up to crashing the DB server. (My understanding is that Heroku uses a bespoke WAL plug-in to implement their "continuous backups" and "db follower" features. I've tried contacting Heroku support on this issue -- their response was less than helpful, even though we're on an enterprise-level support contract).
My question: Is there any downside to leaving these constraints in the NOT VALID state?
Related: Does anyone know why validating a constraint generates so much write activity?
There are downsides to leaving a constraint as not valid. Firstly, you may have data that doesn't meet the constraint requirements, meaning you have data that shouldn't be in your table. But also the query planner won't be able to use the constraint predicate to rule out rows that meet or don't meet the constraint requirment.
As for all the WAL activity, I could only imagine that's because it has to set the flag for those rows to mark them as valid. This should produce a relatively small amount of WAL to be generated relative to actual row updates, but I guess if you have enough rows being validated, it will generate a lot of WAL. That shouldn't usually cause a crash unless storage becomes full.
I need some advice about the following scenario.
I have multiple embedded systems supporting PostgreSQL database running at different places and we have a server running on CentOS at our premises.
Each system is running at remote location and has multiple tables inside its database. These tables have the same names as the server's table names, but each system has different table name than the other systems, e.g.:
system 1 has tables:
sys1_table1
sys1_table2
system 2 has tables
sys2_table1
sys2_table2
I want to update the tables sys1_table1, sys1_table2, sys2_table1 and sys2_table2 on the server on every insert done on system 1 and system 2.
One solution is to write a trigger on each table, which will run on every insert of both systems' tables and insert the same data on the server's tables. This trigger will also delete the records in the systems after inserting the data into server. The problem with this solution is that if the connection with the server is not established due to network issue than that trigger will not execute or the insert will be wasted. I have checked the following solution for this
Trigger to insert rows in remote database after deletion
The second solution is to replicate tables from system 1 and system 2 to the server's tables. The problem with replication will be that if we delete data from the systems, it'll also delete the records on the server. I could add the alternative trigger on the server's tables which will update on the duplicate table, hence the replicated table can get empty and it'll not effect the data, but it'll make a long tables list if we have more than 200 systems.
The third solution is to write a foreign table using postgres_fdw or dblink and update the data inside the server's tables, but will this effect the data inside the server when we delete the data inside the system's table, right? And what will happen if there is no connectivity with the server?
The forth solution is to write an application in python inside each system which will make a connection to server's database and write the data in real time and if there is no connectivity to the server than it will store the data inside the sys1.table1 or sys2.table2 or whatever the table the data belongs and after the re-connect, the code will send the tables data into server's tables.
Which option will be best according to this scenario? I like the trigger solution best, but is there any way to avoid the data loss in case of dis-connectivity from the server?
I'd go with the fourth solution, or perhaps with the third, as long as it is triggered from outside the database. That way you can easily survive connection loss.
The first solution with triggers has the problems you already detected. It is also a bad idea to start potentially long operations, like data replication across a network of uncertain quality, inside a database transaction. Long transactions mean long locks and inefficient autovacuum.
The second solution may actually also be an option if you you have a recent PostgreSQL versions that supports logical replication. You can use a publication WITH (publish = 'insert,update'), so that DELETE and TRUNCATE are not replicated. Replication can deal well with lost connectivity (for a while), but it is not an option if you want the data at the source to be deleted after they have been replicated.
As we know details of every job are stored in rdbms in table Hsp_Job_Status. But unfortunately this table gets truncated each time we re-start services. As per business requirement we needed to keep a record of BR's launched by user and it's details. So we had developed a work around and created a trigger on the table such that it inserted each new row/update in a backup table. This was working fine uptill now.
Recently after re-start the the values of old Job_id (i.e primary key), are not appearing in order. It started series form a previous number. It was going in series of 106XX but after re-start the numbering started from 100XX. As Hsp_job_status was truncated during restart, there was no issue of duplicate primary key in that table. But it created duplicate values in backup table. And this has created issues with backup table and procedure that we use.
Usually the series is continuous one even after table truncate. So may be some thing has gone wrong during restart. Can you please suggest me as to what should i check and do to resolve this issue.
Thanks in advance.
Partial answer: the simple solution is to insert an instance prefix to the Job_Id, and on service startup increment the active instance. The instance table can then include details from startup/shutdown events to help drive SLA metrics. Unfortunately, I don't know how you would go about implementing such a scheme, since it's been many years since I've spoken any SQL dialects.
I have a problem which goes on for longer time. I use slony to replicate database from master to slave and from that slave to three other backup servers. Once a 2-3 weeks there is a key duplication problem that happens only on one specific table (big but not biggest in database).
It started to occur like year ago on Postgres 8.4 and slony 1 and we switched to 2.0.1. Later we upgraded it to 2.0.4, and we succesfuly upgraded slony to 2.1.3 and it's our current version. We started fresh replication on same computers and it was all going well until today. We got the same duplication key error on same table (with different keys every time of course).
Solution to clean it up is just to delete invalid key on slaves (it spreads across all nodes) and it's all working again. Data is not corrupted. But source of problem remains unsolved.
In googles I found nothing related to this problem (we did not used truncate on any table, we did not change the structure of table).
Any ideas what can be done about it?
When this problem occured in our setup, it turned out that the schema of the master database was older than the slaves' and didn't have the UNIQUE constraint for this particular column. So, my advice would be:
make sure the master table has in fact the constraint
if not:
clean the table
add the constraint
else:
revoke write privileges from all clients except slony for the replicated tables.
As Craig has said usually this is a write transaction to a replica. So the first things to do is to verify permissions. If this keeps happening, what you can do is start logging connections of the readers of the replicas and keep them around so when the issue happens, you can track down where the bad tuple came from. This can generate a LOT of logs however so you probably want to see to what extent you can narrow this down first. You presumably know which replica this is starting on, so you can go from there.
A particular area of concern I would spot would be what happens if you have a user defined function which writes. A casual observer might not spot that in the query, nor might a connection pooler.