Implications of leaving a constraint NOT VALID - postgresql

I'm performing schema changes on a large database, correcting ancient design mistakes (expanding primary keys and their corresponding foreign keys from INTEGER to BIGINT). The basic process is:
Shutdown our application.
Drop DB triggers and constraints.
Perform the changes (ALTER TABLE foo ALTER COLUMN bar TYPE BIGINT for each table and primary/foreign key).
Recreate the triggers and constraints (NOT VALID).
Restart the application.
Validate the constraints (ALTER TABLE foo VALIDATE CONSTRAINT bar for each constraint).
Note:
Our Postgres DB (version 11.7) and our application are hosted on Heroku.
Some of our tables are quite large (millions of rows, the largest being ~1.2B rows).
The problem is in the final validation step. When conditions are just "right", a single ALTER TABLE foo VALIDATE CONSTRAINT bar can create database writes at a pace that exceeds the WAL's write capacity. This leads to varying degrees of unhappiness up to crashing the DB server. (My understanding is that Heroku uses a bespoke WAL plug-in to implement their "continuous backups" and "db follower" features. I've tried contacting Heroku support on this issue -- their response was less than helpful, even though we're on an enterprise-level support contract).
My question: Is there any downside to leaving these constraints in the NOT VALID state?
Related: Does anyone know why validating a constraint generates so much write activity?

There are downsides to leaving a constraint as not valid. Firstly, you may have data that doesn't meet the constraint requirements, meaning you have data that shouldn't be in your table. But also the query planner won't be able to use the constraint predicate to rule out rows that meet or don't meet the constraint requirment.
As for all the WAL activity, I could only imagine that's because it has to set the flag for those rows to mark them as valid. This should produce a relatively small amount of WAL to be generated relative to actual row updates, but I guess if you have enough rows being validated, it will generate a lot of WAL. That shouldn't usually cause a crash unless storage becomes full.

Related

When we should use NOT ENFORCED with foreign key constraint in DB2?

The IBM DB2 documentation says:
To improve the performance of queries, you can add informational
constraints to your tables.
And there is this NOT ENFORCED option we can provide:
ALTER TABLE <name> <constraint attributes> NOT ENFORCED
The explanation is fairly simple:
NOT ENFORCED should only be specified if the table data is
independently known to conform to the constraint. Query results might
be unpredictable if the data does not actually conform to the
constraint.
From what I understood - if I have let's say a foreign key, in a table, declared as NOT ENFORCED that's absolutely the same as not having it at all.
But then what are the real use cases for it and when this option should be used?
(what is the difference between having NOT ENFORCED constraint vs not having it at all)
The so-called Information Constraints can be used to improve performance. This is done by adding insights to the database. Without the informational constraint Db2 would not know about the relationship between the two tables and related columns. Now, the SQL query compiler and optimizer can exploit the fact and optimize query execution.
As a consequence, the informational constraint should only be applied when indeed the data is constraint in the specified ways. Db2 does not enforce it, the user (you) is guaranteeing that data property. Hence, when it is not true, query results could be wrong because Db2 assume that the relationships are present.

Enforcing Foreign Key Constraint Over Table From pg_dump With --exclude-table-data

I'm currently working on dumping one of our customer's database in a way that allows us to create new databases from this customer's basic structure, but without bringing along their private data.
So far, I've had success with pg_dump combined with the --exclude_table and exclude-table-data commands, which allowed me to bring only the data I'll effectively need for this task.
However, there are a few tables that mix lines which references some of the data I left behind with other lines that references data that I had to bring, and this is causing me a few issues during the restore operation. Specifically, when the dump tries to enforce FOREIGN KEY constraints for certain columns on these tables, it fails because there are some lines with keys that have no matching data on the respective foreign table - because I chose to not bring this table's data!
I know I can log into the database after the dump is complete, delete any rows that reference data that no longer exists and create the constraint myself, but I'd like to automate the process as much as possible. Is there a way to tell pg_dump or pg_restore (or any other program) to not bring rows from table A if they reference table B if and table B's data was excluded from the backup? Or to tell Postgres that I'd like to have that specific foreign key to be active before importing the table's data?
For reference, I'm working with PostgreSQL 9.2 on a HREL 7 server.
What if you disable foreign key checking when you restore your database dump? And after that remove lonely rows from the referring table.
By the way, I recommend you to fix you database schema so there is no chance wrong tuples being inserted into your database.

Size of Foreign key constraint

My googling powers were not strong enough for this one. This is all theoretical question.
Let's say I have a huge database with hundreds of tables and each table has a user column which references user table.
Now if I would change the user column to have a foreign key constraint, would the increase in database size be noticeable?
If by "change the user column to have a foreign key constraint" you mean something like:
alter table some_table
add constraint fk_some_table_users
foreign key (user_id)
references users (id);
Then the answer is: no, this will in no way change the size of your database (except some additional rows in the system catalogs to store the definition of your constraint).
The constraints will improved the reliability of your data and in some cases even might help the optimizer to remove unnecessary joins or take other shortcuts based on the constraint information. There is however a small performance overhead when insert or deleting rows because the constraint needs to be verified. But this small overhead does not outweigh the advantages that you gain from having consistent data in your database.
I have never seen an application which claimed to be able to "have that under control" that didn't need data cleaning after having been in production for some time. So best leave this kind of check to the database.

Restore PostgreSQL dump with new primary key values

I've got a problem with a PostgreSQL dump / restore. We have a production appliaction running with PostgresSQL 8.4. I need to create some values in the database in the testing environment and then import just this chunk of data into the production environment. The data is generated by the application and I need to use this approach because it needs testing before going into production.
Now that I described the environment, here is my problem:
In the testing database, I leave nothing but the data I need to move to the production database. The data is spread across multiple tables linked with foreign keys with multiple levels (like a tree). I then use pg_dump to export the desired tables into binary format.
When I try to import, the database will correctly import the root table entries with new primary key values, but does not import any of the data from the other tables. I believe that the problem is that foreign keys on child tables no longer recognizes the new primary keys.
Is there a way to achieve such an import which will update all the primary key values of all affected tables in the tree to correct serial (auto increment) values automatically and also update all foreign keys according to these new primary key values?
I have and idea how to do this with assistance of programming language while connected to both databases, but that would be very problematic to achieve for me since I don't have direct access to customers production server.
Thanks in advance!
That one seems to me like a complex migration issue. You can create PL/pgSQL migration scripts with inserts and use returning to get serials and use as foreign keys for other tables up the tree. I do not know the structure of your tree but in some cases reading sequence values in advance into arrays may be required due to complexity or performance reasons.
Other approach can be to examine production sequence values and estimate sequence values that will not be used in the near future. Fabricate test data in the test environment to have serial values that will not collide with production sequence values. Then load that data into the prod database and adjust sequence values of the prod environment so that test sequence values will not be used. It will leave a gap in your ID sequence so you must examine whether anything (like other processes) rely on the sequence values to be continuos.

How to trigger creation/update of another row of record if one row is created/updated in postgresql

I am receiving a record csv for outside, then when I create or update the entry into the postgresql, I need to create an mirror entry that only have sign differences. This is could be done at program level, I am curious to know would it possible using triggers.
For the examples I can find, they all end with code,
FOR EACH ROW EXECUTE PROCEDURE foo()
And usually deal with checks, add addtional info using NEW.additionalfield, or insert into another table. If I use trigger this way to insert another row in the same table, it seems the trigger will triggered again and the creation become recursive.
Any way to work this out?
When dealing with triggers, the rules of thumb are:
If it changes the current row, based on some business rules or other (e.g. adding extra info or processing calculated fields), it belongs in a BEFORE trigger.
If it has side effects on one or more rows in separate tables, it belongs in an AFTER trigger.
If it runs integrity checks on any table that no other built-in constraints (checks, unique keys, foreign keys, exclude, etc.) can take care of, it belongs in a CONSTRAINT [after] trigger.
If it has side effects on one or more other rows within the same table, you should probably revisit your schema, your code flow, or both.
Regarding that last point, there actually are workarounds in Postgres, such as trying to get a lock or checking xmin vs the transaction's xid, to avoid getting bogged down in recursive scenarios. A recent version additionally introduced pg_trigger_depth(). But I'd still advise against it.
Note that a constraint trigger can be created as deferrable initially deferred. This will delay the constraint trigger until the very end of the transaction, rather than immediately after the statement.
Your question and nickname hint that you're wondering how to automatically balance a set of lines in a double-entry book-keeping application. Assuming so, do NOT create the balancing entry automatically. Instead, begin a transaction, enter each line separately, and have a (for each row, deferrable initially deferred) constraint trigger pick things up from there and reject the entire batch if anything is unbalanced. Proceeding that way will spare you a mountain of headaches when you want to balance more than two or three lines with each other.
Another reading might be that you want to create an audit trail. If so, create additional audit tables and use after triggers to populate them. There are multiple ways to create and manage these audit tables. Look into slowly changing dimensions. (Fwiw, type 6 with a start_end column of type tsrange or tstzrange works well for the audit tables if you're interested in a table's full history including its history of relationships with other audit tables.) Use the "live" tables for your application to keep things fast, and use the audit-tables when you need historical reporting.