Can Postgres silently ignore column constraint conflicts? - postgresql

I have a Postgres 9.6 table with certain columns that must be unique. If I try to insert a duplicate row, I want Postgres to simply ignore the insert and continue, instead of failing or aborting. If the insert is wrapped in a transaction, it shouldn't abort the transaction or affect other updates in the transaction.
I assume there's a way to create the table as described above, but I haven't figured it out yet.
Bonus points if you can show me how to do it in Rails.

This is possible with the ON CONFLICT clause for INSERT:
The optional ON CONFLICT clause specifies an alternative action to
raising a unique violation or exclusion constraint violation error.
For each individual row proposed for insertion, either the insertion
proceeds, or, if an arbiter constraint or index specified by
conflict_target is violated, the alternative conflict_action is taken.
ON CONFLICT DO NOTHING simply avoids inserting a row as its
alternative action.
This is a relatively new feature and only available since Postgres 9.5, but that isn't an issue for you.
This is not something you specific at table creation, you'll need to modify each insert. I don't know how this works with Rails, but I guess you'll have to manually write at least part of the queries to do this.
This feature is also often called UPSERT, which is probably a better term to search for if you want to look for an integrated way in Rails to do this.

Related

Is it possible to access current column data on conflict

I want to get such behaviour on inserting data (conflict on id):
if there is no model with same id in db do INSERT
if there is entry with same id in db and that entry is newer (updated_at field) do NOT UPDATE
if there is entry with same id in db and that entry is older (updated_at field) do UPDATE
I'm using Ecto for that and want to work on constraints, however I cannot find an option to do so in documentation. Pseudo code of constraint could look like:
CHECK: NULL(current.updated_at) or incoming.updated_at > current.updated_at
Is such behaviour possible in Postgres?
PostgreSQL does not support CHECK constraints that reference table
data other than the new or updated row being checked. While a CHECK
constraint that violates this rule may appear to work in simple tests,
it cannot guarantee that the database will not reach a state in which
the constraint condition is false (due to subsequent changes of the
other row(s) involved). This would cause a database dump and reload to
fail. The reload could fail even when the complete database state is
consistent with the constraint, due to rows not being loaded in an
order that will satisfy the constraint. If possible, use UNIQUE,
EXCLUDE, or FOREIGN KEY constraints to express cross-row and
cross-table restrictions.
If what you desire is a one-time check against other rows at row
insertion, rather than a continuously-maintained consistency
guarantee, a custom trigger can be used to implement that. (This
approach avoids the dump/reload problem because pg_dump does not
reinstall triggers until after reloading data, so that the check will
not be enforced during a dump/reload.)
That should be simple using the WHERE clause of ON CONFLICT ... DO UPDATE:
INSERT INTO mytable (id, entry) VALUES (42, '2021-05-29 12:00:00')
ON CONFLICT (id)
DO UPDATE SET entry = EXCLUDED.entry
WHERE mytable.entry < EXCLUDED.entry;

Migrating `int` to `bigint` in PostgresSQL without any downtime?

I have a database that is going to experience the integer exhaustion problem that Basecamp famously faced back in November. I have several months to figure out what to do.
Is there a no-downtime-required, proactive solution to migrating this column type? If so what is it? If not, is it just a matter of eating the downtime and migrating the column when I can?
Is this article sufficient, assuming I have several days/weeks to perform the migration now before I'm forced to do it when I run out of ids?
Use logical replication.
With logical replication you can have different data types at primary and standby.
Copy the schema with pg_dump -s, change the data types on the copy and then start logical replication.
Once all data is copied over, switch the application to use the standby.
For zero down time, the application has to be able to reconnect and retry, but that's always a requirement in such a case.
You need PostgreSQL v10 or better for that, and your database
shouldn't modify the schema, as DDL is not replicated.
should not use sequence (SERIAL or IDENTITY), as the last used value would not be replicated
Another solution for pre-v10 databases where all transactions are short:
Add a bigint column to the table.
Create a BEFORE trigger that sets the new column whenever a row is added or updated.
Run a series of updates that set the new column from the old one where it IS NULL. Keep those batches short so you don't lock long and don't deadlock much. Make sure these transaction run with session_replication_role = replica so they don't trigger triggers.
Once all rows are updated, create a unique index CONCURRENTLY on the new column.
Add a unique constraint USING the index you just created. That will be fast.
Perform the switch:
BEGIN;
ALTER TABLE ... DROP oldcol;
ALTER TABLE ... ALTER newcol RENAME TO oldcol;
COMMIT;
That will be fast.
Your new column has no NOT NULL set. This cannot be done without a long invasive lock. But you can add a check constraint IS NOT NULL and create it NOT VALID. That is good enough, and you can later validate it without disruptions.
If there are foreign key constraints, things get a little more complicated. You have to drop these and create NOT VALID foreign keys to the new column.
Create a copy of the old table but with modified ID field. Next create a trigger on the old table that inserts new data to both tables. Finally copy data from the old table to the new one (it would be a good idea to distinguish pre-trigger data with post-trigger for example by id if it is sequential). Once you are done switch tables and delete the old one.
This obviously requires twice as much space (and time for copy) but will work without any downtime.

Is it possible to perform an upsert that requires filtering in Postgres?

I'm wondering if it's possible to use the following statement to do an upsert w/ filtering. That is, can I first try to update with a where clause, if it fails, then insert, rather than the other way around? I would like to do this in Postgres.
INSERT ... ON CONFLICT DO NOTHING/UPDATE
I did see this, but it is definitely a bit more complicated
https://dba.stackexchange.com/questions/13468/idiomatic-way-to-implement-upsert-in-postgresql
That is, can I first try to update with a where clause, if it fails, then insert, rather than the other way around?
It's unclear why you would want to do this.
The purpose of UPSERT is to ensure that the database contains exactly one row with a given key and with a given set of other column values. Postgres tries INSERT first because INSERT will fail when the key conflicts with a duplicate row (so that it can fall back to updating the conflicting row instead of raising an exception). UPDATE will not fail if the WHERE clause matches nothing. It will successfully update zero rows. UPDATE can fail if you violate a constraint (e.g. a CHECK or NOT NULL constraint), but it won't fail just because you didn't match any rows.
And, on the other hand, if your UPDATE would change an existing row, then your INSERT would necessarily fail with a uniqueness violation (because the row exists). So trying the INSERT first doesn't actually change the result in this case.
It is possible to hang a condition on PostgreSQL's UPSERT, with syntax of the form INSERT... ON CONFLICT DO UPDATE... WHERE.... This will:
Insert the rows you provide.
For each conflict with an existing row, evaluate the WHERE condition for that row.
If the WHERE condition is satisfied, update the existing row, otherwise do nothing with it.
I believe this is functionally equivalent to what you are asking for, because:
If the row does not exist, Postgres will INSERT it. UPDATE wouldn't have affected it, so your method would have had to fall back to INSERTing it anyway.
If the row exists, but does not match the WHERE clause, then Postgres will do nothing. I think your method would either do nothing or fail with a uniqueness constraint after trying to INSERT it, but perhaps you had something else in mind for this case.
If the row exists and matches the WHERE clause, both Postgres and your method will do an UPDATE on that row.

How to set Ignore Duplicate Key in Postgresql while table creation itself

I am creating a table in Postgresql 9.5 where id is the primary key. While inserting rows in the table if anyone tries to insert duplicate id, i want it to get ignored instead of raising exception. Is there any way such that i can set this while table creation itself that duplicate entries get ignored.
There are many techniques to resolve duplicate insertion issue while writing insertion query i.e. using ON CONFLICT DO NOTHING, or using WHERE EXISTS clause etc. But i want to handle this at table creation end so that the person writing insertion query doesn't need to bother any.
Creating RULE is one of the possible solution. Are there other possible solutions? Maybe something like this:
`CREATE TABLE dbo.foo (bar int PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))`
Although exact this statement doesn't work on Postgresql 9.5 on my machine.
add a trigger before insert or rule on insert do instead - otherwise has to be handled by inserting query. both solutions will require more resources on each insert.
Alternative way to use function with arguments for insert, that will check for duplicates, so end users will use function instead of INSERT statement.
WHERE EXISTS sub-query is not atomic btw - so you can still have exception after check...
9.5 ON CONFLICT DO NOTHING is the best solution still

postgres condition trigger missing from clause

I'm trying to create a trigger on my table such that it only runs if the 'prepaid' column is true for rows where I've modified the value of the 'points_per_month' column. I tried this:
CREATE TRIGGER "fix_usage_trigger"
AFTER UPDATE OF "points_per_month"
ON "public"."clients"
FOR EACH ROW WHEN (ROW.prepaid)
EXECUTE PROCEDURE "fix_prepaid_client_available_usage"();
psql is telling me this:
ERROR: missing FROM-clause entry for table "row"
LINE 1: ...r_month" ON "public"."clients" FOR EACH ROW WHEN (ROW.prepai...
Clearly I have no FROM clause there, but I'm not sure why I'd need one, nor where to put it.
That should be when (new.prepaid), per David's comment. You can access old and new in the when clause (as in the row before and after the update) much like table aliases. The error message is PG complaining that row is not a known table.
Two additional notes:
it might need to be when (old.prepaid or new.prepaid) if you want to manage billing plan switches -- or another two separate triggers. Conversely, when (old.prepaid and new.prepaid) if you do not, and someone might run database queries that might inadvertently fire the trigger and create undesirable state (add a unit test or two).
the function's name suggest something might be wrong further up in your code flow. You might want to fix that instead, by setting the available usage properly to begin with. Doing so might be more efficient, too.