Sql - Column Check Constraint based on Date - tsql

This isn't a duplicate of Check constraint on date but I might have missed another similar question.
On MS SQL, you can create the following constraint:
ALTER TABLE [X] WITH CHECK ADD CONSTRAINT [CCCHK03_TBX] CHECK
(
[TBX_YEAR] = DATEPART( year, GetDate() )
)
You are able to insert records just fine but I am unable to fully test the implications of this though, what will happen when the server date ticks over to 2017? I am of the impression that it will allow inserts for 2017 but it'll theoretically invalidate all records for 2016.
This table is an insert only table so records are never meant to be updatable so that's not an issue. My main concern is whether there could potentially be a server stability issue caused by this?
I can't seem to find anything relating to this but MS must have a reason for allowing such a constraint.
Normally I'd advise creating an insert trigger and checking against that but this got me curious.
Edit: Expanded test from destination-data's answer:
IF ( OBJECT_ID( 'tempdb..#CheckTest' ) IS NOT NULL )
DROP TABLE #CheckTest
GO
CREATE TABLE #CheckTest ( MN INT )
ALTER TABLE #CheckTest WITH CHECK ADD CONSTRAINT CHK_MN CHECK ( MN = DATEPART( SECOND, GETDATE() ) )
ALTER TABLE #CheckTest CHECK CONSTRAINT CHK_MN
GO
-- Control Test. This will fail with:
--Msg 547, Level 16, State 0, Line 12
--The INSERT statement conflicted with the CHECK constraint "CHK_MN". The conflict occurred in database "tempdb", table "dbo.#CheckTest", column 'MN'.
--The statement has been terminated.
INSERT INTO #CheckTest ( MN )
VALUES ( DATEPART( SECOND, DATEADD( SECOND, 5, GETDATE() ) ) )
-- Add 5 different seconds.
DECLARE #Counter int = 0;
WHILE #Counter < 5
BEGIN
INSERT INTO #CheckTest ( MN )
VALUES ( DATEPART( SECOND, GETDATE() ) )
SET #Counter += 1;
-- Delay for a second.
WAITFOR DELAY '00:00:01';
END
GO
-- Add a different second.
-- Disabling and Enabling a check will work just fine so long as the check already exists.
ALTER TABLE #CheckTest NOCHECK CONSTRAINT CHK_MN;
INSERT INTO #CheckTest ( MN )
VALUES ( DATEPART( SECOND, DATEADD( SECOND, 5, GETDATE() ) ) )
ALTER TABLE #CheckTest CHECK CONSTRAINT CHK_MN;
-- Control Test. This will fail with:
--Msg 547, Level 16, State 0, Line 12
--The INSERT statement conflicted with the CHECK constraint "CHK_MN". The conflict occurred in database "tempdb", table "dbo.#CheckTest", column 'MN'.
--The statement has been terminated.
INSERT INTO #CheckTest ( MN )
VALUES ( DATEPART( SECOND, DATEADD( SECOND, 5, GETDATE() ) ) )
GO
-- Check table contents.
SELECT * FROM #CheckTest;
GO
-- Dropping and recreating the check constraint will result in an error:
--Msg 547, Level 16, State 0, Line 37
--The ALTER TABLE statement conflicted with the CHECK constraint "CHK_MN". The conflict occurred in database "tempdb", table "dbo.#CheckTest", column 'MN'.
ALTER TABLE #CheckTest DROP CONSTRAINT CHK_MN;
ALTER TABLE #CheckTest WITH CHECK ADD CONSTRAINT CHK_MN CHECK ( MN = DATEPART( SECOND, GETDATE() ) )
GO
DROP TABLE #CheckTest
GO
Edit 2: Summary
Based on the testing and feedback, while it sure was a fun exercise and definitely appears to be 100% perfectly valid, I'd definitely advise against it purely from a "future proof" point of view as the check will never be able to be altered. Personally I think the trigger based constraint would be the most maintainable.

Good question!
The short answer is you will be ok. Checks are validated only when inserting or updating columns within a check. Your existing records will be unaffected by the change in year.
But there are some potential issues you should consider. This type of check can make it harder for you to amend the table design, especially via SSMS (which tends to create a new table, import the data from old, drop the old and rename the new). The import will fail as old records do not match the current constraint rules. Of course you can still amend using tSQL.
Amending Table
-- Disable check.
ALTER TABLE [Schema].[Table] NOCHECK CONSTRAINT [Check];
-- Makes changes here.
-- Enable check.
ALTER TABLE [Schema].[Table] CHECK CONSTRAINT [Check];
You do not need to disable the check when updating other columns within the table.
Test Query
Testing was a little difficult with years, so I swapped to minutes.
CREATE TABLE #CheckTest
(
MN INT CONSTRAINT CHK_MN CHECK (MN = DATEPART(MINUTE, GETDATE()))
)
;
-- Add two different minutes.
WHILE #COUNTER < 2
BEGIN
INSERT INTO #CheckTest
(
MN
)
VALUES
(DATEPART(MINUTE, GETDATE()))
;
SET #COUNTER = #COUNTER + 1;
-- Delay for a minute.
WAITFOR DELAY '00:01:00'
END
-- Check table contents.
SELECT
*
FROM
#CheckTest
;

While I agree with the comment by Vladimir Baranov, I also think about the following.
Query optimizer may consider CHECK constraints, when data is queried (example), but as stated in the documentation this is only done for trusted constraints
The query optimizer does not consider constraints that are defined
WITH NOCHECK.
While you created constraint specifying WITH CHECK
ALTER TABLE [X] WITH CHECK ADD CONSTRAINT [CCCHK03_TBX] CHECK ...
a year later (after some rows corresponding to next year are added) you will not be able to do
ALTER TABLE [X] WITH CHECK CHECK CONSTRAINT [CCCHK03_TBX]
Trying to do it will produce error message
Msg 547, Level 16, State 0, Line ... The ALTER TABLE statement
conflicted with the CHECK constraint "CCCHK03_TBX". The conflict
occurred in database "DbName", table "dbo.X", column 'TBX_YEAR'.
However in the sys.check_constraints system view
select is_not_trusted
from sys.check_constraints
where object_id = object_id('CCCHK03_TBX')
you will still see is_not_trusted = 0, where as effectively, I think, it is like is_not_trusted = 1 (because of some of the table rows no longer satisfy check expression).
I'm not sure, but I think that it may lead to situation where query optimizer will generate sub-optimal plan, unless it is smart enough to not consider CHECK constraints containing non-deterministic expressions in some circumstances (I was not able to find this information, great if someone will shed the light onto this). And I believe that query optimizer is smart enough in this circumstances to not make false judgements that may lead to producing incorrect query results.

Related

Is it safe to insert data from inside of a CTE expression?

Background
I have a function that generates detail rows and inserts these into a detail table. I want to automatically ensure that the referenced master row is available before inserting the detail rows.
A BEFORE INSERT trigger does the job, but unfortunately the job is done too well. If a detail row is prevented due to a unique index, the master row will still be inserted, leaving me with a master without children (I don't want that).
I managed to solve this by inserting the master rows inside of a cte and then inserting the detail rows in the actual query. This works, but I'm worried that it's not a safe way of doing it.
INSERT rows from inside of a CTE expression
Here is the code to try this out. The input_cte is just generating static dummy data in the example. This particular example could be built with separate static insert SQLs, but in my real case the input_cte is dynamic.
Is this a safe way to solve this, or is it out of spec and could blow up in the next PG version?
CREATE TABLE master (
id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY,
PRIMARY KEY (id)
);
CREATE TABLE detail (
id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY,
master_id INT,
PRIMARY KEY (id, master_id)
);
ALTER TABLE detail
ADD CONSTRAINT detail_master_id_fkey
FOREIGN KEY (master_id)
REFERENCES master (id);
WITH input_cte AS (
SELECT 1 AS master_id, 1 AS detail_id
UNION SELECT 1, 2
UNION SELECT 2, 1
),
insert_cte AS (
--Ignore conflicts, as the master row could already exist
INSERT INTO master (id)
SELECT DISTINCT master_id
FROM input_cte
ON CONFLICT DO NOTHING
)
INSERT INTO detail (id, master_id)
SELECT detail_id, master_id
FROM input_cte;
SELECT * FROM master;
SELECT * FROM detail;
Edit
Bummer.. I just realised that this method will also insert master rows even if the detail rows would be stopped by my unique index.
I see two options. Which should I choose?
Check exactly what I can insert before trying to insert. I.e. do the same thing as my unique index does.
Go ahead with the above solution or the BEFORE INSERT trigger concept and then clean up unused master rows afterwards with a separate DELETE query.
Edit 2 as a reponse to Haleemur Ali's comment
I agree with Haleemur, but in my case it's a bit more complicated than the small example I created. The unique key on my detail can actually have null values. The index in my detail table (project_sequence) looks like this to enable null values:
CREATE UNIQUE INDEX
project_sequence_unique_combinations ON main.project_sequence
(project_id, controlpoint_type_id, COALESCE(drawing_id, 0), COALESCE(layer_guid, '00000000-0000-0000-0000-000000000000'));
Due to possible NULL values I cannot use these fields in my primary key, so I have a surrogate integer key. I'm calculating these key values in my CTE so they will always be unique. I.e. they can be inserted into the master table (sequence) even if the detail rows gets stopped by the unique index.*
To clarify I've inserted my actual code below. This code works as it should, but it sure would be nice to be able to utilize the new fancy DEFERRED and REFERENCES triggers.
SELECT COALESCE(MAX(id), 0)
INTO _max_sequence_id
FROM main.sequence;
WITH cte AS (
SELECT
d.project_id,
pct.controlpoint_type_id,
d.id as drawing_id,
DENSE_RANK() OVER(ORDER BY d.id, sequence_group_key) + _max_sequence_id + 1 AS new_sequence_id
FROM main.drawing d
CROSS JOIN main.project_controlpoint_type pct
--This left JOIN along with "ps.project_id IS NULL" is my
--current solution, i.e. its "option 1" from above.
LEFT JOIN main.project_sequence ps
ON ps.project_id = d.project_id
AND ps.drawing_id = d.id
AND ps.controlpoint_type_id = pct.controlpoint_type_id
WHERE d.project_id = _project_id
AND pct.project_id = _project_id
AND pct.sequence_level_id = 2
AND ps.project_id IS NULL
),
insert_sequence_cte AS (
INSERT INTO main.sequence
(id, project_id, last_value)
SELECT DISTINCT cte.new_sequence_id, cte.project_id, 0
FROM cte
ON CONFLICT DO NOTHING
)
INSERT INTO main.project_sequence
(project_id, controlpoint_type_id, drawing_id, sequence_id)
SELECT
project_id,
controlpoint_type_id,
drawing_id,
new_sequence_id
FROM cte;
The manual page on WITH queries states that your use case is legitimate and supported:
You can use data-modifying statements (INSERT, UPDATE, or DELETE) in WITH.
and
... data-modifying statements are only allowed in WITH clauses that are attached to the top-level statement. However, normal WITH visibility rules apply, so it is possible to refer to the WITH statement's output from the sub-SELECT.
Further:
If a data-modifying statement in WITH lacks a RETURNING clause, then it forms no temporary table and cannot be referred to in the rest of the query. Such a statement will be executed nonetheless.
If the rule "no master without a detail" is important to your model, you should let the DB enforce it. This will free you from "auto-inserting" master to reduce the potential for error and let you use conventional methods again.
Take a look at CONSTRAINT TRIGGERS. They allow you to just detect and reject violations of your no-master-without-detail rule while leaving the actual INSERTs to application code.
Your use case would need a CONSTRAINT TRIGGER on your master table that is DEFERRABLE INITIALLY DEFERRED. This allows you to INSERT master, then INSERT detail and still be sure that the transaction only commits if all is consistent.
From the manual linked above:
Constraint triggers must be AFTER ROW triggers on plain tables (not foreign tables). They can be fired either at the end of the statement causing the triggering event, or at the end of the containing transaction; in the latter case they are said to be deferred.
You'll need two triggers, one handling INSERT/UPDATE on master and another one handling DELETE/UPDATE on detail:
CREATE CONSTRAINT TRIGGER trigger_assert_master_has_detail
AFTER INSERT OR UPDATE OF id ON master
DEFERRABLE INITIALLY DEFERRED FOR EACH ROW
EXECUTE PROCEDURE assert_master_has_detail();
CREATE CONSTRAINT TRIGGER trigger_assert_no_leftover_master
AFTER DELETE OR UPDATE OF master_id ON detail
DEFERRABLE INITIALLY DEFERRED FOR EACH ROW
EXECUTE PROCEDURE assert_no_leftover_master();
Note that UPDATEs will only fire the trigger if they concern the FK/PK column.
The two trigger functions will then check if there are 1-n details for the master:
CREATE FUNCTION assert_master_has_detail() RETURNS trigger
AS
$$
BEGIN
IF NOT EXISTS (SELECT 1 FROM detail WHERE master_id = NEW.id)
THEN
RAISE EXCEPTION 'no detail for master_id=%', NEW.id;
ELSE
RETURN NEW;
END IF;
END;
$$
LANGUAGE plpgsql;
CREATE FUNCTION assert_no_leftover_master() RETURNS trigger
AS
$$
BEGIN
IF NOT EXISTS (SELECT 1 FROM detail WHERE master_id = OLD.master_id)
AND EXISTS (SELECT 1 FROM master WHERE id = OLD.master_id)
THEN
RAISE EXCEPTION 'last detail for master_id=% removed, but master still exists', OLD.master_id;
ELSE
RETURN NULL;
END IF;
END;
$$
LANGUAGE plpgsql;
Example of a violation:
INSERT INTO master
VALUES (1);
-- ERROR: no detail for master_id=1
-- CONTEXT: PL/pgSQL function assert_master_has_detail() line 5 at RAISE
and a legal scenario:
INSERT INTO master
VALUES (1);
INSERT INTO detail
VALUES (10, 1);
-- trigger fired at end of transaction, finds everything is OK
Here's a complete solution as dbfiddle.

Postgresql function not working as expected with INSERT INTO

I have function to insert data from one table to another
$BODY$
BEGIN
INSERT INTO backups.calls2 (uid,queue_id,connected,callerid2)
SELECT distinct (c.uid) ,c.queue_id,c.connected,c.callerid2
FROM public.calls c
WHERE c.connected is not null;
RETURN;
EXCEPTION WHEN unique_violation THEN NULL;
END;
$BODY$
And structure of table:
CREATE TABLE backups.nc_calls_id
(
uid character(30) NOT NULL,
queue_id integer,
callerid2 text,
connected timestamp without time zone,
id serial NOT NULL,
CONSTRAINT calls2_pkey PRIMARY KEY (uid)
)
WITH (
OIDS=FALSE
);
When I have first executed this query, everything went ok, 200000 rows was inserted to new table with unique Id.
But now, when I executing it again, no rows are being inserted
From the rather minimalist description given (no PostgreSQL version, no CREATE FUNCTION statement showing params etc, no other table structure, no function invocation) I'm guessing that you're attempting to do a merge, where you insert a row only if it doesn't exist by skipping rows if they already exist.
What the above function will do is skip all rows if any row already exists.
You need to either use a loop to do the insert within individual BEGIN ... EXCEPTION blocks (slow) or LOCK the table and do an INSERT INTO ... SELECT ... FROM newtable WHERE NOT EXISTS (SELECT 1 FROM oldtable where oldtable.key = newtable.key).
The INSERT INTO ... SELECT ... WHERE NOT EXISTS method will perform a lot better but will fail if more than one runs concurrently or if anything else inserts into the destination table at the same time. LOCKing the destination table before running it will make sure it's safe.
The PL/PgSQL looping BEGIN ... EXCEPTION method sounds nice and safe at first glance. Then you think about what happens when you run two of them at once. One will insert some keys first, one will insert other keys first, so they have a split of the values between them. That's OK, together they make up the full set. But what if only one of them commits and the other fails for some reason? You'll have an interesting sparsely inserted result. For that reason it's probably best to lock the destination table if using this approach too ... in which case you might as well use the vastly more efficient single pass INSERT with subquery-based uniqueness violation check.

T-SQL - If column exists, execute query involving column, else execute another query where column is not present [duplicate]

Does anyone see what's wrong with this code for SQL Server?
IF NOT EXISTS(SELECT *
FROM sys.columns
WHERE Name = 'OPT_LOCK'
AND object_ID = Object_id('REP_DSGN_SEC_GRP_LNK'))
BEGIN
ALTER TABLE REP_DSGN_SEC_GRP_LNK
ADD OPT_LOCK NUMERIC(10, 0)
UPDATE REP_DSGN_SEC_GRP_LNK
SET OPT_LOCK = 0
ALTER TABLE REP_DSGN_SEC_GRP_LNK
ALTER COLUMN OPT_LOCK NUMERIC(10, 0) NOT NULL
END;
When I run this, I get:
Msg 207, Level 16, State 1, Line 3
Invalid column name 'OPT_LOCK'.
on the update command.
Thanks.
In this case you can avoid the problem by adding the column as NOT NULL and setting the values for existing rows in one statement as per my answer here.
More generally the problem is a parse/compile issue. SQL Server tries to compile all statements in the batch before executing any of the statements.
When a statement references a table that doesn't exist at all the statement is subject to deferred compilation. When the table already exists it throws an error if you reference a non existing column. The best way round this is to do the DDL in a different batch from the DML.
If a statement both references a non existing column in an existing table and a non existent table the error may or may not be thrown before compilation is deferred.
You can either submit it in separate batches (e.g. by using the batch separator GO in the client tools) or perform it in a child scope that is compiled separately by using EXEC or EXEC sp_executesql.
The first approach would require you to refactor your code as an IF ... cannot span batches.
IF NOT EXISTS(SELECT *
FROM sys.columns
WHERE Name = 'OPT_LOCK'
AND object_ID = Object_id('REP_DSGN_SEC_GRP_LNK'))
BEGIN
ALTER TABLE REP_DSGN_SEC_GRP_LNK
ADD OPT_LOCK NUMERIC(10, 0)
EXEC('UPDATE REP_DSGN_SEC_GRP_LNK SET OPT_LOCK = 0');
ALTER TABLE REP_DSGN_SEC_GRP_LNK
ALTER COLUMN OPT_LOCK NUMERIC(10, 0) NOT NULL
END;
The root cause of the error is the newly added column name is not reflected in the sys.syscolumns and sys.columns table until you restart SQL Server Management Studio.
For your information,you can replace the IF NOT EXISTS with the COL_LENGTH function. It takes two parameters,
Table Name and
Column you are searching for
If the Column is found then it returns the range of the datatype of the column Ex: Int (4 bytes), when not found then it returns a NULL.
So, you could use this as follows and also combine 3 Statements into one.
IF (SELECT COL_LENGTH('REP_DSGN_SEC_GRP_LNK','OPT_LOCK')) IS NULL
BEGIN
ALTER TABLE REP_DSGN_SEC_GRP_LNK
ADD OPT_LOCK NUMERIC(10, 0) NOT NULL DEFAULT 0
END;
Makes it simpler.

PostgreSQL, triggers, and concurrency to enforce a temporal key

I want to define a trigger in PostgreSQL to check that the inserted row, on a generic table, has the the property: "no other row exists with the same key in the same valid time" (the keys are sequenced keys). In fact, I has already implemented it. But since the trigger has to scan the entire table, now i'm wondering: is there a need for a table-level lock? Or this is managed someway by the PostgreSQL itself?
Here is an example.
In the upcoming PostgreSQL 9.0 I would have defined the table in this way:
CREATE TABLE medicinal_products
(
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
market_time PERIOD,
EXCLUDE USING gist
(aic_code CHECK WITH =,
market_time CHECK WITH &&)
);
but in fact I have been defined it like this:
CREATE TABLE medicinal_products
(
PRIMARY KEY (aic_code, vs),
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
vs DATE NOT NULL,
ve DATE,
CONSTRAINT valid_time_range
CHECK (ve > vs OR ve IS NULL)
);
Then, I have written a trigger that check the costraint: "two distinct medicinal products can have the same code in two different periods, but not in same time".
So the code:
INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
return an error.
One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:
CREATE TABLE medicinal_product_date_map(
aic_code char(9) NOT NULL,
applicable_date date NOT NULL,
UNIQUE(aic_code, applicable_date));
(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).
Some functions to maintain this table:
CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
INSERT INTO medicinal_product_date_map
SELECT $1, $2 + offset
FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
DELETE FROM medicinal_product_date_map
WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;
And populate the table first time with:
SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;
Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.
CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
END IF;
IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
END IF;
RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();
The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR: duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL: Key (aic_code, applicable_date)=(1 , 2010-03-01) already exists.
CONTEXT: SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM
This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.
Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).
I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.
Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.
It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.
A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.
Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;