I'm using Python to write to a postgres database:
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
But because some of my rows are identical, I get the following error:
psycopg2.IntegrityError: duplicate key value
violates unique constraint "hundred_pkey"
How can I write an 'INSERT unless this row already exists' SQL statement?
I've seen complex statements like this recommended:
IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF
But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?
Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:
INSERT ... ON CONFLICT DO NOTHING/UPDATE
It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.
How can I write an 'INSERT unless this row already exists' SQL statement?
There is a nice way of doing conditional INSERT in PostgreSQL:
INSERT INTO example_table
(id, name)
SELECT 1, 'John'
WHERE
NOT EXISTS (
SELECT id FROM example_table WHERE id = 1
);
CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT in the NOT EXISTS anti-semi-join and the INSERT itself. It can fail under such conditions.
One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.
So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXITS join to only join on the unique columns in the hundred table.
Create temporary table. See docs here.
CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
INSERT Data into temp table.
INSERT INTO temp_data(name, name_slug, status);
Add any indexes to the temp table.
Do main table insert.
INSERT INTO hundred(name, name_slug, status)
SELECT DISTINCT name, name_slug, status
FROM hundred
WHERE NOT EXISTS (
SELECT 'X'
FROM temp_data
WHERE
temp_data.name = hundred.name
AND temp_data.name_slug = hundred.name_slug
AND temp_data.status = status
);
Unfortunately, PostgreSQL supports neither MERGE nor ON DUPLICATE KEY UPDATE, so you'll have to do it in two statements:
UPDATE invoices
SET billed = 'TRUE'
WHERE invoices = '12345'
INSERT
INTO invoices (invoiceid, billed)
SELECT '12345', 'TRUE'
WHERE '12345' NOT IN
(
SELECT invoiceid
FROM invoices
)
You can wrap it into a function:
CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$$
UPDATE invoices
SET billed = $2
WHERE invoices = $1;
INSERT
INTO invoices (invoiceid, billed)
SELECT $1, $2
WHERE $1 NOT IN
(
SELECT invoiceid
FROM invoices
);
$$
LANGUAGE 'sql';
and just call it:
SELECT fn_upd_invoices('12345', 'TRUE')
This is exactly the problem I face and my version is 9.5
And I solve it with SQL query below.
INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
SELECT id FROM example_table WHERE id = 1
)
LIMIT 1;
Hope that will help someone who has the same issue with version >= 9.5.
Thanks for reading.
You can make use of VALUES - available in Postgres:
INSERT INTO person (name)
SELECT name FROM person
UNION
VALUES ('Bob')
EXCEPT
SELECT name FROM person;
I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:
Create Function ignore_dups() Returns Trigger
As $$
Begin
If Exists (
Select
*
From
hundred h
Where
-- Assuming all three fields are primary key
h.name = NEW.name
And h.hundred_slug = NEW.hundred_slug
And h.status = NEW.status
) Then
Return NULL;
End If;
Return NEW;
End;
$$ Language plpgsql;
Create Trigger ignore_dups
Before Insert On hundred
For Each Row
Execute Procedure ignore_dups();
Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:
sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))
Note that as #Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.
There is a nice way of doing conditional INSERT in PostgreSQL using WITH query:
Like:
WITH a as(
select
id
from
schema.table_name
where
column_name = your_identical_column_value
)
INSERT into
schema.table_name
(col_name1, col_name2)
SELECT
(col_name1, col_name2)
WHERE NOT EXISTS (
SELECT
id
FROM
a
)
RETURNING id
we can simplify the query using upsert
insert into invoices (invoiceid, billed)
values ('12345', 'TRUE')
on conflict (invoiceid) do
update set billed=EXCLUDED.billed;
INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":
BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
It's easy with rules:
CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING
But it fails with concurrent writes ...
The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180.
I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.
Using IF NOT FOUND THEN after a SELECT just works perfectly for me.
(described in PostgreSQL Documentation)
Example from documentation:
SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
RAISE EXCEPTION 'employee % not found', myname;
END IF;
psycopgs cursor class has the attribute rowcount.
This read-only attribute specifies the number of rows that the last
execute*() produced (for DQL statements like SELECT) or affected (for
DML statements like UPDATE or INSERT).
So you could try UPDATE first and INSERT only if rowcount is 0.
But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.
Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.
I suggest you insert an id as serial type to handly the primary key
If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows
INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred
+",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
hundred_pkey DO NOTHING;" cursor.execute(sql_string);
INSERT INTO invoices (invoiceid, billed) (
SELECT '12345','TRUE' WHERE NOT EXISTS (
SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
)
)
I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.
sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.
import json
def upsert(table_name, id_column, other_columns, values_hash):
template = """
WITH new_values ($$ALL_COLUMNS$$) as (
values
($$VALUES_LIST$$)
),
upsert as
(
update $$TABLE_NAME$$ m
set
$$SET_MAPPINGS$$
FROM new_values nv
WHERE m.$$ID_COLUMN$$ = nv.$$ID_COLUMN$$
RETURNING m.*
)
INSERT INTO $$TABLE_NAME$$ ($$ALL_COLUMNS$$)
SELECT $$ALL_COLUMNS$$
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.$$ID_COLUMN$$ = new_values.$$ID_COLUMN$$)
"""
all_columns = [id_column] + other_columns
all_columns_csv = ",".join(all_columns)
all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])
q = template
q = q.replace("$$TABLE_NAME$$", table_name)
q = q.replace("$$ID_COLUMN$$", id_column)
q = q.replace("$$ALL_COLUMNS$$", all_columns_csv)
q = q.replace("$$VALUES_LIST$$", all_values_csv)
q = q.replace("$$SET_MAPPINGS$$", set_mappings)
return q
def query_value(value):
if value is None:
return "NULL"
if type(value) in [str, unicode]:
return "'%s'" % value.replace("'", "''")
if type(value) == dict:
return "'%s'" % json.dumps(value).replace("'", "''")
if type(value) == bool:
return "%s" % value
if type(value) == int:
return "%s" % value
return value
if __name__ == "__main__":
my_table_name = 'mytable'
my_id_column = 'id'
my_other_columns = ['field1', 'field2']
my_values_hash = {
'id': 123,
'field1': "john",
'field2': "doe"
}
print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:
ALTER USER user SET search_path to 'name_of_schema';
after these changes "INSERT" will work correctly.
Related
Assume I have the following table
CREATE TABLE my_table (
id UUID,
field_1 VARCHAR
field_2 VARCHAR
field_3 VARCHAR
);
This table is linked to a form in front-end where I can update field_1, field_2 and field_3, but I want to only update the fields that the user has filled.
For software and team reasons, we have the following function to update, built and called from front-end:
CREATE OR REPLACE FUNCTION public.update_my_table(
_my_table_input my_table%ROWTYPE
) RETURNS VOID AS $$
BEGIN
UPDATE
my_table
SET
field_1 = _my_table_input.field_1
field_2 = _my_table_input.field_2,
field_3 = _my_table_input.field_3
WHERE
id = _my_table.id
END;
_my_table_input represents a row of table with an id and a list of all fields that the user has modified.
In the code I'm treating, the issue I'm facing is that I'm passing a table row which can be incomplete, meaning that for instance _my_table_input.field_3 can be null, and in this case, I don't want to update field_3
I could change the way the front works, but I was thinking and I'm looking for an elegant to UPDATE my data of only defined fields of the function input, something like:
UPDATE
my_table
SET
field_1 = _my_table_input.field_1 IF EXISTS(_my_table_input.field_1)
field_2 = _my_table_input.field_2 IF EXISTS(_my_table_input.field_2),
field_3 = _my_table_input.field_3 IF EXISTS(_my_table_input.field_3)
WHERE
id = _my_table.id
I checked online, and on PostgreSQL docs, but I could not find what I want. So I thought maybe some of you could have a brilliant idea.
How about using "dynamic query" approach?
I'd like to share the idea, although I probably don't know the most graceful way to do so with Postgreql, but I think it may go like these:
with
-- combining each desired field in temp table
set_fields (field) as (
values
(case
when _my_table_input.field1 is not null
then concat('field1 = ', _my_table_input.field1)
else null
end),
(case
when _my_table_input.field2 is not null
then concat('field2 = ', _my_table_input.field2)
else null
end),
),
-- aggregating parts of statement in another temp table
update_sql_statement as (
select
concat(
'update ',
'my_table ',
'set ',
array_to_string(array_agg(field), ', ')
-- ' where id = _my_table.id'
) as statement
from set_fields
)
-- Then take the finally runable statment string out to execute
-- This is the part I'm not sure about, but I believe postgresql have some way of executing certain string from temp.
-- execute(format(select statement from update_sql_statement))
select statement from update_sql_statement
Try see this doc if it help spark something.
This is a simple example of what I need, for any given table, I need to get all the instances of the primary keys, this is a little example, but I need a generic way to do it.
create table foo
(
a numeric
,b text
,c numeric
constraint pk_foo primary key (a,b)
)
insert into foo(a,b,c) values (1,'a',1),(2,'b',2),(3,'c',3);
select <the magical thing>
result
a|b
1 |1|a|
2 |2|b|
3 |3|c|
.. ...
I need to control if the instances of the primary keys are changed by the user, but I don't want to repeat code in too many tables! I need a generic way to do it, I will put <the magical thing>
in a function to put it on a trigger before update and blah blah blah...
In PostgreSQL you must always provide a resulting type for a query. However, you can obtain the code of the query you need, and then execute the query from the client:
create or replace function get_key_only_sql(regclass) returns string as $$
select 'select '|| (
select string_agg(quote_ident(att.attname), ', ' order by col)
from pg_index i
join lateral unnest(indkey) col on (true)
join pg_attribute att on (att.attrelid = i.indrelid and att.attnum = col)
where i.indrelid = $1 and i.indisprimary
group by i.indexrelid
limit 1) || ' from '||$1::text
end;
$$ language sql;
Here's some client pseudocode using the function above:
sql = pgexecscalar("select get_key_only_sql('mytable'::regclass)");
rs = pgopen(sql);
In a SQL Server 2008R2 sproc, I have the following code to calculate the total number of records returned by a dynamic query:
-- count the actual number of results
DECLARE #rowcount TABLE (Value int);
INSERT INTO #rowcount
EXEC('SELECT COUNT(*) ' + #sqlBody);
SELECT #varcharActualNumResults = VALUE FROM #rowcount;
The dynamic query is broken up into three parts: #sqlTop, #sqlBody, and #sqlBottom.
I want to know how many records are actually returned by #sqlBody, but the actual execution is the concatenated #sqlTop + #sqlBody + #sqlBottom.
The issue is that the above query takes 24136ms (roughly 24 seconds) where the actual number of records is around 18,000.
Another issue comes up where I want to get the rowcount for the entire query:
EXEC (#sqlTop + #sqlBody + #sqlBottom)
SET #NumberOfResultsReturned = ##ROWCOUNT;
This execution takes roughly two seconds.
Here is a sample query:
SELECT TOP(10) Title
FROM ItemData
WHERE (
FREETEXT(Title, '"windshield"')
OR
( [Title] LIKE '%mazda 6%' )
)
AND ( WebsiteID=1 )
ORDER BY DateAdded DESC
#sqlBody contains everything between the FROM and before ORDER BY.
How can this be optimized?
UPDATE
In my sproc I defined my temp table then attempt to populate it through the EXEC command. However at the EXEC command line I'm getting an error indicating the temp table has to be defined:
DECLARE #Temp_Results TABLE (ItemListID BIGINT, Title VARCHAR(255) )
EXEC ('INSERT INTO #Temp_Results (ItemListID, Title) (' + #sqlTop + #sqlBody + ')')
If I take out the exec command and execute the SQL it works fine. How can I get around this?
From your update, you've created a Table Variable, not a Temporary Table. Table variables (as with all other variables) are not available in nested scopes. Temp tables are. You can try:
CREATE TABLE #Temp_Results (ItemListID BIGINT, Title VARCHAR(255) )
EXEC ('INSERT INTO #Temp_Results (ItemListID, Title) (' + #sqlTop + #sqlBody + ')')
I understand your problem now and I think the best way is to not use a dynamic query. However, if you have to use the dynamic query here are some ways to speed it up.
Make sure all the items in the where have indexes.
Add your counting into the EXEC so that the row count is returned with one exec call. For example in your dynamic query put the results of the reduction in a temp table and count that table then select from that table.
I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.
Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)
No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).
using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.
Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.
How to delete N oldest entries. I'm limited Sybase. I need to write a stored procedure which would accept a number X and then leave only X newest entries in the table.
For example:
Say ID is auto incremented. The smaller it is, the older this entry is.
ID Text
=========
1 ASD
2 DSA
3 HJK
4 OIU
I need a procedure which would be executed like this.
execute CleanUp 2
and the result will be
ID Text
=========
3 HJK
4 OIU
Note: SQL Server syntax, but should work
Delete from TableName where ID in
(select top N ID from TableName order by ID )
If you want N to be a parameter you will have to construct the statement string and execute it
declare #query varchar(4000)
set #query = 'Delete from TableName where ID in '
set #query = #query + '(select top ' + #N + ' ID from TableName order by ID )'
exec sp_executesql #query
I Like Eduardo's option best as it's the simplest solution, but since Sergej mentions it is quite slow, here's an alternative solution:
Create a stored procedure that does the following:
Create a temp table with the same structure as the original table.
Insert the top N rows into the temp table.
Truncate the original table.
Copy the rows from the temp table back to the original table.
Generally this will be much faster, especially if you have lots of rows in the table.
If you have a clustered index on id, it's safe to execute a delete top query.
delete top 2 from TableName;
I know this is an old question but this can be done without constructing the statement as the top answer say, using a CTE :
WITH MyCTE AS
(
SELECT Field1, Field2, ROW_NUMBER() OVER (ORDER BY Field1 ASC) AS RowNum
FROM MyTable
WHERE Field2 = #WhatIWant
)
DELETE FROM MyCTE WHERE RowNum <= #NbRowsToDelete;