I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
Example of the function when being run:
select get_result('123','idtype');
table columns:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id is the table index.
Can someone help?
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
Related question with a lot more explanation:
Atomic UPDATE .. SELECT in Postgres
Explain
Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
The locking clauses cannot be used in contexts where returned rows
cannot be clearly identified with individual table rows; for example
they cannot be used with aggregation.
Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.
If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:
CREATE INDEX foo_idx ON table (id_type, id)
WHERE id_used IS NULL;
Alternative solutions
Advisory locks May be the superior approach here:
Postgres UPDATE ... LIMIT 1
Or you may want to lock many rows at once:
How to mark certain nr of rows in table on concurrent access
I would like to write a stored procedure using a statement to iterate through a result-set of records provided by another statement, and union the end results into one single result-set. Can anyone advise on an approach for this?
For example, a generic set of records to iterate through:
SELECT sys.schemas.name + '.' + sys.objects.name as [schm_obj]
FROM sys.objects
INNER JOIN sys.schemas
ON sys.objects.schema_id = sys.schemas.schema_id
AND sys.schemas.name IN ('dbo')
Generic query to be executed on each record:
SELECT DISTINCT referenced_schema_name + '.' + referenced_entity_name
FROM sys.dm_sql_referenced_entities(#schm_obj,'OBJECT')
The parameter #schm_obj to be replaced by a single field value returned in each row of the first query; eventually, I would like to union all results. Any advice would be greatly appreciated.
You need to do dyanmic sql to do this. I am confused as you are stating a procedure then showing a table function. Table Functions unless something changed in 2012 cannot do dynamic sql. If you want to basically create a programmable object you can use to get data that YOU SUPPLY the schema or meta detail to get to it I would use a dynamic procedure that would label what you got and then give you the value of it. Then you could insert this into a table variable and iterate or insert into it till your heart's content.
create proc dbo.Dynaminizer
(
#ObjName nvarchar(64)
)
as
BEGIN
declare #SQL nvarchar(1024)
Select #SQL = 'Select ''' + #ObjName + ''' as Name, ' + #ObjName + ' from sys.tables'
EXECUTE sp_executesql #SQL
END
declare #temp table ( name varchar(32), value varchar(128))
insert into #Temp
exec Dynaminizer 'name'
insert into #Temp
exec Dynaminizer 'object_id'
select *
from #Temp
UPDATE LATER THAT DAY....
If you just want to take values from one table or set and then perform operations in a function that takes a schema and a name combo then there is another method to do that. I would use a cte to make a custom column and then do a cross apply to perform that dataset in a function for how many rows that are in the set. This will evaluate your function like it was executing it for each value and then posting the results. There is no need to union unless you are more specific for what you are wanting to do. You can take it even further and if you want the dependencies inline(as opposed to seperate rows) you can relate the dataset back to itself and then do an xml type to cast out the dependencies into a comma seperated list.
Example usage of both:
-- multi row example to show referencing relationships
with procs as
(
Select
schema_name(schema_id) + '.' + name as Name
from sys.procedures p
)
select
p.Name
, referenced_schema_name + '.' + referenced_entity_name as Ref
from procs p
cross apply sys.dm_sql_referenced_entities(p.Name,'OBJECT')
;
-- take it a step further and put relationships inside a single column
with procs as
(
Select
schema_name(schema_id) + '.' + name as Name
from sys.procedures p
)
, setup as
(
select
p.Name
, referenced_schema_name + '.' + referenced_entity_name as Ref
from procs p
cross apply sys.dm_sql_referenced_entities(p.Name,'OBJECT')
)
Select distinct
Name
, stuff(
(
select
', ' + Ref
from setup x
where x.Name = m.Name
for xml path('')
)
, 1, 2, '') as Dependencies
from setup m
Let's say we have a table with some data in it.
IF OBJECT_ID('dbo.table1') IS NOT NULL
BEGIN
DROP TABLE dbo.table1;
END
CREATE TABLE table1 ( DATA INT );
---------------------------------------------------------------------
-- Generating testing data
---------------------------------------------------------------------
INSERT INTO dbo.table1(data)
SELECT 100
UNION ALL
SELECT 200
UNION ALL
SELECT NULL
UNION ALL
SELECT 400
UNION ALL
SELECT 400
UNION ALL
SELECT 500
UNION ALL
SELECT NULL;
How to delete the 2nd, 5th, 6th records in the table? The order is defined by the following query.
SELECT data
FROM dbo.table1
ORDER BY data DESC;
Note, this is in SQL Server 2000 environment.
Thanks.
In short, you need something in the table to indicate sequence. The "2nd row" is a non-sequitur when there is nothing that enforces sequence. However, a possible solution might be (toy example => toy solution):
If object_id('tempdb..#NumberedData') Is Not Null
Drop Table #NumberedData
Create Table #NumberedData
(
Id int not null identity(1,1) primary key clustered
, data int null
)
Insert #NumberedData( data )
SELECT 100
UNION ALL SELECT 200
UNION ALL SELECT NULL
UNION ALL SELECT 400
UNION ALL SELECT 400
UNION ALL SELECT 500
UNION ALL SELECT NULL
Begin Tran
Delete table1
Insert table1( data )
Select data
From #NumberedData
Where Id Not In(2,5,6)
If ##Error <> 0
Commit Tran
Else
Rollback Tran
Obviously, this type of solution is not guaranteed to work exactly as you want but the concept is the best you will get. In essence, you stuff your rows into a table with an identity column and use that to identify the rows to remove. Removing the rows entails emptying the original table and re-populating with only the rows you want. Without a unique key of some kind, there just is no clean way of handling this problem.
As you are probably aware you can do this in later versions using row_number very straightforwardly.
delete t from
(select ROW_NUMBER() over (order by data) r from table1) t
where r in (2,5,6)
Even without that it is possible to use the undocumented %%LOCKRES%% function to differentiate between 2 identical rows
SELECT data,%%LOCKRES%%
FROM dbo.table1`
I don't think that's available in SQL Server 2000 though.
In SQL Sets don't have order but cursors do so you could use something like the below. NB: I was expecting to be able to use DELETE ... WHERE CURRENT OF but that relies on a PK so the code to delete a row is not as simple as I was hoping for.
In the event that the data to be deleted is a duplicate then there is no guarantee that it will delete the same row as CURRENT OF would have. However in this eventuality the ordering of the tied rows is arbitrary anyway so whichever row is deleted could equally well have been given that row number in the cursor ordering.
DECLARE #RowsToDelete TABLE
(
rowidx INT PRIMARY KEY
)
INSERT INTO #RowsToDelete SELECT 2 UNION SELECT 5 UNION SELECT 6
DECLARE #PrevRowIdx int
DECLARE #CurrentRowIdx int
DECLARE #Offset int
SET #CurrentRowIdx = 1
DECLARE #data int
DECLARE ordered_cursor SCROLL CURSOR FOR
SELECT data
FROM dbo.table1
ORDER BY data
OPEN ordered_cursor
FETCH NEXT FROM ordered_cursor INTO #data
WHILE EXISTS(SELECT * FROM #RowsToDelete)
BEGIN
SET #PrevRowIdx = #CurrentRowIdx
SET #CurrentRowIdx = (SELECT TOP 1 rowidx FROM #RowsToDelete ORDER BY rowidx)
SET #Offset = #CurrentRowIdx - #PrevRowIdx
DELETE FROM #RowsToDelete WHERE rowidx = #CurrentRowIdx
FETCH RELATIVE #Offset FROM ordered_cursor INTO #data
/*Can't use DELETE ... WHERE CURRENT OF as here that requires a PK*/
SET ROWCOUNT 1
DELETE FROM dbo.table1 WHERE (data=#data OR data IS NULL OR #data IS NULL)
SET ROWCOUNT 0
END
CLOSE ordered_cursor
DEALLOCATE ordered_cursor
To perform any action on a set of rows (such as deleting them), you need to know what identifies those rows.
So, you have to come up with criteria that identifies the rows you want to delete.
Providing a toy example, like the one above, is not particularly useful.
You plan ahead and if you anticipate this is possible you add a surrogate key column or some such.
In general you make sure you don't create tables without PK's.
It's like asking "Say I don't look both directions before crossing the road and I step in front of a bus."
I'm using Python to write to a postgres database:
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
But because some of my rows are identical, I get the following error:
psycopg2.IntegrityError: duplicate key value
violates unique constraint "hundred_pkey"
How can I write an 'INSERT unless this row already exists' SQL statement?
I've seen complex statements like this recommended:
IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF
But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?
Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:
INSERT ... ON CONFLICT DO NOTHING/UPDATE
It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.
How can I write an 'INSERT unless this row already exists' SQL statement?
There is a nice way of doing conditional INSERT in PostgreSQL:
INSERT INTO example_table
(id, name)
SELECT 1, 'John'
WHERE
NOT EXISTS (
SELECT id FROM example_table WHERE id = 1
);
CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT in the NOT EXISTS anti-semi-join and the INSERT itself. It can fail under such conditions.
One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.
So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXITS join to only join on the unique columns in the hundred table.
Create temporary table. See docs here.
CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
INSERT Data into temp table.
INSERT INTO temp_data(name, name_slug, status);
Add any indexes to the temp table.
Do main table insert.
INSERT INTO hundred(name, name_slug, status)
SELECT DISTINCT name, name_slug, status
FROM hundred
WHERE NOT EXISTS (
SELECT 'X'
FROM temp_data
WHERE
temp_data.name = hundred.name
AND temp_data.name_slug = hundred.name_slug
AND temp_data.status = status
);
Unfortunately, PostgreSQL supports neither MERGE nor ON DUPLICATE KEY UPDATE, so you'll have to do it in two statements:
UPDATE invoices
SET billed = 'TRUE'
WHERE invoices = '12345'
INSERT
INTO invoices (invoiceid, billed)
SELECT '12345', 'TRUE'
WHERE '12345' NOT IN
(
SELECT invoiceid
FROM invoices
)
You can wrap it into a function:
CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$$
UPDATE invoices
SET billed = $2
WHERE invoices = $1;
INSERT
INTO invoices (invoiceid, billed)
SELECT $1, $2
WHERE $1 NOT IN
(
SELECT invoiceid
FROM invoices
);
$$
LANGUAGE 'sql';
and just call it:
SELECT fn_upd_invoices('12345', 'TRUE')
This is exactly the problem I face and my version is 9.5
And I solve it with SQL query below.
INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
SELECT id FROM example_table WHERE id = 1
)
LIMIT 1;
Hope that will help someone who has the same issue with version >= 9.5.
Thanks for reading.
You can make use of VALUES - available in Postgres:
INSERT INTO person (name)
SELECT name FROM person
UNION
VALUES ('Bob')
EXCEPT
SELECT name FROM person;
I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:
Create Function ignore_dups() Returns Trigger
As $$
Begin
If Exists (
Select
*
From
hundred h
Where
-- Assuming all three fields are primary key
h.name = NEW.name
And h.hundred_slug = NEW.hundred_slug
And h.status = NEW.status
) Then
Return NULL;
End If;
Return NEW;
End;
$$ Language plpgsql;
Create Trigger ignore_dups
Before Insert On hundred
For Each Row
Execute Procedure ignore_dups();
Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:
sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))
Note that as #Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.
There is a nice way of doing conditional INSERT in PostgreSQL using WITH query:
Like:
WITH a as(
select
id
from
schema.table_name
where
column_name = your_identical_column_value
)
INSERT into
schema.table_name
(col_name1, col_name2)
SELECT
(col_name1, col_name2)
WHERE NOT EXISTS (
SELECT
id
FROM
a
)
RETURNING id
we can simplify the query using upsert
insert into invoices (invoiceid, billed)
values ('12345', 'TRUE')
on conflict (invoiceid) do
update set billed=EXCLUDED.billed;
INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":
BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
It's easy with rules:
CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING
But it fails with concurrent writes ...
The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180.
I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.
Using IF NOT FOUND THEN after a SELECT just works perfectly for me.
(described in PostgreSQL Documentation)
Example from documentation:
SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
RAISE EXCEPTION 'employee % not found', myname;
END IF;
psycopgs cursor class has the attribute rowcount.
This read-only attribute specifies the number of rows that the last
execute*() produced (for DQL statements like SELECT) or affected (for
DML statements like UPDATE or INSERT).
So you could try UPDATE first and INSERT only if rowcount is 0.
But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.
Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.
I suggest you insert an id as serial type to handly the primary key
If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows
INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred
+",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
hundred_pkey DO NOTHING;" cursor.execute(sql_string);
INSERT INTO invoices (invoiceid, billed) (
SELECT '12345','TRUE' WHERE NOT EXISTS (
SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
)
)
I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.
sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.
import json
def upsert(table_name, id_column, other_columns, values_hash):
template = """
WITH new_values ($$ALL_COLUMNS$$) as (
values
($$VALUES_LIST$$)
),
upsert as
(
update $$TABLE_NAME$$ m
set
$$SET_MAPPINGS$$
FROM new_values nv
WHERE m.$$ID_COLUMN$$ = nv.$$ID_COLUMN$$
RETURNING m.*
)
INSERT INTO $$TABLE_NAME$$ ($$ALL_COLUMNS$$)
SELECT $$ALL_COLUMNS$$
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.$$ID_COLUMN$$ = new_values.$$ID_COLUMN$$)
"""
all_columns = [id_column] + other_columns
all_columns_csv = ",".join(all_columns)
all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])
q = template
q = q.replace("$$TABLE_NAME$$", table_name)
q = q.replace("$$ID_COLUMN$$", id_column)
q = q.replace("$$ALL_COLUMNS$$", all_columns_csv)
q = q.replace("$$VALUES_LIST$$", all_values_csv)
q = q.replace("$$SET_MAPPINGS$$", set_mappings)
return q
def query_value(value):
if value is None:
return "NULL"
if type(value) in [str, unicode]:
return "'%s'" % value.replace("'", "''")
if type(value) == dict:
return "'%s'" % json.dumps(value).replace("'", "''")
if type(value) == bool:
return "%s" % value
if type(value) == int:
return "%s" % value
return value
if __name__ == "__main__":
my_table_name = 'mytable'
my_id_column = 'id'
my_other_columns = ['field1', 'field2']
my_values_hash = {
'id': 123,
'field1': "john",
'field2': "doe"
}
print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:
ALTER USER user SET search_path to 'name_of_schema';
after these changes "INSERT" will work correctly.
How to delete N oldest entries. I'm limited Sybase. I need to write a stored procedure which would accept a number X and then leave only X newest entries in the table.
For example:
Say ID is auto incremented. The smaller it is, the older this entry is.
ID Text
=========
1 ASD
2 DSA
3 HJK
4 OIU
I need a procedure which would be executed like this.
execute CleanUp 2
and the result will be
ID Text
=========
3 HJK
4 OIU
Note: SQL Server syntax, but should work
Delete from TableName where ID in
(select top N ID from TableName order by ID )
If you want N to be a parameter you will have to construct the statement string and execute it
declare #query varchar(4000)
set #query = 'Delete from TableName where ID in '
set #query = #query + '(select top ' + #N + ' ID from TableName order by ID )'
exec sp_executesql #query
I Like Eduardo's option best as it's the simplest solution, but since Sergej mentions it is quite slow, here's an alternative solution:
Create a stored procedure that does the following:
Create a temp table with the same structure as the original table.
Insert the top N rows into the temp table.
Truncate the original table.
Copy the rows from the temp table back to the original table.
Generally this will be much faster, especially if you have lots of rows in the table.
If you have a clustered index on id, it's safe to execute a delete top query.
delete top 2 from TableName;
I know this is an old question but this can be done without constructing the statement as the top answer say, using a CTE :
WITH MyCTE AS
(
SELECT Field1, Field2, ROW_NUMBER() OVER (ORDER BY Field1 ASC) AS RowNum
FROM MyTable
WHERE Field2 = #WhatIWant
)
DELETE FROM MyCTE WHERE RowNum <= #NbRowsToDelete;