Get all instances of primary keys of a table - postgresql

This is a simple example of what I need, for any given table, I need to get all the instances of the primary keys, this is a little example, but I need a generic way to do it.
create table foo
(
a numeric
,b text
,c numeric
constraint pk_foo primary key (a,b)
)
insert into foo(a,b,c) values (1,'a',1),(2,'b',2),(3,'c',3);
select <the magical thing>
result
a|b
1 |1|a|
2 |2|b|
3 |3|c|
.. ...
I need to control if the instances of the primary keys are changed by the user, but I don't want to repeat code in too many tables! I need a generic way to do it, I will put <the magical thing>
in a function to put it on a trigger before update and blah blah blah...

In PostgreSQL you must always provide a resulting type for a query. However, you can obtain the code of the query you need, and then execute the query from the client:
create or replace function get_key_only_sql(regclass) returns string as $$
select 'select '|| (
select string_agg(quote_ident(att.attname), ', ' order by col)
from pg_index i
join lateral unnest(indkey) col on (true)
join pg_attribute att on (att.attrelid = i.indrelid and att.attnum = col)
where i.indrelid = $1 and i.indisprimary
group by i.indexrelid
limit 1) || ' from '||$1::text
end;
$$ language sql;
Here's some client pseudocode using the function above:
sql = pgexecscalar("select get_key_only_sql('mytable'::regclass)");
rs = pgopen(sql);

Related

Concatenate string instead of just replacing it

I have a table with standard columns where I want to perform regular INSERTs.
But one of the columns is of type varchar with special semantics. It's a string that's supposed to behave as a set of strings, where the elements of the set are separated by commas.
Eg. if one row has in that varchar column the value fish,sheep,dove, and I insert the string ,fish,eagle, I want the result to be fish,sheep,dove,eagle (ie. eagle gets added to the set, but fish doesn't because it's already in the set).
I have here this Postgres code that does the "set concatenation" that I want:
SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array('fish,sheep,dove' || ',fish,eagle', ','))) AS x;
But I can't figure out how to apply this logic to insertions.
What I want is something like:
CREATE TABLE IF NOT EXISTS t00(
userid int8 PRIMARY KEY,
a int8,
b varchar);
INSERT INTO t00 (userid,a,b) VALUES (0,1,'fish,sheep,dove');
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x;
How can I achieve something like that?
Storing comma separated values is a huge mistake to begin with. But if you really want to make your life harder than it needs to be, you might want to create a function that merges two comma separated lists:
create function merge_lists(p_one text, p_two text)
returns text
as
$$
select string_agg(item, ',')
from (
select e.item
from unnest(string_to_array(p_one, ',')) as e(item)
where e.item <> '' --< necessary because of the leading , in your data
union
select t.item
from unnest(string_to_array(p_two, ',')) t(item)
where t.item <> ''
) t;
$$
language sql;
If you are using Postgres 14 or later, unnest(string_to_array(..., ',')) can be replace with string_to_table(..., ',')
Then your INSERT statement gets a bit simpler:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = merge_lists(excluded.b, t00.b);
I think I was only missing parentheses around the SELECT statement:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = (SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x);

SELECT clause in FOR LOOP control using plpgsql

I try to make a script as to output all the foos that are used by only one user, if a foo is used by more than one user, it shouldn't be outputed.
here's my tables
foos (id, value)
users (id, name)
used (foo_id, user_id)
and my not working script
FUNCTION output_unshared_foos ()
RETURNS foos AS
$a$
DECLARE
foocounts RECORD;
BEGIN
SELECT u.foo_id, count(*)
INTO foocounts -- store in the local variable
FROM used u
GROUP BY u.foo_id;
FOR f IN SELECT * FROM foos
LOOP
IF (SELECT fc.count < 2 FROM foocounts fc WHERE fc.foo_id = f.id) THEN
RETURN NEXT f;
END IF;
END LOOP;
END
$a$ language plpgsql;
doesn't seem to work, every rows are returned and the conditional control seems to be always true.
Your first problem is that you can't store the result of a query that returns more than one row into a single variable (the SELECT u.foo_id, count(*) INTO ... part). I'm surprised you don't get a runtime error when you call your function.
Your function also doesn't compile because the record f is not declared and a functioned defined as returns foos can't use return next
But your approach is wrong (even if it worked). Doing row-by-row processing is almost always the wrong choice in SQL. SQL and relational databases are meant to handle sets, not single rows.
Your problem can be solved with a single query:
select foo_id
from used
group by foo_id
having count(distinct user_id) = 1
will return all foo ids that are used by exactly one user.
If you need the additional information from the foos table, you can join the above query to the foos table:
select f.*
from foos f
join (
select foo_id
from used
group by foo_id
having count(distinct user_id) = 1
) u on f.id = u.foo_id

Get columns that differ between 2 rows

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I find 2 companies that potentially are the same, but I need to know which values (columns) differ between these 2 rows in order to continue.
I think it is possible to compare column by column x 60, but I search for a simpler and more generic solution.
Something like:
SELECT * FROM company where co_id=22
SHOW DIFFERENCE
SELECT * FROM company where co_id=33
The result should be the column names that differ.
For this you may use an intermediate key/value representation of the rows, with JSON functions or alternatively with the hstore extension (now only of historical interest). JSON comes built-in with every reasonably recent version of PostgreSQL, whereas hstore must be installed in the database with CREATE EXTENSION.
Demo:
CREATE TABLE table1 (id int primary key, t1 text, t2 text, t3 text);
Let's insert two rows that differ by the primary key and one other column (t3).
INSERT INTO table1 VALUES
(1,'foo','bar','baz'),
(2,'foo','bar','biz');
Solution with json
First with get a key/value representation of the rows with the original row number, then we pair the rows based on their original row number and
filter out those with the same "value" column
WITH rowcols AS (
select rn, key, value
from (select row_number() over () as rn,
row_to_json(table1.*) as r from table1) AS s
cross join lateral json_each_text(s.r)
)
select r1.key from rowcols r1 join rowcols r2
on (r1.rn=r2.rn-1 and r1.key = r2.key)
where r1.value <> r2.value;
Sample result:
key
-----
id
t3
Solution with hstore
SELECT skeys(h1-h2) from
(select hstore(t.*) as h1 from table1 t where id=1) h1
CROSS JOIN
(select hstore(t.*) as h2 from table1 t where id=2) h2;
h1-h2 computes the difference key by key and skeys() outputs the result as a set.
Result:
skeys
-------
id
t3
The select-list might be refined with skeys((h1-h2)-'id'::text) to always remove id which, as the primary key, will obviously always differ between rows.
Here's a stored procedure that should get you most of the way...
While this should work "as is", it has no error checking, which you should add.
It gets all the columns in the table, and loops over them. A difference is when the count of the distinct items is more than one.
Also, the output is:
The count of the number of differences
Messages for each column where there is a difference
It might be more useful to return a rowset of the columns with the differences. Anyway, good luck!
Usage:
SELECT showdifference('public','company','co_id',22,33)
CREATE OR REPLACE FUNCTION showdifference(p_schema text, p_tablename text,p_idcolumn text,p_firstid integer, p_secondid integer)
RETURNS INTEGER AS
$BODY$
DECLARE
l_diffcount INTEGER;
l_column text;
l_dupcount integer;
column_cursor CURSOR FOR select column_name from information_schema.columns where table_name = p_tablename and table_schema = p_schema and column_name <> p_idcolumn;
BEGIN
-- need error checking here, to ensure the table and schema exist and the columns exist
-- Should also check that the records ids exist.
-- Should also check that the column type of the id field is integer
-- Set the number of differences to zero.
l_diffcount := 0;
-- use a cursor to iterate over the columns found in information_schema.columns
-- open the cursor
OPEN column_cursor;
LOOP
FETCH column_cursor INTO l_column;
EXIT WHEN NOT FOUND;
-- build a query to see if there is a difference between the columns. If there is raise a notice
EXECUTE 'select count(distinct ' || quote_ident(l_column) || ' ) from ' || quote_ident(p_schema) || '.' || quote_ident(p_tablename) || ' where ' || quote_ident(p_idcolumn) || ' in ('|| p_firstid || ',' || p_secondid ||')'
INTO l_dupcount;
IF l_dupcount > 1 THEN
-- increment the counter
l_diffcount := l_diffcount +1;
RAISE NOTICE '% has % differences', l_column, l_dupcount ; -- for "real" you might want to return a rowset and could do something here
END IF;
END LOOP;
-- close the cursor
CLOSE column_cursor;
RETURN l_diffcount;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
COST 100;

How to work around the "Recursive CTE member can refer itself only in FROM clause" requirement?

I'm trying to run a graph search to find all nodes accessible from a starting point, like so:
with recursive
nodes_traversed as (
select START_NODE ID
from START_POSITION
union all
select ed.DST_NODE
from EDGES ed
join nodes_traversed NT
on (NT.ID = ed.START_NODE)
and (ed.DST_NODE not in (select ID from nodes_traversed))
)
select distinct * from nodes_traversed
Unfortunately, when I try to run that, I get an error:
Recursive CTE member (nodes_traversed) can refer itself only in FROM clause.
That "not in select" clause is important to the recursive expression, though, as it provides the ending point. (Without it, you get infinite recursion.) Using generation counting, like in the accepted answer to this question, would not help, since this is a highly cyclic graph.
Is there any way to work around this without having to create a stored proc that does it iteratively?
Here is my solution that use global temporary table, I have limited recursion by level and nodes from temporary table.
I am not sure how it will work on large set of data.
create procedure get_nodes (
START_NODE integer)
returns (
NODE_ID integer)
as
declare variable C1 integer;
declare variable C2 integer;
begin
/**
create global temporary table id_list(
id integer
);
create index id_list_idx1 ON id_list (id);
*/
delete from id_list;
while ( 1 = 1 ) do
begin
select count(distinct id) from id_list into :c1;
insert into id_list
select id from
(
with recursive nodes_traversed as (
select :START_NODE AS ID , 0 as Lv
from RDB$DATABASE
union all
select ed.DST_NODE , Lv+1
from edges ed
join nodes_traversed NT
on
(NT.ID = ed.START_NODE)
and nt.Lv < 5 -- Max recursion level
and nt.id not in (select id from id_list)
)
select distinct id from nodes_traversed);
select count(distinct id) from id_list into :c2;
if (c1 = c2) then break;
end
for select distinct id from id_list into :node_id do
begin
suspend ;
end
end

Postgres: INSERT if does not exist already

I'm using Python to write to a postgres database:
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
But because some of my rows are identical, I get the following error:
psycopg2.IntegrityError: duplicate key value
violates unique constraint "hundred_pkey"
How can I write an 'INSERT unless this row already exists' SQL statement?
I've seen complex statements like this recommended:
IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF
But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?
Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:
INSERT ... ON CONFLICT DO NOTHING/UPDATE
It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.
How can I write an 'INSERT unless this row already exists' SQL statement?
There is a nice way of doing conditional INSERT in PostgreSQL:
INSERT INTO example_table
(id, name)
SELECT 1, 'John'
WHERE
NOT EXISTS (
SELECT id FROM example_table WHERE id = 1
);
CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT in the NOT EXISTS anti-semi-join and the INSERT itself. It can fail under such conditions.
One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.
So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXITS join to only join on the unique columns in the hundred table.
Create temporary table. See docs here.
CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
INSERT Data into temp table.
INSERT INTO temp_data(name, name_slug, status);
Add any indexes to the temp table.
Do main table insert.
INSERT INTO hundred(name, name_slug, status)
SELECT DISTINCT name, name_slug, status
FROM hundred
WHERE NOT EXISTS (
SELECT 'X'
FROM temp_data
WHERE
temp_data.name = hundred.name
AND temp_data.name_slug = hundred.name_slug
AND temp_data.status = status
);
Unfortunately, PostgreSQL supports neither MERGE nor ON DUPLICATE KEY UPDATE, so you'll have to do it in two statements:
UPDATE invoices
SET billed = 'TRUE'
WHERE invoices = '12345'
INSERT
INTO invoices (invoiceid, billed)
SELECT '12345', 'TRUE'
WHERE '12345' NOT IN
(
SELECT invoiceid
FROM invoices
)
You can wrap it into a function:
CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$$
UPDATE invoices
SET billed = $2
WHERE invoices = $1;
INSERT
INTO invoices (invoiceid, billed)
SELECT $1, $2
WHERE $1 NOT IN
(
SELECT invoiceid
FROM invoices
);
$$
LANGUAGE 'sql';
and just call it:
SELECT fn_upd_invoices('12345', 'TRUE')
This is exactly the problem I face and my version is 9.5
And I solve it with SQL query below.
INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
SELECT id FROM example_table WHERE id = 1
)
LIMIT 1;
Hope that will help someone who has the same issue with version >= 9.5.
Thanks for reading.
You can make use of VALUES - available in Postgres:
INSERT INTO person (name)
SELECT name FROM person
UNION
VALUES ('Bob')
EXCEPT
SELECT name FROM person;
I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:
Create Function ignore_dups() Returns Trigger
As $$
Begin
If Exists (
Select
*
From
hundred h
Where
-- Assuming all three fields are primary key
h.name = NEW.name
And h.hundred_slug = NEW.hundred_slug
And h.status = NEW.status
) Then
Return NULL;
End If;
Return NEW;
End;
$$ Language plpgsql;
Create Trigger ignore_dups
Before Insert On hundred
For Each Row
Execute Procedure ignore_dups();
Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:
sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))
Note that as #Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.
There is a nice way of doing conditional INSERT in PostgreSQL using WITH query:
Like:
WITH a as(
select
id
from
schema.table_name
where
column_name = your_identical_column_value
)
INSERT into
schema.table_name
(col_name1, col_name2)
SELECT
(col_name1, col_name2)
WHERE NOT EXISTS (
SELECT
id
FROM
a
)
RETURNING id
we can simplify the query using upsert
insert into invoices (invoiceid, billed)
values ('12345', 'TRUE')
on conflict (invoiceid) do
update set billed=EXCLUDED.billed;
INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":
BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
It's easy with rules:
CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING
But it fails with concurrent writes ...
The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180.
I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.
Using IF NOT FOUND THEN after a SELECT just works perfectly for me.
(described in PostgreSQL Documentation)
Example from documentation:
SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
RAISE EXCEPTION 'employee % not found', myname;
END IF;
psycopgs cursor class has the attribute rowcount.
This read-only attribute specifies the number of rows that the last
execute*() produced (for DQL statements like SELECT) or affected (for
DML statements like UPDATE or INSERT).
So you could try UPDATE first and INSERT only if rowcount is 0.
But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.
Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.
I suggest you insert an id as serial type to handly the primary key
If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows
INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred
+",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
hundred_pkey DO NOTHING;" cursor.execute(sql_string);
INSERT INTO invoices (invoiceid, billed) (
SELECT '12345','TRUE' WHERE NOT EXISTS (
SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
)
)
I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.
sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.
import json
def upsert(table_name, id_column, other_columns, values_hash):
template = """
WITH new_values ($$ALL_COLUMNS$$) as (
values
($$VALUES_LIST$$)
),
upsert as
(
update $$TABLE_NAME$$ m
set
$$SET_MAPPINGS$$
FROM new_values nv
WHERE m.$$ID_COLUMN$$ = nv.$$ID_COLUMN$$
RETURNING m.*
)
INSERT INTO $$TABLE_NAME$$ ($$ALL_COLUMNS$$)
SELECT $$ALL_COLUMNS$$
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.$$ID_COLUMN$$ = new_values.$$ID_COLUMN$$)
"""
all_columns = [id_column] + other_columns
all_columns_csv = ",".join(all_columns)
all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])
q = template
q = q.replace("$$TABLE_NAME$$", table_name)
q = q.replace("$$ID_COLUMN$$", id_column)
q = q.replace("$$ALL_COLUMNS$$", all_columns_csv)
q = q.replace("$$VALUES_LIST$$", all_values_csv)
q = q.replace("$$SET_MAPPINGS$$", set_mappings)
return q
def query_value(value):
if value is None:
return "NULL"
if type(value) in [str, unicode]:
return "'%s'" % value.replace("'", "''")
if type(value) == dict:
return "'%s'" % json.dumps(value).replace("'", "''")
if type(value) == bool:
return "%s" % value
if type(value) == int:
return "%s" % value
return value
if __name__ == "__main__":
my_table_name = 'mytable'
my_id_column = 'id'
my_other_columns = ['field1', 'field2']
my_values_hash = {
'id': 123,
'field1': "john",
'field2': "doe"
}
print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:
ALTER USER user SET search_path to 'name_of_schema';
after these changes "INSERT" will work correctly.