Generic trigger to restrict insertions based on count - postgresql

Background
In a PostgreSQL 9.0 database, there are various tables that have many-to-many relationships. The number of those relationships must be restricted. A couple of example tables include:
CREATE TABLE authentication (
id bigserial NOT NULL, -- Primary key
cookie character varying(64) NOT NULL, -- Authenticates the user with a cookie
ip_address character varying(40) NOT NULL -- Device IP address (IPv6-friendly)
)
CREATE TABLE tag_comment (
id bigserial NOT NULL, -- Primary key
comment_id bigint, -- Foreign key to the comment table
tag_name_id bigint -- Foreign key to the tag name table
)
Different relationships, however, have different limitations. For example, in the authentication table, a given ip_address is allowed 1024 cookie values; whereas, in the tag_comment table, each comment_id can have 10 associated tag_name_ids.
Problem
Currently, a number of functions have these restrictions hard-coded; scattering the limitations throughout the database, and preventing them from being changed dynamically.
Question
How would you impose a maximum many-to-many relationship limit on tables in a generic fashion?
Idea
Create a table to track the limits:
CREATE TABLE imposed_maximums (
id serial NOT NULL,
table_name character varying(128) NOT NULL,
column_group character varying(128) NOT NULL,
column_count character varying(128) NOT NULL,
max_size INTEGER
)
Establish the restrictions:
INSERT INTO imposed_maximums
(table_name, column_group, column_count, max_size) VALUES
('authentication', 'ip_address', 'cookie', 1024);
INSERT INTO imposed_maximums
(table_name, column_group, column_count, max_size) VALUES
('tag_comment', 'comment_id', 'tag_id', 10);
Create a trigger function:
CREATE OR REPLACE FUNCTION impose_maximum()
RETURNS trigger AS
$BODY$
BEGIN
-- Join this up with imposed_maximums somehow?
select
count(1)
from
-- the table name
where
-- the group column = NEW value to INSERT;
RETURN NEW;
END;
Attach the trigger to every table:
CREATE TRIGGER trigger_authentication_impose_maximum
BEFORE INSERT
ON authentication
FOR EACH ROW
EXECUTE PROCEDURE impose_maximum();
Obviously it won't work as written... is there a way to make it work, or otherwise enforce the restrictions such that they are:
in a single location; and
not hard-coded?
Thank you!

I've been doing a similar type of generic triggers.
The most tricky part is to get the value entry in the NEW record based on the column name.
I'm doing it the following way:
convert NEW data into array;
find the attnum of the column and use it as an index for the array.
This approach works as long as there're no commas in the data :( I don't know of other ways how to convert NEW or OLD variables into the array of values.
The following function might help:
CREATE OR REPLACE FUNCTION impose_maximum() RETURNS trigger AS $impose_maximum$
DECLARE
_sql text;
_cnt int8;
_vals text[];
_anum int4;
_im record;
BEGIN
_vals := string_to_array(translate(trim(NEW::text), '()', ''), ',');
FOR _im IN SELECT * FROM imposed_maximums WHERE table_name = TG_TABLE_NAME LOOP
SELECT attnum INTO _anum FROM pg_catalog.pg_attribute a
JOIN pg_catalog.pg_class t ON t.oid = a.attrelid
WHERE t.relkind = 'r' AND t.relname = TG_TABLE_NAME
AND NOT a.attisdropped AND a.attname = _im.column_group;
_sql := 'SELECT count('||quote_ident(_im.column_count)||')'||
' FROM '||quote_ident(_im.table_name)||
' WHERE '||quote_ident(_im.column_group)||' = $1';
EXECUTE _sql INTO _cnt USING _vals[_anum];
IF _cnt > CAST(_im.max_size AS int8) THEN
RAISE EXCEPTION 'Maximum of % hit for column % in table %(%=%)',
_im.max_size, _im.column_count,
_im.table_name, _im.column_group, _vals[_anum];
END IF;
END LOOP;
RETURN NEW;
END; $impose_maximum$ LANGUAGE plpgsql;
This function will check for all conditions defined for a given table.

Yes, there is a way to make it work.
In my personal opinion your idea is the way to go. It just needs one level of "meta". So, the table imposed_restrictions should have trigger(s), which is (are) fired after insert, update and delete. The code should then in turn create, modify or remove triggers and functions.
Take a look at execute statement of PL/pgSQL, which - essentially - allows you to execute any string. Needless to say, this string may contain definitions of triggers, functions, etc. Obviously, you have the access to OLD and NEW in the triggers, so you can fill in the placeholders in the string and you are done.
I believe you should be able to accomplish what you want with this answer. Please note that this is my personal view on the topic and it might not be an optimal solution - I would like to see a different, maybe also more efficient, approach.
Edit - Below is a sample from one of my old projects. It is located inside the function that is triggered before update (though now I get to think of it, maybe it should have been called after ;) And yes, the code is messy, as it does not use the nice $escape$ syntax. I was really, really young then. Nonetheless, the snipped demonstrates that it is possible to achieve what you want.
query:=''CREATE FUNCTION '' || NEW.function_name || ''('';
IF NEW.parameter=''t'' THEN
query:=query || ''integer'';
END IF;
query:=query || '') RETURNS setof '' || type_name || '' AS'' || chr(39);
query:=query || '' DECLARE list '' || type_name || ''; '';
query:=query || ''BEGIN '';
query:=query || '' FOR list IN EXECUTE '' || chr(39) || chr(39);
query:=query || temp_s || '' FROM '' || NEW.table_name;
IF NEW.parameter=''t'' THEN
query:=query || '' WHERE id='' || chr(39) || chr(39) || ''||'' || chr(36) || ''1'';
ELSE
query:=query || '';'' || chr(39) || chr(39);
END IF;
query:=query || '' LOOP RETURN NEXT list; '';
query:=query || ''END LOOP; RETURN; END; '' || chr(39);
query:=query || ''LANGUAGE '' || chr(39) || ''plpgsql'' || chr(39) || '';'';
EXECUTE query;

These function + trigger could be used as a template. If You combine them with #Sorrow 's technique of dynamically generating the functions + triggers, this could solve the OP's problem.
Please note that, instead of recalculating the count for every affected row (by calling the COUNT() aggregate function), I maintain an 'incremental' count. This should be cheaper.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path='tmp';
CREATE TABLE authentication
( id bigserial NOT NULL -- Primary key
, cookie varchar(64) NOT NULL -- Authenticates the user with a cookie
, ip_address varchar(40) NOT NULL -- Device IP address (IPv6-friendly)
, PRIMARY KEY (ip_address, cookie)
);
CREATE TABLE authentication_ip_count (
ip_address character varying(40) NOT NULL
PRIMARY KEY -- REFERENCES authentication(ip_address)
, refcnt INTEGER NOT NULL DEFAULT 0
--
-- This is much easyer:
-- keep the max value inside the table
-- + use a table constraint
-- , maxcnt INTEGER NOT NULL DEFAULT 2 -- actually 100
-- , CONSTRAINT no_more_cookies CHECK (refcnt <= maxcnt)
);
CREATE TABLE imposed_maxima
( id serial NOT NULL
, table_name varchar NOT NULL
, column_group varchar NOT NULL
, column_count varchar NOT NULL
, max_size INTEGER NOT NULL
, PRIMARY KEY (table_name,column_group,column_count)
);
INSERT INTO imposed_maxima(table_name,column_group,column_count,max_size)
VALUES('authentication','ip_address','cookie', 2);
CREATE OR REPLACE FUNCTION authentication_impose_maximum()
RETURNS trigger AS
$BODY$
DECLARE
dummy INTEGER;
BEGIN
IF (TG_OP = 'INSERT') THEN
INSERT INTO authentication_ip_count (ip_address)
SELECT sq.*
FROM ( SELECT NEW.ip_address) sq
WHERE NOT EXISTS (
SELECT *
FROM authentication_ip_count nx
WHERE nx.ip_address = sq.ip_address
);
UPDATE authentication_ip_count
SET refcnt = refcnt + 1
WHERE ip_address = NEW.ip_address
;
SELECT COUNT(*) into dummy -- ac.refcnt, mx.max_size
FROM authentication_ip_count ac
JOIN imposed_maxima mx ON (1=1) -- outer join
WHERE ac.ip_address = NEW.ip_address
AND mx.table_name = 'authentication'
AND mx.column_group = 'ip_address'
AND mx.column_count = 'cookie'
AND ac.refcnt > mx.max_size
;
IF FOUND AND dummy > 0 THEN
RAISE EXCEPTION 'Cookie moster detected';
END IF;
ELSIF (TG_OP = 'DELETE') THEN
UPDATE authentication_ip_count
SET refcnt = refcnt - 1
WHERE ip_address = OLD.ip_address
;
DELETE FROM authentication_ip_count ac
WHERE ac.ip_address = OLD.ip_address
AND ac.refcnt <= 0
;
-- ELSIF (TG_OP = 'UPDATE') THEN
-- (Only needed if we allow updates of ip-address)
-- otherwise the count stays the same.
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER trigger_authentication_impose_maximum
BEFORE INSERT OR UPDATE OR DELETE
ON authentication
FOR EACH ROW
EXECUTE PROCEDURE authentication_impose_maximum();
-- Test it ...
INSERT INTO authentication(ip_address, cookie) VALUES ('1.2.3.4', 'Some koekje' );
INSERT INTO authentication(ip_address, cookie) VALUES ('1.2.3.4', 'kaakje' );
INSERT INTO authentication(ip_address, cookie) VALUES ('1.2.3.4', 'Yet another cookie' );
RESULTS:
INSERT 0 1
CREATE FUNCTION
CREATE TRIGGER
INSERT 0 1
INSERT 0 1
ERROR: Cookie moster detected

Related

Cannot Get Dynamic Exec of SP to Return INOUT Param

Using PostgreSQL 13.2, wherein a stored procedure (the Requestor) is given a name of a list of stored procedures to run (the job group). All sp's executed this way are coded to write a log record as their last task. I have chosen to pull that 'append log' code from all of the sp's, and instead send back the log record (always a single record) using an INOUT rowtype param, but have run into trouble. In my example below, the requestor sp will load the records returned from the sp's it calls into a temp table shaped like the permanent log table.
That permanent table looks like this:
create table public.job_log (
log_id integer,
event_id integer,
job_id integer,
rows_affected integer);
Any one of the jobs that is executed by the requestor sp might look like this one:
CREATE OR REPLACE procedure public.get_log_rcd(
inout p_log_rcd public.job_log)
LANGUAGE 'plpgsql'
as
$BODY$
declare
v_log_id integer = 40;
v_event_id integer = 698;
v_job_id integer = 45;
v_rows_affected integer = 60;
begin
select
v_log_id
, v_event_id
, v_job_id
, v_rows_affected
into
p_log_rcd.log_id,
p_log_rcd.event_id,
p_log_rcd.job_id,
p_log_rcd.rows_affected;
end;
$BODY$
This sample sp doesn't do anything--it's purpose here is only to simulate initialize of the log parameters to return to caller.
Again, the requestor sp that's going to run jobs like the one above creates a temp table with the same structure as the permanent log:
drop table if exists tmp_log_cache;
create temp table tmp_log_cache as table public.job_log with no data;
If the requestor sp didn't have to do dynamic SQL, it would look something like this block here:
do
$$
declare
big_local public.job_log;
begin
call public.get_log_rcd( big_local );
insert into tmp_log_cache (
log_id
, event_id
, job_id
, rows_affected )
values (
big_local.log_id
, big_local.event_id
, big_local.job_id
, big_local.rows_affected);
end;
$$;
Doing a
select * from tmp_log_cache;
Returns a row containing the 4 column values expected, all is well. But, dynamic execution is required. And, as I'm sure most folks here know, the following dog don't hunt:
do
$$
declare
big_local public.job_log;
v_query_text varchar;
v_job_name varchar = 'public.get_log_rcd';
begin
select 'call ' || v_job_name || '( $1 );'
into v_query_text;
execute v_query_text using big_local::public.job_log;
insert into tmp_log_cache (
log_id
, event_id
, job_id
, rows_affected )
values (
big_local.log_id
, big_local.event_id
, big_local.job_id
, big_local.rows_affected);
end;
$$;
The above dynamic statement executes without error, but the insert statement only has NULL values to work with--a row is inserted, all nulls. Any suggestions warmly welcomed. The sp's that comprise the various job groups could probably have been implemented as functions, although in all cases their primary tasks are to massage, normalize, cleanse telemetry data, not to spit anything out, per se.
Hmm, the documentation states that "parameter symbols (...) only work in SELECT, INSERT, UPDATE, and DELETE commands.", so this probably isn't possible using parameters.
But as a workaround you can build a dynamic DO and include a variable to get the values and the INSERT in there.
DO
$o$
DECLARE
v_query_text varchar;
v_job_name varchar := format('%I.%I',
'public',
'get_log_rcd');
BEGIN
v_query_text := concat('DO ',
'$i$ ',
'DECLARE ',
' big_local public.job_log; ',
'BEGIN ',
' CALL ', v_job_name, '(big_local); ',
' INSERT INTO tmp_log_cache ',
' (log_id, ',
' event_id, ',
' job_id, ',
' rows_affected) ',
' VALUES (big_local.log_id, ',
' big_local.event_id, ',
' big_local.job_id, '
' big_local.rows_affected); ',
'END; ',
'$i$; ');
EXECUTE v_query_text;
END;
$o$;
db<>fiddle
Thanks--I would not have considered the ability to execute a 'do' using execute. It just would not have occurred to me. Well, here's my solution: flip to functions.
With the understanding that my 'Requestor' is only given sp's to run because that's what we had to do with SQL Server and it was reflex, I did the 1-line change needed to flip my example sp above to a function:
CREATE OR REPLACE function public.get_log_rcdf(
inout p_log_rcd public.job_log)
returns public.job_log
LANGUAGE 'plpgsql'
as
$BODY$
declare
v_log_id integer = 40;
v_event_id integer = 698;
v_job_id integer = 45;
v_rows_affected integer = 60;
begin
select
v_log_id
, v_event_id
, v_job_id
, v_rows_affected
into
p_log_rcd.log_id,
p_log_rcd.event_id,
p_log_rcd.job_id,
p_log_rcd.rows_affected;
end;
$BODY$
In fact, the change to a function required the addition of a RETURNS line. Done. Then, the dynamic call was tweaked to a SELECT and the execute modified with an INTO:
do
$$
declare
big_local public.job_log;
v_query_text varchar;
v_job_name varchar = 'public.get_log_rcdf';
begin
select 'select * from ' || v_job_name || '( $1 );'
into v_query_text;
raise info 'SQL text is: %', v_query_text;
execute v_query_text into big_local using big_local;
insert into tmp_log_cache (
log_id
, event_id
, job_id
, rows_affected )
values (
big_local.log_id
, big_local.event_id
, big_local.job_id
, big_local.rows_affected);
end;
$$;
and the process now works exactly as desired. I tidy up my handling of the dynamic function name as illustrated in the first answer, and I think we're done here.

pl/pgsql CTE insert with parent child tables and array of ROWTYPE

I have two history tables. One is the parent and the second is the detail. In this case they are history tables that track changes in another table.
CREATE TABLE IF NOT EXISTS history (
id serial PRIMARY KEY,
tablename text,
row_id integer,
ts timestamp,
username text,
source text,
action varchar(10)
);
CREATE TABLE IF NOT EXISTS history_detail (
id serial PRIMARY KEY,
master_id integer NOT NULL references history(id),
colname text,
oldval text,
newval text
);
I then have function that will compare an existing row with a new row. The compare seems like a straight forward to me. The part I am struggling with is when I want to insert the differences into my history tables. During the compare I am storing the differences into an array of history_detail, of course at that time I do not know what the id or the parent table row will be. That is where I am getting hung up.
CREATE OR REPLACE FUNCTION update_prescriber(_npi integer, colnames text[]) RETURNS VOID AS $$
DECLARE
t text[];
p text[];
pos integer := 0;
ts text;
updstmt text := '';
sstmt text := '';
colname text;
_id integer;
_tstr text := '';
_dtl history_detail%ROWTYPE;
_dtls history_detail[] DEFAULT '{}';
BEGIN
-- get the master table row id.
SELECT id INTO _id FROM master WHERE npi = _npi;
-- these select all the rows' column values cast as text.
SELECT unnest_table('tempmaster', 'WHERE npi = ''' || _npi || '''') INTO t;
SELECT unnest_table('master', 'WHERE npi = ''' || _npi || '''') INTO p;
-- go through the arrays and compare values
FOREACH ts IN ARRAY t
LOOP
pos := pos + 1;
-- pos + 1 becuse the master table has the ID column
IF p[pos + 1] != ts THEN
colname := colnames[pos];
updstmt := updstmt || ', ' || colname || '=t.' || colname;
sstmt := sstmt || ',' || colname;
_dtl.colname := colname;
_dtl.oldval := p[pos + 1];
_dtl.newval := ts;
_dtls := array_append(dtls, dtl);
RAISE NOTICE 'THERE IS a difference at for COLUMN %, old: %, new: %', colname, p[pos + 1], ts;
END IF;
END LOOP;
RAISE NOTICE 'dtls length: %', array_length(dtls,1);
RAISE NOTICE 'dtls: %', dtls;
RAISE NOTICE 'done comparing: %', updstmt;
IF length(updstmt) > 0 THEN
WITH hist AS (
INSERT INTO history
(tablename, row_id, ts, username, source, action)
VALUES
('master', _id, current_timestamp, 'me', 'source', 'update')
RETURNING *
), dtls AS (
SELECT hist.id_
INSERT INTO history_detail
--
-- this is where I am having trouble
--
;
_tstr := 'UPDATE master
SET ' || substr(updstmt,2) || '
FROM (SELECT ' || substr(sstmt,2) || ' FROM tempmaster WHERE npi = ''' || _npi || ''') AS t
WHERE master.id = ' || _id || ';';
EXECUTE _tstr;
END IF;
END;
$$ LANGUAGE plpgsql;
In an ideal world I would be able to do all of this in a statement. I know I could do it in multiple statements wrapped inside another BEGIN..END. I would like to make sure that I do it in the most efficient way possible. I don't think that there is a way to get rid of the dynamic EXECUTE, but hopefully someone smarter than me can push me in the right direction.
Thanks for any help.
I was able to create a statement that would insert into the 2 history tables at once.
WITH hist AS (
INSERT INTO history
(tablename, row_id, ts, username, source, action)
VALUES
('master', _id, current_timestamp, 'me', 'source', 'update')
RETURNING id
), dtls AS (
SELECT (my_row).*
FROM unnest(_dtls) my_row
), inserts AS (
SELECT hist.id AS master_id,
dtls.colname AS colname,
dtls.newval AS newval,
dtls.oldval AS oldval
FROM dtls,hist
)
INSERT INTO history_detail
(master_id, colname, newval, oldval)
SELECT * FROM inserts
;
I'd still like to add the column update as something that isn't an EXECUTE statement, but I really don't think that is possible.

Performance of joining on multiple columns with potential NULL values

Lets say we have the following table
CREATE TABLE my_table
(
record_id SERIAL,
column_1 INTEGER,
column_2 INTEGER,
column_3 INTEGER,
price NUMERIC
);
With the following data
INSERT INTO my_table (column_1, column_2, column_3, price) VALUES
(1, NULL, 1, 54.99),
(1, NULL, 1, 69.50),
(NULL, 2, 2, 54.99),
(NULL, 2, 2, 69.50),
(3, 3, NULL, 54.99),
(3, 3, NULL, 69.50);
Now we do something like
CREATE TABLE my_table_aggregations AS
SELECT
ROW_NUMBER() OVER () AS aggregation_id,
column_1,
column_2,
column_3
FROM my_table
GROUP BY
column_1,
column_2,
column_3;
What I want to do now is assign an aggregation_id to each record_id in my_table. Now because I have NULL values I cant simply join by t1.column_1 = t2.column_1 because NULL = NULL is NULL and so the join will exclude these records.
Now I know that I should use something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
((t.column_1 IS NULL AND agg.column_1 IS NULL) OR t.column_1 = agg.column_1) AND
((t.column_2 IS NULL AND agg.column_2 IS NULL) OR t.column_2 = agg.column_2) AND
((t.column_3 IS NULL AND agg.column_3 IS NULL) OR t.column_3 = agg.column_3)
);
The problem here is that I am dealing with hundreds of millions of records and having an OR in the join seems to take forever to run.
There is an alternative, which is something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, -1) = COALESCE(agg.column_1, -1) AND
COALESCE(t.column_2, -1) = COALESCE(agg.column_2, -1) AND
COALESCE(t.column_3, -1) = COALESCE(agg.column_3, -1)
);
But the problem with this is that I am assuming there is no value in any of those columns which is -1.
Do note, this is an example which I am well aware I can use DENSE_RANK to get the same result. So lets pretend that this isn't an option.
Is there some crazy awesome way to get around having to use COALESCE but keeping the performance it has over using the correct way of the OR? I run tests, and the COALESCE is over 10 times faster than the OR.
I am running this on a Greenplum database so I am not sure if this performance difference is the same on a standard Postgres database.
Since my solution with NULLIF had performance problems, and your use of COALESCE was much faster, I wonder if you could try tweaking that solution to deal with the issue of -1. To do that, you could try casting to avoid false matches. I'm not sure what the performance hit would be, but it would look like:
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(cast(t.column_1 as varchar), 'NA') =
COALESCE(cast(agg.column_1 as varchar), 'NA') AND
COALESCE(cast(t.column_2 as varchar), 'NA') =
COALESCE(cast(agg.column_2 as varchar), 'NA') AND
COALESCE(cast(t.column_3 as varchar), 'NA') =
COALESCE(cast(agg.column_3 as varchar), 'NA')
);
After doing some thinking, I decided the best approach this this is to dynamically find a value for each column that can be used as the second param in a COALESCE join. The function is rather long, but it does what I need and more importantly, this way keeps the COALESCE performance, the only down side is getting the MIN values is an additional time cost, but we are talking a minute.
Here is the function:
CREATE OR REPLACE FUNCTION pg_temp.get_null_join_int_value
(
left_table_schema TEXT,
left_table_name TEXT,
left_table_columns TEXT[],
right_table_schema TEXT,
right_table_name TEXT,
right_table_columns TEXT[],
output_table_schema TEXT,
output_table_name TEXT
) RETURNS TEXT AS
$$
DECLARE
colum_name TEXT;
sql TEXT;
complete_sql TEXT;
full_left_table TEXT;
full_right_table TEXT;
full_output_table TEXT;
BEGIN
/*****************************
VALIDATE PARAMS
******************************/
-- this section validates all of the function parameters ensuring that the values that cannot be NULL are not so
-- also checks for empty arrays which is not allowed and then ensures both arrays are of the same length
IF (left_table_name IS NULL) THEN
RAISE EXCEPTION 'left_table_name cannot be NULL';
ELSIF (left_table_columns IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be NULL';
ELSIF (right_table_name IS NULL) THEN
RAISE EXCEPTION 'right_table_name cannot be NULL';
ELSIF (right_table_columns IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be NULL';
ELSIF (output_table_name IS NULL) THEN
RAISE EXCEPTION 'output_table_name cannot be NULL';
ELSIF (array_upper(left_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be an empty array';
ELSIF (array_upper(right_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be an empty array';
ELSIF (array_upper(left_table_columns, 1) <> array_upper(right_table_columns, 1)) THEN
RAISE EXCEPTION 'left_table_columns and right_table_columns must have a matching array length';
END IF;
/************************
TABLE NAMES
*************************/
-- create the full name of the left table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (left_table_schema IS NOT NULL) THEN
full_left_table = left_table_schema || '.' || left_table_name;
ELSE
full_left_table = left_table_name;
END IF;
-- create the full name of the right table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (right_table_schema IS NOT NULL) THEN
full_right_table = right_table_schema || '.' || right_table_name;
ELSE
full_right_table = right_table_name;
END IF;
-- create the full name of the output table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (output_table_schema IS NOT NULL) THEN
full_output_table = output_table_schema || '.' || output_table_name;
ELSE
full_output_table = output_table_name;
END IF;
/**********************
LEFT TABLE
***********************/
-- start to create the table which will store the min values from the left table
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_left_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the left table column names parameter
FOR colum_name IN SELECT UNNEST(left_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the left table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the left table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = sql;
/************************
RIGHT TABLE
*************************/
-- start to create the table which will store the min values from the right table
sql =
'DROP TABLE IF EXISTS temp_null_join_right_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_right_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the right table column names parameter
FOR colum_name IN SELECT UNNEST(right_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the right table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the right table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- start to create the final output table which will contain the column names defined in the left_table_columns parameter
-- each column will contain a negative value that is not present in both the left and right tables for the given column
sql =
'DROP TABLE IF EXISTS ' || full_output_table || ';' || E'\n' ||
'CREATE ' || (CASE WHEN output_table_schema IS NULL THEN 'TEMP ' END) || 'TABLE ' || full_output_table || ' AS' || E'\n' ||
'SELECT';
-- loop through each index of the left_table_columns array
FOR i IN coalesce(array_lower(left_table_columns, 1), 1)..coalesce(array_upper(left_table_columns, 1), 1) LOOP
-- add to the sql a call to the LEAST function
-- this function takes an infinite number of columns and returns the smallest value within those columns
-- we have -1 hardcoded because the smallest minimum value may be a positive integer and so we need to ensure the number used is negative
-- this way we will not confuse this value with a real ID from a table
sql = sql || E'\n\t' || 'LEAST(l."' || left_table_columns[i] || '", r."' || right_table_columns[i] || '", -1) AS "' || left_table_columns[i] || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish off the SQL which creates the final table
sql = sql || E'\n' ||
'FROM temp_null_join_left_table l' || E'\n' ||
'CROSS JOIN temp_null_join_right_table r' || ';';
-- create the final table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- we no longer need these tables
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'DROP TABLE IF EXISTS temp_null_join_right_table;';
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- return the SQL that has been run, good for debugging purposes or just understanding what the function does
RETURN complete_sql;
END;
$$
LANGUAGE plpgsql;
Below is an example usage of the function
SELECT pg_temp.get_null_join_int_value
(
-- left table
'public',
'my_table',
'{"column_1", "column_2", "column_3"}',
-- right table
'public',
'my_table_aggregations',
'{"column_1", "column_2", "column_3"}',
-- output table
NULL,
'temp_null_join_values'
);
Once the temp_null_join_values table is created you can do a sub select in the join for the COALESCE 2nd param.
DROP TABLE IF EXISTS temp_result_table;
CREATE TEMP TABLE temp_result_table AS
SELECT
t.record_id,
agg.aggregation_id
FROM public.my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, (SELECT column_1 FROM temp_null_join_values)) = COALESCE(agg.column_1, (SELECT column_1 FROM temp_null_join_values)) AND
COALESCE(t.column_2, (SELECT column_2 FROM temp_null_join_values)) = COALESCE(agg.column_2, (SELECT column_2 FROM temp_null_join_values)) AND
COALESCE(t.column_3, (SELECT column_3 FROM temp_null_join_values)) = COALESCE(agg.column_3, (SELECT column_3 FROM temp_null_join_values))
);
I hope this helps someone
How about:
SELECT
t.record_id,
a.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
NULLIF(t.column_1, agg.column_1) IS NULL
AND
NULLIF(agg.column_1, t.column_1) IS NULL
AND
NULLIF(t.column_2, agg.column_2) IS NULL
AND
NULLIF(agg.column_2, t.column_2) IS NULL
AND
NULLIF(t.column_3, agg.column_3) IS NULL
AND
NULLIF(agg.column_3, t.column_3) IS NULL
);

PostgreSQL - Determine which columns were updated

I have a table with many columns, one of which is a lastUpdate column.
I am writing a trigger in plpgsql for Postgres 9.1, that should set a value for lasUpdate upon an UPDATE to the record.
The challenge is to exclude some pre-defined columns from that trigger; Meaning, updating those specific columns shouldn't affect the lastUpdate value of the record.
Any advise?
In PostgreSQL you can access the previous value using OLD. and the new ones using NEW. aliases. There is even a specific example in the docs for what you need:
CREATE TRIGGER check_update
BEFORE UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.balance IS DISTINCT FROM NEW.balance)
EXECUTE PROCEDURE check_account_update();
I know it is too old question, but I found myself with the same need and I managed to do it with a trigger using the information_schema.colmns table.
I attach here the possible solution where the only parameters to edit would be the TIMEUPDATE_FIELD and EXCLUDE_FIELDS in the trigger function check_update_testtrig():
CREATE TABLE testtrig
(
id bigserial NOT NULL,
col1 integer,
col2 integer,
col3 integer,
lastupdate timestamp not null default now(),
lastread timestamp,
CONSTRAINT testtrig_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE OR REPLACE FUNCTION check_update_testtrig()
RETURNS trigger AS
$BODY$
DECLARE
TIMEUPDATE_FIELD text := 'lastupdate';
EXCLUDE_FIELDS text[] := ARRAY['lastread'];
PK_FIELD text := 'id';
ROW_RES RECORD;
IS_DISTINCT boolean := false;
COND_RES integer := 0;
BEGIN
FOR ROW_RES IN
SELECT column_name
FROM information_schema.columns
WHERE table_schema = TG_TABLE_SCHEMA
AND table_name = TG_TABLE_NAME
AND column_name != TIMEUPDATE_FIELD
AND NOT(column_name = ANY (EXCLUDE_FIELDS))
LOOP
EXECUTE 'SELECT CASE WHEN $1.' || ROW_RES.column_name || ' IS DISTINCT FROM $2.' || ROW_RES.column_name || ' THEN 1 ELSE 0 END'
INTO STRICT COND_RES
USING NEW, OLD;
IS_DISTINCT := IS_DISTINCT OR (COND_RES = 1);
END LOOP;
IF (IS_DISTINCT)
THEN
EXECUTE 'UPDATE ' || TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME || ' SET ' || TIMEUPDATE_FIELD || ' = now() WHERE ' || PK_FIELD || ' = $1.' || PK_FIELD
USING NEW;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE TRIGGER trigger_update_testtrig
AFTER UPDATE
ON testtrig
FOR EACH ROW
EXECUTE PROCEDURE check_update_testtrig();
Looking to your question and your comment on the answer of Jakub Kania, would I say that part of the solution is that you will create an extra table.
The issue is that constraints on columns should only apply the functioning of the column itself, it should not affect the values of other columns in the table. Specifying which columns should influence the status 'lastUpdate' is imo business logic.
The idea which columns should have impact on the value of the status column 'lastUpdate' changes along with the business, not with the table design. Therefor should the solution imo consist of a table in combination with a trigger.
I would add a table with a column for a list of columns (column can be of type array) that can be used in a trigger on the table like described by Jakub Kania. If the default behaviour should be that a new column has to change the value of the column 'lastUpdate', then would I create the trigger so that it only lists names of columns that do not change the value of 'lastUpdate'. If the default behaviour is to not change the value of the column 'lastUpdate',then would I advise you to add the name of the column to the list of columns in case the members in the list would change the value of the column 'lastUpdate'.
If the table column is within the list of columns then should it update the field lastUpdate.

EXECUTE...INTO...USING statement in PL/pgSQL can't execute into a record?

I'm attempting to write an area of a function in PL/pgSQL that loops through an hstore and sets a record's column(the key of the hstore) to a specific value (the value of the hstore). I'm using Postgres 9.1.
The hstore will look like: ' "column1"=>"value1","column2"=>"value2" '
Generally, here is what I want from a function that takes in an hstore and has a record with values to modify:
FOR my_key, my_value IN
SELECT key,
value
FROM EACH( in_hstore )
LOOP
EXECUTE 'SELECT $1'
INTO my_row.my_key
USING my_value;
END LOOP;
The error which I am getting with this code:
"myrow" has no field "my_key". I've been searching for quite a while now for a solution, but everything else I've tried to achieve the same result hasn't worked.
Simpler alternative to your posted answer. Should perform much better.
This function retrieves a row from a given table (in_table_name) and primary key value (in_row_pk), and inserts it as new row into the same table, with some values replaced (in_override_values). The new primary key value as per default is returned (pk_new).
CREATE OR REPLACE FUNCTION f_clone_row(in_table_name regclass
, in_row_pk int
, in_override_values hstore
, OUT pk_new int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text; -- name of PK column
_cols text; -- list of names of other columns
BEGIN
-- Get name of PK column
SELECT INTO _pk a.attname
FROM pg_catalog.pg_index i
JOIN pg_catalog.pg_attribute a ON a.attrelid = i.indrelid
AND a.attnum = i.indkey[0] -- single PK col!
WHERE i.indrelid = in_table_name
AND i.indisprimary;
-- Get list of columns excluding PK column
SELECT INTO _cols string_agg(quote_ident(attname), ',')
FROM pg_catalog.pg_attribute
WHERE attrelid = in_table_name -- regclass used as OID
AND attnum > 0 -- exclude system columns
AND attisdropped = FALSE -- exclude dropped columns
AND attname <> _pk; -- exclude PK column
-- INSERT cloned row with override values, returning new PK
EXECUTE format('
INSERT INTO %1$I (%2$s)
SELECT %2$s
FROM (SELECT (t #= $1).* FROM %1$I t WHERE %3$I = $2) x
RETURNING %3$I'
, in_table_name, _cols, _pk)
USING in_override_values, in_row_pk -- use override values directly
INTO pk_new; -- return new pk directly
END
$func$;
Call:
SELECT f_clone_row('tbl', 1, '"col1"=>"foo_new","col2"=>"bar_new"');
db<>fiddle here
Old sqlfiddle
Use regclass as input parameter type, so only valid table names can be used to begin with and SQL injection is ruled out. The function also fails earlier and more gracefully if you should provide an illegal table name.
Use an OUT parameter (pk_new) to simplify the syntax.
No need to figure out the next value for the primary key manually. It is inserted automatically and returned after the fact. That's not only simpler and faster, you also avoid wasted or out-of-order sequence numbers.
Use format() to simplify the assembly of the dynamic query string and make it less error-prone. Note how I use positional parameters for identifiers and unquoted strings respectively.
I build on your implicit assumption that allowed tables have a single primary key column of type integer with a column default. Typically serial columns.
Key element of the function is the final INSERT:
Merge override values with the existing row using the #= operator in a subselect and decompose the resulting row immediately.
Then you can select only relevant columns in the main SELECT.
Let Postgres assign the default value for the PK and get it back with the RETURNING clause.
Write the returned value into the OUT parameter directly.
All done in a single SQL command, that is generally fastest.
Since I didn't want to have to use any external functions for speed purposes, I created a solution using hstores to insert a record into a table:
CREATE OR REPLACE FUNCTION fn_clone_row(in_table_name character varying, in_row_pk integer, in_override_values hstore)
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE
my_table_pk_col_name varchar;
my_key text;
my_value text;
my_row record;
my_pk_default text;
my_pk_new integer;
my_pk_new_text text;
my_row_hstore hstore;
my_row_keys text[];
my_row_keys_list text;
my_row_values text[];
my_row_values_list text;
BEGIN
-- Get the next value of the pk column for the table.
SELECT ad.adsrc,
at.attname
INTO my_pk_default,
my_table_pk_col_name
FROM pg_attrdef ad
JOIN pg_attribute at
ON at.attnum = ad.adnum
AND at.attrelid = ad.adrelid
JOIN pg_class c
ON c.oid = at.attrelid
JOIN pg_constraint cn
ON cn.conrelid = c.oid
AND cn.contype = 'p'
AND cn.conkey[1] = at.attnum
JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE c.relname = in_table_name
AND n.nspname = 'public';
-- Get the next value of the pk in a local variable
EXECUTE ' SELECT ' || my_pk_default
INTO my_pk_new;
-- Set the integer value back to text for the hstore
my_pk_new_text := my_pk_new::text;
-- Add the next value statement to the hstore of changes to make.
in_override_values := in_override_values || hstore( my_table_pk_col_name, my_pk_new_text );
-- Copy over only the given row to the record.
EXECUTE ' SELECT * '
' FROM ' || quote_ident( in_table_name ) ||
' WHERE ' || quote_ident( my_table_pk_col_name ) ||
' = ' || quote_nullable( in_row_pk )
INTO my_row;
-- Replace the values that need to be changed in the column name array
my_row := my_row #= in_override_values;
-- Create an hstore of my record
my_row_hstore := hstore( my_row );
-- Create a string of comma-delimited, quote-enclosed column names
my_row_keys := akeys( my_row_hstore );
SELECT array_to_string( array_agg( quote_ident( x.colname ) ), ',' )
INTO my_row_keys_list
FROM ( SELECT unnest( my_row_keys ) AS colname ) x;
-- Create a string of comma-delimited, quote-enclosed column values
my_row_values := avals( my_row_hstore );
SELECT array_to_string( array_agg( quote_nullable( x.value ) ), ',' )
INTO my_row_values_list
FROM ( SELECT unnest( my_row_values ) AS value ) x;
-- Insert the values into the columns of a new row
EXECUTE 'INSERT INTO ' || in_table_name || '(' || my_row_keys_list || ')'
' VALUES (' || my_row_values_list || ')';
RETURN my_pk_new;
END
$function$;
It's quite a bit longer than what I had envisioned, but it works and is actually quite speedy.