Up-to-date dictionary of distinct values for column - postgresql

I have a table with many columns and several million rows like
CREATE TABLE foo (
id integer,
thing1 text,
thing2 text,
...
stuff text);
How can I manage the relevance of dictionary of unique values of stuff column that originally populates like this:
INSERT INTO stuff_dict SELECT DISTINCT stuff from foo;
Should I manually synchronize (check if new stuff value already in stuff_dict before every insert/update) or use triggers for each insert/update/delete from foo table. In latter case, what is the best design for such a trigger(s)?
UPDATE: view does not suit here, because the SELECT * FROM stuff_dict should run as fast as possible (even CREATE INDEX ON foo(stuff) does not help much when foo has tens of millions of records).

A materialized view seems to be the simplest option for a large table.
In the trigger function just refresh the view. You can use concurrently option (see the pozs's comment below).
create materialized view stuff_dict as
select distinct stuff
from foo;
create or replace function refresh_stuff_dict()
returns trigger language plpgsql
as $$
begin
refresh materialized view /*concurrently*/ stuff_dict;
return null;
end $$;
create trigger refresh_stuff_dict
after insert or update or delete or truncate
on foo for each statement
execute procedure refresh_stuff_dict();
While the solution with a materialized view is straightforward it may be suboptimal when the table foo is modified frequently. In this case use a table for the dictionary. An index will be helpful.
create table stuff_dict as
select distinct stuff
from foo;
create index on stuff_dict(stuff);
The trigger function is more complicated and should be fired for each row after insert/update/delete:
create or replace function refresh_stuff_dict()
returns trigger language plpgsql
as $$
declare
do_insert boolean := tg_op = 'INSERT' or tg_op = 'UPDATE' and new.stuff <> old.stuff;
do_delete boolean := tg_op = 'DELETE' or tg_op = 'UPDATE' and new.stuff <> old.stuff;
begin
if do_insert and not exists (select 1 from stuff_dict where stuff = new.stuff) then
insert into stuff_dict values(new.stuff);
end if;
if do_delete and not exists (select 1 from foo where stuff = old.stuff) then
delete from stuff_dict
where stuff = old.stuff;
end if;
return case tg_op when 'DELETE' then old else new end;
end $$;
create trigger refresh_stuff_dict
after insert or update or delete
on foo for each row
execute procedure refresh_stuff_dict();

Related

PostgreSQL trigger: first condition executes but not the second

A trigger works on the first part of a function but not the second.
I'm trying to set up a trigger that does two things:
Update a field - geom - whenever the fields lat or lon are updated, using those two fields.
Update a field - country - from the geom field by referencing another table.
I've tried different syntaxes of using NEW, OLD, BEFORE and AFTER conditions, but whatever I do, I can only get the first part to work.
Here's the code:
CREATE OR REPLACE FUNCTION update_geometries()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
update schema.table a set geom = st_setsrid(st_point(a.lon, a.lat), 4326);
update schema.table a set country = b.name
from reference.admin_layers_0 b where st_intersects(a.geom,b.geom)
and a.pk = new.pk;
RETURN NEW;
END;
$$;
CREATE TRIGGER
geom_update
AFTER INSERT OR UPDATE of lat,lon on
schema.table
FOR EACH STATEMENT EXECUTE PROCEDURE update_geometries();
There is no new on a statement level trigger. (well, there is, but it is always Null)
You can either keep the statement level and update the entire a table, i.e. remove the and a.pk = new.pk, or, if only part of the rows are updated, change the trigger for a row-level trigger and only update the affected rows
CREATE OR REPLACE FUNCTION update_geometries()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
NEW.geom = st_setsrid(st_point(NEW.lon, NEW.lat), 4326);
SELECT b.name
INTO NEW.country
FROM reference.admin_layers_0 b
WHERE st_intersects(NEW.geom,b.geom);
RETURN NEW;
END;
$$;
CREATE TRIGGER
geom_update
BEFORE INSERT OR UPDATE of lat,lon on
schema.table
FOR EACH ROW EXECUTE PROCEDURE update_geometries();

Joins around if statement for PostgreSQL trigger function

I have a table a with 3 triggers that insert, update, or delete corresponding rows in b whenever a row in a is inserted, updated, or deleted. All 3 triggers use the same trigger function p.
CREATE OR REPLACE FUNCTION p ()
RETURNS TRIGGER
AS $$
BEGIN
IF (TG_OP = 'INSERT') THEN
-- INSERT INTO b ...
RETURN NEW;
ELSIF (TG_OP = 'UPDATE') THEN
-- UPDATE b ...
RETURN NEW;
ELSIF (TG_OP = 'DELETE') THEN
-- DELETE FROM b ...
RETURN NEW;
ELSE
RETURN NULL;
END IF;
END;
$$ LANGUAGE PLPGSQL;
CREATE TRIGGER i AFTER INSERT ON a FOR EACH ROW EXECUTE PROCEDURE p ();
CREATE TRIGGER u AFTER UPDATE ON a FOR EACH ROW EXECUTE PROCEDURE p ();
CREATE TRIGGER d AFTER DELETE ON a FOR EACH ROW EXECUTE PROCEDURE p ();
a also has a foreign key a1 into c (with primary key c1), and I would like to alter p in such a way that it enters the IF/ELSIF branches also depending on a column c2 in c: if that joined column changed, enter the INSERT and UPDATE branches; if it stayed the same, enter the UPDATE branch. In effect, something like this:
IF (TG_OP = 'INSERT') OR ((TG_OP = 'UPDATE') AND (oldC.c2 <> newC.c2)) THEN
-- ...
ELSIF (TG_OP = 'UPDATE') OR (oldC.c2 = newC.c2) THEN
-- ...
ELSIF (TG_OP = 'DELETE') OR ((TG_OP = 'UPDATE') AND (oldC.c2 <> newC.c2)) THEN
-- ...
ELSE
-- ...
END IF;
where oldC and newC would result from joins similar to these (with approx. syntax):
SELECT oldC.* FROM a, c AS oldC WHERE OLD.a1 = c.c1;
SELECT newC.* FROM a, c AS newC WHERE NEW.a1 = c.c1;
So what is needed in effect are two joins outside the IF statement, which would allow it to refer to oldC and newC (or something analogous). Is this possible and how would the altered version of p look (with correct PostgreSQL syntax)?
First off, there is no NEW in case of a DELETE, so RETURN NEW; doesn't make sense. This would raise an exception before Postgres 11, where this was changed:
In PL/pgSQL trigger functions, the OLD and NEW variables now read as NULL when not assigned (Tom Lane)
Previously, references to these variables could be parsed but not
executed.
It doesn't matter what you return for AFTER triggers anyway. Might as well be RETURN NULL;
There is no OLD in case of an INSERT, either.
You would only need a single trigger the way you have it right now:
CREATE TRIGGER a_i_u_d -- *one* trigger
AFTER INSERT OR UPDATE OR DELETE ON a
FOR EACH ROW EXECUTE FUNCTION p ();
However, I suggest separate trigger functions for INSERT, UPDATE and DELETE to avoid complications. Then you need three separate triggers, each calling its respective trigger function.
The case you want to add can only affect UPDATE. Nothing can "change" like you describe with INSERT or DELETE. And strictly speaking, what you ask for is impossible even for an UPDATE trigger:
depending on a column c2 in c: if that joined column changed ...
A trigger function on table a only sees a single snapshot of table c. There is no way to detect any "change" in that table. If you really meant to write:
depending on column a.a1: if that changed so that the referenced value c.c2 is different now ...
.. then there is a way:
Since a BEFORE trigger is less prone to endless loops and other complications, I demonstrate a BEFORE UPDATE trigger. (Changing to AFTER is trivial.):
CREATE OR REPLACE FUNCTION p_upbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF NEW.a1 <> OLD.a1 THEN -- assuming a1 is defined NOT NULL
IF (SELECT count(DISTINCT c.c2) > 1 -- covers possible NULL in c2 as well
FROM c
WHERE c.c1 IN (NEW.a1, OLD.a1)) THEN
-- do something
END IF;
END IF;
RETURN NEW;
END
$func$;
If a1 can be NULL and you need to track changes from / to NULL as well, you need to do more ...
Trigger:
CREATE TRIGGER upbef
BEFORE UPDATE ON a
FOR EACH ROW EXECUTE FUNCTION p_upbef ();
Since everything depends on a change in a.a1 now (and you don't have other things in the trigger) you can move the outer IF to the trigger itself (cheaper):
CREATE OR REPLACE FUNCTION p_upbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF (SELECT count(DISTINCT c.c2) > 1 -- covers NULL as well
FROM c
WHERE c.c1 IN (NEW.a1, OLD.a1)) THEN -- assuming a1 is NOT NULL!
-- do something
END IF;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER upbef
BEFORE UPDATE OF a1 ON a -- !
FOR EACH ROW EXECUTE FUNCTION p_upbef();
It's not exactly the same, since an UPDATE involving the column a1 might actually leave the value unchanged , but it's good enough either way for our purpose: to only run the expensive check on c.c2 in relevant cases.
Related:
How To Avoid Looping Trigger Calls In PostgreSQL 9.2.1
Cycloning in trigger

Only perform update if column exists

Is it possible to execute an update conditionally if a column exists?
For instance, I may have a column in a table and if that column exists I want that update executed, otherwise, just skip it (or catch its exception).
You can do it inside a function. If you don't want to use the function later you can just drop it afterwards.
To know if a column exists in a certain table, you can try to fetch it using a select(or a perform, if you're gonna discard the result) in information_schema.columns.
The query bellow creates a function that searches for a column bar in a table foo, and if it finds it, updates its value. Later the function is run, then droped.
create function conditional_update() returns void as
$$
begin
perform column_name from information_schema.columns where table_name= 'foo' and column_name = 'bar';
if found then
update foo set bar = 12345;
end if;
end;
$$ language plpgsql;
select conditional_update();
drop function conditional_update();
With the following table as example :
CREATE TABLE mytable (
idx INT
,idy INT
);
insert into mytable values (1,2),(3,4),(5,6);
you can create a custom function like below to update:
create or replace function fn_upd_if_col_exists(_col text,_tbl text,_val int) returns void as
$$
begin
If exists (select 1
from information_schema.columns
where table_schema='public' and table_name=''||_tbl||'' and column_name=''||_col||'' ) then
execute format('update mytable set '||_col||'='||_val||'');
raise notice 'updated';
else
raise notice 'column %s doesn''t exists on table %s',_col,_tbl;
end if;
end;
$$
language plpgsql
and you can call this function like:
select fn_upd_if_col_exists1('idz','mytable',111) -- won't update raise "NOTICE: column idz deosnt exists on table mytables"
select fn_upd_if_col_exists1('idx','mytable',111) --will upadate column idx with value 1111 "NOTICE: updated"

FOR EACH STATEMENT trigger example

I've been looking at the documentation of postgresql triggers, but it seems to only show examples for row-level triggers, but I can't find an example for a statement-level trigger.
In particular, it is not quite clear how to iterate in the update/inserted rows in a single statement, since NEW is for a single record.
OLD and NEW are null or not defined in a statement-level trigger. Per documentation:
NEW
Data type RECORD; variable holding the new database row for INSERT/UPDATE operations in row-level triggers. This variable is
null in statement-level triggers and for DELETE operations.
OLD
Data type RECORD; variable holding the old database row for UPDATE/DELETE operations in row-level triggers. This variable is null in statement-level triggers and for INSERT operations.
Bold emphasis mine.
Up to Postgres 10 this read slightly different, much to the same effect, though:
... This variable is unassigned in statement-level triggers. ...
While those record variables are still of no use for statement level triggers, a new feature very much is:
Transition tables in Postgres 10+
Postgres 10 introduced transition tables. Those allow access to the whole set of affected rows. The manual:
AFTER triggers can also make use of transition tables to inspect the entire set of rows changed by the triggering statement.
The CREATE TRIGGER command assigns names to one or both transition
tables, and then the function can refer to those names as though they
were read-only temporary tables. Example 43.7 shows an example.
Follow the link to the manual for code examples.
Example statement-level trigger without transition tables
Before the advent of transition tables, those were even less common. A useful example is to send notifications after certain DML commands.
Here is a basic version of what I use:
-- Generic trigger function, can be used for multiple triggers:
CREATE OR REPLACE FUNCTION trg_notify_after()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
PERFORM pg_notify(TG_TABLE_NAME, TG_OP);
RETURN NULL;
END
$func$;
-- Trigger
CREATE TRIGGER notify_after
AFTER INSERT OR UPDATE OR DELETE ON my_tbl
FOR EACH STATEMENT
EXECUTE PROCEDURE trg_notify_after();
For Postgres 11 or later use the equivalent, less confusing syntax:
...
EXECUTE FUNCTION trg_notify_after();
See:
Trigger function does not exist, but I am pretty sure it does
Well, here are some examples of statement-level triggers.
Table:
CREATE TABLE public.test (
number integer NOT NULL,
text character varying(50)
);
Trigger function:
OLD and NEW are still NULL
The return value can also be always left NULL.
CREATE OR REPLACE FUNCTION public.tr_test_for_each_statement()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
x_rec record;
BEGIN
raise notice '=operation: % =', TG_OP;
IF (TG_OP = 'UPDATE' OR TG_OP = 'DELETE') THEN
FOR x_rec IN SELECT * FROM old_table LOOP
raise notice 'OLD: %', x_rec;
END loop;
END IF;
IF (TG_OP = 'INSERT' OR TG_OP = 'UPDATE') THEN
FOR x_rec IN SELECT * FROM new_table LOOP
raise notice 'NEW: %', x_rec;
END loop;
END IF;
RETURN NULL;
END;
$$;
Settings statement-level triggers
Only AFTER and only one event is supported.
CREATE TRIGGER tr_test_for_each_statement_insert
AFTER INSERT ON public.test
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.tr_test_for_each_statement();
CREATE TRIGGER tr_test_for_each_statement_update
AFTER UPDATE ON public.test
REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.tr_test_for_each_statement();
CREATE TRIGGER tr_test_for_each_statement_delete
AFTER DELETE ON public.test
REFERENCING OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.tr_test_for_each_statement();
Examples:
INSERT INTO public.test(number, text) VALUES (1, 'a');
=operation: INSERT =
NEW: (1,a)
INSERT INTO public.test(number, text) VALUES (2, 'b'), (3, 'b');
=operation: INSERT =
NEW: (2,b)
NEW: (3,b)
UPDATE public.test SET number = number + 1 WHERE text = 'a';
=operation: UPDATE =
OLD: (1,a)
NEW: (2,a)
UPDATE public.test SET number = number + 10 WHERE text = 'b';
=operation: UPDATE =
OLD: (2,b)
OLD: (3,b)
NEW: (12,b)
NEW: (13,b)
DELETE FROM public.test;
=operation: DELETE =
OLD: (2,a)
OLD: (12,b)
OLD: (13,b)

postgres count from table efficient way

In my application we are using postgresql,now it has one million records in summary table.
When I run the following query it takes 80,927 ms
SELECT COUNT(*) AS count
FROM summary_views
GROUP BY question_id,category_type_id
Is there any efficient way to do this?
COUNT(*) in PostgreSQL tends to be slow. It's a feature of MVCC. One of the workarounds of the problem is a row counting trigger with a helper table:
create table table_count(
table_count_id text primary key,
rows int default 0
);
CREATE OR REPLACE FUNCTION table_count_update()
RETURNS trigger AS
$BODY$
begin
if tg_op = 'INSERT' then
update table_count set rows = rows + 1
where table_count_id = TG_TABLE_NAME;
elsif tg_op = 'DELETE' then
update table_count set rows = rows - 1
where table_count_id = TG_TABLE_NAME;
end if;
return null;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Next step is to add proper trigger declaration for each table you'd like to use it with. For example for table tab_name:
begin;
insert into table_count values
('tab_name',(select count(*) from tab_name));
create trigger tab_name_table_count after insert or delete
on tab_name for each row execute procedure table_count_update();
commit;
It is important to run in a transaction block to keep actual count and helper table in sync in case of delete or insert between initial count and trigger creation. Transaction guarantees this. From now on to get current count instantly, just invoke:
select rows from table_count where table_count_id = 'tab_name';
Edit: In case of your group by clause, you'll need more sophisticated trigger function and count table.