Postgresql: sum of an ever-growing list - postgresql

I have a select sum(field) from table where boolean_field = true kind of query.
This are now about 8 million rows and only going to get bigger.
Understandably, Postgres does seq scan instead of index, to avoid having to load all of it into memory at once.
At least that's how I understand it.
What would be a good way to make this run at an acceptable speed, at any size?
EDIT: version is Postgres 11.2
EDIT2: Already using an index on (boolean_field) where boolean_field = true

You could keep a table that contains the sum:
START TRANSACTION;
CREATE TABLE table_sum (s double precision NOT NULL);
CREATE FUNCTION upd_sum() RETURNS trigger
LANGUAGE plpgsql AS
BEGIN
CASE TG_OP
WHEN 'INSERT' THEN
IF NEW.boolean_field THEN
UPDATE table_sum SET s = s + NEW.field;
END IF;
RETURN NEW;
WHEN 'UPDATE' THEN
IF NEW.boolean_field OR OLD.boolean_field THEN
UPDATE table_sum
SET s = s
+ CASE WHEN NEW.boolean_field THEN NEW.field ELSE 0.0 END
- CASE WHEN OLD.boolean_field THEN OLD.field ELSE 0.0 END;
END IF;
RETURN NEW;
WHEN 'DELETE' THEN
IF OLD.boolean_field THEN
UPDATE table_sum SET s = s - OLD.field;
END IF;
RETURN OLD;
WHEN 'TRUNCATE' THEN
UPDATE table_sum SET s = 0.0;
RETURN NULL;
END CASE;
END;$$;
CREATE TRIGGER upd_sum1 AFTER INSERT OR UPDATE OR DELETE ON "table"
FOR EACH ROW EXECUTE PROCEDURE upd_sum();
CREATE TRIGGER upd_sum2 AFTER TRUNCATE ON "table"
FOR EACH STATEMENT EXECUTE PROCEDURE upd_sum();
INSERT INTO table_sum
SELECT sum(field) FROM "table"
WHERE boolean_field;
COMMIT;
Some explanations:
Documentation links to PL/pgSQL, PL/pgSQL triggers and CREATE TRIGGER
This script is in a single transaction so that the counter is initialized correctly in the face of concurrent transactions. CREATE TRIGGER will take an ACCESS EXCLUSIVE lock on the table, so that all concurrent data access is blocked until the counter is initialized.

Related

PSQL Add value from row to another value in the same row using triggers

I have a test table with three columns (file, qty, qty_total). I will input multiple rows like this for example, insert into test_table (file,qty) VALUS (A,5);. What i want is for on commit is for a trigger to take the value from qty and add it to qty_total. As what will happen is that this value will get updated as this example demonstrates. Update test_table set qty = 10 where file = A; So the qty_total is now 15. Thanks
Managed to solve this myself. I created a trigger function `CREATE FUNCTION public.qty_total()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $BODY$
BEGIN
IF TG_OP = 'UPDATE' THEN
NEW."total" := (OLD.total + NEW.col2);
RETURN NEW;
ELSE
NEW."total" := NEW.col2;
RETURN NEW;
END IF;
END;
$BODY$;
ALTER FUNCTION public.qty_total()
OWNER TO postgres; This was called by a trigger CREATE TRIGGER qty_trigger
BEFORE INSERT OR UPDATE
ON public.test
FOR EACH ROW
EXECUTE PROCEDURE qty_total(); now when i insert a new code and value, the value is copied to the total, when it is updated, the value is added to the total and i have my new qty_total. This may not have the best error catching in it, but since i am passing the data from php, i am happy to make sure the errors are caught and removed.

Endless loop in trigger function

This is a trigger that is called by either an insert, update or a delete on a table. It is guaranteed the calling table has all the columns impacted and a deletes table also exists.
CREATE OR REPLACE FUNCTION sample_trigger_func() RETURNS TRIGGER AS $$
DECLARE
operation_code char;
table_name varchar(50);
delete_table_name varchar(50);
old_id integer;
BEGIN
table_name = TG_TABLE_NAME;
delete_table_name = TG_TABLE_NAME || '_deletes';
SELECT SUBSTR(TG_OP, 1, 1)::CHAR INTO operation_code;
IF TG_OP = 'DELETE' THEN
OLD.mod_op = operation_code;
OLD.mod_date = now();
RAISE INFO 'OLD: %', (OLD).name;
EXECUTE format('INSERT INTO %s VALUES %s', delete_table_name, (OLD).*);
ELSE
EXECUTE format('UPDATE TABLE %s SET mod_op = %s AND mod_date = %s'
, TG_TABLE_NAME, operation_code, now());
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
The ELSE branch triggers an endless loop. There may be more problems.
How to fix it?
The ELSE branch can be radically simplified. But a couple more things are inefficient / inaccurate / dangerous:
CREATE OR REPLACE FUNCTION sample_trigger_func()
RETURNS TRIGGER AS
$func$
BEGIN
IF TG_OP = 'DELETE' THEN
RAISE INFO 'OLD: %', OLD.name;
EXECUTE format('INSERT INTO %I SELECT ($1).*', TG_TABLE_NAME || '_deletes')
USING OLD #= hstore('{mod_op, mod_datetime}'::text[]
, ARRAY[left(TG_OP, 1), now()::text]);
RETURN OLD;
ELSE -- insert, update
NEW.mod_op := left(TG_OP, 1);
NEW.mod_datetime := now();
RETURN NEW;
END IF;
END
$func$ LANGUAGE plpgsql;
In the ELSE branch just assign to NEW directly. No need for more dynamic SQL - which would fire the same trigger again causing an endless loop. That's the primary error.
RETURN NEW; outside the IF construct would break your trigger function for DELETE, since NEW is not assigned for DELETEs.
A key feature is the use of hstore and the hstore operator #= to dynamically change two selected fields of the well-known row type - that is unknown at the time of writing the code. This way you do not tamper with the original OLD value, which might have surprising side effect if you have more triggers down the chain of events.
OLD #= hstore('{mod_op, mod_datetime}'::text[]
, ARRAY[left(TG_OP, 1), now()::text]);
The additional module hstore must be installed. Details:
How to set value of composite variable field using dynamic SQL
Passing column names dynamically for a record variable in PostgreSQL
Using the hstore(text[], text[]) variant here to construct an hstore value with multiple fields on the fly.
The assignment operator in plpgsql is :=:
The forgotten assignment operator "=" and the commonplace ":="
Note that I used the column name mod_datetime instead of the misleading mod_date, since the column is obviously a timestamp and not a date.
I added a couple of other improvements while being at it. And the trigger itself should look like this:
CREATE TRIGGER insupdel_bef
BEFORE INSERT OR UPDATE OR DELETE ON table_name
FOR EACH ROW EXECUTE PROCEDURE sample_trigger_func();
SQL Fiddle.

postgres count from table efficient way

In my application we are using postgresql,now it has one million records in summary table.
When I run the following query it takes 80,927 ms
SELECT COUNT(*) AS count
FROM summary_views
GROUP BY question_id,category_type_id
Is there any efficient way to do this?
COUNT(*) in PostgreSQL tends to be slow. It's a feature of MVCC. One of the workarounds of the problem is a row counting trigger with a helper table:
create table table_count(
table_count_id text primary key,
rows int default 0
);
CREATE OR REPLACE FUNCTION table_count_update()
RETURNS trigger AS
$BODY$
begin
if tg_op = 'INSERT' then
update table_count set rows = rows + 1
where table_count_id = TG_TABLE_NAME;
elsif tg_op = 'DELETE' then
update table_count set rows = rows - 1
where table_count_id = TG_TABLE_NAME;
end if;
return null;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Next step is to add proper trigger declaration for each table you'd like to use it with. For example for table tab_name:
begin;
insert into table_count values
('tab_name',(select count(*) from tab_name));
create trigger tab_name_table_count after insert or delete
on tab_name for each row execute procedure table_count_update();
commit;
It is important to run in a transaction block to keep actual count and helper table in sync in case of delete or insert between initial count and trigger creation. Transaction guarantees this. From now on to get current count instantly, just invoke:
select rows from table_count where table_count_id = 'tab_name';
Edit: In case of your group by clause, you'll need more sophisticated trigger function and count table.

Execute deferred trigger only once per row in PostgreSQL

I have a deferred AFTER UPDATE trigger on a table, set to fire when a certain column is updated. It's an integer type I'm using as a counter.
I'm not 100% certain but it looks like if I increment that particular column 100 times during a transaction, the trigger is queued up and executed 100 times at the end of the transaction.
I would like the trigger to only be scheduled once per row no matter how many times I've incremented that column.
Can I do that somehow?
Alternatively if triggered triggers must queue up regardless if they are duplicates, can I clear this queue during the first run of the trigger?
Version of Postgres is 9.1. Here's what I got:
CREATE CONSTRAINT TRIGGER counter_change
AFTER UPDATE OF "Counter" ON "table"
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE counter_change();
CREATE OR REPLACE FUNCTION counter_change()
RETURNS trigger
LANGUAGE plpgsql
AS $$
DECLARE
BEGIN
PERFORM some_expensive_procedure(NEW."id");
RETURN NEW;
END;$$;
This is a tricky problem. But it can be done with per-column triggers and conditional trigger execution introduced in PostgreSQL 9.0.
You need an "updated" flag per row for this solution. Use a boolean column in the same table for simplicity. But it could be in another table or even a temporary table per transaction.
The expensive payload is executed once per row where the counter is updated (once or multiple time).
This should also perform well, because ...
... it avoids multiple calls of triggers at the root (scales well)
... does not change additional rows (minimize table bloat)
... does not need expensive exception handling.
Consider the following
Demo
Tested in PostgreSQL 9.1 with a separate schema x as test environment.
Tables and dummy rows
-- DROP SCHEMA x;
CREATE SCHEMA x;
CREATE TABLE x.tbl (
id int
,counter int
,trig_exec_count integer -- for monitoring payload execution.
,updated bool);
Insert two rows to demonstrate it works with multiple rows:
INSERT INTO x.tbl VALUES
(1, 0, 0, NULL)
,(2, 0, 0, NULL);
Trigger functions and Triggers
1.) Execute expensive payload
CREATE OR REPLACE FUNCTION x.trg_upaft_counter_change_1()
RETURNS trigger AS
$BODY$
BEGIN
-- PERFORM some_expensive_procedure(NEW.id);
-- Update trig_exec_count to count execution of expensive payload.
-- Could be in another table, for simplicity, I use the same:
UPDATE x.tbl t
SET trig_exec_count = trig_exec_count + 1
WHERE t.id = NEW.id;
RETURN NULL; -- RETURN value of AFTER trigger is ignored anyway
END;
$BODY$ LANGUAGE plpgsql;
2.) Flag row as updated.
CREATE OR REPLACE FUNCTION x.trg_upaft_counter_change_2()
RETURNS trigger AS
$BODY$
BEGIN
UPDATE x.tbl
SET updated = TRUE
WHERE id = NEW.id;
RETURN NULL;
END;
$BODY$ LANGUAGE plpgsql;
3.) Reset "updated" flag.
CREATE OR REPLACE FUNCTION x.trg_upaft_counter_change_3()
RETURNS trigger AS
$BODY$
BEGIN
UPDATE x.tbl
SET updated = NULL
WHERE id = NEW.id;
RETURN NULL;
END;
$BODY$ LANGUAGE plpgsql;
Trigger names are relevant! Called for the same event they are executed in alphabetical order.
1.) Payload, only if not "updated" yet:
CREATE CONSTRAINT TRIGGER upaft_counter_change_1
AFTER UPDATE OF counter ON x.tbl
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
WHEN (NEW.updated IS NULL)
EXECUTE PROCEDURE x.trg_upaft_counter_change_1();
2.) Flag row as updated, only if not "updated" yet:
CREATE TRIGGER upaft_counter_change_2 -- not deferred!
AFTER UPDATE OF counter ON x.tbl
FOR EACH ROW
WHEN (NEW.updated IS NULL)
EXECUTE PROCEDURE x.trg_upaft_counter_change_2();
3.) Reset Flag. No endless loop because of trigger condition.
CREATE CONSTRAINT TRIGGER upaft_counter_change_3
AFTER UPDATE OF updated ON x.tbl
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
WHEN (NEW.updated) --
EXECUTE PROCEDURE x.trg_upaft_counter_change_3();
Test
Run UPDATE & SELECT separately to see the deferred effect. If executed together (in one transaction) the SELECT will show the new tbl.counter but the old tbl2.trig_exec_count.
UPDATE x.tbl SET counter = counter + 1;
SELECT * FROM x.tbl;
Now, update the counter multiple times (in one transaction). The payload will only be executed once. Voilá!
UPDATE x.tbl SET counter = counter + 1;
UPDATE x.tbl SET counter = counter + 1;
UPDATE x.tbl SET counter = counter + 1;
UPDATE x.tbl SET counter = counter + 1;
UPDATE x.tbl SET counter = counter + 1;
SELECT * FROM x.tbl;
I don't know of a way to collapse trigger execution to once per (updated) row per transaction, but you can emulate this with a TEMPORARY ON COMMIT DROP table which tracks those modified rows and performs your expensive operation only once per row per tx:
CREATE OR REPLACE FUNCTION counter_change() RETURNS TRIGGER
AS $$
BEGIN
-- If we're the first invocation of this trigger in this tx,
-- make our scratch table. Create unique index separately to
-- suppress avoid NOTICEs without fiddling with log_min_messages
BEGIN
CREATE LOCAL TEMPORARY TABLE tbl_counter_tx_once
("id" AS_APPROPRIATE NOT NULL)
ON COMMIT DROP;
CREATE UNIQUE INDEX ON tbl_counter_tx_once AS ("id");
EXCEPTION WHEN duplicate_table THEN
NULL;
END;
-- If we're the first invocation in this tx *for this row*,
-- then do our expensive operation.
BEGIN
INSERT INTO tbl_counter_tx_once ("id") VALUES (NEW."id");
PERFORM SOME_EXPENSIVE_OPERATION_HERE(NEW."id");
EXCEPTION WHEN unique_violation THEN
NULL;
END;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
There's of course a risk of name collision with that temporary table, so choose judiciously.

count number of rows to be affected before update in trigger

I want to know number of rows that will be affected by UPDATE query in BEFORE per statement trigger . Is that possible?
The problem is that i want to allow only queries that will update up to 4 rows. If affected rows count is 5 or more i want to raise error.
I don't want to do this in code because i need this check on db level.
Is this at all possible?
Thanks in advance for any clues on that
Write a function that updates the rows for you or performs a rollback. Sorry for poor style formatting.
create function update_max(varchar, int)
RETURNS void AS
$BODY$
DECLARE
sql ALIAS FOR $1;
max ALIAS FOR $2;
rcount INT;
BEGIN
EXECUTE sql;
GET DIAGNOSTICS rcount = ROW_COUNT;
IF rcount > max THEN
--ROLLBACK;
RAISE EXCEPTION 'Too much rows affected (%).', rcount;
END IF;
--COMMIT;
END;
$BODY$ LANGUAGE plpgsql
Then call it like
select update_max('update t1 set id=id+10 where id < 4', 3);
where the first param ist your sql-Statement and the 2nd your max rows.
Simon had a good idea but his implementation is unnecessarily complicated. This is my proposition:
create or replace function trg_check_max_4()
returns trigger as $$
begin
perform true from pg_class
where relname='check_max_4' and relnamespace=pg_my_temp_schema();
if not FOUND then
create temporary table check_max_4
(value int check (value<=4))
on commit drop;
insert into check_max_4 values (0);
end if;
update check_max_4 set value=value+1;
return new;
end; $$ language plpgsql;
I've created something like this:
begin;
create table test (
id integer
);
insert into test(id) select generate_series(1,100);
create or replace function trg_check_max_4_updated_records()
returns trigger as $$
declare
counter_ integer := 0;
tablename_ text := 'temptable';
begin
raise notice 'trigger fired';
select count(42) into counter_
from pg_catalog.pg_tables where tablename = tablename_;
if counter_ = 0 then
raise notice 'Creating table %', tablename_;
execute 'create temporary table ' || tablename_ || ' (counter integer) on commit drop';
execute 'insert into ' || tablename_ || ' (counter) values(1)';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
else
execute 'select counter from ' || tablename_ into counter_;
execute 'update ' || tablename_ || ' set counter = counter + 1';
raise notice 'updating';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
if counter_ > 4 then
raise exception 'Cannot change more than 4 rows in one trancation';
end if;
end if;
return new;
end; $$ language plpgsql;
create trigger trg_bu_test before
update on test
for each row
execute procedure trg_check_max_4_updated_records();
update test set id = 10 where id <= 1;
update test set id = 10 where id <= 2;
update test set id = 10 where id <= 3;
update test set id = 10 where id <= 4;
update test set id = 10 where id <= 5;
rollback;
The main idea is to have a trigger on 'before update for each row' that creates (if necessary) a temporary table (that is dropped at the end of transaction). In this table there is just one row with one value, that is the number of updated rows in current transaction. For each update the value is incremented. If the value is bigger than 4, the transaction is stopped.
But I think that this is a wrong solution for your problem. What's a problem to run such wrong query that you've written about, twice, so you'll have 8 rows changed. What about deletion rows or truncating them?
PostgreSQL has two types of triggers: row and statement triggers. Row triggers only work within the context of a row so you can't use those. Unfortunately, "before" statement triggers don't see what kind of change is about to take place so I don't believe you can use those, either.
Based on that, I would say it's unlikely you'll be able to build that kind of protection into the database using triggers, not unless you don't mind using an "after" trigger and rolling back the transaction if the condition isn't satisfied. Wouldn't mind being proved wrong. :)
Have a look at using Serializable Isolation Level. I believe this will give you a consistent view of the database data within your transaction. Then you can use option #1 that MusiGenesis mentioned, without the timing vulnerability. Test it of course to validate.
I've never worked with postgresql, so my answer may not apply. In SQL Server, your trigger can call a stored procedure which would do one of two things:
Perform a SELECT COUNT(*) to determine the number of records that will be affected by the UPDATE, and then only execute the UPDATE if the count is 4 or less
Perform the UPDATE within a transaction, and only commit the transaction if the returned number of rows affected is 4 or less
No. 1 is timing vulnerable (the number of records affected by the UPDATE may change between the COUNT(*) check and the actual UPDATE. No. 2 is pretty inefficient, if there are many cases where the number of rows updated is greater than 4.