count number of rows to be affected before update in trigger - postgresql

I want to know number of rows that will be affected by UPDATE query in BEFORE per statement trigger . Is that possible?
The problem is that i want to allow only queries that will update up to 4 rows. If affected rows count is 5 or more i want to raise error.
I don't want to do this in code because i need this check on db level.
Is this at all possible?
Thanks in advance for any clues on that

Write a function that updates the rows for you or performs a rollback. Sorry for poor style formatting.
create function update_max(varchar, int)
RETURNS void AS
$BODY$
DECLARE
sql ALIAS FOR $1;
max ALIAS FOR $2;
rcount INT;
BEGIN
EXECUTE sql;
GET DIAGNOSTICS rcount = ROW_COUNT;
IF rcount > max THEN
--ROLLBACK;
RAISE EXCEPTION 'Too much rows affected (%).', rcount;
END IF;
--COMMIT;
END;
$BODY$ LANGUAGE plpgsql
Then call it like
select update_max('update t1 set id=id+10 where id < 4', 3);
where the first param ist your sql-Statement and the 2nd your max rows.

Simon had a good idea but his implementation is unnecessarily complicated. This is my proposition:
create or replace function trg_check_max_4()
returns trigger as $$
begin
perform true from pg_class
where relname='check_max_4' and relnamespace=pg_my_temp_schema();
if not FOUND then
create temporary table check_max_4
(value int check (value<=4))
on commit drop;
insert into check_max_4 values (0);
end if;
update check_max_4 set value=value+1;
return new;
end; $$ language plpgsql;

I've created something like this:
begin;
create table test (
id integer
);
insert into test(id) select generate_series(1,100);
create or replace function trg_check_max_4_updated_records()
returns trigger as $$
declare
counter_ integer := 0;
tablename_ text := 'temptable';
begin
raise notice 'trigger fired';
select count(42) into counter_
from pg_catalog.pg_tables where tablename = tablename_;
if counter_ = 0 then
raise notice 'Creating table %', tablename_;
execute 'create temporary table ' || tablename_ || ' (counter integer) on commit drop';
execute 'insert into ' || tablename_ || ' (counter) values(1)';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
else
execute 'select counter from ' || tablename_ into counter_;
execute 'update ' || tablename_ || ' set counter = counter + 1';
raise notice 'updating';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
if counter_ > 4 then
raise exception 'Cannot change more than 4 rows in one trancation';
end if;
end if;
return new;
end; $$ language plpgsql;
create trigger trg_bu_test before
update on test
for each row
execute procedure trg_check_max_4_updated_records();
update test set id = 10 where id <= 1;
update test set id = 10 where id <= 2;
update test set id = 10 where id <= 3;
update test set id = 10 where id <= 4;
update test set id = 10 where id <= 5;
rollback;
The main idea is to have a trigger on 'before update for each row' that creates (if necessary) a temporary table (that is dropped at the end of transaction). In this table there is just one row with one value, that is the number of updated rows in current transaction. For each update the value is incremented. If the value is bigger than 4, the transaction is stopped.
But I think that this is a wrong solution for your problem. What's a problem to run such wrong query that you've written about, twice, so you'll have 8 rows changed. What about deletion rows or truncating them?

PostgreSQL has two types of triggers: row and statement triggers. Row triggers only work within the context of a row so you can't use those. Unfortunately, "before" statement triggers don't see what kind of change is about to take place so I don't believe you can use those, either.
Based on that, I would say it's unlikely you'll be able to build that kind of protection into the database using triggers, not unless you don't mind using an "after" trigger and rolling back the transaction if the condition isn't satisfied. Wouldn't mind being proved wrong. :)

Have a look at using Serializable Isolation Level. I believe this will give you a consistent view of the database data within your transaction. Then you can use option #1 that MusiGenesis mentioned, without the timing vulnerability. Test it of course to validate.

I've never worked with postgresql, so my answer may not apply. In SQL Server, your trigger can call a stored procedure which would do one of two things:
Perform a SELECT COUNT(*) to determine the number of records that will be affected by the UPDATE, and then only execute the UPDATE if the count is 4 or less
Perform the UPDATE within a transaction, and only commit the transaction if the returned number of rows affected is 4 or less
No. 1 is timing vulnerable (the number of records affected by the UPDATE may change between the COUNT(*) check and the actual UPDATE. No. 2 is pretty inefficient, if there are many cases where the number of rows updated is greater than 4.

Related

PostgreSQL concurrent check if row exists in table

Suppose I have simple logic:
If user had no balance accrual earlier (which is recorded in accruals table), we must give him 100$ to balance:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
BEGIN
--LOCK TABLE accruals; -- label A
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
END
$$;
COMMIT;
The problem of this transaction is it's not concurrent.
Running this transaction in parrallel results getting user_id=1 with balance=200 and 2 accruals recorded.
How do I test concurrency ?
1. I run in session 1: START TRANSACTION; LOCK TABLE accruals;
2. In session 2 and session 3 I run this transaction
3. In session 1: ROLLBACK
The question is: How do I make this 100% concurrent and make sure user will have 100$ only once.
The only way I see is to lock the table (label A in code sample)
But do I have another way ?
The simplest way is probably to use the serializable isolation level (by changing default_transaction_isolation). Then one of the processes should get something like "ERROR: could not serialize access due to concurrent update"
If you want to keep the isolation level at 'read committed', then you can just count accruals at the end and throw an error:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
_count int;
BEGIN
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
select count(*) into _count from accruals where user_id=1;
IF _count >1 THEN
RAISE SQLSTATE '22023';
END IF;
END
$$;
COMMIT;
This works because one process will block the other on the UPDATE (assuming non-zero number of rows get updated), and by the time one process commits to release the blocked process, its inserted row will be visible to the other one.
Formally there is then no need for the first check, but if you don't want a lot of churn due to rolled back INSERT and UPDATE, you might want to keep it.

Postgresql: sum of an ever-growing list

I have a select sum(field) from table where boolean_field = true kind of query.
This are now about 8 million rows and only going to get bigger.
Understandably, Postgres does seq scan instead of index, to avoid having to load all of it into memory at once.
At least that's how I understand it.
What would be a good way to make this run at an acceptable speed, at any size?
EDIT: version is Postgres 11.2
EDIT2: Already using an index on (boolean_field) where boolean_field = true
You could keep a table that contains the sum:
START TRANSACTION;
CREATE TABLE table_sum (s double precision NOT NULL);
CREATE FUNCTION upd_sum() RETURNS trigger
LANGUAGE plpgsql AS
BEGIN
CASE TG_OP
WHEN 'INSERT' THEN
IF NEW.boolean_field THEN
UPDATE table_sum SET s = s + NEW.field;
END IF;
RETURN NEW;
WHEN 'UPDATE' THEN
IF NEW.boolean_field OR OLD.boolean_field THEN
UPDATE table_sum
SET s = s
+ CASE WHEN NEW.boolean_field THEN NEW.field ELSE 0.0 END
- CASE WHEN OLD.boolean_field THEN OLD.field ELSE 0.0 END;
END IF;
RETURN NEW;
WHEN 'DELETE' THEN
IF OLD.boolean_field THEN
UPDATE table_sum SET s = s - OLD.field;
END IF;
RETURN OLD;
WHEN 'TRUNCATE' THEN
UPDATE table_sum SET s = 0.0;
RETURN NULL;
END CASE;
END;$$;
CREATE TRIGGER upd_sum1 AFTER INSERT OR UPDATE OR DELETE ON "table"
FOR EACH ROW EXECUTE PROCEDURE upd_sum();
CREATE TRIGGER upd_sum2 AFTER TRUNCATE ON "table"
FOR EACH STATEMENT EXECUTE PROCEDURE upd_sum();
INSERT INTO table_sum
SELECT sum(field) FROM "table"
WHERE boolean_field;
COMMIT;
Some explanations:
Documentation links to PL/pgSQL, PL/pgSQL triggers and CREATE TRIGGER
This script is in a single transaction so that the counter is initialized correctly in the face of concurrent transactions. CREATE TRIGGER will take an ACCESS EXCLUSIVE lock on the table, so that all concurrent data access is blocked until the counter is initialized.

postgres count from table efficient way

In my application we are using postgresql,now it has one million records in summary table.
When I run the following query it takes 80,927 ms
SELECT COUNT(*) AS count
FROM summary_views
GROUP BY question_id,category_type_id
Is there any efficient way to do this?
COUNT(*) in PostgreSQL tends to be slow. It's a feature of MVCC. One of the workarounds of the problem is a row counting trigger with a helper table:
create table table_count(
table_count_id text primary key,
rows int default 0
);
CREATE OR REPLACE FUNCTION table_count_update()
RETURNS trigger AS
$BODY$
begin
if tg_op = 'INSERT' then
update table_count set rows = rows + 1
where table_count_id = TG_TABLE_NAME;
elsif tg_op = 'DELETE' then
update table_count set rows = rows - 1
where table_count_id = TG_TABLE_NAME;
end if;
return null;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Next step is to add proper trigger declaration for each table you'd like to use it with. For example for table tab_name:
begin;
insert into table_count values
('tab_name',(select count(*) from tab_name));
create trigger tab_name_table_count after insert or delete
on tab_name for each row execute procedure table_count_update();
commit;
It is important to run in a transaction block to keep actual count and helper table in sync in case of delete or insert between initial count and trigger creation. Transaction guarantees this. From now on to get current count instantly, just invoke:
select rows from table_count where table_count_id = 'tab_name';
Edit: In case of your group by clause, you'll need more sophisticated trigger function and count table.

a postgres update trigger performs everything else except the actual update

Let's use a test table :
CREATE TABLE labs.date_test
(
pkey int NOT NULL,
val integer,
date timestamp without time zone,
CONSTRAINT date_test_pkey PRIMARY KEY (pkey)
);
I have a trigger function defined as below. It is a function to insert a date into a specified column in the table. Its arguments are the primary key, the name of the date field, and the date to be inserted:
CREATE OR REPLACE FUNCTION tf_set_date()
RETURNS trigger AS
$BODY$
DECLARE
table_name text;
pkey_col text := TG_ARGV[0];
date_col text := TG_ARGV[1];
date_val text := TG_ARGV[2];
BEGIN
table_name := format('%I.%I', TG_TABLE_SCHEMA, TG_TABLE_NAME);
IF TG_NARGS != 3 THEN
RAISE 'Wrong number of args for tf_set_date()'
USING HINT='Check triggers for table ' || table_name;
END IF;
EXECUTE format('UPDATE %s SET %I = %s' ||
' WHERE %I = ($1::text::%s).%I',
table_name, date_col, date_val,
pkey_col, table_name, pkey_col )
USING NEW;
RAISE NOTICE '%', NEW;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
The actual trigger definition is as follows:
CREATE TRIGGER t_set_ready_date
BEFORE UPDATE OF val
ON labs.date_test
FOR EACH ROW
EXECUTE PROCEDURE tf_set_date('pkey', 'date', 'localtimestamp(0)');
Now say I do: INSERT INTO TABLEdate_test(pkey) values(1);`
Then I perform an update as follows:
UPDATE labs.date_test SET val = 1 WHERE pkey = 1;
Now the date gets inserted as expected. But the val field is still NULL. It does not have 1 as one would expect (or rather as I expected).
What am I doing wrong? The RAISE NOTICE in the trigger shows that NEW is still what I expect it to be. Aren't UPDATEs allowed in BEFORE UPDATE triggers? One comment about postgres triggers seems to indicate that original the UPDATE gets overwritten if there is an UPDATE statement in a BEFORE UPDATE trigger. Can someone help me out?
EDIT
I am trying to update the same table that invoked the trigger, and that too the same row which is to be modified by the UPDATE statement that invoked the trigger. I am running Postgresql 9.2
Given all the dynamic table names it isn't entirely clear if this trigger issues an update on the same table that invoked the trigger.
If so: That won't work. You can't UPDATE some_table in a BEFORE trigger on some_table. Or, more strictly, you can, but if you update any row that is affected by the statement that's invoking the trigger results will be unpredictable so it isn't generally a good idea.
Instead, alter the values in NEW directly. You can't do this with dynamic column names, unfortunately; you'll just have to customise the trigger or use an AFTER trigger to do the update after the rows have already been changed.
I am not sure, but your triggers can do recursion calls - it does UPDATE same table from UPDATE trigger. This is usually bad practice, and usually is not good idea to write too generic triggers. But I don't know what you are doing, maybe you need it, but you have to be sure, so you are protected against recursion.
For debugging of triggers is good push to start and to end of function body debug messages. Probably you use GET DIAGNOSTICS statement after EXECUTE statement for information about impact of dynamic SQL
DECLARE
_updated_rows int;
_query text;
BEGIN
RAISE NOTICE 'Start trigger function xxx';
...
_query := format('UPDATE ....);
RAISE NOTICE 'dynamic sql %, %', _query, new;
EXECUTE _query USING new;
GET DIAGNOSICS _updated_rows = ROW_COUNT;
RAISE NOTICE 'Updated rows %', _updated_rows;
...

Variable containing the number of rows affected by previous DELETE? (in a function)

I have a function that is used as an INSERT trigger. This function deletes rows that would conflict with [the serial number in] the row being inserted. It works beautifully, so I'd really rather not debate the merits of the concept.
DECLARE
re1 feeds_item.shareurl%TYPE;
BEGIN
SELECT regexp_replace(NEW.shareurl, '/[^/]+(-[0-9]+\.html)$','/[^/]+\\1') INTO re1;
RAISE NOTICE 'DELETEing rows from feeds_item where shareurl ~ ''%''', re1;
DELETE FROM feeds_item where shareurl ~ re1;
RETURN NEW;
END;
I would like to add to the NOTICE an indication of how many rows are affected (aka: deleted). How can I do that (using LANGUAGE 'plpgsql')?
UPDATE:
Base on some excellent guidance from "Chicken in the kitchen", I have changed it to this:
DECLARE
re1 feeds_item.shareurl%TYPE;
num_rows int;
BEGIN
SELECT regexp_replace(NEW.shareurl, '/[^/]+(-[0-9]+\.html)$','/[^/]+\\1') INTO re1;
DELETE FROM feeds_item where shareurl ~ re1;
IF FOUND THEN
GET DIAGNOSTICS num_rows = ROW_COUNT;
RAISE NOTICE 'DELETEd % row(s) from feeds_item where shareurl ~ ''%''', num_rows, re1;
END IF;
RETURN NEW;
END;
For a very robust solution, that is part of PostgreSQL SQL and not just plpgsql you could also do the following:
with a as (DELETE FROM feeds_item WHERE shareurl ~ re1 returning 1)
select count(*) from a;
You can actually get lots more information such as:
with a as (delete from sales returning amount)
select sum(amount) from a;
to see totals, in this way you could get any aggregate and even group and filter it.
In Oracle PL/SQL, the system variable to store the number of deleted / inserted / updated rows is:
SQL%ROWCOUNT
After a DELETE / INSERT / UPDATE statement, and BEFORE COMMITTING, you can store SQL%ROWCOUNT in a variable of type NUMBER. Remember that COMMIT or ROLLBACK reset to ZERO the value of SQL%ROWCOUNT, so you have to copy the SQL%ROWCOUNT value in a variable BEFORE COMMIT or ROLLBACK.
Example:
BEGIN
DECLARE
affected_rows NUMBER DEFAULT 0;
BEGIN
DELETE FROM feeds_item
WHERE shareurl = re1;
affected_rows := SQL%ROWCOUNT;
DBMS_OUTPUT.
put_line (
'This DELETE would affect '
|| affected_rows
|| ' records in FEEDS_ITEM table.');
ROLLBACK;
END;
END;
I have found also this interesting SOLUTION (source: http://markmail.org/message/grqap2pncqd6w3sp )
On 4/7/07, Karthikeyan Sundaram wrote:
Hi,
I am using 8.1.0 postgres and trying to write a plpgsql block. In that I am inserting a row. I want to check to see if the row has been
inserted or not.
In oracle we can say like this
begin
insert into table_a values (1);
if sql%rowcount > 0
then
dbms.output.put_line('rows inserted');
else
dbms.output.put_line('rows not inserted');
end if; end;
Is there something equal to sql%rowcount in postgres? Please help.
Regards skarthi
Maybe:
http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
Click on the link above, you'll see this content:
37.6.6. Obtaining the Result Status There are several ways to determine the effect of a command. The first method is to use the GET
DIAGNOSTICS command, which has the form:
GET DIAGNOSTICS variable = item [ , ... ];This command allows
retrieval of system status indicators. Each item is a key word
identifying a state value to be assigned to the specified variable
(which should be of the right data type to receive it). The currently
available status items are ROW_COUNT, the number of rows processed by
the last SQL command sent down to the SQL engine, and RESULT_OID, the
OID of the last row inserted by the most recent SQL command. Note that
RESULT_OID is only useful after an INSERT command into a table
containing OIDs.
An example:
GET DIAGNOSTICS integer_var = ROW_COUNT; The second method to
determine the effects of a command is to check the special variable
named FOUND, which is of type boolean. FOUND starts out false within
each PL/pgSQL function call. It is set by each of the following types
of statements:
A SELECT INTO statement sets FOUND true if a row is assigned, false if
no row is returned.
A PERFORM statement sets FOUND true if it produces (and discards) a
row, false if no row is produced.
UPDATE, INSERT, and DELETE statements set FOUND true if at least one
row is affected, false if no row is affected.
A FETCH statement sets FOUND true if it returns a row, false if no row
is returned.
A FOR statement sets FOUND true if it iterates one or more times, else
false. This applies to all three variants of the FOR statement
(integer FOR loops, record-set FOR loops, and dynamic record-set FOR
loops). FOUND is set this way when the FOR loop exits; inside the
execution of the loop, FOUND is not modified by the FOR statement,
although it may be changed by the execution of other statements within
the loop body.
FOUND is a local variable within each PL/pgSQL function; any changes
to it affect only the current function.
I would to share my code (I had this idea from Roelof Rossouw):
CREATE OR REPLACE FUNCTION my_schema.sp_delete_mytable(_id integer)
RETURNS integer AS
$BODY$
DECLARE
AFFECTEDROWS integer;
BEGIN
WITH a AS (DELETE FROM mytable WHERE id = _id RETURNING 1)
SELECT count(*) INTO AFFECTEDROWS FROM a;
IF AFFECTEDROWS = 1 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
EXCEPTION WHEN OTHERS THEN
RETURN 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;