Postgres trigger when any of "top" rows change - postgresql

I have a table with about 10,000 users. I want to watch the top 10, as sorted by score, and give realtime updates whenever the composition of the list changes or any of the scores of the top 10 change.
What's the best way to do this with a trigger, and will this be a highly expensive trigger to keep running?

As long as you have an index on score, it would be cheap to have an AFTER UPDATE trigger FOR EACH ROW that contains:
IF (SELECT count(*)
FROM (SELECT 1
FROM users
WHERE score > NEW.score
LIMIT 10
) top_users
) < 10
OR
(SELECT count(*)
FROM (SELECT 1
FROM users
WHERE score > OLD.score
LIMIT 10
) top_users
) < 10
THEN
/* your action here */
END IF;
You'd also need a DELETE trigger that contains only the second part of the query and an INSERT trigger with only the first part.
The problem I see here is /* your action here */.
This should be a short operation like adding something to a queue, otherwise you might end up with long transactions and long locks.

Related

How to ensure sum of amounts in one table is less than the amount of another?

Say I have a table of marbles
id
color
total
1
blue
5
2
red
10
3
swirly
3
and I need to put them into bags with a unique constraint on (bag_id, marble_id):
bag_id
marble_id
quantity
1
1 (blue)
2
1
2 (red)
3
2
1 (blue)
2
I have a query for bagging at most the number of remaining marbles
WITH unbagged AS (
SELECT
marble.total - COALESCE( SUM( bag.quantity ), 0 ) AS quantity
FROM marble
LEFT JOIN bag ON marble.id = bag.marble_id
WHERE marble.id = :marble_id
GROUP BY marble.id )
INSERT INTO bag (bag_id, marble_id, quantity)
SELECT
:bag_id,
:marble_id,
LEAST( :quantity, unbagged.quantity )
FROM unbagged
ON CONFLICT (bag_id, marble_id) DO UPDATE SET
quantity = bag.quantity
+ LEAST(
EXCLUDED.quantity,
(SELECT quantity FROM unbagged) )
which works great until one day, it gets called twice at exactly the same time with the same item and I end up with 6 swirly marbles in a bag (or maybe 3 each in 2 bags), even though there are only 3 total.
I think I understand why, but I don't know how to prevent this from happening?
Your algorithm isn't exactly clear to me, but the core issue is concurrency.
Manual locking
Your query processes a single given row in table marble at a time. The cheapest solution is to take an exclusive lock on that row (assuming that's the only query writing to marble and bag). Then the next transaction trying to mess with the same kind of marble has to wait until the current one has committed (or rolled back).
BEGIN;
SELECT FROM marble WHERE id = :marble_id FOR UPDATE; -- row level lock
WITH unbagged AS ( ...
COMMIT;
SERIALIZABLE
Or use serializable transaction isolation, that's the more expensive "catch-all" solution - and be prepared to repeat the transaction in case of a serialization error. Like:
BEGIN ISOLATION LEVEL SERIALIZABLE;
WITH unbagged AS ( ...
COMMIT;
Related:
How to atomically replace a subset of table data
Atomic UPDATE .. SELECT in Postgres

How to select for update one row from table A and all joined rows from table B in Postgres?

I have a table that is a queue of tasks with each task requiring exclusive access to several resources. I want my query to select a single task that doesn't need resources claimed by other similar sessions.
If each task had to work on a single resource I would've written something like this:
select *
from tasks t
inner join resources r
on r.id = t.resource_id
order by t.submitted_ts
limit 1
for update skip locked
But since I have multiple resources I somehow have to lock them all:
select *
from tasks t
inner join task_details td
on t.id = td.task_id
inner join resources r
on r.id = td.resource_id
order by t.submitted_ts, t.id
limit ???
for update skip locked
I cannot limit by 1, since I need to lock all joined rows of resources.
It also seems to me that I should try and lock all rows of resources, so it must be not skip locked, but nowait for resources and skip locked for tasks.
First I had to create a helper function that either locks all linked rows or not:
create or replace function try_lock_resources(p_task_id bigint)
returns boolean
language plpgsql
as $$
begin
perform *
from task_details td
join resources r
on td.resource_id = r.resource_id
where td.task_id = p_task_id
for update of r nowait;
return true;
exception when lock_not_available then
return false;
end;
$$;
Then I needed to invoke this function for each row:
select *
from tasks
where processing_status = 'Unprocessed'
and try_lock_resources(task_id)
order by created_ts
limit 1
for update skip locked
After this query is run, only the returned row and its associated resources are locked. I verified that an identical query from another session returns the first unprocessed tasks that has no resources in common with the one returned by the first session.
P.S.: the original answer used a different query (which you shouldn't use as is):
with unprocessed_tasks as materialized (
select *
from tasks t
where processing_status = 'Unprocessed'
order by created_ts
)
select *
from unprocessed_tasks
where try_lock_resources(task_id)
limit 1
for update skip locked
The problem with this query is that the following could (and did happen):
session A runs the query, locks task X and starts working on it
session B starts running the query, the materialized CTE is run first, returning task X among other tasks
session A commits the transaction and releases all locks
session B finishes running the query, locks task X and starts working on it
LIMIT clause applies to joined table.
Instead of table A use subquery with it's own LIMIT.
SELECT
"table a"."учебный год",
"table b".семестр
FROM
(SELECT
"Учебный год"."учебный год"
FROM
"Учебный год"
ORDER BY
"Учебный год"."учебный год"
LIMIT 1) "table a"
INNER JOIN "Семестр" "table b" ON "table b"."учебный год" = "table a"."учебный год"

Is there any way to update this column faster using PostgreSQL

I have about 200,000,000 rows and I am trying to update one of the columns, and this query seems particularly slow, so I am not sure what exactly wrong or if it is just slow.
UPDATE table1 p
SET location = a.location
FROM table2 a
WHERE p.idv = a.idv;
I curently on idv for both of the tables. Is there someway to make this faster?
Encounter the same problem several weeks ago , finally I use the following strategies to drastically improve the speed. I guess it is not the best approach , but just for your reference.
Write a simple function which accept a range of Id. The function will execute the update SQL but just update these range of ID.
Also add 'location != a.location' to the where clause . I heard that it can help to reduce the table become bloated which will affect query performance and need to do vacuum to restore the performance.
I execute the function continuously using about 30 threads which intuitively I think it can reduce the total time required by approximately 30 times. You can adjust to use a even higher number of threads if you are ambitious enough.
So it executes something likes below concurrently :
update table1 p set location = a.location from table a where p.idv = a.idv and location != a.location and p.id between 1 and 100000;
update table1 p set location = a.location from table a where p.idv = a.idv and location != a.location and p.id between 100001 and 200000;
update table1 p set location = a.location from table a where p.idv = a.idv and location != a.location and p.id between 200001 and 300000;
.....
.....
Also it has another advantage that I can know the update progress and what is the estimated remaining time to go by printing some simple timing statistic in each function.
Creating a new table can be faster than update existing data. So you can try the following:
CREATE TABLE new_table AS
SELECT
a.*, -- here you can set all fields you need
CALESCE(b.location, a.location) location -- update location field from table2
FROM table1 a
LEFT JOIN table2 b ON b.idv = a.idv;
After creation you will be able to drop old table and to rename the new.

historical result of SELECT statement

I want to query a large number of rows and displayed to the user, however the user will see only for example 10 rows and I will use LIMIT and OFFSET, he will press 'next' button in the user interface and the next 10 rows will be fetched.
The database will be updated all the time, is there any way to guarantee that the user will see the data of the next 10 rows as they were in the first select, so any changes in the data will not be reflected if he choose to see the next 10 rows of result.
This is like using SELECT statement as a snapshot of the past, any subsequent updates after the first SELECT will not be visible for the subsequent SELECT LIMIT OFFSET.
You can use cursors, example:
drop table if exists test;
create table test(id int primary key);
insert into test
select i from generate_series(1, 50) i;
declare cur cursor with hold for
select * from test order by id;
move absolute 0 cur; -- get 1st page
fetch 5 cur;
id
----
1
2
3
4
5
(5 rows)
truncate test;
move absolute 5 cur; -- get 2nd page
fetch 5 cur; -- works even though the table is already empty
id
----
6
7
8
9
10
(5 rows)
close cur;
Note that it is rather expensive solution. A declared cursor creates a temporary table with a complete result set (a snapshot). This can cause significant server load. Personally I would rather search for alternative approaches (with dynamic results).
Read the documentation: DECLARE, FETCH.

Best way to keep column value up to date

I have the following tables (simplified) :
They are connected with a foreign key ( element_id). If all acquisition entries of an element have a delivery_time which is greater than 28, the element gets the status critical. At the moment i use a view
based on the elements table which checks for every element_id if it is critical. The function i use reads out the min(dilvery_time) of an element and checks if it is greater than 28. This calculation is done every time the view is opened.
The soltution works, but it's slow. Also i think the approach above does much unnecessary work, because the critical status can only changes if the table acquisition is modified.
My new approach would be to add a boolean column "critical" to the elements table. Also i would set up a trigger function on the acquisitions table which updates the critical status of the modified element (if necessary). Then the critical status should always be up to date and the selects should be much faster.
Is my new approach suitable, or are there better ways to solve my problem ?
Edit, here are the create statemens of the tables,view and function:
CREATE TABLE elements (
element_id serial primary key,
elemnt_name varchar(100));
CREATE TABLE acquisitions (
acquisition_id serial primary key,
element_id int NOT NULL,
delivery_time int,
foreign key (element_id) references elements(element_id));
CREATE OR REPLACE FUNCTION is_element_critical(param integer)
RETURNS boolean AS
$BODY$
DECLARE
delivery_date_int integer;
BEGIN
SELECT into delivery_date_int min(delivery_time)
from acquisitions where element_id = param;
IF delivery_date_int > 28 THEN
RETURN true;
ELSE
return false;
END IF;
END
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE OR REPLACE VIEW elementview AS
SELECT elements.element_id,
elements.elemnt_name, is_element_critical(elements.element_id)
AS is_element_critical
FROM elements;
With ~10000 acquisitions and ~ 1500 elements a select on the elementview takes 1600 ms.
One problem with your approach is that the function is evaluated for each row in the view.
You could try to use a join and process this in a set-based manner (which is very often a better approach than a row-by-row processing).
CREATE OR REPLACE VIEW elementview
AS
SELECT e.element_id,
e.elemnt_name,
min(a.delivery_time) > 28 as is_element_critical
FROM elements e
JOIN acquisitions a ON a.element_id = e.element_id
GROUP BY e.element_id, e.elemnt_name;
Adding an index on acquisitions(element_id, delivery_time) might speed up this query.
If you don't have an acquisition for each element you might want to change this to an LEFT JOIN.
If the number of acquisitions that are not critical is much lower than those that are critical, you might be able to speed this up even further using a partial index:
create index idx_ac on acquisitions (element_id, delivery_time)
where delivery_time > 28;
And then only join against acquisitions that are critical:
SELECT e.element_id,
e.elemnt_name,
min(a.delivery_time) > 28 as is_element_critical
FROM elements e
LEFT JOIN acquisitions a ON a.element_id = e.element_id and a.delivery_time > 28
GROUP BY e.element_id, e.elemnt_name;
The left join is necessary because of the added condition and a.delivery_time > 28.
On my laptop the first query runs in 35ms (2000 elements, 30000 acquisitions). The second one runs in 5ms. Every element has at least one acquisition that is critical (which is probably not really realistic)