Constraint on sum from rows - postgresql

I've got a table in PostgreSQL 9.4:
user_votes (
user_id int,
portfolio_id int,
car_id int
vote int
)
Is it possible to put a constraint on the table so a user max can have 99 point to vote with in each portfolio?
This means that a user can have multiple rows consisting of the same user_id and portfolio_id, but different car_id and vote. The sum on votes should never exceed 99, but it can be placed among different cars.
So doing:
INSERT INTO user_vores (user_id, portfolio_id, car_id, vote) VALUES
(1, 1, 1, 20),
(1, 1, 7, 40),
(1, 1, 9, 25)
would all be allowed, but when trying to add something that exceeds 99 votes should fail, like another row:
INSERT INTO user_vores (user_id, portfolio_id, car_id, vote) VALUES
(1, 1, 21, 40)

Unfortunately no, if you tried to create such a constraint you will see this error message:
ERROR: aggregate functions are not allowed in check constraints
But the wonderfull thing about postgresql is that there is always more than one way to skin a cat. You can use a BEFORE trigger to check that the data you are trying to insert fullfills our requirements.
Row-level triggers fired BEFORE can return null to signal the trigger
manager to skip the rest of the operation for this row (i.e.,
subsequent triggers are not fired, and the INSERT/UPDATE/DELETE does
not occur for this row). If a nonnull value is returned then the
operation proceeds with that row value.
Inside your trigger you would count the number of votes
SELECT COUNT(*) into vote_count FROM user_votes WHERE user_id = NEW.user_id
Now if vote_count is 99 you return NULL and the data will not be inserted.

Related

Fast new row insertion if a value of a column depends on previous value in existing row

I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.

Is it possible to look at the output of the previous row of a PostgreSQL query?

This is the question: Is it possible to look at the outputs, what has been selected, from the previous row of a running SQL query in Postgres?
I know that lag exists to look at the inputs, the "from" of the query. I also know that a CTE, subquery or lateral join can solve most issues of this kind. But I think the problem I'm facing genuinely requires a peek at the output of the previous row. Why? Because the output of the current row depends on a constant from a lookup table and the value used too look up that constant is an aggregate of all the previous rows. And if that lookup returns the wrong constant all subsequent rows will be increasingly off from the expected value.
The whole rest of this text is a simplified example based on the problem I'm facing. It should be possible to input it to PostgreSQL 12 and above and play around. I'm terribly sorry that it is as complicated as it is, but I think it is the most simple I can make it while still retaining the core issue: lookup in lookup table based on an aggregate from all previous rows as well as the fact that the "inventory" that's being tracked is modeled as a series of transactions of two discrete types.
The database itself exists to keep track of multiple fish farms, or cages full of fish. Fish can be moved/transferred from between these farms and the farms are fed about daily. Why not just carry the aggregate as a field in the table? Because it should be possible to switch out the lookup table after the season is over, to adjust it to better match with reality.
-- A listing of all groups of fish ever grown.
create table farms (
id bigserial primary key,
start timestamp not null,
stop timestamp
);
insert into farms
(id, start)
values (
1, '2021-02-01T13:37'
);
-- A transfer of fish from one odling to another.
-- If the source is null the fish is transferred from another fishery outside our system.
-- If the destination is null the fish is being slaughtered, removed from the system.
create table transfers (
source bigint references farms(id),
destination bigint references farms(id),
timestamp timestamp not null default current_timestamp,
total_weight_g bigint not null constraint positive_nonzero_total_weight_g check (total_weight_g > 0),
average_weight_g bigint not null constraint positive_nonzero_average_weight_g check (average_weight_g > 0),
number_fish bigint generated always as (total_weight_g / average_weight_g) stored
);
insert into transfers
(source, destination, timestamp, total_weight_g, average_weight_g)
values
(null, 1, '2021-02-01T16:38', 5, 5),
(null, 1, '2021-02-15T16:38', 500, 500);
-- Transactions of fish feed into a farm.
create table feedings (
id bigserial primary key,
growth_table bigint not null,
farm bigint not null references farms(id),
amount_g bigint not null constraint positive_nonzero_amunt_g check (amount_g > 0),
timestamp timestamp not null
);
insert into feedings
(farm, growth_table, amount_g, timestamp)
values
(1, 1, 1, '2021-02-02T13:37'),
(1, 1, 1, '2021-02-03T13:37'),
(1, 1, 1, '2021-02-04T13:37'),
(1, 1, 1, '2021-02-05T13:37'),
(1, 1, 1, '2021-02-06T13:37'),
(1, 1, 1, '2021-02-07T13:37');
create view combined_feed_and_transfer_history as
with transfer_history as (
select timestamp, destination as farm, total_weight_g, average_weight_g, number_fish
from transfers as deposits
where deposits.destination = 1 -- TODO: This view only works for one farm, fix that.
union all
select timestamp, source as farm, -total_weight_g, -average_weight_g, -number_fish
from transfers as withdrawals
where withdrawals.source = 1
)
select timestamp, farm, total_weight_g, number_fish, average_weight_g, null as growth_table
from transfer_history
union all
select timestamp, farm, amount_g, 0 as number_fish, 0 as average_weight_g, growth_table
from feedings
order by timestamp;
-- Conversion tables from feed to gained weight.
create table growth_coefficients (
growth_table bigserial not null,
average_weight_g bigint not null constraint positive_nonzero_weight check (average_weight_g > 0),
feed_conversion_rate double precision not null constraint positive_foderkonverteringsfaktor check (feed_conversion_rate >= 0),
primary key(growth_table, average_weight_g)
);
insert into growth_coefficients
(average_weight_g, feed_conversion_rate, growth_table)
values
(5.00,0.10,1),
(10.00,10.00,1),
(20.00,1.30,1),
(50.00,1.31,1),
(100.00,1.32,1),
(300.00,1.36,1),
(600.00,1.42,1),
(1000.00,1.50,1),
(1500.00,1.60,1),
(2000.00,1.70,1),
(2500.00,1.80,1),
(3000.00,1.90,1),
(4000.00,2.10,1),
(5000.00,2.30,1);
-- My current solution is a bad one. It does a CTE that sums over all events but does not account
-- for the feed conversion rate. That means that the average weight used too look up the feed
-- conversion rate will diverge more and more from reality the further into the season time goes.
-- This is why it is important to look at the output, the average weight, of the previous row.
-- We start by summing up all the transfer and feed events to get a rough average_weight_g.
with estimate as (
select
timestamp,
farm,
total_weight_g as transaction_size_g,
growth_table,
sum(total_weight_g) over (order by timestamp) as sum_weight_g,
sum(number_fish) over (order by timestamp) as sum_number_fish,
sum(total_weight_g) over (order by timestamp) / sum(number_fish) over (order by timestamp) as average_weight_g
from
combined_feed_and_transfer_history
)
select
timestamp,
sum_number_fish,
transaction_size_g as trans_g,
sum_weight_g,
closest_lookup_table_weight.average_weight_g as lookup_g,
converted_weight_g as conv_g,
sum(converted_weight_g) over (order by timestamp) as sum_conv_g,
sum(converted_weight_g) over (order by timestamp) / sum_number_fish as sum_average_g
from
estimate
join lateral ( -- We then use this estimated_average_weight to look up the closest constant in the growth coefficient table.
(select gc.average_weight_g - estimate.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g >= estimate.average_weight_g order by gc.average_weight_g asc limit 1)
union all
(select estimate.average_weight_g - gc.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g <= estimate.average_weight_g order by gc.average_weight_g desc limit 1)
order by diff
limit 1
) as closest_lookup_table_weight
on true
join lateral ( -- If the historical event is a feeding we need to lookup the feed conversion rate.
select case when growth_table is null then 1
else (select feed_conversion_rate
from growth_coefficients gc
where gc.growth_table = growth_table
and gc.average_weight_g = closest_lookup_table_weight.average_weight_g)
end
) as growth_coefficient
on true
join lateral (
select feed_conversion_rate * transaction_size_g as converted_weight_g
) as converted_weight_g
on true;
At the very bottom is my current "solution". With the above example data the sum_conv_g should end up being 5.6, but due to the aggregate being used as the lookup not accounting for the conversion rate the sum_conv_g ends up 45.2 instead.
One idea I had was if there perhaps something like query-local variables one could use to store the sum_average_g between rows? There's always the escape hatch of just querying out the transactions to my generic programming language Clojure and solving it there, but it would be neat if it could be solved entirely within the database.
You have to formulate a recursive subquery. I posted a simplified version of this question over at the DBA SE and got the answer there. The answer to that question can be found here and can be expanded to this more complicated question, though I would wager that no one will ever have the interest to do that.

postgresql Update with SUM

In the given simplified example, I would like to update ‘total_score’ for each event in the events table with sum of score of users which participated in that event.
Using cursor for this is easy to comprehend and implement, but I would like to refactor this to set-based approach, using SELECT / UPDATE on the entire column as cursor is just too slow and probably a bad practice in this case.
I would be grateful if you could not only provide required query, but also explain or link to explanations how to think in ‘set-based’ manner rather than procedural cursor one.
POSTGRESQL version = 13
SETUP:
Tables are simplified as much as possible, there are no PKs etc.
There are users, user-activities and events in which users can participate. Every event is unique, takes place at certain time on certain world.
If user activity time on certain world coincide with the whole event duration, it is considered that this user participated in the event. For each event I need to sum score of all users who participated and then update the total_score (as the calculated sum).
TABLES:
Analyzed users table (contains selected users from whole user base who meet certain criteria (eg. Age etc.). Users inserted below already meet those requirements.
CREATE TABLE analyzed_user
(
user_id bigint NOT NULL,
score numeric
);
INSERT INTO analyzed_user VALUES(100, 400);
INSERT INTO analyzed_user VALUES(200, 800);
INSERT INTO analyzed_user VALUES(300, 1500);
Events table - events in which users could participate. Events always end before 23:59 (never lap over next day)
CREATE TABLE event
(
event_id bigint NOT NULL,
date date,
start_time time without time zone,
end_time time without time zone,
world varchar,
total_score numeric NOT NULL DEFAULT 0
);
INSERT INTO event VALUES (1, '2021-07-27', '08:00:00', '09:00:00', 'Earth', 0);
INSERT INTO event VALUES (2, '2021-07-27', '12:00:00', '13:00:00', 'Earth', 0);
INSERT INTO event VALUES (3, '2021-07-27', '14:00:00', '15:00:00', 'Mars', 0);
INSERT INTO event VALUES (4, '2021-07-27', '20:00:00', '21:00:00', 'Mars', 0);
Users activity table (for simplicity also end before 23:59 (server restarts at midnight and kicks everybody out)).
CREATE TABLE activity
(
user_id bigint NOT NULL,
date date,
start_time time without time zone,
end_time time without time zone,
world varchar
);
INSERT INTO activity VALUES (100, '2021-07-27', '07:00:00', '14:00:00', 'Earth');
INSERT INTO activity VALUES (100, '2021-07-27', '23:00:00', '23:30:00', 'Earth');
INSERT INTO activity VALUES (100, '2021-07-27', '15:00:00', '22:00:00', 'Mars');
INSERT INTO activity VALUES (200, '2021-07-27', '7:30:00', '9:30:00', 'Earth');
INSERT INTO activity VALUES (200, '2021-07-27', '13:00:00', '16:30:00', 'Mars');
INSERT INTO activity VALUES (200, '2021-07-27', '18:00:00', '20:20:00', 'Mars');
INSERT INTO activity VALUES (300, '2021-07-27', '11:30:00', '14:30:00', 'Earth');
INSERT INTO activity VALUES (300, '2021-07-27', '17:00:00', '18:30:00', 'Mars');
INSERT INTO activity VALUES (300, '2021-07-27', '19:30:00', '22:30:00', 'Mars');
In short, cursor approach(bad practice here?) would be following:
Create temp table qualified_user (id, score) – to keep track on users participating in the current row event.
Open cursor for select * from event
Fetch row
Truncate table qualified_user
Insert into qualified_user (user_id, score)
select user_id, score from activity a
left join analyzed_user u ON a.user_id = u.user_id
WHERE
a.world = record.world AND
a.start_time <= record.start_time AND
a.end_time >= record.end_time
i.e. gathering users and their score whose activity coincided with current cursor record’s (event) time and world).
Select sum of score from qualified_user table
Update current of cursor with calculated sum.
Repeat steps from 3 to 7 as long as there are next rows.
So basically it is going event by event, for each event (its start, end time and world) selecting users whose activity coincide and summing their score. The logic is fine, just computing time of cursor is awful. This is something I could not properly query in set-based manner.
My attempts were multiple combinations of the following (also with WITH clause etc), but always ending in group by which I believe is the problem(?) (aggregate functions like SUM are not allowed without group by clause):
UPDATE event ee
SET total_score = gg.total
FROM (SELECT SUM(uu.score) AS total, aa.user_id, aa.world, aa.start_time, aa.end_time, aa.date
FROM activity aa
LEFT JOIN qualified_user uu ON aa.user_id = uu.user_id
GROUP BY aa.user_id, aa.world, aa.start_time, aa.end_time, aa.date
) AS gg
WHERE
gg.world = ee.world AND
gg.start_time <= ee.start_time AND
gg.end_time >= ee.end_time AND
gg.date = ee.date;
The correct results would be:
Event #1 total score = 400 + 800 = 1200
Event #2 total score = 400+1500 = 1900
Event #3 total score = 800
Event #4 total score = 400+1500 = 1900
But as you can check, the results from the above query are wrong (800/1500/800/400)
For me it is not logical to group users activities, although I can’t go through without it and getting an error.
I would be grateful if you could explain what is wrong with query above and provide the proper query.
Two major problems I was dealing here were:
aggregate functions are not allowed in update statements
reference problem from inner to outer queries
After failing, digging and learning about CTEs, I came up with following solution.
WITH cte AS(
SELECT ee.event_id, ee.date, ee.start_time, ee.end_time, ee.world,
(SELECT SUM(score) FROM
activity aa
LEFT JOIN analyzed_user uu ON aa.user_id = uu.user_id
WHERE
aa.world = ee.world AND
aa.start_time <= ee.start_time AND
aa.end_time >= ee.end_time AND
aa.date = ee.date) AS total
FROM event ee)
UPDATE event eee
SET total_score = cte.total
FROM cte
WHERE eee.event_id = cte.event_id

How to show chain element by order

I have goal to create query which return me item ids regarding position in chain.
I have chain logic, each element has right and left fk and index.
Chain can contains elements which can added like append and like prepend approach, regarding this id from table not help to build current chain dependencies.
This is db structure
create table public.chain_data
(
id integer not null
constraint chain_data_pkey
primary key,
unique_identifiers_id integer not null
constraint fk_388447e52a0b191e
references public.unique_identifiers
on delete cascade,
chain_data_name varchar(255) not null,
carriage boolean default false,
left_id integer not null
constraint fk_388447e5e26cce02
references public.chain_data,
right_id integer
constraint fk_388447e554976835
references public.chain_data
);
alter table public.chain_data
owner to "universal-counter";
create index idx_388447e52a0b191e
on public.chain_data (unique_identifiers_id);
create unique index left_right_uniq_idx
on public.chain_data (right_id, left_id);
create unique index carriage_uniq_index
on public.chain_data (unique_identifiers_id, carriage)
where (carriage <> false);
and data example. this chain began from id = 10 and then was prepend new items(rows) in start of chain. Each element has left and right dependencies. So inserts:
INSERT INTO public.chain_data (id, unique_identifiers_id, chain_data_name, carriage, left_id, right_id)
VALUES
(10, 8, 'dddd_2', true, 22, null),
(22, 8, 'shuba', false, 23, 10),
(24, 8, 'viktor', false, null, 23),
(23, 8, 'ivan', false, 24, 22);
Regarding this query should to return ids like this
24, 23, 22, 10
because element with id = 24 present on start chain then by left and right dependencies obviously 23, 22 and 10 id= 10 is last element in chain
demo:db<>fiddle
You can use a recursive CTE for that:
WITH RECURSIVE chain AS (
SELECT id, right_id -- 1
FROM chain_data
WHERE left_id IS NULL
UNION
SELECT cd.id, cd.right_id -- 2
FROM chain_data cd
JOIN chain c ON c.right_id = cd.id
)
SELECT
string_agg(id::text, ', ') -- 3
FROM
chain
Initial part of the recursion: The record with the NULL value
The recursion part: Join the current table on the previous step using the previous right_id as current id
Afterwards you can aggregate all fetched records with the string_agg() aggregation to return your string list.

How do I add a constraint with a where clause in PostgreSQL?

I have a table with reservations. A reservation is made of a date range, and a time range. They also belong to a couple of other models. I would like to add a constraint that makes it impossible for a reservation to happen for overlapping times.
I have this:
CREATE TABLE reservations (
id integer NOT NULL,
dates daterange,
times timerange,
desk_id integer NOT NULL,
space_id integer,
);
ALTER TABLE reservations ADD EXCLUDE USING gist (dates WITH &&, times WITH &&)
It works well. But I want this constraint to be scoped to desk_id and client_id.
It should be possible to save a record for overlapping times/dates when this record is about different desk_id or space_id.
How can I do this?
You just can use the exact same mechanism you were using, but also adding desk_id and space_id to your exclusions. This time, instead of using the && operator (meaning overlaps) with the = operator:
ALTER TABLE reservations
ADD EXCLUDE
USING gist (desk_id WITH =, space_id WITH =, dates WITH &&, times WITH &&) ;
Theses inserts will work, because they involve two different desk_id:
INSERT INTO
reservations
(id, dates, times, desk_id, space_id)
VALUES
(1, '[20170101,20170101]'::daterange, '[10:00,11:00]'::timerange, 10, 10),
(2, '[20170101,20170101]'::daterange, '[10:30,11:00]'::timerange, 20, 10) ;
This insert will fail, because you'd be having a time-range overlap, and the same desk_id and space_id:
INSERT INTO
reservations
(id, dates, times, desk_id, space_id)
VALUES
(3, '[20170101,20170101]'::daterange, '[10:00,11:00]'::timerange, 10, 10) ;