How to show chain element by order - postgresql

I have goal to create query which return me item ids regarding position in chain.
I have chain logic, each element has right and left fk and index.
Chain can contains elements which can added like append and like prepend approach, regarding this id from table not help to build current chain dependencies.
This is db structure
create table public.chain_data
(
id integer not null
constraint chain_data_pkey
primary key,
unique_identifiers_id integer not null
constraint fk_388447e52a0b191e
references public.unique_identifiers
on delete cascade,
chain_data_name varchar(255) not null,
carriage boolean default false,
left_id integer not null
constraint fk_388447e5e26cce02
references public.chain_data,
right_id integer
constraint fk_388447e554976835
references public.chain_data
);
alter table public.chain_data
owner to "universal-counter";
create index idx_388447e52a0b191e
on public.chain_data (unique_identifiers_id);
create unique index left_right_uniq_idx
on public.chain_data (right_id, left_id);
create unique index carriage_uniq_index
on public.chain_data (unique_identifiers_id, carriage)
where (carriage <> false);
and data example. this chain began from id = 10 and then was prepend new items(rows) in start of chain. Each element has left and right dependencies. So inserts:
INSERT INTO public.chain_data (id, unique_identifiers_id, chain_data_name, carriage, left_id, right_id)
VALUES
(10, 8, 'dddd_2', true, 22, null),
(22, 8, 'shuba', false, 23, 10),
(24, 8, 'viktor', false, null, 23),
(23, 8, 'ivan', false, 24, 22);
Regarding this query should to return ids like this
24, 23, 22, 10
because element with id = 24 present on start chain then by left and right dependencies obviously 23, 22 and 10 id= 10 is last element in chain

demo:db<>fiddle
You can use a recursive CTE for that:
WITH RECURSIVE chain AS (
SELECT id, right_id -- 1
FROM chain_data
WHERE left_id IS NULL
UNION
SELECT cd.id, cd.right_id -- 2
FROM chain_data cd
JOIN chain c ON c.right_id = cd.id
)
SELECT
string_agg(id::text, ', ') -- 3
FROM
chain
Initial part of the recursion: The record with the NULL value
The recursion part: Join the current table on the previous step using the previous right_id as current id
Afterwards you can aggregate all fetched records with the string_agg() aggregation to return your string list.

Related

T-SQL How to exclude a child record when it's also it's own parent record?

I have a scenario like the following:
create table #Example (
id int
, overall_id int
, parent_id int
, child_id int
);
insert into #Example values
(1, 25963, 491575090, 491575090)
,(2, 25963, 547952026, 491575090)
,(3, 25963, 547952026, 230085039)
,(4, 25963, 547952026, 547952026);
select e.*
from #Example as e;
drop table #Example;
I want to exclude the record with id "2" because that is it's own parent record (see id "1").
I do not want to exclude 3, because the child record is not it's own parent record. And I don't want to exclude 1 and 4 because those are their own parent records.
One problem is that in my business scenario, I have no corresponding "ID" field, that is something I provided in this example so that I could refer to each row uniquely.
Any help on techniques to exclude record 2 would be greatly appreciated!
I still don't understand the question, but the expected result falls out of:
select *
from #Example as E
where not exists (
select 42
from #Example as IE
where
-- There is a row that is self parenting?!
IE.parent_id = IE.child_id and
-- The row under consideration is related in a child/parent way?
IE.child_id = E.child_id and
-- It isn't the same row as we're considering.
IE.id <> E.id );
See dbfiddle.

Is it possible to look at the output of the previous row of a PostgreSQL query?

This is the question: Is it possible to look at the outputs, what has been selected, from the previous row of a running SQL query in Postgres?
I know that lag exists to look at the inputs, the "from" of the query. I also know that a CTE, subquery or lateral join can solve most issues of this kind. But I think the problem I'm facing genuinely requires a peek at the output of the previous row. Why? Because the output of the current row depends on a constant from a lookup table and the value used too look up that constant is an aggregate of all the previous rows. And if that lookup returns the wrong constant all subsequent rows will be increasingly off from the expected value.
The whole rest of this text is a simplified example based on the problem I'm facing. It should be possible to input it to PostgreSQL 12 and above and play around. I'm terribly sorry that it is as complicated as it is, but I think it is the most simple I can make it while still retaining the core issue: lookup in lookup table based on an aggregate from all previous rows as well as the fact that the "inventory" that's being tracked is modeled as a series of transactions of two discrete types.
The database itself exists to keep track of multiple fish farms, or cages full of fish. Fish can be moved/transferred from between these farms and the farms are fed about daily. Why not just carry the aggregate as a field in the table? Because it should be possible to switch out the lookup table after the season is over, to adjust it to better match with reality.
-- A listing of all groups of fish ever grown.
create table farms (
id bigserial primary key,
start timestamp not null,
stop timestamp
);
insert into farms
(id, start)
values (
1, '2021-02-01T13:37'
);
-- A transfer of fish from one odling to another.
-- If the source is null the fish is transferred from another fishery outside our system.
-- If the destination is null the fish is being slaughtered, removed from the system.
create table transfers (
source bigint references farms(id),
destination bigint references farms(id),
timestamp timestamp not null default current_timestamp,
total_weight_g bigint not null constraint positive_nonzero_total_weight_g check (total_weight_g > 0),
average_weight_g bigint not null constraint positive_nonzero_average_weight_g check (average_weight_g > 0),
number_fish bigint generated always as (total_weight_g / average_weight_g) stored
);
insert into transfers
(source, destination, timestamp, total_weight_g, average_weight_g)
values
(null, 1, '2021-02-01T16:38', 5, 5),
(null, 1, '2021-02-15T16:38', 500, 500);
-- Transactions of fish feed into a farm.
create table feedings (
id bigserial primary key,
growth_table bigint not null,
farm bigint not null references farms(id),
amount_g bigint not null constraint positive_nonzero_amunt_g check (amount_g > 0),
timestamp timestamp not null
);
insert into feedings
(farm, growth_table, amount_g, timestamp)
values
(1, 1, 1, '2021-02-02T13:37'),
(1, 1, 1, '2021-02-03T13:37'),
(1, 1, 1, '2021-02-04T13:37'),
(1, 1, 1, '2021-02-05T13:37'),
(1, 1, 1, '2021-02-06T13:37'),
(1, 1, 1, '2021-02-07T13:37');
create view combined_feed_and_transfer_history as
with transfer_history as (
select timestamp, destination as farm, total_weight_g, average_weight_g, number_fish
from transfers as deposits
where deposits.destination = 1 -- TODO: This view only works for one farm, fix that.
union all
select timestamp, source as farm, -total_weight_g, -average_weight_g, -number_fish
from transfers as withdrawals
where withdrawals.source = 1
)
select timestamp, farm, total_weight_g, number_fish, average_weight_g, null as growth_table
from transfer_history
union all
select timestamp, farm, amount_g, 0 as number_fish, 0 as average_weight_g, growth_table
from feedings
order by timestamp;
-- Conversion tables from feed to gained weight.
create table growth_coefficients (
growth_table bigserial not null,
average_weight_g bigint not null constraint positive_nonzero_weight check (average_weight_g > 0),
feed_conversion_rate double precision not null constraint positive_foderkonverteringsfaktor check (feed_conversion_rate >= 0),
primary key(growth_table, average_weight_g)
);
insert into growth_coefficients
(average_weight_g, feed_conversion_rate, growth_table)
values
(5.00,0.10,1),
(10.00,10.00,1),
(20.00,1.30,1),
(50.00,1.31,1),
(100.00,1.32,1),
(300.00,1.36,1),
(600.00,1.42,1),
(1000.00,1.50,1),
(1500.00,1.60,1),
(2000.00,1.70,1),
(2500.00,1.80,1),
(3000.00,1.90,1),
(4000.00,2.10,1),
(5000.00,2.30,1);
-- My current solution is a bad one. It does a CTE that sums over all events but does not account
-- for the feed conversion rate. That means that the average weight used too look up the feed
-- conversion rate will diverge more and more from reality the further into the season time goes.
-- This is why it is important to look at the output, the average weight, of the previous row.
-- We start by summing up all the transfer and feed events to get a rough average_weight_g.
with estimate as (
select
timestamp,
farm,
total_weight_g as transaction_size_g,
growth_table,
sum(total_weight_g) over (order by timestamp) as sum_weight_g,
sum(number_fish) over (order by timestamp) as sum_number_fish,
sum(total_weight_g) over (order by timestamp) / sum(number_fish) over (order by timestamp) as average_weight_g
from
combined_feed_and_transfer_history
)
select
timestamp,
sum_number_fish,
transaction_size_g as trans_g,
sum_weight_g,
closest_lookup_table_weight.average_weight_g as lookup_g,
converted_weight_g as conv_g,
sum(converted_weight_g) over (order by timestamp) as sum_conv_g,
sum(converted_weight_g) over (order by timestamp) / sum_number_fish as sum_average_g
from
estimate
join lateral ( -- We then use this estimated_average_weight to look up the closest constant in the growth coefficient table.
(select gc.average_weight_g - estimate.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g >= estimate.average_weight_g order by gc.average_weight_g asc limit 1)
union all
(select estimate.average_weight_g - gc.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g <= estimate.average_weight_g order by gc.average_weight_g desc limit 1)
order by diff
limit 1
) as closest_lookup_table_weight
on true
join lateral ( -- If the historical event is a feeding we need to lookup the feed conversion rate.
select case when growth_table is null then 1
else (select feed_conversion_rate
from growth_coefficients gc
where gc.growth_table = growth_table
and gc.average_weight_g = closest_lookup_table_weight.average_weight_g)
end
) as growth_coefficient
on true
join lateral (
select feed_conversion_rate * transaction_size_g as converted_weight_g
) as converted_weight_g
on true;
At the very bottom is my current "solution". With the above example data the sum_conv_g should end up being 5.6, but due to the aggregate being used as the lookup not accounting for the conversion rate the sum_conv_g ends up 45.2 instead.
One idea I had was if there perhaps something like query-local variables one could use to store the sum_average_g between rows? There's always the escape hatch of just querying out the transactions to my generic programming language Clojure and solving it there, but it would be neat if it could be solved entirely within the database.
You have to formulate a recursive subquery. I posted a simplified version of this question over at the DBA SE and got the answer there. The answer to that question can be found here and can be expanded to this more complicated question, though I would wager that no one will ever have the interest to do that.

PostgreSQL multicolumn index not fully used

I have a large (~110 million rows) table on PostgreSQL 12.3 whose relevant fields can be described by the following DDL:
CREATE TABLE tbl
(
item1_id integer,
item2_id integer,
item3_id integer,
item4_id integer,
type_id integer
)
One of the queries we execute often is:
SELECT type_id, item1_id, item2_id, item3_id, item4_id
FROM tbl
WHERE
type_id IS NOT NULL
AND item1_id IN (1, 2, 3)
AND (
item2_id IN (4, 5, 6)
OR item2_id IS NULL
)
AND (
item3_id IN (7, 8, 9)
OR item3_id IS NULL
)
AND (
item4_id IN (10, 11, 12)
OR item4_id IS NULL
)
Although we have indexes for each of the individual columns, the query is still relatively slow (a couple of seconds). Hoping to optimize this, I created the following index:
CREATE INDEX tbl_item_ids
ON public.tbl USING btree
(item1_id ASC, item2_id ASC, item3_id ASC, item4_id ASC)
WHERE type_id IS NOT NULL;
Unfortunately the query performance barely improved - EXPLAIN tells me this is because although an index scan is done with this newly created index, only item1_id is used as an Index Cond, whereas all the other filters are applied at table level (i.e. plain Filter).
I'm not sure why the index is not used in its entirety (or at least for more than the item1_id column). Is there an obvious reason for this? Is there a way I can restructure the index or the query itself to help with performance?
A multi-column index can only be used for more than the first column if the condition on the first column uses an equality comparison (=). IN or = ANY does not qualify.
So you will be better off with individual indexes for each column, which can be combined with a bitmap or.
You should try to avoid OR in the WHERE condition, perhaps with
WHERE coalesce(item2_id, -1) IN (-1, 4, 5, 6)
where -1 is a value that doesn't occur. Then you could use an index on the coalesce expression.

postgres constraint based on SELECT condition

ᕼello! I think I have a somewhat tricky postgres situation:
parents have children. children have an age, and a flag that they are the appreciated.
The rule: a parent can't appreciate two children of the same age!
My question is: how to enforce this rule?
Current schema:
CREATE TABLE parent (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL
);
CREATE TABLE child (
id SERIAL PRIMARY KEY,
parent INTEGER REFERENCES parent(id) NOT NULL,
name VARCHAR(50) NOT NULL,
age INTEGER NOT NULL,
appreciated BOOLEAN NOT NULL
);
Put some values in:
INSERT INTO parent(name) VALUES
('bob'), -- assume bob's id = 0
('mary'); -- assume mary's id = 1
INSERT INTO child(parent, name, age, appreciated) VALUES
(0, 'child1', 10, FALSE), -- Bob has children 1, 2, 3
(0, 'child2', 10, FALSE),
(0, 'child3', 15, FALSE),
(1, 'child4', 20, FALSE), -- Mary has children 4, 5, 6
(1, 'child5', 20, FALSE),
(1, 'child6', 10, FALSE);
All fine so far. No child is appreciated, which is always valid.
Mary is allowed to appreciate child6:
UPDATE child SET appreciated=TRUE WHERE name='child6';
Bob is allowed to appreciate child2. child2 is the same age as child6 (who is already appreciated), but child6 is not Bob's child.
UPDATE child SET appreciated=TRUE WHERE name='child2';
Bob now cannot appreciate child1. This child1 is the same age as child2, and child2 is already appreciated.
UPDATE child SET appreciated=TRUE WHERE name='child2'; -- This needs to FAIL!
How do I enforce such a constraint? I'm open to all kinds of solutions, but modifying the general schema is not an option.
Thanks in advance!
How about a UNIQUE partial index, like so:
CREATE UNIQUE INDEX ON child(parent,age) WHERE appreciated;
So every pair of parent,age has to be unique, but only when appreciated children are considered.
You might want to use a trigger that activates BEFORE the insert/update and that fails if the constraint you create is not satisfied.
I suppose it should be like
create trigger <trigger_name>
before insert or update on <table_name>
for each row
declare
dummy number;
begin
select count(*)
into dummy
from <table_name>
where (appreciated=TRUE and :new.child = child and :new.age = age);
if dummy > 0 then
raise_application_error(-20001,'Too many appreciated children');
end if;
end;
Some documentation
The simplest thing I would think to do is add a flag grateful(?) == false to the parent model and when child.appreciated == true { parent.grateful == true }
Check the value of parent.grateful in the function that acts on child.appreciated.
If parent.grateful == true
return "sorry this parent has already shown their appreciation."
LOL this is an interesting concept though. Good Luck. :)

Constraint on sum from rows

I've got a table in PostgreSQL 9.4:
user_votes (
user_id int,
portfolio_id int,
car_id int
vote int
)
Is it possible to put a constraint on the table so a user max can have 99 point to vote with in each portfolio?
This means that a user can have multiple rows consisting of the same user_id and portfolio_id, but different car_id and vote. The sum on votes should never exceed 99, but it can be placed among different cars.
So doing:
INSERT INTO user_vores (user_id, portfolio_id, car_id, vote) VALUES
(1, 1, 1, 20),
(1, 1, 7, 40),
(1, 1, 9, 25)
would all be allowed, but when trying to add something that exceeds 99 votes should fail, like another row:
INSERT INTO user_vores (user_id, portfolio_id, car_id, vote) VALUES
(1, 1, 21, 40)
Unfortunately no, if you tried to create such a constraint you will see this error message:
ERROR: aggregate functions are not allowed in check constraints
But the wonderfull thing about postgresql is that there is always more than one way to skin a cat. You can use a BEFORE trigger to check that the data you are trying to insert fullfills our requirements.
Row-level triggers fired BEFORE can return null to signal the trigger
manager to skip the rest of the operation for this row (i.e.,
subsequent triggers are not fired, and the INSERT/UPDATE/DELETE does
not occur for this row). If a nonnull value is returned then the
operation proceeds with that row value.
Inside your trigger you would count the number of votes
SELECT COUNT(*) into vote_count FROM user_votes WHERE user_id = NEW.user_id
Now if vote_count is 99 you return NULL and the data will not be inserted.