Summation graph over tree in PostgreSQL - postgresql

I have salaries table and a tree with departments (child, parent). Need to go over the graph and calculate every summation vertex. The output I need - summation of all child nodes. The issue looks like the request is double counting values.
Test data:
CREATE TABLE deps (
id serial,
child varchar,
parent varchar);
CREATE TABLE salaries (
name varchar,
salary numeric);
INSERT INTO salaries(name, salary) VALUES
('manager1', 100),
('manager2', 100),
('manager3', 100),
('manager4', 100),
('manager5', 100),
('manager6', 100),
('manager7', 100),
('manager8', 100),
('manager9', 100),
('engeneer1', 100),
('engeneer2', 100),
('engeneer3', 100),
('engeneer4', 100),
('engeneer5', 100),
('engeneer6', 100),
('engeneer7', 100),
('engeneer8', 100),
('engeneer9', 100),
('engeneer10', 100),
('accountant1', 100),
('accountant2', 100),
('accountant3', 100),
('accountant4', 100);
insert INTO deps(child, parent) VALUES
('manager1', 'management'),
('manager2', 'management'),
('manager3', 'management'),
('manager4', 'management'),
('management_team1', 'management'),
('management_team1_1', 'management_team1'),
('management_team1_2', 'management_team1'),
('manager5', 'management_team1_1'),
('manager6', 'management_team1_1'),
('manager7', 'management_team1'),
('manager8', 'management_team1_2'),
('manager9', 'management_team1_2'),
('engeneer1', 'it'),
('engeneer2', 'it'),
('engeneer3', 'it'),
('engeneer4', 'it'),
('it_dep1', 'it'),
('it_dep2', 'it'),
('engeneer5', 'it_dep1'),
('engeneer6', 'it_dep1'),
('engeneer7', 'it_dep2'),
('engeneer8', 'it_dep2'),
('it_dep3', 'it_dep2'),
('engeneer9', 'it_dep3'),
('engeneer10', 'it_dep3'),
('accountant1', 'accounts'),
('accountant2', 'accounts'),
('accountant3', 'accounts'),
('accountant4', 'accounts'),
('management', NULL),
('accounts', NULL),
('it', NULL);
Request:
WITH RECURSIVE tree ("depth", parent, child) as(
SELECT
0,
parent,
child
FROM deps
WHERE parent IS NULL
UNION
SELECT
"depth" + 1,
tree.child,
deps.child
FROM tree JOIN deps ON tree.child = deps.parent
),
graph (child, parent, "depth", value) as(
-- non recursive
SELECT
tree.child,
tree.parent,
tree."depth",
salaries.salary -- outbound amount FROM node which IS equal TO salary AT this depth
FROM
tree JOIN salaries ON salaries.name = tree.child
WHERE tree."depth" = (SELECT max("depth") FROM tree) -- we START FROM deepest LEVEL OF yierarchy
UNION
-- recursive
SELECT
current_tree.child,
current_tree.parent,
current_tree."depth",
COALESCE(current_tree.salary,
sum(graph.value) OVER (PARTITION BY current_tree.child)
) -- outbound amount FROM node which IS equal TO SUM of ALL incoming amounts
FROM
graph,
LATERAL (SELECT * FROM tree LEFT JOIN salaries ON salaries.name = tree.child WHERE tree."depth" = graph."depth" - 1) AS current_tree
LEFT JOIN LATERAL (SELECT * FROM tree WHERE tree."depth" = graph."depth") AS previous_tree
ON current_tree.child = previous_tree.parent
-- WHERE graph."depth" = (SELECT max("depth") FROM graph)
)
SELECT * FROM graph
WHERE graph."depth" = (SELECT max("depth") FROM graph)
gives error since double calling of graph table is not allowed
DB Fiddle sample
I expect to see 32 relations corresponding to initial tree with sums as values of all child nodes.

Related

Postgresql data calculation

Im trying to do some calculations using postgres, but no sucess so far. My query goes something like this:
select ....,
(select json_agg(data_table)
from (..... HERE GOES DE RESULT OF THE CALCULATION + a lot of business and data.... ) as data_table)
from foo
So i gonna exemplify with a table:
create temp table tbdata (id smallint, parent_id smallint, value numeric(25,2));
insert into tbdata values(1, null, 100), (2, 1, 50), (3, 1, 49), (4, 3, 20), (5, 3, 29);
select * from tbdata;
I need to calculate the difference between the sum of the siblings and the parent value. Example:
ID 2(50) + ID 3(49) = 99
ID 1(parent) = 100
so i need to add 1 to any of the childs (lets say 3), the result gonna be:
ID 2(50) + ID 3(49 + 1) = 100
ID 1(parent) = 100
After that, my ID3 have changed, so i need to update any of his childs:
ID 4(20) + ID 5(29) = 49
ID 3(parent) = 50
then again, updating value of ID 5 with the difference (50 - 49)
ID 4(20) + ID 5(29 + 1) = 50
ID 3(parent) = 50
I tried using recursive queries, windows function, and cte, but i always stuck in something. I was able to do using a function with a loop, but i dont want to do that.
Theres any way i can do it with a single SQL?

I keep getting a unexpected select error in my snosql statement

I keep getting an unexpected select error as well as an unexpected ON error in rows 61 AND 64 in my snowsql statement.
Not sure why if anyone can help that would be great. I've added the portion of my snowsql statement below.
I'm trying to use a select statement within a where clause is there a way to do this?
AS select
t1.sunday_date,
t1.sunday_year_month,
t1.sunday_month,
t1.dc,
t1.source_sku,
t1.Product_Family,
t1.Product_type,
t1.Product_Subtype,
t1.Material,
t1.Color,
t1.Size,
t1.EOL_Date,
t1.NPI_Date,
t1.period_start,
t1.period_month,
IIF( t4.period_start < t1.sunday_date, iif(ISNULL(ta.actual_quantity), 0, ta.actual_quantity),
IIF(ISNULL(tfc.SOPFCSTOVERRIDE ), iif(ISNULL(tf.Period_Start), 0, tf.dc_forecast) , tfc.SOPFCSTOVERRIDE
)) AS forecast_updated,
iif(ISNULL(tf.Period_Start),t4.period_start,tf.Period_Start) AS period_start_forecast,
iif(ISNULL(ti.VALUATED_UNRESTRICTED_USE_STOCK), 0, ti.VALUATED_UNRESTRICTED_USE_STOCK) AS inventory_quantity,
iif(ISNULL(ti.HCI_DS_KEYFIGURE_QUANTITY), 0, ti.HCI_DS_KEYFIGURE_QUANTITY) AS in_transit_quantity,
iif(ISNULL(ti.planned_quantity), 0, ti.planned_quantity) AS inbound_quantity,
iif(ISNULL(tbac.backlog_ecomm ), 0, tbac.backlog_ecomm) + iif(ISNULL(tbac_sap.backlog_sap_open), 0, tbac_sap.backlog_sap_open) AS backlog_quantity,
iif(ISNULL(ta.actual_quantity), 0, ta.actual_quantity) AS actual_quantity,
iif(ISNULL(tso.open_orders), 0, tso.open_orders) AS open_orders,
iif(ISNULL(tf.Period_Start), 0, tf.dc_forecast) AS forecast,
tfc.SOPFCSTOVERRIDE AS forecast_consumption,
iif(ISNULL(tpc.SHIP_DATE), 0, tpc.SHIP_DATE) AS production_current_week,
iif(ISNULL(tpc.SHIP_DATE), 0, tpc.SHIP_DATE) AS production_next_week,
NOW() AS updated_timestamp
FROM ( ( ( ( ( ( ( ( (
SELECT
e.sunday_date,
e.sunday_month,
e.sunday_year_month,
d.dc,
c.SOURCE_SKU,
c.Product_Family,
c.Product_Type,
c.Product_Subtype,
c.Material,
c.Color,
c.Size,
c.EOL_Date,
c.NPI_Date,
b.period_start,
b.period_month
FROM
(SELECT sunday_date, sunday_month, sunday_year_month FROM bas_report_date) AS e,
(SELECT distinct Week_Date AS period_start, DateSerial('445_Year','445_Month',1) AS period_month from inv_bas_445_Month_Alignment) AS b,
(SELECT source_sku AS source_sku, Product_Family, Product_Type, Product_Subtype, Material, Color, Size, EOL_Date, NPI_Date from inv_vw_product_dev ) AS c,
(SELECT dc AS dc FROM inv_bas_dc_site_lookup) AS d
WHERE b.period_start >=
( select
MIN(mt.Reference_Date )
FROM BAS_report_date tr
INNER JOIN inv_bas_445_Month_Alignment mt ON tr.sunday_month = DateSerial(mt.'445_Year',mt.'445_Month,1')
)
AND b.period_start <= DateAdd("ww", 26,e.sunday_date)
) t1
LEFT JOIN
(
SELECT
MATERIAL_NUMBER,
CINT(LOCATION_NUMBER) AS Int_Location_ID,
HCI_DS_KEYFIGURE_DATE,
HCI_DS_KEYFIGURE_QUANTITY,
PLANNED_QUANTITY,
VALUATED_UNRESTRICTED_USE_STOCK
FROM inv_vw_ibp_transit_inventorry_dev
) ti
You can replace the DateSerial() function
(which is from VBA / MS Access / Excel from the Microsoft universe)
with DATE_FROM_PARTS().
DATE_FROM_PARTS() also supports the non-obvious functionality of DateSerial():
DateSerial(2020, 1, 1 - 1) gets you New Year's Eve - the day before New Year's Day
DATE_FROM_PARTS(2020, 1 - 1, 1 - 1) is the month before the day before New Year's Day
DATE_FROM_PARTS(y, m + 1, 0) is End Of Month (EOM).
etc., etc.

PostgreSQL missing from clause entry for table in function

I have the following function that handles a trigger on insert or update:
CREATE OR REPLACE FUNCTION ticketChangeFunc() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO dw.FactSalesHeader (DateKey, LocationKey, EmployeeKey, AppointmentTypeKey, TicketStatusTypeKey, TicketID,TotalAmount, IsNewPatient, IsActive)
SELECT d.date_key, COALESCE(l.LocationKey, 0), COALESCE(e.EmployeeKey, 0), COALESCE(a.AppointmentTypeKey, 0), COALESCE(ts.TicketStatusTypeKey, 0), NEW.ticket_id,
NEW.total_amount, NEW.is_new_patient, NEW.is_active
FROM db1.tickets t
JOIN dw.DimDate d on t.ticket_date = d.db_date
LEFT JOIN dw.DimLocation l on NEW.location_id = l.LocationID
LEFT JOIN dw.DimEmployee e on NEW.counselor_id = e.EmployeeID
LEFT JOIN dw.DimAppointmentType a on NEW.office_visit_ind = a.AppointmentTypeFlagAttribute
LEFT JOIN dw.DimTicketStatus ts on NEW.ticket2_status = ts.TicketStatusTypeID
ON CONFLICT (TicketID)
DO UPDATE
SET DateKey = d.date_key,
LocationKey = COALESCE(l.LocationKey, 0),
EmployeeKey = COALESCE(e.EmployeeKey, 0),
AppointmentTypeKey = COALESCE(a.AppointmentTypeKey, 0),
TicketStatusTypeKey = COALESCE(ts.TicketStatusTypeKey, 0),
TotalAmount = NEW.total_amount,
IsNewPatient = NEW.is_new_patient,
IsActive = NEW.is_active;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
I get the error: missing FROM-clause entry for table "d"
I thought maybe it was somehow related to the db1.tickets table that I'm not technically using in the query. I've tried taking out that bit and just doing FROM dw.DimDate with WHERE d.date_key = NEW.ticket_date, and that gives the same error.
There are a lot of these types of questions on SO, but I haven't found one that addresses this particular scenario, as I feel like it has something to do with it being inside of this trigger function.
The function gets called from this:
CREATE TRIGGER trg_tickets AFTER INSERT OR UPDATE ON db1.tickets
FOR EACH ROW EXECUTE PROCEDURE ticketChangeFunc();
I'm also wondering if using the ON CONFLICT UPDATE clause updates every row when it finds a match regardless of whether the values differ? Is there a performance impact to this, and if so, is there a way to check for equality of each field and do nothing if there are no differences?
For those who may happen to run into this in the future, the solution for me was to add a conditional to the function and handle inserts separately from updates:
CREATE OR REPLACE FUNCTION ticketChangeFunc() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO dw.FactSalesHeader (DateKey, LocationKey, EmployeeKey, AppointmentTypeKey, TicketStatusTypeKey, TicketID, TotalAmount, IsNewPatient, IsActive)
SELECT d.date_key, COALESCE(l.LocationKey, 0), COALESCE(e.EmployeeKey, 0), COALESCE(a.AppointmentTypeKey, 0), COALESCE(ts.TicketStatusTypeKey, 0), NEW.ticket_id,
NEW.total_amount, NEW.is_new_patient, NEW.is_active
FROM db1.tickets t
JOIN dw.DimDate d ON NEW.ticket_date = d.db_date
LEFT JOIN dw.DimLocation l ON NEW.location_id = l.LocationID
LEFT JOIN dw.DimEmployee e ON NEW.counselor_id = e.EmployeeID
LEFT JOIN dw.DimAppointmentType a ON NEW.office_visit_ind = a.AppointmentTypeFlagAttribute
LEFT JOIN dw.DimTicketStatus ts ON NEW.ticket2_status = ts.TicketStatusTypeID;
ELSE
UPDATE dw.FactSalesHeader
SET DateKey = CAST(TO_CHAR(NEW.ticket_date, 'YYYYMMDD') as integer),
LocationKey = COALESCE(l.LocationKey, 0),
EmployeeKey = COALESCE(e.EmployeeKey, 0),
AppointmentTypeKey = COALESCE(a.AppointmentTypeKey, 0),
TicketStatusTypeKey = COALESCE(ts.TicketStatusTypeKey, 0),
TotalAmount = NEW.total_amount,
IsNewPatient = NEW.is_new_patient,
IsActive = NEW.is_active
FROM dw.FactSalesHeader hdr
LEFT JOIN dw.DimLocation l ON NEW.location_id = l.LocationID
LEFT JOIN dw.DimEmployee e ON NEW.counselor_id = e.EmployeeID
LEFT JOIN dw.DimAppointmentType a ON NEW.office_visit_ind = a.AppointmentTypeFlagAttribute
LEFT JOIN dw.DimTicketStatus ts ON NEW.ticket2_status = ts.TicketStatusTypeID
WHERE hdr.TicketID = NEW.ticket_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

Interconnecting tables on PostgreSQL

I am a newbie here.
I am using PostgreSQL to manipulate lots of data in my specific field of research. Unfortunately, I am encountering a problem that is not allowing me to continue my analysis. I tried to simplify my problem to clearly illustrate it.
Let's suppose I have a table called "Buyers" with those data:
table_buyers
The buyers can make ONLY ONE purchase in each store or none. There are three stores and there a table for each one. Just like below:
table_store1
table_store2
table_store3
To create the tables, I am using the following code:
CREATE TABLE public.buyer
(
ID integer NOT NULL PRIMARY KEY,
name text NOT NULL,
phone text NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store1
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store2
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store3
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
To add the information on the tables, I am using the following code:
INSERT INTO buyer (ID, name, phone) VALUES
(1, 'Alex', 88888888),
(2, 'Igor', 77777777),
(3, 'Mike', 66666666);
INSERT INTO Store1 (ID_buyer, total_order, total_itens) VALUES
(1, 87.45, 8),
(2, 14.00, 3),
(3, 12.40, 4);
INSERT INTO Store2 (ID_buyer, total_order, total_itens) VALUES
(1, 785.12, 7),
(2, 9874.21, 25);
INSERT INTO Store3 (ID_buyer, total_order, total_itens) VALUES
(2, 45.87, 1);
As all the tables are interconnected by buyer's ID, I wish I could have a query that generates an output just like this:
desired output table.
Please, note that if the buyer did not buy anything in a store, I must print '0'.
I know this is an easy task, but unfortunately, I have been failing on accomplish it.
Using the 'AND' logical operator, I tried the following code to accomplish this task:
SELECT
buyer.id,
buyer.name,
store1.total_order,
store2.total_order,
store3.total_order
FROM
public.buyer,
public.store1,
public.store2,
public.store3
WHERE
buyer.id = store1.id_buyer AND
buyer.id = store2.id_buyer AND
buyer.id = store3.id_buyer;
But, obviously, it just returned 'Igor' as this was the only buyer that have bought items on all three stores (print screen).
Then, I tried the 'OR' logical operator, just like the following code:
SELECT
buyer.id,
buyer.name,
store1.total_order,
store2.total_order,
store3.total_order
FROM
public.buyer,
public.store1,
public.store2,
public.store3
WHERE
buyer.id = store1.id_buyer OR
buyer.id = store2.id_buyer OR
buyer.id = store3.id_buyer;
But then, it returns 12 lines with wrong values (print screen).
Clearly, my mistake is about not considering that 'Buyers' don't have to on all three stores on my code. I just can't correct it on my own, can you please help me?
I appreciate a lot for an answer that can light up my way. Thanks a lot!
Tips about how I can search for this issue are very welcome as well!
Ok. I doubt that this is the final answer for you, but its a start
SELECT
buyer.id,
buyer.name,
COALESCE( gb_store1.total_orders, 0 ) as store1_total,
COALESCE( gb_store2.total_orders, 0 ) as store2_total,
COALESCE( gb_store3.total_orders, 0 ) as store3_total
FROM
public.buyer,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store1
GROUP BY ID_buyer ) gb_store1 ON gb_store1.id_buyer = buyer.id ,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store2
GROUP BY ID_buyer ) gb_store2 ON gb_store2.id_buyer = buyer.id ,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store3
GROUP BY ID_buyer ) gb_store3 ON gb_store3.id_buyer = buyer.id ;
So, this query has a couple elements should focus on. The subselects/groupby allow you to total within your subtables by ID_buyer. The LEFT OUTER JOIN make its so your query can still return a result, even if a subselect finds no matching record. Finally, the COALESCE allows you to return 0 when one of your totals is NULL (because the subselect found no match).
Hope this helps.

SQL: PIVOTting Count & Percentage against a column

I'm trying to produce a report that shows, for each Part No, the results of tests on those parts in terms of the numbers passed and failed, and the percentages passed and failed.
So far, I have the following:
SELECT r2.PartNo, [Pass] AS Passed, [Fail] as Failed
FROM
(SELECT ResultID, PartNo, Result FROM Results) r1
PIVOT (Count(ResultID) FOR Result IN ([Pass], [Fail])) AS r2
ORDER By r2.PartNo
This is half of the solution (the totals for passes and fails); the question is, how do I push on and include percentages?
I haven't tried yet, but I imagine that I can start again from scratch, and build up a series of subqueries, but this is more a learning exercise - I want to know the 'best' (most elegant or most efficient) solution, so I thought I'd seek advice.
Can I extend this PIVOT query, or should I take a different approach?
DDL:
CREATE TABLE RESULTS (
[ResultID] [int] NOT NULL,
[SerialNo] [int] NOT NULL,
[PartNo] [varchar](10) NOT NULL,
[Result] [varchar](10) NOT NULL);
DML:
INSERT INTO Results VALUES (1, '100', 'ABC', 'Pass')
INSERT INTO Results VALUES (2, '101', 'DEF', 'Pass')
INSERT INTO Results VALUES (3, '100', 'ABC', 'Fail')
INSERT INTO Results VALUES (4, '102', 'DEF', 'Pass')
INSERT INTO Results VALUES (5, '102', 'DEF', 'Pass')
INSERT INTO Results VALUES (6, '102', 'DEF', 'Fail')
INSERT INTO Results VALUES (7, '101', 'DEF', 'Fail')
UPDATE:
My solution, based on bluefeet's answer is:
SELECT r2.PartNo,
[Pass] AS Passed,
[Fail] as Failed,
ROUND(([Fail] / CAST(([Pass] + [Fail]) AS REAL)) * 100, 2) AS PercentFailed
FROM
(SELECT ResultID, PartNo, Result FROM Results) r1
PIVOT (Count(ResultID) FOR Result IN ([Pass], [Fail])) AS r2
ORDER By r2.PartNo
I've ROUNDed a FLOAT(rather than CAST to DECIMAL twice) because its a tiny bit more efficient, and I've also decided that we only real need the failure %age.
It sounds like you just need to add a column for Percent Passed and Percent Failed. You can calculate those columns on your PIVOT.
SELECT r2.PartNo
, [Pass] AS Passed
, [Fail] as Failed
, ([Pass] / Cast(([Pass] + [Fail]) as decimal(5, 2))) * 100 as PercentPassed
, ([Fail] / Cast(([Pass] + [Fail]) as decimal(5, 2))) * 100 as PercentFailed
FROM
(
SELECT ResultID, PartNo, Result
FROM Results
) r1
PIVOT
(
Count(ResultID)
FOR Result IN ([Pass], [Fail])
) AS r2
ORDER By r2.PartNo