why does calculation changes in order taken - postgresql

why does this query miscalculates the colum for ordertaken?? is it about the groupings>
SELECT t2.employeeid, t1.firstname || ' ' || t1.lastname as fullname,
sum(t3.quantity*t4.price) as totalsales,
COUNT(t2.orderid) AS ordertaken, COUNT(DISTINCT t2.customerid) AS uniquecustomercount
from employees as t1
join orders as t2 on t2.employeeid = t1.employeeid
join order_details as t3 on t2.orderid = t3.orderid
join products as t4 on t4.productid = t3.productid
group by t2.employeeid, fullname
correct computation for column ordertaken

If I understand correctly, you need a distinct count for both counts:
SELECT
t2.employeeid,
t1.firstname || ' ' || t1.lastname AS fullname,
SUM(t3.quantity*t4.price) AS totalsales,
COUNT(DISTINCT t2.orderid) AS ordertaken,
COUNT(DISTINCT t2.customerid) AS uniquecustomercount
FROM employees AS t1
INNER JOIN orders AS t2 ON t2.employeeid = t1.employeeid
INNER JOIN order_details AS t3 ON t2.orderid = t3.orderid
INNER JOIN products AS t4 ON t4.productid = t3.productid
GROUP BY t2.employeeid, fullname;
The distinct counts are required on the counts from the orders table because the joins to order_details and products may duplicate a given order record, magnifying the count.

Related

How to use DISTINCT ON in ARRAY_AGG()?

I have the following query:
SELECT array_agg(DISTINCT p.id) AS price_ids,
array_agg(p.name) AS price_names
FROM items
LEFT JOIN prices p on p.item_id = id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE id = 1;
When I LEFT JOIN the third_table all my prices are duplicated.
I'm using DISTINCT inside ARRAY_AGG() to get the ids without dups, but I want the names without dups aswell.
If I use array_agg(DISTINCT p.name) AS price_names, it will return distinct values based on the name, not the id.
I want to do something similar to array_agg(DISTINCT ON (p.id) p.name) AS price_names, but it is invalid.
How can I use DISTINCT ON inside ARRAY_AGG()?
Aggregate first, then join:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN (
SELECT pr.item_id,
array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
GROUP BY pr.item_id
) p on p.item_id = items.id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;
Using a lateral join might be faster if you only pick a single item:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN LATERAL (
SELECT array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
WHERE pr.item_id = items.id
) p on true
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;
If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !
Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

Problems with Postgresql ERROR: subquery uses ungrouped column "ev.title" from outer query

I have a query like this:
select c.id, c.name, c.website, c.longdescription, c.description, c.email,
(SELECT jsonb_agg(ev) FROM
(SELECT ev.title, ev.description, ev.longdescription,
(SELECT jsonb_agg(ed) FROM
(SELECT ed.startdate, ed.enddate, ed.id WHERE ed.id notnull)ed) as dates, ev.id WHERE ev.id notnull) ev) as events,
(SELECT jsonb_agg(ca) FROM (SELECT ct.zip, ca.id, ca.street1, ca.street2, ca.addresstype_id, ST_Y(ca.geopoint::geometry) as latitude, ST_X(ca.geopoint::geometry) as longitude
WHERE ca.id notnull)ca) as addresses
FROM companies c
LEFT JOIN events ev ON ev.company_id = c.id
LEFT JOIN companyaddresses ca ON ca.company_id = c.id
LEFT JOIN cities ct ON ct.id = ca.city_id
LEFT JOIN eventdates ed ON ed.event_id = ev.id
GROUP by c.id
I am getting the error "ERROR: subquery uses ungrouped column "ev.title" from
outer query Position: 125".
Can't figure out how to group it correctly for the subqueries. Any suggestions?
Give this a try:
SELECT c.id, c.name, c.website, c.longdescription, c.description, c.email,
    (SELECT jsonb_agg(ev) FROM
        (SELECT even.title, even.description, even.longdescription,
            (SELECT jsonb_agg(ed) FROM
                (SELECT eventdates.startdate, eventdates.enddate, eventdates.id FROM eventdates WHERE eventdates.event_id = even.id)ed) as dates,
        even.id FROM events even WHERE even.company_id = c.id) ev) as events,
jsonb_agg((SELECT ca FROM (SELECT ct.zip, ca.id, ca.street1, ca.street2, ca.addresstype_id, ST_Y(ca.geopoint::geometry) as latitude, ST_X(ca.geopoint::geometry) as longitude WHERE ca.id notnull)ca)) as addresses
FROM companies c
LEFT JOIN companyaddresses ca ON ca.company_id = c.id
LEFT JOIN cities ct ON ct.id = ca.city_id
Group by c.id

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

Left join filtering only works on where clause

Can someone explain this to me?
I have two huge tables on which I wish to LEFT JOIN, but I think it would be more efficient to filter at the 'on' rather than the 'where'.
Select * from Table1 t1 left join Table2 t2 on t1.Id = t2.Id and t1.Enabled = 1
Rather than
Select * from Table1 t1 left join Table2 t2 on t1.Id = t2.Id where t1.Enabled = 1
The above results are not the same. No matter what I put in as filtering, it stays unaffected:
Select * from Table1 t1 left join Table2 t2 on t1.Id = t2.Id and t1.Enabled = 1
yields exactly the same results as
Select * from Table1 t1 left join Table2 t2 on t1.Id = t2.Id and t1.Enabled = 0
Also, Table1 might only have a few records where Enabled = 1, so it would be inefficient to join millions of records first and then filter to find only 10.
I can do a sub select first, but I somehow I feel it is not the right way:
Select * from (Select * from Table1 where Enabled = 1) a left join Table2 t2 on a.Id = t2.Id
Types of Joins
Left outer join All rows from the first-named table (the "left"
table, which appears leftmost in the JOIN clause) are included.
Unmatched rows in the right table do not appear.
When you move the condition to the where then t1.Enabled = 1 is applied
Do you have actual performance problems with the first query?