Grouping results when nesting queries

Grouping results when nesting queries - tsql

I am trying to get a 6 month trend of entities in my database that match some criteria, however the issue is that I need to nest a few levels deep to determine if an entity qualifies.
The entities are "members" who may have multiple "accounts" and I need to make sure that none of their accounts have certain flags set before I include them.
If I wanted to just get a count as of a specific date (we keep historical data), I would do something like:
SELECT COUNT(sup.SSN)
FROM MemberSuppTable as sup
WHERE (
sup.ProcessDate = #PROCESSDATE
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate = #PROCESSDATE
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate = #PROCESSDATE
--other criteria for account exclusion go here.
)
)
)
)
)
MemberSuppTable has high level info on members:
(ID, FirstAccountOpenDate, status, etc)
MemberSummaryTable ties accounts to members in the MemberSuppTable:
(AccountNumber, MemberSuppID, ...)
Now, I'm trying to get a count for month end process dates, grouped by process date in a single query.
So, where the above query would return
ssn count
----------
1,000,000
I want:
process date | ssn count
------------------------
20160430 | 8,000,000
20160551 | 8,500,000
... | ...
20160331 | 1,000,000
so far I've come up with the following (see below for why it doesn't work):
WITH valid_dates AS (
SELECT D.ProcessDate
FROM arcu.vwARCUProcessDates AS D
WHERE d.FullDate = D.MonthEndDate
AND d.ProcessDate >= #SDATE
)
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
WHERE (
AND sup.ProcessDate IN (SELECT * FROM valid_dates)
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate IN (SELECT * FROM valid_dates)
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate IN (SELECT * FROM valid_dates)
...
)
)
)
)
)
GROUP BY (sup.ProcessDate)
With the above query though, I believe that a member would be included in ALL groups if they matched the criteria for ANY process date in the valid_dates table.
Can anyone help me out? (I'm new to SQL, so forgive me if I'm missing something simple.)

First i would rewrite your first query using INNER JOIN instead of WHERE .. IN:
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = #PROCESSDATE
AND summ.ProcessDate = #PROCESSDATE
AND acct.ProcessDate = #PROCESSDATE
-- other criteria for account exclusion go here.
This looks more compact and is (IMHO) more readable.
Now I would change the query the way, that #PROCESSDATE occures only once
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = #PROCESSDATE
AND summ.ProcessDate = sup.ProcessDate
AND acct.ProcessDate = sup.ProcessDate
-- other criteria for account exclusion go here.
You can keep the conditions in the WHERE clause, but i more like them to be in the ON clause
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable AS sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
WHERE sup.ProcessDate = #PROCESSDATE
-- other criteria for account exclusion go here.
Now it's easy to get the COUNT for each ProcessDate
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
-- WHERE criteria for account exclusion go here.
GROUP BY sup.ProcessDate
To also filter by "valid_dates" it would be just an additional JOIN and some WHERE conditions
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
INNER JOIN arcu.vwARCUProcessDates AS d
ON d.ProcessDate = sup.ProcessDate
WHERE d.FullDate = d.MonthEndDate
AND d.ProcessDate >= #SDATE
-- AND criteria for account exclusion go here.
GROUP BY sup.ProcessDate
For better performance it might be better to GROUP BY d.ProcessDate, but don't forget to also ajust the SELECT part.
Edit:
As noted in the comments, DISTINCT keyword has to be used, if every SSN hast to be counted once. So i edited the solution.
It also has to be noted, that even with DISTINCT the first query is not allways equivalent to the original one. If sup.SSN is not unique, the queries could return different results.

IN clauses are perfectly fine for such a query. More readable than joins, as you show clearly which table you select data from and which tables are only accessed to check record existence. This is well structured and shows you have given the query some thought.
Your query would get even more readable, however, without the unneccessary alias names and parentheses.
Anyway, you want to use the same process date that you find in the subqueries, I guess, so enhance your IN clauses accordingly:
select processdate, count(distinct ssn)
from membersupptable
where (processdate, membersuppid) in
(
select processdate, membersuppid
from membersummarytable
where (processdate, accountnumber) in
(
select processdate, accountnumber
from accounttable
where processdate in
(
select processdate
from vwarcuprocessdates
where fulldate = monthenddate
and processdate >= #sdate
)
)
)
group by processdate;

Related

How to use DISTINCT ON in ARRAY_AGG()?

I have the following query:
SELECT array_agg(DISTINCT p.id) AS price_ids,
array_agg(p.name) AS price_names
FROM items
LEFT JOIN prices p on p.item_id = id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE id = 1;
When I LEFT JOIN the third_table all my prices are duplicated.
I'm using DISTINCT inside ARRAY_AGG() to get the ids without dups, but I want the names without dups aswell.
If I use array_agg(DISTINCT p.name) AS price_names, it will return distinct values based on the name, not the id.
I want to do something similar to array_agg(DISTINCT ON (p.id) p.name) AS price_names, but it is invalid.
How can I use DISTINCT ON inside ARRAY_AGG()?

Aggregate first, then join:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN (
SELECT pr.item_id,
array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
GROUP BY pr.item_id
) p on p.item_id = items.id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;
Using a lateral join might be faster if you only pick a single item:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN LATERAL (
SELECT array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
WHERE pr.item_id = items.id
) p on true
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;

If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !

Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

MariaDB - order by with more selects

I have this SQL:
select * from `posts`
where `posts`.`deleted_at` is null
and `expire_at` >= '2017-03-26 21:23:42.000000'
and (
select count(distinct tags.id) from `tags`
inner join `post_tag` on `tags`.`id` = `post_tag`.`tag_id`
where `post_tag`.`post_id` = `posts`.`id`
and (`tags`.`tag` like 'PHP' or `tags`.`tag` like 'pop' or `tags`.`tag` like 'UI')
) >= 1
Is it possible order the results by number of tags in posts?
Maybe add there alias?
Any information can help me.

Convert your correlated subquery into a join:
select p.*
from posts p
join (
select pt.post_id,
count(distinct t.id) as tag_count
from tags t
inner join post_tag pt on t.id = pt.tag_id
where t.tag in ('PHP', 'pop', 'UI')
group by pt.post_id
) pt on p.id = pt.post_id
where p.deleted_at is null
and p.expire_at >= '2017-03-26 21:23:42.000000'
order by pt.tag_count desc;
Also, note that I changed the bunch of like and or to single IN because you are not matching any pattern i.e. there is no % in the string. So, better using single IN instead.
Also, if you have defined your table names, column names etc keeping keywords etc in mind, you shouldn't have the need to use the backticks. They make reading a query difficult.

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC

Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.

Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

Postgresql query COUNT and MAX together?

SELECT id,icon,type,cnt
FROM capability
JOIN (
SELECT s0_.capability_id AS capability_id0 ,
count(capability_id) as cnt
FROM service_offer_capability s0_
INNER JOIN service_offer s1_ ON s0_.service_offer_id = s1_.id
WHERE s0_.value <> 'i:0;' AND s1_.service_id = 2
GROUP BY s0_.capability_id
) af
ON af.capability_id0=id;
All i want to do is to have a max(cnt) as an extra column. I know that you can order by cnt and get the first but i am looking for an alternative..Is it possible or i have to run multiple queries?

This should do it:
SELECT id,
icon,
type,
cnt,
max(cnt) over () as max_cnt
FROM capability
JOIN (
SELECT s0_.capability_id AS capability_id0 ,
count(capability_id) as cnt
FROM service_offer_capability s0_
INNER JOIN service_offer s1_ ON s0_.service_offer_id = s1_.id
WHERE s0_.value <> 'i:0;' AND s1_.service_id = 2
GROUP BY s0_.capability_id
) af
ON af.capability_id0=id;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Grouping results when nesting queries - tsql

Related

How to use DISTINCT ON in ARRAY_AGG()?

Strange Behaviour on Postgresql query

MariaDB - order by with more selects

Avoiding Order By in T-SQL

Postgresql query COUNT and MAX together?

Categories

Resources