SUM(CASE WHEN ...) returns a greater number than COUNT(DISTINCT..) - postgresql

I have written a query in two models, but I can't figure out why the second query returns a greater number than the first one; while the number that the first one, COUNT(DISTINCT...) returns is correct:
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[])),
date_gen64 AS
(
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017', interval
'1 day') AS date) as days ORDER BY days)
SELECT cl.class_date AS c_date,
count(DISTINCT (CASE WHEN co.id = 1 THEN p.id END)),
count(DISTINCT (CASE WHEN co.id = 2 THEN p.id END))
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id AND co.id = 1
JOIN types ON cr.type_id = ANY (types.id)
RIGHT JOIN date_gen64 dg ON dg.days = cl.class_date
GROUP BY cl.class_date
ORDER BY cl.class_date
The above query returns 26 but following query returns 27!
The reason why I rewrote it with SUM is that the first query
was too slow. But my question is that why the second one counts more?
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT p.id AS "person_id",
cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days

You can theoretically have multiple people enrolled in the same class on the same day. Indeed that would seem to be the main point of having classes. So each time there are multiple people assigned to the same class on the same day you can have a higher count than you would in your first query. Does that make sense?
You don't appear to be using p.id in that inner query so simply remove it and your counts should match.
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days

Related

How to append CTE results to main query output?

I've created a TSQL query that pulls from two sets of tables in my database. The tables in the Common Table Expression are different from the tables in the main query. I'm joining on MRN and need the end result to contain accounts from both sets of tables. I've written the following query to this end:
with cteHosp as(
select Distinct p.EncounterNumber, p.MRN, p.AdmitAge
from HospitalPatients p
inner join Eligibility e on p.MRN = e.MRN
inner join HospChgDtl c on p.pt_id = c.pt_id
inner join HospitalDiagnoses d on p.pt_id = d.pt_id
where p.AdmitAge >=12
and d.dx_cd in ('G89.4','R52.1','R52.2','Z00.129')
)
Select Distinct a.AccountNo, a.dob, DATEDIFF(yy, a.dob, GETDATE()) as Age
from RHCCPTDetail c
inner join RHCAppointments a on c.ClaimID = a.ClaimID
inner join Eligibility e on c.hl7Id = e.MRN
full outer join cteHosp on e.MRN = cteHosp.MRN
where DATEDIFF(yy, a.dob, getdate()) >= 12
and left(c.PriDiag,7) in ('G89.4','R52.1','R52.2', 'Z00.129')
or (
DATEDIFF(yy, a.dob, getdate()) >= 12
and LEFT(c.DiagCode2,7) in ('G89.4','R52.1','R52.2','Z00.129')
)
or (
DATEDIFF(yy, a.dob, getdate()) >= 12
and LEFT(c.DiagCode3,7) in ('G89.4','R52.1','R52.2','Z00.129')
)
or (
DATEDIFF(yy, a.dob, getdate()) >= 12
and LEFT(c.DiagCode4,7) in ('G89.4','R52.1','R52.2','Z00.129')
)
order by AccountNo
How do I merge together the output of both the common table expression and the main query into one set of results?
Merge performs inserts, updates or deletes. I believe you want to join the cte. If so, here is an example.
Notice the cteBatch is joined to the Main query below.
with
cteBatch (BatchID,BatchDate,Creator,LogID)
as
(
select
BatchID
,dateadd(day,right(BatchID,3) -1,
cast(cast(left(BatchID,4) as varchar(4))
+ '-01-01' as date)) BatchDate
,Creator
,LogID
from tblPriceMatrixBatch b
unpivot
(
LogID
for Logs in (LogIDISI,LogIDTG,LogIDWeb)
)u
)
Select
0 as isCurrent
,i.InterfaceID
,i.InterfaceName
,b.BatchID
,b.BatchDate
,case when isdate(l.start) = 0 and isdate(l.[end]) = 0 then 'Scheduled'
when isdate(l.start) = 1 and isdate(l.[end]) = 0 then 'Running'
when isdate(l.start) = 1 and isdate(l.[end]) = 1 and isnull(l.haserror,0) = 1 then 'Failed'
when isdate(l.start) = 1 and isdate(l.[end]) = 1 and isnull(l.haserror,0) != 1 then 'Success'
else 'idunno' end as stat
,l.Start as StartTime
,l.[end] as CompleteTime
,b.Creator as Usr
from EOCSupport.dbo.Interfaces i
join EOCSupport.dbo.Logs l
on i.InterfaceID = l.InterfaceID
join cteBatch b
on b.logid = l.LogID

how to select top 10 without duplicates

Using SQL Server 2012
I need to select TOP 10 Producer based on a ProducerCode. But the data is messed up, users were entering same Producers just spelled differently and with the same ProducerCode.
So I just need TOP 10, so if the ProducerCode is repeating, I just want to pick the first one in a list.
How can I achieve that?
Sample of my data
;WITH cte_TopWP --T
AS
(
SELECT distinct ProducerCode, Producer,SUM(premium) as NetWrittenPremium,
SUM(CASE WHEN PolicyType = 'New Business' THEN Premium ELSE 0 END) as NewBusiness1,
SUM(CASE WHEN PolicyType = 'Renewal' THEN Premium ELSE 0 END) as Renewal1,
SUM(CASE WHEN PolicyType = 'Rewrite' THEN Premium ELSE 0 END) as Rewrite1
FROM ProductionReportMetrics
WHERE YEAR(EffectiveDate) = 2016 AND TransactionType = 'Policy' AND CompanyLine = 'Arch Insurance Company'--AND ProducerType = 'Wholesaler'
GROUP BY ProducerCode,Producer
)
,
cte_Counts --C
AS
(
SELECT distinct ProducerCode, ProducerName, COUNT (distinct ControlNo) as Submissions2,
SUM(CASE WHEN QuotedPremium IS NOT NULL THEN 1 ELSE 0 END) as Quoted2,
SUM(CASE WHEN Type = 'New Business' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as NewBusiness2,
SUM(CASE WHEN Type = 'Renewal' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as Renewal2,
SUM(CASE WHEN Type = 'Rewrite' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as Rewrite2,
SUM(CASE WHEN Status = 'Declined' THEN 1 ELSE 0 END ) as Declined2
FROM ClearanceReportMetrics
WHERE YEAR(EffectiveDate)=2016 AND CompanyLine = 'Arch Insurance Company'
GROUP BY ProducerCode,ProducerName
)
SELECT top 10 RANK() OVER (ORDER BY NetWrittenPremium desc) as Rank,
t.ProducerCode,
c.ProducerName as 'Producer',
NetWrittenPremium,
t.NewBusiness1,
t.Renewal1,
t.Rewrite1,
c.[NewBusiness2]+c.[Renewal2]+c.[Rewrite2] as PolicyCount,
c.Submissions2,
c.Quoted2,
c.[NewBusiness2],
c.Renewal2,
c.Rewrite2,
c.Declined2
FROM cte_TopWP t --LEFT OUTER JOIN tblProducers p on t.ProducerCode=p.ProducerCode
LEFT OUTER JOIN cte_Counts c ON t.ProducerCode=c.ProducerCode
You should use ROW_NUMBER to fix your issue.
https://msdn.microsoft.com/en-us/library/ms186734.aspx
A good example of this is the following answer:
https://dba.stackexchange.com/a/22198
Here's the code example from the answer.
SELECT * FROM
(
SELECT acss_lookup.ID AS acss_lookupID,
ROW_NUMBER() OVER
(PARTITION BY your_distinct_column ORDER BY any_column_you_think_is_appropriate)
as num,
acss_lookup.product_lookupID AS acssproduct_lookupID,
acss_lookup.region_lookupID AS acssregion_lookupID,
acss_lookup.document_lookupID AS acssdocument_lookupID,
product.ID AS product_ID,
product.parent_productID AS productparent_product_ID,
product.label AS product_label,
product.displayheading AS product_displayheading,
product.displayorder AS product_displayorder,
product.display AS product_display,
product.ignorenewupdate AS product_ignorenewupdate,
product.directlink AS product_directlink,
product.directlinkURL AS product_directlinkURL,
product.shortdescription AS product_shortdescription,
product.logo AS product_logo,
product.thumbnail AS product_thumbnail,
product.content AS product_content,
product.pdf AS product_pdf,
product.language_lookupID AS product_language_lookupID,
document.ID AS document_ID,
document.shortdescription AS document_shortdescription,
document.language_lookupID AS document_language_lookupID,
document.document_note AS document_document_note,
document.displayheading AS document_displayheading
FROM acss_lookup
INNER JOIN product ON (acss_lookup.product_lookupID = product.ID)
INNER JOIN document ON (acss_lookup.document_lookupID = document.ID)
)a
WHERE a.num = 1
ORDER BY product_displayheading ASC;
You could do this:
SELECT ProducerCode, MIN(Producer) AS Producer, ...
GROUP BY ProducerCode

Join two queries become one subquery

I want to ask how to combine these two queries become one subquery?
select c.CustomerName, A.Qty
from Customer c join (select s.CustomerID, pd.Date, s.Qty
from Period pd join Sales s on pd.TimeID = s.TimeID) A on c.CustomerID = A.CustomerID
where #Date = A.Date
and
select sum(case when (pd.Date between '2010-03-15' and #Date) then s.Qty else 0 end) as TotalQty
from Period pd full join Sales s on pd.TimeID = s.TimeID full join Customer c on s.CustomerID = c.CustomerID
group by c.CustomerName, c.CustomerID
They should result one table contains these following columns: CustomerName, Qty, and TotalQty. I've tried many ways but they didn't work at all. Really hope your help, thanks.
Answering your question assuming you are looking at the result as one row and not "column" :
select c.CustomerName, A.Qty,(select sum(case when (pd.Date between '2010-03-15' and #Date) then s.Qty else 0 end)
from Period pd full join Sales s on pd.TimeID = s.TimeID full join Customer c on s.CustomerID = c.CustomerID
group by c.CustomerName, c.CustomerID
) as TotalQty
from Customer c join (select s.CustomerID, pd.Date, s.Qty
from Period pd join Sales s on pd.TimeID = s.TimeID) A on c.CustomerID = A.CustomerID
where #Date = A.Date
If you still want the result as a single column , google "row to column transpose"

How to display rollup data in new column?

I have the following query which returns the number of android questions per each day on StackOverflow in the year of 2011. I want to get the sum of all the questions asked during the year 2011. For this I am using ROLLUP.
select
year(p.CreationDate) as [Year],
month(p.CreationDate) as [Month],
day(p.CreationDate) as [Day],
count(*) as [QuestionsAskedToday]
from Posts p
inner join PostTags pt on p.id = pt.postid
inner join Tags t on t.id = pt.tagid
where
t.tagname = 'android' and
p.CreationDate > '2011-01-01 00:00:00'
group by year(p.CreationDate), month(p.CreationDate),day(p.CreationDate)
​with rollup
order by year(p.CreationDate), month(p.CreationDate) desc,day(p.CreationDate) desc​
This is the output:
The sum of all questions asked on each day in 2011 is being displayed in the QuestionsAskedToday column itself.
Is there a way to display the rollup in a new column with an alias?
Link to the query
To show this as a column rather than a row you can use SUM(COUNT(*)) OVER () instead of ROLLUP. (Online Demo)
SELECT YEAR(p.CreationDate) AS [Year],
MONTH(p.CreationDate) AS [Month],
DAY(p.CreationDate) AS [Day],
COUNT(*) AS [QuestionsAskedToday],
SUM(COUNT(*)) OVER () AS [Total]
FROM Posts p
INNER JOIN PostTags pt
ON p.id = pt.postid
INNER JOIN Tags t
ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate > '2011-01-01 00:00:00'
GROUP BY YEAR(p.CreationDate),
MONTH(p.CreationDate),
DAY(p.CreationDate)
ORDER BY YEAR(p.CreationDate),
MONTH(p.CreationDate) DESC,
DAY(p.CreationDate) DESC
You could take an approach like this: Example
SELECT
YEAR(p.CreationDate) AS 'Year'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
THEN CAST(MONTH(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS 'Month'
, CASE
WHEN GROUPING(DAY(p.CreationDate)) = 0
THEN CAST(DAY(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS [DAY]
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
AND GROUPING(DAY(p.CreationDate)) = 0
THEN COUNT(1)
END AS 'QuestionsAskedToday'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 1
OR GROUPING(DAY(p.CreationDate)) = 1
THEN COUNT(1)
END AS 'Totals'
FROM Posts AS p
INNER JOIN PostTags AS pt ON p.id = pt.postid
INNER JOIN Tags AS t ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate >= '2011-01-01'
GROUP BY ROLLUP(YEAR(p.CreationDate)
, MONTH(p.CreationDate)
, DAY(p.CreationDate))
ORDER BY YEAR(p.CreationDate)
, MONTH(p.CreationDate) DESC
, DAY(p.CreationDate) DESC​​​​​​​
If this is what you wanted, the same technique can be applied to Years as well to total them in the new column, or their own column, if you want to query for multiple years and aggregate them.

Cannot perform an aggregate function on a subquery

Can someone help me with this query?
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentinalRent, SUM(
(SELECT COUNT(t.ID) * ru.MonthlyRent FROM tblTenant t
WHERE t.UnitID = ru.ID)
) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN tblProperty p ON p.ID = ru.PropertyID
GROUP BY p.OwnerName
I'm having problems with the second sum, it won't let me do it. Evidently SUM won't work on subqueries, but I need to calculate the expected rent (MonthlyRent if there is a tenant assigned to the RentalUnit's id, 0 of they're not). How can I make this work?
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentialRent, SUM(cnt) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN
tblProperty p
ON p.ID = ru.PropertyID
OUTER APPLY
(
SELECT COUNT(t.id) * ru.MonthlyRent AS cnt
FROM tblTenant t
WHERE t.UnitID = ru.ID
) td
GROUP BY p.OwnerName
Here's a test script to check:
WITH tblRentalUnit AS
(
SELECT 1 AS id, 100 AS MonthlyRent, 1 AS PropertyID
UNION ALL
SELECT 2 AS id, 300 AS MonthlyRent, 2 AS PropertyID
),
tblProperty AS
(
SELECT 1 AS id, 'Owner 1' AS OwnerName
UNION ALL
SELECT 2 AS id, 'Owner 2' AS OwnerName
),
tblTenant AS
(
SELECT 1 AS id, 1 AS UnitID
UNION ALL
SELECT 2 AS id, 1 AS UnitID
)
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentialRent, SUM(cnt) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN
tblProperty p
ON p.ID = ru.PropertyID
OUTER APPLY
(
SELECT COUNT(t.id) * ru.MonthlyRent AS cnt
FROM tblTenant t
WHERE t.UnitID = ru.ID
) td
GROUP BY p.OwnerName
What is the meaning of the sum of the unitMonthlyRent times the number of tenants, for some partiicular rental unit (COUNT(t.ID) * ru.MonthlyRent )?
Is it the case that all you are trying to do is see the difference between the total potential rent from all untis versus the expected rent (From only occcupied units) ? If so, then try this
Select p.OwnerName,
Sum(r.MonthlyRent) AS PotentinalRent,
Sum(Case t.Id When Null Then 0
Else r.MonthlyRent End) ExpectedRent
From tblRentalUnit r
Left Join tblTenant t
On t.UnitID = r.ID
left Join tblProperty p
On p.ID = r.PropertyID)
Group By p.OwnerName