Hi I have the following code, I wanted to ask if it is okay to Left join a table to an unpivoted version of itself as shown in the code below.
with physician_diag as (
SELECT baseentityid, eventdate,locationid,teamid,average_muac,child_age,child_gender,physician_diagnosis, dignosis_data
FROM [VITAL_DWH].[vr].[event_physician_visit]
UNPIVOT
(dignosis_data FOR physician_diagnosis IN
(
well_baby,
severe_pneumonia) )
AS unpvt )
,
final as (
select *, dense_rank() OVER (PARTITION BY baseentityid ORDER BY eventdate) AS rn from (
select distinct
v.baseentityid, cast(v.eventdate as date) as eventdate, v.locationid, v.teamid,
p.physician_diagnosis, p.dignosis_data,
v.average_muac, v.child_age , v.child_gender
FROM [VITAL_DWH].[vr].[event_physician_visit] v
left join physician_diagnosis p on p.baseentityid=v.baseentityid and p.eventdate=v.eventdate
)
Can't I achieve the same result without left joining, as is shown below -
with physician_diag as (
SELECT baseentityid, eventdate,locationid,teamid,average_muac,child_age,child_gender,physician_diagnosis, dignosis_data
FROM [VITAL_DWH].[vr].[event_physician_visit]
UNPIVOT
(dignosis_data FOR physician_diagnosis IN
(
well_baby,
severe_pneumonia
)
)
AS unpvt
),
final as (
-- Final
select *, dense_rank() OVER (PARTITION BY baseentityid ORDER BY eventdate) AS rn from (
select * from physician_diag
) F
In the second analysis,
select count(distinct baseentityid) from final
gives 9104
In the first analysis,
select count(distinct baseentityid) from final
gives 9107
I was confused as to why there is a difference.
Related
I am writing a query in Redshift to answer the question "Give the average lifetime spend of users who spent more on their first order than their second order." This is based off of an order_items table which has one row for every item ordered (so an order with 3 items would be represented in 3 rows). Here's a snapshot of the first 10 rows:
First 10 rows of order_items:
Here is my solution:
with
cte1_lifetime as (
select oi.user_id, sum(oi.sale_price) as lifetime_spend
from order_items as oi
group by oi.user_id
),
cte2_order as (
select oi.user_id, oi.order_id, sum(oi.sale_price) as order_total, rank() over(partition by oi.user_id order by oi.created_at) as order_rank
from order_items as oi
group by oi.user_id, oi.order_id, oi.created_at
order by oi.user_id, oi.order_id
),
cte3_first_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=1
order by user_id, order_id
),
cte4_second_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=2
order by user_id, order_id
)
select avg(cte1.lifetime_spend) as average_lifetime_spend
from cte1_lifetime as cte1
where exists (
select *
from cte3_first_order as cte3, cte4_second_order as cte4
where cte3.user_id=cte4.user_id
and cte1.user_id=cte3.user_id
and cte3.order_total > cte4.order_total)
And here is the answer key:
WITH
table1 AS
(SELECT user_id, order_id,
SUM(sale_price) OVER (PARTITION BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as order_total,
RANK() OVER (PARTITION BY user_id ORDER BY created_at) AS "sequence"
FROM order_items)
,
table2 AS
(SELECT user_id, SUM(sale_price) AS lifetime_spend
FROM order_items
WHERE EXISTS
(SELECT t1.user_id
FROM table1 t1, table1 t2
WHERE t1.user_id = t2.user_id AND t1.sequence = 1 AND t2.sequence = 2 AND t1.order_total>t2.order_total
AND t1.user_id = order_items.user_id)
GROUP BY 1
ORDER BY 1)
SELECT AVG(lifetime_spend)
FROM table2
These answers yield slightly different results on the same data- an average lifetime spend of $215 vs $220. I'd really like to understand why they are different but so far I can't figure it out. Any ideas?
I'm writing a query that will bring me back either row or rank '1'. When I run the query with the 'where rank or row clause equals 1', no data returns for those columns but if I take out the 'where rank or row clause' the data populates. Can someone tell me why the where clause is causing no data to return?
LEFT OUTER JOIN
(SELECT DISTINCT CLMED.SIMPLE_GENERIC_C
, MEDINFO.MEDICATION_ID as [CYTOTEC]
,FMED.PAT_ENC_CSN_ID
, FMED.HOSP_ADMSN_DATE
-- , MIN(FMED.TAKEN_DATETIME) over (Partition by FMED.PAT_ENC_CSN_ID,
MEDINFO.MEDICATION_ID ) as 'FIRST DOSE'
, MIN(FMED.TAKEN_DATETIME) over (Partition by FMED.PAT_ENC_CSN_ID, CLMED.SIMPLE_GENERIC_C
) AS 'FIRST DOSE'
, RANK() over (Partition by FMED.PAT_ENC_CSN_ID, CLMED.SIMPLE_GENERIC_C order by
FMED.TAKEN_DATETIME DESC) 'CYTOTEC_RANK'
, ROW_NUMBER() over (Partition by FMED.PAT_ENC_CSN_ID, MEDINFO.MEDICATION_ID order by
FMED.TAKEN_DATETIME) AS 'CYTOTEC_ROW'
, FMED.DISPLAY_NAME
, FMED.ORDER_MED_ID
, ORDMED.DESCRIPTION
, ZCGEN.NAME
FROM MED_ADMIN FMED
LEFT JOIN MEDINFO MEDINFO ON FMED.ORDER_MED_ID = MEDINFO.ORDER_MED_ID
LEFT JOIN MED ORDMED ON FMED.ORDER_MED_ID = ORDMED.ORDER_MED_ID
LEFT JOIN MEDICATION CLMED ON MEDINFO.MEDICATION_ID = CLMED.MEDICATION_ID
LEFT JOIN GENERIC ZCGEN ON CLMED.SIMPLE_GENERIC_C = ZCGEN.SIMPLE_GENERIC_C
WHERE ZCGEN.NAME = 'miSOPROStol'
AND 'CYTOTEC_RANK' = 1
) AS [CYT] ON CYT.PAT_ENC_CSN_ID = VITALS.PAT_ENC_CSN_ID
I am using TSQL, SSMS v.17.9.1 The underlying db is Microsoft SQL Server 2014 SP3
For display purposes, I want to concatenate the results of two queries:
SELECT TOP 1 colA as 'myCol1' FROM tableA
--
SELECT TOP 1 colB as 'myCol2' FROM tableB
and display the results from the queries in one row in SSMS.
(The TOP 1 directive would hopefully guarantee the same number of results from each query, which would assist displaying them together. If this could be generalized to TOP 10 per query that would help also)
This should work for any number of rows, it assumes you want to pair ordered by the values in the column displayed
With
TableA_CTE AS
(
SELECT TOP 1 colA as myCol1
,Row_Number() OVER (ORDER BY ColA DESC) AS RowOrder
FROM tableA
),
TableB_CTE AS
(
SELECT TOP 1 colB as myCol2
,Row_Number() OVER (ORDER BY ColB DESC) AS RowOrder
FROM tableB
)
SELECT A.myCol1, B.MyCol2
FROM TableA_CTE AS A
INNER JOIN TableB_CTE AS B
ON A.RowOrder = B.RowOrder
There are currently two issues with the accepted answer:
I) a missing comma before the line: "Table B As"
II) TSQL seems to find it recursive as written, so I re-wrote it in a non-recursive way:
This is a re-working of the accepted answer that actually works in T-SQL:
USE [Database_1];
With
CTE_A AS
(
SELECT TOP 1 [Col1] as myCol1
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableA]
)
,
CTE_B AS
(
SELECT TOP 1 [Col2] as myCol2
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableB]
)
SELECT A.myCol1, B.myCol2
FROM CTE_A AS A
INNER JOIN CTE_B AS B
ON ( A.RowOrder = B.RowOrder)
Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)
I would like to produce a string containing some parsed numeric ranges.
I have a table with some data
b_id,s_id
1,50
1,51
1,53
1,61
1,62
1,63
2,91
2,95
2,96
2,97
Using only SQL in PostgreSQL, how could I produce this output:
b_id,s_seqs
1,"50-51,53,61-63"
2,"91,95-97"
How on earth do I do that?
select b_id, string_agg(seq, ',' order by seq_no) as s_seqs
from (
select
b_id, seq_no,
replace(regexp_replace(string_agg(s_id::text, ','), ',.+,', '-'), ',', '-') seq
from (
select
b_id, s_id,
sum(mark) over w as seq_no
from (
select
b_id, s_id,
(s_id- 1 <> lag(s_id, 1, s_id) over w)::int as mark
from my_table
window w as (partition by b_id order by s_id)
) s
window w as (partition by b_id order by s_id)
) s
group by 1, 2
) s
group by 1;
Here you can find a step-by-step analyse from the innermost query towards the outside.