find max value for specific column while still seeing other columns - postgresql

For tables patient and labh
patient
id lastname
19 patientone
20 patienttwo
patientid lastname loinc datetime numerical
19 patientone 4548-4 2014-05-15 00:00:00 6.5
19 patientone 4548-4 2015-05-15 00:00:00 7.5
19 patientone 4548-4 2016-05-15 00:00:00 3.5
19 patientone 4548-4 2017-05-15 00:00:00 5.5
19 patientone 5000-3 2018-05-15 00:00:00 123
20 patienttwo 4548-4 2013-05-15 00:00:00 2.5
20 patienttwo 4548-4 2012-05-15 00:00:00 1.5
20 patienttwo 4548-4 2011-05-15 00:00:00 9.5
20 patienttwo 4548-4 2010-05-15 00:00:00 3.5
Desired output:
patientid lastname datetime numerical
19 patientone 2017-05-15 00:00:00 5.5
20 patienttwo 2013-05-15 00:00:00 2.5
The labh table hold lab values(numerical), the type of lab (loinc) and when they were done (datetime). I'd like to query for the most recent value of loinc=4548-4 , and i'd like the output to show both the date and the value.
i've tried this below and it shows the most recent dates, but I can't see the values (numerical) at the same time. when I add the numerical column, the it shows all the values, not just the most recent.
Select Distinct patient.id, patient.lastname, Max(Date_Trunc('day', labh.datetime)) As "Date" From patient Inner Join labh On patient.id = labh.patientid Where labh.loinc = '4548-4' Group By patient.id, patient.lastname, patient.firstname Order By patient.id

you haven't selected the numerical column in your query. You can use CTE to store the data temporarily through ranking on pratition over patient id and ordering each partition on the basis of date.
So, according to this, you can try:
WITH summary AS (
SELECT p.id as "Patient ID",
p.lastname as "Patient Name",
l.datetime As "Date",
l.numerical as "Numerical",
ROW_NUMBER() OVER (PARTITION BY p.id
ORDER BY l.datetime DESC) AS rank
FROM patient p
Inner Join labh l
On p.id = l.patientid)
SELECT "Patient ID",
"Patient Name",
"Date",
"Numerical"
FROM summary
WHERE rank = 1;
And this will give you:
Patient ID
Patient Name
Date
Numerical
19
patientone
2017-05-15T00:00:00.000Z
5.5
20
patienttwo
2013-05-15T00:00:00.000Z
2.5
UPDATE
As you've updated the question and changed the expectation, the modified query will be nothing but adding a where condition inside cte construction:
WITH summary AS (
SELECT p.id as "Patient ID",
p.lastname as "Patient Name",
l.datetime As "Date",
l.numerical as "Numerical",
ROW_NUMBER() OVER (PARTITION BY p.id
ORDER BY l.datetime DESC) AS rank
FROM patient p
Inner Join labh l
On p.id = l.patientid
where l.loinc = '4548-4') -- Added this line
SELECT "Patient ID",
"Patient Name",
"Date",
"Numerical"
FROM summary
WHERE rank = 1;
This will give you the same result:
Patient ID
Patient Name
Date
Numerical
19
patientone
2017-05-15T00:00:00.000Z
5.5
20
patienttwo
2013-05-15T00:00:00.000Z
2.5

In order to achieve what you're looking for in Postgres (and other SQL RDBMSes), you need to essentially identify the max value and its corresponding primary key, then join it with the rest of the data set you are looking to retrieve:
SELECT patient.*, labh.*
FROM patient
JOIN labh
ON patient.id = labh.patientid
JOIN (SELECT patientid, max(datetime)
FROM labh
GROUP BY patientid) maxvals
ON maxvals.patientid = labh.patientid AND
maxvals.datetime = labh.datetime

Related

How to Outer Join a Calendar table to view dates with 0 records

I have a table with records of orders by customers and a table with dates from Jan 2022 to 10 years. I wanted to get all numbers of customers made everyday for the last 28 days, including those with 0 customers recorded. So I needed to outer join the calendar table to the customer records. However, I cant use outer join correctly.
Here's how I done it:
SELECT order_date as 'date', COUNT(orderstatus) as 'customers'
FROM orders
RIGHT OUTER JOIN calendar ON
calendar.date = orders.order_date
WHERE sellerid = 11
Im getting:
date customers
2022-01-02 9
I wanted to see:
date customers
2022-01-01 0
2022-01-02 9
2022-01-03 0
.
.
.
You would not get the results that you posted in your question unless you group by date, so I guess you missed that part of your code.
You need a WHERE clause to filter the calendar's rows for the last 28 days and you must move the condition sellerid = 11 to the ON clause:
SELECT c.order_date,
COUNT(o.order_date) customers
FROM calendar c LEFT JOIN orders o
ON o.sellerid = 11 AND o.order_date = c.date
WHERE c.date BETWEEN CURRENT_DATE - INTERVAL 28 DAY AND CURRENT_DATE
GROUP BY c.order_date;

How I can find duplicate values in the result of a join operation?

I have two tables
MappingTable > Id, ItemId, Quantity
ItemTable > ItemId, Name, DateOfPurchase
I wanted to find out the duplicate rows having same Quantity and same DateOfPurchase.
eg. I have
Id ItemId Quantity
1 01 4
2 03 5
3 05 4
ItemId Name DateOfPurchase
01 AB 2019-10-30 18:30:00
05 XY 2019-10-30 18:17:00
Result:
Quantity DateOfPurchase Name
4 2019-10-30 AB
4 2019-10-30 XY
So, I might join these tables and then find duplicates
How can I do that?
One option is to use window funtions, if your database supports them:
select *
from (
select
m.*,
i.name,
i.dateOfPurchase,
count(*) over(partition by m.quantity, p.dateOfPurchase) cnt
from mapping m
inner join item i on i.itemId = m.itemId
) t
where cnt > 1
order by quantity, dateOfPurchase

How do I select the min opendate from a list of duplicates?

I have 3 columns. SSN|AccountNumber|OpenDate
1 SSN may have multiple AccountNumbers
Each AccountNumber has a corresponding OpenDate
In my list I have many SSN's, each containing several account numbers which may have been opened on different days.
I want the results of my query to be SSN|earlest OpenDate|AccountNumber that corresponds with the earliest opendate.
I'm dealing with about 200,000 records.
EDIT: First I did
select SSN, min(OpenDate), AcctNumber from Table Group By SSN, AccountNumber
but that didn't quite give me the correct data.
The raw data gives me something like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 Jan
10 102 Feb
10 103 Mar
Where I got 10, Jan, and AccNumber 102 which is not the account number that is associated with Jan OpenDate After looking at others, I found that the account number I got was just one of the account numbers associated with that SSN rather than the one that corresponds with the min(OpenDate)
WITH CTE AS ( SELECT SSN, AcctNumber, OpenDate, ROW_NUM() OVER (PARTITION BY SSN ORDER BY OpenDate DESC) AS RN ) SELECT SSN, AcctNumber, OpenDate FROM CTE WHERE RN=1;
If your table is like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 101 May
10 102 April
20 201 June
20 201 July
Do you want your query to return this?
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 102 April
20 201 June
Then you would use this query:
select ssn, min(OpenDate), acctNumber from tbl group by ssn, acctNumber
You can try this..
select SSN , AcctNumber, OpenDate
from (SELECT SSN , AcctNumber, OpenDate
, ROW_NUMBER() OVER ( PARTITION BY SSN, ORDER BY OpenDate ASC ) AS RN
FROM table) AS temp
WHERE temp.RN= 1

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

Drive EndDate of Current Row From StarDate of Next Row

Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate
Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.