Related
I am new to Postgresql. I have a table called 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
something like this:
Now I want to find the “most favorable” month (when most amount of the product was
sold) and the “least favorable” month (when the least amount of the product was sold) for each product.
The result should be like this:
I entered
SELECT
prod product,
MAX(CASE WHEN rn2 = 1 THEN month END) MOST_FAV_MO,
MAX(CASE WHEN rn1 = 1 THEN month END) LEAST_FAV_MO
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant ) rn1,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant DESC) rn2
FROM sales
) x
WHERE rn1 = 1 or rn2 = 1
GROUP BY prod,quant;
Then there are null values for each product and there are 20 rows in total:
So how can I remove the null values in these rows and make the total number of rows to 10 (There are 10 distinct products in total)???
I would say that the GROUP BY clause should be
GROUP BY prod
Otherwise you get one line per different quant, which is not what you want.
I've got some periodic counter data (like once a second) from different objects that I wish to combine into an hourly total.
If I do it with separate column names, it's pretty straightforward:
CREATE TABLE ts1 (
id INTEGER,
ts TIMESTAMP,
count0 integer,
count1 integer,
count2 integer
);
INSERT INTO ts1 VALUES
(1, '2017-12-07 10:37:48', 10, 20, 50),
(2, '2017-12-07 10:37:48', 13, 7, 88),
(1, '2017-12-07 10:37:49', 12, 23, 34),
(2, '2017-12-07 10:37:49', 11, 13, 46),
(1, '2017-12-07 10:37:50', 8, 33, 80),
(2, '2017-12-07 10:37:50', 9, 3, 47),
(1, '2017-12-07 10:37:51', 17, 99, 7),
(2, '2017-12-07 10:37:51', 9, 23, 96);
SELECT id, date_trunc('hour', ts + '1 hour') nts,
sum(count0), sum(count1), sum(count2)
FROM ts1 GROUP BY id, nts;
id | nts | sum | sum | sum
----+---------------------+-----+-----+-----
1 | 2017-12-07 11:00:00 | 47 | 175 | 171
2 | 2017-12-07 11:00:00 | 42 | 46 | 277
(2 rows)
The problem is that different objects have different numbers of counts (though each particular object's rows -- ones sharing the same ID -- all have the same number of counts). Hence I want to use an array.
The corresponding table looks like this:
CREATE TABLE ts2 (
id INTEGER,
ts TIMESTAMP,
counts INTEGER[]
);
INSERT INTO ts2 VALUES
(1, '2017-12-07 10:37:48', ARRAY[10, 20, 50]),
(2, '2017-12-07 10:37:48', ARRAY[13, 7, 88]),
(1, '2017-12-07 10:37:49', ARRAY[12, 23, 34]),
(2, '2017-12-07 10:37:49', ARRAY[11, 13, 46]),
(1, '2017-12-07 10:37:50', ARRAY[8, 33, 80]),
(2, '2017-12-07 10:37:50', ARRAY[9, 3, 47]),
(1, '2017-12-07 10:37:51', ARRAY[17, 99, 7]),
(2, '2017-12-07 10:37:51', ARRAY[9, 23, 96]);
I have looked at this answer https://stackoverflow.com/a/24997565/1076479 and I get the general gist of it, but I cannot figure out how to get the correct rows summed together when I try to combine it with the grouping by id and timestamp.
For example, with this I get all the rows, not just the ones with matching id and timestamp:
SELECT id, date_trunc('hour', ts + '1 hour') nts, ARRAY(
SELECT sum(elem) FROM ts2 t, unnest(t.counts)
WITH ORDINALITY x(elem, rn) GROUP BY rn ORDER BY rn
) FROM ts2 GROUP BY id, nts;
id | nts | array
----+---------------------+--------------
1 | 2017-12-07 11:00:00 | {89,221,448}
2 | 2017-12-07 11:00:00 | {89,221,448}
(2 rows)
FWIW, I'm using postgresql 9.6
The problem with you original query is that you're summing all elements, because GROUP BY id, nts is executed in outer query. Combining a CTE with LATERAL JOIN would do the trick:
WITH tmp AS (
SELECT
id,
date_trunc('hour', ts + '1 hour') nts,
sum(elem) AS counts
FROM
ts2
LEFT JOIN LATERAL unnest(counts) WITH ORDINALITY x(elem, rn) ON TRUE
GROUP BY
id, nts, rn
)
SELECT id, nts, array_agg(counts) FROM tmp GROUP BY id, nts
I have a very strange request. I'm trying to create an SQL statement to do this. I know I can create a cursor but trying to see if it can be done is SQL
Here is my source data.
1 - 1:00 PM
2 - 1:02 PM
3 - 1:03 PM
4 - 1:05 PM
5 - 1:06 PM
6 - 1:09 PM
7 - 1:10 PM
8 - 1:12 PM
9 - 1:13 PM
10 - 1:15 PM
I'm trying to create a function that if I pass an interval it will return the resulting data set.
For example I pass in 5 minutes, then the records I would want back are records 1, 4, 7, & 10.
Is there a way to do this in SQL. Note: if record 4 (1:05 PM wasn't in the data set I would expect to see 1, 5, & 8. I would see 5 because it is the next record with a time greater than 5 minutes from record 1 and record 8 because it is the next record with a time greater than 5 minutes from record 5.
Here is a create script that you should have provided:
declare #Table1 TABLE
([id] int, [time] time)
;
INSERT INTO #Table1
([id], [time])
VALUES
(1, '1:00 PM'),
(2, '1:02 PM'),
(3, '1:03 PM'),
(4, '1:05 PM'),
(5, '1:06 PM'),
(6, '1:09 PM'),
(7, '1:10 PM'),
(8, '1:12 PM'),
(9, '1:13 PM'),
(10, '1:15 PM')
;
I would do this with this query:
declare #interval int
set #interval = 5
;with next_times as(
select id, [time], (select min([time]) from #Table1 t2 where t2.[time] >= dateadd(minute, #interval, t1.[time])) as next_time
from #Table1 t1
),
t as(
select id, [time], next_time
from next_times t1 where id=1
union all
select t3.id, t3.[time], t3.next_time
from t inner join next_times t3
on t.next_time = t3.[time]
)
select id, [time] from t order by 1
-- results:
id time
----------- ----------------
1 13:00:00.0000000
4 13:05:00.0000000
7 13:10:00.0000000
10 13:15:00.0000000
(4 row(s) affected)
It works even for the situations with a missing interval:
-- delete the 1:05 PM record
delete from #table1 where id = 4;
;with next_times as(
select id, [time], (select min([time]) from #Table1 t2 where t2.[time] >= dateadd(minute, #interval, t1.[time])) as next_time
from #Table1 t1
),
t as(
select id, [time], next_time
from next_times t1 where id=1
union all
select t3.id, t3.[time], t3.next_time
from t inner join next_times t3
on t.next_time = t3.[time]
)
select id, [time] from t order by 1;
-- results:
id time
----------- ----------------
1 13:00:00.0000000
5 13:06:00.0000000
8 13:12:00.0000000
(3 row(s) affected)
I need to build a patient population based on clinic visits. The qualifying criteria (filter) for this population is 3 visits in a 6 week period over the evaluation year. How can I code this?
DECLARE #Records TABLE (ptID INT, date DATE)
INSERT INTO #Records VALUES
(1, '2016-01-01')
,(1, '2016-01-05')
,(1, '2016-02-01')
,(1, '2016-10-01')
,(2, '2015-12-01')
,(2, '2015-12-10')
,(2, '2015-12-31')
,(2, '2016-01-01')
,(2, '2016-01-05')
,(2, '2016-03-05')
,(3, '2016-01-01')
,(3, '2016-02-01')
,(3, '2016-03-01')
,(3, '2016-04-01')
,(3, '2016-05-01')
,(3, '2016-06-01')
,(3, '2016-07-01')
,(3, '2016-08-01')
select a.ptID , a.date
from #Records a
join #Records b
on a.ptID = b.ptID
and datediff(wk, a.date, b.date) <= 6
and datediff(wk, a.date, b.date) > 0
and DATEPART(yy, a.date) = DATEPART(yy, b.date)
group by a.ptID, a.date
having count(*) >= 2
Paparazzi deserves all the credit for comparing the table to itself. I'm just refining his comparison here.
DECLARE #Records TABLE (
PatientID INT
,VisitDate DATE
)
INSERT INTO #Records VALUES
(1, '2016-01-01')
,(1, '2016-01-05')
,(1, '2016-02-01')
,(1, '2016-10-01')
,(2, '2015-12-01')
,(2, '2016-01-01')
,(2, '2016-01-05')
,(2, '2016-03-05')
;WITH SixWeeks
AS (
SELECT a.PatientID AS PID1, a.VisitDate AS Date1,
b.PatientID AS PID2, b.VisitDate AS Date2,
DATEDIFF(dd, a.VisitDate, b.VisitDate) AS DD
FROM #Records a
JOIN #Records b
ON a.PatientID = b.PatientID
AND DATEDIFF(dd, a.VisitDate, b.VisitDate) <= 42
AND DATEPART(yy,a.VisitDate) = '2016'
WHERE EXISTS (SELECT * FROM #Records WHERE (VisitDate > a.VisitDate AND VisitDate < b.VisitDate))
)
SELECT PID1 FROM SixWeeks
GROUP BY PID1
I have a 100K-row table representing sales during a particular time period. Usually the periods are at least a few hours long, but occasionally we get a period that's only a few minutes long. These tiny periods mess up downstream reporting, so I'd like to merge them with the preceding period. Any period that's 30 minutes or less should get merged with the previous period, with sales data summed across periods. There may be zero, one, or many multiple subsequent short periods between long periods. There are no time gaps in the data-- the start of one period is always the same as the end of the previous one.
What's a good set-based way (no cursors!) to perform this merging?
Existing data (simplified) looks like this:
UnitsSold Start End
---------------------------------------------------
10 06-12-2013 08:03 06-12-2013 12:07
12 06-12-2013 12:07 06-12-2013 16:05
1 06-12-2013 16:05 06-12-2013 16:09
1 06-12-2013 16:09 06-12-2013 16:13
7 06-12-2013 16:13 06-12-2013 20:10
Desired output would look like this:
UnitsSold Start End
---------------------------------------------------
10 06-12-2013 08:03 06-12-2013 12:07
14 06-12-2013 12:07 06-12-2013 16:13
7 06-12-2013 16:13 06-12-2013 20:10
Unfortunately we're still on SQL Server 2008 R2, so we can't leverage the cool new window functions in SQL Server 2012, which might make this problem easier to solve efficiently.
There's a good discussion of a similar problem in Merge adjacent rows in SQL?. I particularly like the PIVOT/UNPIVOT solution, but I'm stumped for how to adapt it to my problem.
My idea is
create list only with long periods
find start of next long period with "outer apply"
sum units with subquery
Something like this
declare #t table (UnitsSold int, start datetime, finish datetime)
insert into #t values (10, '20130612 08:03', '20130612 12:07')
insert into #t values (12, '20130612 12:07', '20130612 16:05')
insert into #t values (1, '20130612 16:05', '20130612 16:09')
insert into #t values (1, '20130612 16:09', '20130612 16:13')
insert into #t values (7, '20130612 16:13', '20130612 20:10')
select
(select SUM(UnitsSold) from #t t3 where t3.start>=t1.start and t3.finish<=ISNULL(oa.start, t1.finish)) as UnitsSold,
t1.start,
ISNULL(oa.start, t1.finish) as finish
from #t t1
outer apply (
select top(1) start
from #t t2
where datediff(minute,t2.start, t2.finish)>30
and t2.start >= t1.finish
order by t2.start
) oa
where datediff(minute, t1.start, t1.finish)>30
Using recursive CTE:
DECLARE #t TABLE (UnitsSold INT, Start DATETIME, Finish DATETIME)
INSERT INTO #t VALUES
(10, '06-12-2013 08:03', '06-12-2013 12:07'),
(12, '06-12-2013 12:07', '06-12-2013 16:05'),
(1, '06-12-2013 16:05', '06-12-2013 16:09'),
(1, '06-12-2013 16:09', '06-12-2013 16:13'),
(7, '06-12-2013 16:13', '06-12-2013 20:10')
;WITH rec AS (
-- Returns periods > 30 minutes
SELECT u.UnitsSold, u.Start, u.Finish
FROM #t u WHERE DATEDIFF(MINUTE, u.Start, u.Finish) > 30
UNION ALL
-- Adds on adjoining periods <= 30 minutes
SELECT
u.UnitsSold + r.UnitsSold,
r.Start,
u.Finish
FROM rec r
INNER JOIN #t u ON r.Finish = u.Start
AND DATEDIFF(MINUTE, u.Start, u.Finish) <= 30)
-- Since the CTE also returns incomplete periods we need
-- to filter out the relevant periods, in this case the
-- last/max values for each start value.
SELECT
MAX(r.UnitsSold) AS UnitsSold,
r.Start AS Start,
MAX(r.Finish) AS Finish
FROM rec r
GROUP BY r.Start
Using CTE and cumulative sum:
DECLARE #t TABLE (UnitsSold INT, Start DATETIME, Finish DATETIME)
INSERT INTO #t VALUES
(10, '06-12-2013 08:03', '06-12-2013 12:07'),
(12, '06-12-2013 12:07', '06-12-2013 16:05'),
(1, '06-12-2013 16:05', '06-12-2013 16:09'),
(1, '06-12-2013 16:09', '06-12-2013 16:13'),
(7, '06-12-2013 16:13', '06-12-2013 20:10')
;WITH groups AS (
SELECT UnitsSold, Start, Finish,
-- Cumulative sum, IIF returns 1 for each row that
-- should generate a new row in the final result.
SUM(IIF(DATEDIFF(MINUTE, Start, Finish) <= 30, 0, 1)) OVER (ORDER BY Start) csum
FROM #t)
SELECT
SUM(UnitsSold) UnitsSold,
MIN(Start) Start,
MAX(Finish) Finish
FROM groups
GROUP BY csum