Group and Stuff multiple rows based on Count condition

Group and Stuff multiple rows based on Count condition - tsql

I have a script that runs every 10 minutes and returns table with events from past 24 hours (marked by the script run time)
ID Name TimeOfEvent EventCategory TeamColor
1 Verlene Bucy 2015-01-30 09:10:00.000 1 Blue
2 Geneva Rendon 2015-01-30 09:20:00.000 2 Blue
3 Juliane Hartwig 2015-01-30 09:25:00.000 3 Blue
4 Vina Dutton 2015-01-30 12:55:00.000 2 Red
5 Cristin Lewis 2015-01-30 15:50:00.000 2 Red
6 Reiko Cushman 2015-01-30 17:10:00.000 1 Red
7 Mallie Temme 2015-01-30 18:35:00.000 3 Blue
8 Keshia Seip 2015-01-30 19:55:00.000 2 Blue
9 Rosalia Maher 2015-01-30 20:35:00.000 3 Red
10 Keven Gabel 2015-01-30 21:25:00.000 3 Red
Now I'd like to select two groups of Names based on these conditions:
1) Select Names from same EventCategory having 4 or more records in past 24 hours.
2) Select Names from same EventCategory and same TeamColor having 2 or more records in past 1 hour.
So my result would be:
4+per24h: Geneva Rendon, Vina Dutton, Cristin Lewis, Keshia Seip EventCategory = 2
4+per24h: Juliane Hartwig, Mallie Temme, Rosalia Maher, Keven Gabel EventCategory = 3
2+per1h: Rosalia Maher, Keven Gabel EventCategory = 3, TeamColor = Red
For the first one, I have written this:
SELECT mt.EventCategory, MAX(mt.[name]), MAX(mt.TimeOfEvent), MAX(mt.TeamColor)
FROM #mytable mt
GROUP BY mt.EventCategory
HAVING COUNT(mt.EventCategory) >= 4
because I don't care for the actual time as long as it's in the past 24 hours (and it always is), but I have trouble stuffing the names in one row.
The second part, I have no idea how to do. Because the results need to have both same EventCategory and TeamColor and also be limited by the one hour bracket.

this is possible, but you mix two separate issues. Here you find them combined with UNION:
Just paste this into an empty query window and execute. Adapt to your needs:
DECLARE #tbl TABLE(ID INT,Name VARCHAR(100),TimeOfEvent DATETIME,EventCategory INT,TeamColor VARCHAR(10));
INSERT INTO #tbl VALUES
(1,'Verlene Bucy','2015-01-30T09:10:00.000',1,'Blue')
,(2,'Geneva Rendon','2015-01-30T09:20:00.000',2,'Blue')
,(3,'Juliane Hartwig','2015-01-30T09:25:00.000',3,'Blue')
,(4,'Vina Dutton','2015-01-30T12:55:00.000',2,'Red')
,(5,'Cristin Lewis','2015-01-30T15:50:00.000',2,'Red')
,(6,'Reiko Cushman','2015-01-30T17:10:00.000',1,'Red')
,(7,'Mallie Temme','2015-01-30T18:35:00.000',3,'Blue')
,(8,'Keshia Seip','2015-01-30T19:55:00.000',2,'Blue')
,(9,'Rosalia Maher','2015-01-30T20:35:00.000',3,'Red')
,(10,'Keven Gabel','2015-01-30T21:25:00.000',3,'Red');
WITH Extended AS
(
SELECT *
,DATEDIFF(MINUTE,'2015-01-30T21:26:00.000',TimeOfEvent) AS MinuteDiff --use GETDATE() here...
,COUNT(*) OVER(PARTITION BY EventCategory) AS CountCategory
FROM #tbl AS tbl
)
,Filtered24Hours AS
(
SELECT *
FROM Extended
WHERE CountCategory >=4
)
,Filtered60Mins AS
(
SELECT *
FROM Extended
WHERE MinuteDiff >=-60
AND CountCategory >=2
)
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered24Hours AS x WHERE x.EventCategory=outerSource.EventCategory) AS CountNames
,'per24h' AS TimeIntervall
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered24Hours AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'') AS Names
,EventCategory
,NULL
FROM Filtered24Hours AS outerSource
UNION
SELECT DISTINCT (SELECT COUNT(*) FROM Filtered60Mins AS x WHERE x.EventCategory=outerSource.EventCategory)
,'per1h'
,STUFF((
SELECT ' ,' + innerSource.Name
FROM Filtered60Mins AS innerSource
WHERE innerSource.EventCategory=outerSource.EventCategory
ORDER BY innerSource.TimeOfEvent
FOR XML PATH('')
),1,2,'')
,EventCategory
,TeamColor
FROM Filtered60Mins AS outerSource
The result
Count Interv Names Category Team
4 per24h Geneva Rendon ,Vina Dutton ,Cristin Lewis ,Keshia Seip 2 NULL
4 per24h Juliane Hartwig ,Mallie Temme ,Rosalia Maher ,Keven Gabel 3 NULL
2 per1h Rosalia Maher ,Keven Gabel 3 Red

Related

How do i write a group by query in PostgreSQL

I'm getting errors with PostgreSQL when am writing a group by query,
am sure someone will tell me to put all the columns I've selected in group by, but that will not give me the correct results.
Am writing a query that will select all the vehicles in the database and group the results by vehicles, giving me the total distance and cost for a given period.
Here is how am doing the query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL
GROUP BY i.vehicle;
If I put all the columns that are in the select in the group bt, the results will not be correct.
How do i go about this without putting all the columns in group by and creating sub-queries?
The fuel table looks like:
vehicle dates department quantity totalcost
1 2019-01-01 102 12 1200
1 2019-01-05 102 15 1500
1 2019-01-13 102 18 1800
1 2019-01-22 102 10 1000
2 2019-01-01 102 12 1260
2 2019-01-05 102 19 1995
2 2019-01-13 102 28 2940
Vehicle Table
id model vtype
1 1 2
2 4 6
2 5 7
This is the results i expect from the query
vehicle dates department quantity totalcost model vtype
1 2019-01-01 102 12 1200 1 2
1 2019-01-05 102 15 1500 1 2
1 2019-01-13 102 18 1800 1 2
1 2019-01-22 102 10 1000 1 2
1 2019-01-18 102 10 1000 1 2
1 65 6500
2 2019-01-01 102 12 1260 5 7
2 2019-01-05 102 19 1995 5 7
2 2019-01-13 102 28 2940 5 7
1 45 6195

Your query doesn't really make sense. Apparently there can be multiple departments and costcenters per vehicle in the fuelissuances table - which of those should be returned?
One way to deal with that, is to return all of them, e.g. as an array:
SELECT i.vehicle,
array_agg(i.costcenter) as costcenters,
array_agg(i.department) as departments,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-03-01'
and i.date < date '2019-04-01'
AND i.deleted_at IS NULL
group by i.vehicle, v.model, v.vtype;
Instead of an array, you could also return a comma separated lists of those values, e.g. string_agg(i.costcenter, ',') as costcenters.
Adding the columns v.model and v.vtype won't (shouldn't) change anything as the group by i.vehicle will only return a single vehicle anyway and thus the model and vtype won't change for that in the group.
Note that I removed the useless aliases and replaced the condition on the date with a proper range condition that can make use of an index on the dates column.
Edit
Based on your new sample data, you want a running total, rather than a "regular" aggregation. This can easily be done using window functions
SELECT i.vehicle,
i.costcenter,
i.department,
SUM(i.quantity) over (w) AS liters,
SUM(i.totalcost) over (w) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-01-01'
and i.dates < date '2019-02-01'
AND i.deleted_at IS NULL
window w as (partition by i.vehicle order by i.dates)
order by i.vehicle, i.dates;
I would not create those "total" lines using SQL, but rather in your front end that display the data.
Online example: https://rextester.com/CRJZ27446

You need to use a nested query to get those SUM you want inside that query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
(SELECT SUM(i.quantity) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS liters,
(SELECT SUM(i.totalcost) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL;

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.

This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

Retrieve information dynamically from multiple CTE

I have multiple CTEs and I want to retrieve some information from a couple of them into next CTE.
So, I have this information from one of the CTEs:
PeriodID StarDate
1 2006-01-01
2 2007-04-25
3 2008-08-16
4 2009-12-08
5 2011-04-017
and this from other:
RecordID Date
100 2007-04-15
101 2008-05-21
102 2008-06-06
103 2008-07-01
104 2009-11-12
And I need to show in next one:
RecordID Date PeriodID
100 2007-04-15 1
101 2008-05-21 2
102 2008-06-06 2
103 2008-07-01 2
104 2009-11-12 3
I can use some case/when statement to define if date of record is in period 1,2,3,4 or 5 but it some situation I can have different numbers of periods return from the first CTE.
Is there a way to do this in the above context?

You can have multiple CTEs defined as follows, and then select from and join them as you would any other table.
with cte1 as (select * ...),
cte2 as (select * ...)
select
cte2.*,
periodid
from cte2
cross apply
(select top 1 * from cte1 where cte2.recorddate> cte1.startdate order by startdate desc) v

T-SQL A problem with SELECT TOP (case [...])

I have query like that:
(as You see I'd like to retrieve 50% of total rows or first 100 rows etc)
//#AllRowsSelectType is INT
SELECT TOP (
case #AllRowsSelectType
when 1 then 100 PERCENT
when 2 then 50 PERCENT
when 3 then 25 PERCENT
when 4 then 33 PERCENT
when 5 then 50
when 6 then 100
when 7 then 200
end
) ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a,b,c etc
why have I the error : "Incorrect syntax near the keyword 'PERCENT'." on line "when 1 [...]"

The syntax for TOP is:
TOP (expression) [PERCENT]
[ WITH TIES ]
The reserved keyword PERCENT cannot be included in the expression. Instead you can run two different queries: one for when you want PERCENT and another for when you don't.
If you need this to be one query you can run both queries and use UNION ALL to combine the results:
SELECT TOP (
CASE #AllRowsSelectType
WHEN 1 THEN 100
WHEN 2 THEN 50
WHEN 3 THEN 25
WHEN 4 THEN 33
ELSE 0
END) PERCENT
ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a, b, c, ...
UNION ALL
SELECT TOP (
CASE #AllRowsSelectType
WHEN 5 THEN 50
WHEN 6 THEN 100
WHEN 7 THEN 200
ELSE 0
END)
ROW_NUMBER() OVER(ORDER BY [id]) AS row_num, a, b, c, ...

You're also mixing two different types of use. The other is.
DECLARE #ROW_LIMT int
IF #AllRowsSelectType < 5
SELECT #ROW_LIMIT = COUNT(*)/#AllRowsSelectType FROM myTable -- 100%, 50%, 33%, 25%
ELSE
SELECT #ROW_LIMIT = 50 * POWER(2, #AllRowsSelectType - 5) -- 50, 100, 200...
WITH OrderedMyTable
(
select *, ROW_NUMBER() OVER (ORDER BY id) as rowNum
FROM myTable
)
SELECT * FROM OrderedMyTable
WHERE rowNum <= #ROW_LIMIT

You could do:
select top (CASE #FilterType WHEN 2 THEN 50 WHEN 3 THEN 25 WHEN 4 THEN 33 ELSE 100 END) percent * from
(select top (CASE #FilterType WHEN 5 THEN 50 WHEN 6 THEN 100 WHEN 7 THEN 200 ELSE 2147483647 END) * from
<your query here>
) t
) t
Which may be easier to read.

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.

Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.

In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)

I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)