Select multiple values by category in T-SQL - tsql

I have the following table, with over 70K records:
test_1:
ClientID Category
22 Stress
22 Alcohol
22 Scizo
23 Stress
23 Alcohol
24 Stress
24 Scizo
25 Bi Polar
25 Cocaine
25 Meth
26 Stress
I need to SELECT only those ClientIDs, where Category = 'Stress', and also Category = 'Alcohol', within a ClientID.
So, I expect ClientIDs - 22, 23 in my output.
(ClientID 24 has only 'Stress' and no 'Alcohol'; same for ClientID 26, ClientID 25 has no 'Stress' no 'Alcohol'. Means 24, 25, 26 shouldn't be selected)
In this simple code my result includes ClientID = 22, 23, 24, 26. Where 'Stress' appears without 'Alcohol' in last 2 IDs.
SELECT
[ClientID]
,[Category]
FROM
[WH].[dbo].[Test_1]
WHERE
(0=0)
and (Category = 'Stress' or Category = 'Alcohol')
If I write my WHERE statement with AND
WHERE
(0=0)
and (Category = 'Stress' AND Category = 'Alcohol')
then I have no records displayed
Please HELP!
UPD -
Question answered (see below)
Also, if I'd wanted to see the actual categories (not just IDs) in my query, then I do the following:
SELECT
m.[ClientID]
,m.[Category]
FROM
[WH].[dbo].[Test_1] m
INNER JOIN
(
SELECT
[ClientID]
FROM
[WH].[dbo].[Test_1]
WHERE
[Category] IN ('Stress', 'Alcohol')
GROUP BY
[ClientID]
HAVING COUNT(DISTINCT Category) = 2
) cte ON m.ClientID = cte.ClientID
I get the following result:
ClientID Category
22 Stress
22 Alcohol
22 Scizo
23 Stress
23 Alcohol

The problem with your current approach is that the WHERE clause is logic applied to a single record. Instead, you want to perform the category check across multiple records. One approach uses aggregation:
SELECT ClientID
FROM [WH].[dbo].[Test_1]
WHERE Category IN ('Stress', 'Alcohol')
GROUP BY ClientID
HAVING COUNT(DISTINCT Category) = 2;

Related

forward rolling sum with different stopping points by row

First, some sample data so the business problem can be explained -
select
ItemID = 276,
Quantity,
Bucket,
DaysInMonth = day(eomonth(Bucket)),
DailyQuantity = cast(Quantity * 1.0 / day(eomonth(Bucket)) as decimal(4, 0)),
DaysFactor
into #data
from
(
values
('1/1/2021', 95, 5500),
('2/1/2021', 75, 6000),
('3/1/2021', 80, 5000),
('4/1/2021', 82, 5300),
('5/1/2021', 90, 5200),
('6/1/2021', 80, 6500),
('7/1/2021', 85, 6100),
('8/1/2021', 90, 5100),
('9/1/2021', null, 5800),
('10/1/2021', null, 5900)
) d (Bucket, DaysFactor, Quantity);
select * from #data;
Now, the business problem -
The first row has a DaysFactor of 95.
The forward rolling sum for this row is calculated as
(31 x 177) + (28 x 214) + (31 x 161) + (5 x 177) = 17,355
That is...
the daily quantity for all 31 days of the 1/1/2021 bucket plus
the daily quantity for all 28 days of the 2/1/2021 bucket plus
the daily quantity for all 31 days of the 3/1/2021 bucket plus
the daily quantity for 5 days of the 4/1/2021 bucket.
This results in 95 days of forward looking quantity.
95 days = 31 + 28 + 31 + 5
For the second row, with a DaysFactor of 75, it would start with daily quantity for the 28 days in the 2/1/2021 bucket and go out until a total of 75 days' worth of quantity were summed, like so:
(28 x 214) + (31 x 161) + (16 x 177) = 13,815
75 days = 28 + 31 + 16
One approach to this is building a calendar of daily demand and then summing quantity over the specified days. However, I'm stuck on how to do the summing. Here is the code that builds the calendar with daily quantities:
with
dates as
(
select
FirstDay = min(cast(Bucket as date)),
LastDay = eomonth(max(cast(Bucket as date)))
from #data
),
tally as (
select top (select datediff(d, FirstDay, LastDay) + 1 from dates) --restrict to number of rows equal to number of days between first and last days
n = row_number() over(order by (select null)) - 1
from sys.messages
),
calendar as (
select
Bucket = dateadd(d, t.n, d.FirstDay)
from tally t
cross join dates d
)
select
c.Bucket,
d.DailyQuantity
from #data d
inner join calendar c
on year(d.Bucket) = year(c.Bucket)
and month(d.Bucket) = month(c.Bucket);
Here's a screenshot of a subset of rows from this query:
I was hoping to use T-SQL's LEAD() to do this but don't see a way to put the DaysFactor into the ROWS clause within OVER(). Is there a way to do that? If not, is there a set based approach to calculating the rolling forward sum?
Expected result set:
Figured it out using an approach different than LEAD(). This column was added to #data:
BucketEnd = cast(dateadd(d, DaysFactor - 1, Bucket) as date)
Then code that builds the calendar with daily quantities shown in original question was put into a temp table called #calendar.
Then this query performs the calculations:
select
d.ItemID,
d.Bucket,
RollingForwardQuantitySum = sum(iif(c.Bucket between d.Bucket and d.BucketEnd, c.DailyQuantity, null))
from #data d
cross join #calendar c
group by
d.ItemID,
d.Bucket
order by
d.ItemID,
cast(d.Bucket as date);
The output from this query matches the expected result set screen shot in the original post.

Book The Art of PostgreSQL self study don't understand lateral join mean?

I guess in this tag many people bought the book "The Art of PostgreSQL". The book content is the context for this question. I self-paced learning from this book. I encountered some problems, so I write a mail to the author, also ask this question in here.
On page 47:
I Totally don't understand what does line 27 limit :n mean?
I also don't know what does line 34 ss(name, albumid, count) on true mean?
I kind of get some part of it. But I am still not sure what does
LATERAL JOIN DO?
-- name: genre-top-n
2 -- Get the N top tracks by genre
3 select genre.name as genre,
4 case when length(ss.name) > 15
5 then substring(ss.name from 1 for 15) || '…'
6 else ss.name
7 end as track,
8 artist.name as artist
9 from genre
10 left join lateral
11 /*
12 * the lateral left join implements a nested loop over
13 * the genres and allows to fetch our Top-N tracks per
Chapter 5 A Small Application | 47
14 * genre, applying the order by desc limit n clause.
15 *
16 * here we choose to weight the tracks by how many
17 * times they appear in a playlist, so we join against
18 * the playlisttrack table and count appearances.
19 */
20 (
21 select track.name, track.albumid, count(playlistid)
22 from track
23 left join playlisttrack using (trackid)
24 where track.genreid = genre.genreid
25 group by track.trackid
26 order by count desc
27 limit :n
28 )
29 /*
30 * the join happens in the subquery's where clause, so
31 * we don't need to add another one at the outer join
32 * level, hence the "on true" spelling.
33 */
34 ss(name, albumid, count) on true
35 join album using(albumid)
36 join artist using(artistid)
37 order by genre.name, ss.count desc;

Postgres: Nested records in a Recursive query in depth first manner

I am working on a simple comment system where a user can comment on other comments, thus creating a hierarchy. To get the comments in a hierarchical order I am using Common Table Expression in Postgres.
Below are the fields and the query used:
id
user_id
parent_comment_id
message
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE
The above query returns records in a breadth first manner:
id parent_comment_id user_id
10 null 30
9 null 30
11 9 30
14 10 31
15 10 31
12 11 30
13 12 31
But can it be modified to achieve something like below where records are returned together for that comment set, in a depth first manner? The point is to get the data in this way to make rendering on the Front-end smoother.
id parent_comment_id user_id
9 null 30
11 9 30
12 11 30
13 12 31
10 null 30
14 10 31
15 10 31
Generally I solve this problem by synthesising a "Path" column which can be sorted lexically, e.g. 0001:0003:0006:0009 is a child of 0001:0003:0006. Each child entry can be created by concatenating the path element to the parent's path. You don't have to return this column to the client, just use it for sorting.
id parent_comment_id user_id sort_key
9 null 30 0009
11 9 30 0009:0011
12 11 30 0009:0011:0012
13 12 31 0009:0011:0012:0013
10 null 30 0010
14 10 31 0010:0014
15 10 31 0010:0015
The path element doesn't have to be anything in particular provided it sorts lexically in the order you want children at that level to sort, and is unique at that level. Basing it on an auto-incrementing ID is fine.
Using a fixed length path element is not strictly speaking necessary but makes it easier to reason about.
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id,
lpad(id::text, 4) sort_key
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id,
concat(CommentCTE.sort_key, ':', lpad(id::text, 4))
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE order by sort_key

Find max value in a group in FileMaker

How to select only max values in a group in the following set
id productid price year
---------------------------
1 11 0,10 2015
2 11 0,12 2016
3 11 0,11 2017
4 22 0,08 2016
5 33 0,02 2016
6 33 0,01 2017
Expected result for each productid and max year would be
id productid price year
---------------------------
3 11 0,11 2017
4 22 0,08 2016
6 33 0,01 2017
This works for me.
ExecuteSQL (
"SELECT t.id, t.productid, t.price, t.\"year\"
FROM test t
WHERE \"year\" =
(SELECT MAX(\"year\") FROM test tt WHERE t.productid = tt.productid)"
; " " ; "")
Adapted from this answer:
https://stackoverflow.com/a/21310671/832407
A simple SQL query will give you a last year for every product record
ExecuteSQL (
"SELECT productid, MAX ( \"year\")
FROM myTable
GROUP By productid";
"";"" )
To get to the price for that year is going to be trickier, as FileMaker SQL does not fully support subqueries or temp tables.

SQL select statement filter

I am struggling with a filter for clients in our system. Each Client has a plan that is carried out monthly. For each plan there are can be multiple visits and for each visit there can be different visit tasks with each task falling under a category e.g.
ClientNo VisitNo VisitTaskID TaskCategory
------------------------------------------
900001 100 19 P
900001 100 18 P
900001 100 01 H
900001 105 21 P
900001 105 19 P
900001 105 16 C
I want to do a count for clients who receive only VisitTaskID 19 for TaskCategory 'P'. I tried using the query below but it will not filter out the other VisitTasks under category P
SELECT COUNT (ClientNo)
FROM Tasks
WHERE VisitTask NOT IN (02,03....18,20,21)
The result still counts clients with VisitTaskID's that I thought I was filtering out.
Each VisitTaskID is unique no matter what category it falls under.
Any help is much appreciated.
Clients who only have task 19 within category p:
SELECT COUNT (ClientNo)
FROM Tasks
WHERE (VisitTask = 19 AND TaskCategory = 'P')
AND NOT EXISTS (SELECT clientno FROM tasks WHERE VisitTask =! 19 AND TaskCategory = 'P')