Subselect and Max - tsql

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange

Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear

Related

Trying to partition to remove rows where two columns don't match sql

How can I filter out rows within a group that do not have matching values in two columns?
I have a table A like:
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
5109
57
10
75
10
0206
85
11
58
11
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
T100
20
38
64
38
Within each CODE group, I want to check whether US_PRICE = NON_US_PRICE and remove that row from the resulting table.
I tried:
SELECT *,
CASE WHEN US_PRICE != NON_US_PRICE OVER (PARTITION BY CODE) END
FROM A;
but I think I am missing something when I try to partition by CODE.
I want the resulting table to look like
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
For provided sample, simple WHERE clause could produce such result:
SELECT *
FROM A
WHERE US_PRICE IS DISTINCT FROM NON_US_PRICE;
IS DISTINCT FROM handles NULLs comparing to != operator.

PostgreSQL Lag Function between of Two Tables

I am new to Postgresql/Python, so please bear with me!
Assuming we have two tables:
item table having a itemid, price, time.
user table having colums userid, itemid, timecreated, quantity, firstprice, lastprice, difference.
Table examples like :
item table:
itemid price time
RBK 92 1546408800
LBV 51 1546408800
ZBT 49 1546408800
GLS 22 1546408800
DBC 17 1546408800
RBK 91 1546495200
LBV 55 1546495200
ZBT 51 1546495200
GLS 24 1546495200
DBC 28 1546581600
RBK 108 1546581600
LBV 46 1546581600
ZBT 49 1546581600
GLS 21 1546581600
DBC 107 1546581600
In item table all those values comes up with api.
and user table:
userid itemid timecreated quantitty firstprice currentprice difference
1 RBK 1546408800 20
2 RBK 1546408800 15
3 RBK 1546408800 35
3 GLS 1546408800 101
3 DBC 1546495200 140
1 RBV 1546495200 141
2 RBK 1546495200 25
2 RBV 1546581600 31
User table is djangobased table which is user can register\add new items to follow prices.
My struggle access the item table to fetch first price which is having a same timestamp. In that example userid 1 RBK First price (1546408800) must be filling with 92
I did some trick with postgresql with (lag) But this does not seems to be working:
update user
set firstprice = tt.prev_price
from (select item.*,
lag(price) over (partition by itemid order by time) as prev_price
from item
) tt
where tt.id = item.id and
tt.prev_close is distinct from item.price;
I can call current price from the api but didnt find out the way to filling firstprice from the item table. I will be making for a trigger for this query. I searched on google and on stackoverflow but I couldn't find anything that could help me. Thanks in advance.
I can advice next approach (may be not fastest):
update "user" set firstprice = (
select price from "item" i
where i.itemid = "user".itemid and i.time >= "user".timecreated order by i.time limit 1
);
It calculate firstprice using sub-query. Test this SQL here

PostgreSQL : comparing two sets of results does not work

I have a table that contains 3 columns of ids, clothes, shoes, customers and relates them.
I have a query that works fine :
select clothes, shoes from table where customers = 101 (all clothes and shoes of customer 101). This returns
clothes - shoes (SET A)
1 6
1 2
33 12
24 null
Another query that works fine :
select clothes ,shoes from table
where customers in
(select customers from table where clothes = 1 and customers <> 101 ) (all clothes and shoes of any other customer than 101, with specified clothes). This returns
shoes - clothes(SET B)
6 null
null 24
1 1
2 1
12 null
null 26
14 null
Now I want to get all clothes and shoes from SET A that are not in SET B.
So (example) select from SET A where NOT IN SET B. This should return just clothes 33, right?
I try to convert this to a working query :
select clothes, shoes from table where customers = 101
and
(clothes,shoes) not in
(
select clothes,shoes from
table where customers in
(select customers from table where clothes = 1 and customers <> 101 )
) ;
I tried different syntaxes, but the above looks more logic.
Problem is I never get clothes 33, just an empty set.
How do I fix this? What goes wrong?
Thanks
Edit , here is the contents of the table
id shoes customers clothes
1 1 1 1
2 1 4 1
3 1 5 1
4 2 2 2
5 2 3 1
6 1 3 1
44 2 101 1
46 6 101 1
49 12 101 33
51 13 102
52 101 24
59 107 51
60 107 24
62 23 108 51
63 23 108 2
93 124 25
95 6 125
98 127 25
100 3 128
103 24 131
104 25 132
105 102 28
106 10 102
107 23 133
108 4 26
109 6 4
110 4 24
111 12 4
112 14 4
116 102 48
117 102 24
118 102 25
119 102 26
120 102 29
122 134 31
The except clause in PostgreSQL works the way the minus operator does in Oracle. I think that will give you what you want.
I think notionally your query looks right, but I suspect those pesky nulls are impacting your results. Just like a null is not-NOT equal to 5 (it's nothing, therefore it's neither equal to nor not equal to anything), a null is also not-NOT "in" anything...
select clothes, shoes
from table1
where customers = 101
except
select clothes, shoes
from table1
where customers in (
select customers
from table1
where clothes = 1 and customers != 101
)
For PostgreSQL null is undefined value, so You must get rid of potential nulls in your result:
select id,clothes,shoes from t1 where customers = 101 -- or select id...
and (
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
OR
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
)
if You wanted unique pairs you would use:
select clothes, shoes from t1 where customers = 101
and
(clothes,shoes) not in
(
select coalesce(clothes,-1),coalesce(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
) ;
You can't get "clothes 33" if You are selecting both clothes and shoes columns...
Also if u need to know exactly which column, clothes or shoes was unique to this customer, You might use this little "hack":
select id,clothes,-1 AS shoes from t1 where customers = 101
and
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
UNION
select id,-1,shoes from t1 where customers = 101
and
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
And Your result would be:
id=49, clothes=33, shoes=-1
(I assume that there aren't any clothes or shoes with id -1, You may put any exotic value here)
Cheers

Postgres: Nested records in a Recursive query in depth first manner

I am working on a simple comment system where a user can comment on other comments, thus creating a hierarchy. To get the comments in a hierarchical order I am using Common Table Expression in Postgres.
Below are the fields and the query used:
id
user_id
parent_comment_id
message
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE
The above query returns records in a breadth first manner:
id parent_comment_id user_id
10 null 30
9 null 30
11 9 30
14 10 31
15 10 31
12 11 30
13 12 31
But can it be modified to achieve something like below where records are returned together for that comment set, in a depth first manner? The point is to get the data in this way to make rendering on the Front-end smoother.
id parent_comment_id user_id
9 null 30
11 9 30
12 11 30
13 12 31
10 null 30
14 10 31
15 10 31
Generally I solve this problem by synthesising a "Path" column which can be sorted lexically, e.g. 0001:0003:0006:0009 is a child of 0001:0003:0006. Each child entry can be created by concatenating the path element to the parent's path. You don't have to return this column to the client, just use it for sorting.
id parent_comment_id user_id sort_key
9 null 30 0009
11 9 30 0009:0011
12 11 30 0009:0011:0012
13 12 31 0009:0011:0012:0013
10 null 30 0010
14 10 31 0010:0014
15 10 31 0010:0015
The path element doesn't have to be anything in particular provided it sorts lexically in the order you want children at that level to sort, and is unique at that level. Basing it on an auto-incrementing ID is fine.
Using a fixed length path element is not strictly speaking necessary but makes it easier to reason about.
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id,
lpad(id::text, 4) sort_key
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id,
concat(CommentCTE.sort_key, ':', lpad(id::text, 4))
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE order by sort_key

Complicated AVG within date range

I've got a table with a tracking of a plant's equipment installation.
Here is a sample:
ID Name Date Percentage
1 GT-001 2011-01-08 30
2 GT-002 2011-01-11 40
3 GT-003 2011-02-02 30
4 GT-001 2011-02-03 50
5 GT-003 2011-02-15 50
6 GT-004 2011-02-15 30
7 GT-002 2011-02-15 60
8 GT-001 2011-02-20 60
9 GT-003 2011-03-01 60
10 GT-004 2011-03-05 50
11 GT-001 2011-03-10 70
12 GT-004 2011-03-15 60
And the corresponding script:
CREATE TABLE [dbo].[SampleTable](
[ID] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Date] [date] NULL,
[Percentage] [int] NULL) ON [PRIMARY]
GO
--Populate the table with values
INSERT INTO [dbo].[SampleTable] VALUES
('1', 'GT-001', '2011-01-08', '30'),
('2', 'GT-002', '2011-01-11', '40'),
('3', 'GT-003', '2011-02-02', '30'),
('4', 'GT-001', '2011-02-03', '50'),
('5', 'GT-003', '2011-02-15', '50'),
('6', 'GT-004', '2011-02-15', '30'),
('7', 'GT-002', '2011-02-15', '60'),
('8', 'GT-001', '2011-02-20', '60'),
('9', 'GT-003', '2011-03-01', '60'),
('10', 'GT-004', '2011-03-05', '50'),
('11', 'GT-001', '2011-03-10', '70'),
('12', 'GT-004', '2011-03-15', '60');
GO
What i need is to create a chart with Date on the X and Average Percentage on the Y. Average Percentage is an average percentage of all equipment by that particular date starting from the beggining of the installation process (MIN(Fields!Date.Value, "EquipmentDataset"))
Having no luck in implementing this using SSRS only, i decided to create a more complicated dataset for it using T-SQL.
I guess that it is nessesary to add a calculated column named 'AveragePercentage' that should store an average percentage on that date, calculating only the most latest equipment percentage values in a range between the beggining of the installation process (MIN(Date)) and the current row's date. Smells like a recursion, but i'm newbie to T-SQL....))
Here is the desired output
ID Name Date Percentage Average
1 GT-001 2011-01-08 30 30
2 GT-002 2011-01-11 40 35
3 GT-003 2011-02-02 30 33
4 GT-001 2011-02-03 50 40
5 GT-003 2011-02-15 50 48
6 GT-004 2011-02-15 30 48
7 GT-002 2011-02-15 60 48
8 GT-001 2011-02-20 60 50
9 GT-003 2011-03-01 60 53
10 GT-004 2011-03-05 50 58
11 GT-001 2011-03-10 70 60
12 GT-004 2011-03-15 60 63
What do you think?
I'll be very appreciated for any help.
You could use cross apply with row_number to find the latest value for each machine. An additional subquery is required because you cannot use row_number in the where clause directly. Here's the query:
select t1.id
, t1.Name
, t1.Date
, t1.Percentage
, avg(1.0*last_per_machine.percentage)
from SampleTable t1
outer apply
(
select *
from (
select row_number() over (partition by Name order by id desc)
as rn
, *
from SampleTable t2
where t2.date <= t1.date
) as numbered
where rn = 1
) as last_per_machine
group by
t1.id
, t1.Name
, t1.Date
, t1.Percentage
Working example on SE Data.