Count 2 aggregates using case in a nested and joined Select - tsql

I hit a wall with the problem I was presented with by User...
Overview.
User requires statistic looking like this:
|TEAM | TARGET | WEEKLY | MONTHLY |
|-----|------- |--------|---------|
|AAA | 80 | 15 | 59 |
|BBB | 80 | 12 | 35 |
|CCC | 80 | 13 | 50 |
|DDD | 80 | 6 | 39 |
|EEE | 80 | 7 | 28 |
|FFF | 80 | 11 | 30 |
|GGG | 80 | 10 | 28 |
|HHH | 80 | 8 | 48 |
I am at this point with the code:
DECLARE #StartExDate datetime
DECLARE #EndExDate datetime
declare #ThisWeekNow int
SET #StartExDate = (SELECT(CONVERT(DATETIME, (SELECT DATEADD(DAY, 1, EOMONTH(GETDATE(),-1))))))
SET #EndExDate = (SELECT(CONVERT(DATETIME, (SELECT DATEADD(DAY, 1, EOMONTH(GETDATE()))))))
SET #ThisWeekNow = (SELECT DATEPART(wk, GETDATE()))
Select
tt.Team
,tt.Target
,W_Total
,M_Total
From [Team_Targets] tt
join (
Select
count(0) as M_Total,
Case When DATEPART(wk,s.Date) = #ThisWeekNow Then count(1) end as W_Total,
m.FERef
FROM [tr_type] s
join [tr] c
on s.ItemID = c.ItemID
join [a] a
on c.ParentID = a.ItemID
join [M] m
on a.ERef = m.ERef and a.MN = m.MN
where s.Date between #StartExDate and #EndExDate
and (s.DRef = 1546 or s.DRef = 1658)
group by m.FERef, s.Date
) t
on tt.team = t.FERef
group by tt.team, tt.Target, t.M_Total, t.W_Total
What I get is this:
|TEAM | TARGET | WEEKLY | MONTHLY |
|-----|------- |--------|---------|
|AAA | 80 | NULL | 1 |
|AAA | 80 | 1 | 1 |
|BBB | 80 | NULL | 1 |
|BBB | 80 | 1 | 1 |
|CCC | 80 | NULL | 1 |
|CCC | 80 | 1 | 1 |
|DDD | 80 | NULL | 1 |
|DDD | 80 | 1 | 1 |
|EEE | 80 | NULL | 1 |
|EEE | 80 | 1 | 1 |
|FFF | 80 | NULL | 1 |
|FFF | 80 | 1 | 1 |
|GGG | 80 | NULL | 1 |
|GGG | 80 | 1 | 1 |
|HHH | 80 | NULL | 1 |
|HHH | 80 | 1 | 1 |
I'm a bit stumped.
If I drop one aggregate I get something useful.
Issue is that Team_Targets is an user table, while the rest is from off-the -shelf system we use, thus the joins and nested selects.
Is there a way to get the desired result? Any way will do.
I'm this week on 3hrs of sleep a day, so I'm sure I'm missing something and/or using wrong function. Constant distractions at work don't help as well.
Hearty thanks for any and all suggestions.

Try this. Make your subquery return only one row for each FERef.
DECLARE #StartExDate DATETIME
DECLARE #EndExDate DATETIME
DECLARE #ThisWeekNow INT
SET #StartExDate = (SELECT (CONVERT(DATETIME, (SELECT DATEADD(DAY, 1, EOMONTH(GETDATE(), -1))))))
SET #EndExDate = (SELECT (CONVERT(DATETIME, (SELECT DATEADD(DAY, 1, EOMONTH(GETDATE()))))))
SET #ThisWeekNow = (SELECT DATEPART(wk, GETDATE()))
SELECT
tt.Team,
tt.Target,
W_Total,
M_Total
FROM
[Team_Targets] tt
JOIN (
SELECT
SUM(1) AS M_Total,
SUM(CASE WHEN DATEPART(wk, s.Date) = #ThisWeekNow THEN 1 ELSE NULL END) AS W_Total,
m.FERef
FROM
[tr_type] s
JOIN [tr] c ON s.ItemID = c.ItemID
JOIN [a] a ON c.ParentID = a.ItemID
JOIN [M] m ON a.ERef = m.ERef AND a.MN = m.MN
WHERE
s.Date BETWEEN #StartExDate AND #EndExDate
AND (s.DRef = 1546 OR s.DRef = 1658)
GROUP BY m.FERef--, s.Date
) t ON tt.team = t.FERef
GROUP BY tt.team, tt.Target, t.M_Total, t.W_Total

Related

How can you filter for only the max value from from a queried table in Postgresql?

I'm fairly new to Postgresql and my problem can be simplified to the following:
Suppose that I have 2 tables:
Table A:
id | join_value | filter_data1 | filter_data2
---------------------------------------------
1 | 1 | "Yes" | 1
2 | 1 | "Yes" | 3
3 | 2 | "No" | 0
Table B:
id | join_value | filter_data1 | filter_data2 | date
---------------------------------------------------------
1 | 3 | "Yes" | 0 | 1/3/2021
2 | 1 | "Yes" | 17 | 1/3/2021
3 | 1 | "No" | -1 | 1/2/2021
4 | 1 | "Yes" | 32 | 1/2/2021
5 | 1 | "Yes" | 40 | 1/3/2021
I would like to filter these tables on the filter data and then join them on the join value. The catch is that I would then like to only grab the values that have a date == MAX(date). Here is an example of a query that I have attempted.
SELECT * FROM
(SELECT * FROM A
WHERE filter_data1 = "Yes"
AND filter_data2 > 2)
AS a_tab
JOIN
(SELECT * FROM B
WHERE filter_data1 = "Yes"
AND filter_data2 > 16)
AS b_tab
ON a_tab.join_value = b_tab.join_value;
This would give me the following table:
id | join_value | filter_data1 | filter_data2 | id | filter_data1 | filter_data2 | date
------------------------------------------------------------------------------------------
2 | 1 | "Yes" | 3 | 2 | "Yes" | 17 | 1/3/2021
2 | 1 | "Yes" | 3 | 4 | "Yes" | 32 | 1/2/2021
2 | 1 | "Yes" | 3 | 5 | "Yes" | 40 | 1/3/2021
But the problem is, I would like to also do a 'WHERE date = MAX(date)'
The resulting table would be this:
id | join_value | filter_data1 | filter_data2 | id | filter_data1 | filter_data2 | date
------------------------------------------------------------------------------------------
2 | 1 | "Yes" | 3 | 2 | "Yes" | 17 | 1/3/2021
2 | 1 | "Yes" | 3 | 5 | "Yes" | 40 | 1/3/2021
Does anyone have any ideas how to accomplish this?
At first, let me give you a hint, how you can write your existing select query in a way that it is better readable:
SELECT
a.*, b.*
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
Now I am going to add another column to this query, that holds the maximum value of the date column of the output. Therefore, we can use a WINDOW FUNCTION:
SELECT
a.*, b.*, MAX(b.date) OVER ()
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
As the WINDOW FUNCTION is the part of the query, that is computed in the last step, we cannot add a condition here. So we use this query as a subquery and add the condition to the top-level-query:
SELECT
*
FROM (
SELECT
a.*, b.*, MAX(b.date) OVER () AS max_date
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
) t
WHERE t.date = t.max_date
This should give you the required results.

Sort using auxiliary fields, start and end

In PostgreSQL, what is the best way to sort records using start and end fields in a generic way, without the need to include in the query the first record (where start_id=3)?
Example table:
+-------+----------+--------+--------+
| FK_ID | START_ID | END_ID | STRING |
+-------+----------+--------+--------+
| 77 | 1 | 9 | E |
| 82 | 5 | 2 | A |
| 77 | 7 | 1 | I |
| 77 | 3 | 7 | W |
| 82 | 9 | 5 | Q |
| 77 | 9 | 5 | X |
| 82 | 2 | 7 | G |
+-------+----------+--------+--------+
Sorted where FK_ID = 77:
+----+---+---+---+
| 77 | 3 | 7 | W |
| 77 | 7 | 1 | I |
| 77 | 1 | 9 | E |
| 77 | 9 | 5 | X |
+----+---+---+---+
Sorted where FK_ID = 82:
+----+---+---+---+
| 82 | 9 | 5 | Q |
| 82 | 5 | 2 | A |
| 82 | 2 | 7 | G |
+----+---+---+---+
Result query sequence:
+-------+----------+
| FK_ID | SEQUENCE |
+-------+----------+
| 82 | QAG |
| 77 | WIEX |
+-------+----------+
I do not think this is the most efficient way but you can try with a recursive CTE
WITH RECURSIVE path AS (
SELECT * FROM myTable AS t1 WHERE NOT EXISTS(
SELECT 1 FROM myTable AS t2 WHERE t1.fk_id = t2.fk_id AND t2.end_id = t1.start_id
) ORDER BY start_id LIMIT 1
UNION ALL
SELECT myTable.* FROM myTable JOIN path ON path.end_id = myTable.start_id
)
SELECT fk_id,array_to_string(array_agg(string)) FROM path GROUP BY fk_id

Valid periods - SQL VIEW

I have 2 tables (actually there are 4, but for now lets say it's 2) with data like this:
Table PersonA
ClientID ID From Till
1 10 1.1.2017 30.4.2017
1 12 1.8.2017 2.1.2018
Table PersonB
ClientID ID From Till
1 6 1.3.2017 30.6.2017
And I need to generate view that would show something like this:
ClientID From Till PersonA PersonB
1 1.1.2017 28.2.2017 10 NULL
1 1.3.2017 30.4.2017 10 6
1 1.5.2017 30.6.2017 NULL 6
1 1.8.2017 02.1.2018 12 NULL
So basically I need to create view that would show what "persons" each client had in given period.
So when there is an overlap, client have both PersonA and PersonB (same should apply for PersonC and PersonD).
So in the final view one client can't have any overlapping dates.
I don't know how to approach this.
In an adaptation of this algorithm, we can already handle the overlaps:
declare #PersonA table(ClientID int, ID int, [From] date, Till date);
insert into #PersonA values (1,10,'20170101','20170430'),(1,12,'20170801','20180112');
declare #PersonB table(ClientID int, ID int, [From] date, Till date);
insert into #PersonB values (1,6,'20170301','20170630');
declare #PersonC table(ClientID int, ID int, [From] date, Till date);
insert into #PersonC values (1,12,'20170401','20170625');
declare #PersonD table(ClientID int, ID int, [From] date, Till date);
insert into #PersonD values (1,14,'20170501','20170525'),(1,14,'20170510','20171122');
with X(ClientID,EdgeDate)
as (select ClientID
,case
when toggle = 1
then Till
else [From]
end as EdgeDate
from
(
select ClientID,[From],Till from #PersonA
union all
select ClientID,[From],Till from #PersonB
union all
select ClientID,[From],Till from #PersonC
union all
select ClientID,[From],Till from #PersonD
) as concated
cross join
(
select-1 as toggle
union all
select 1 as toggle
) as toggler
),merged
as (select distinct
S.ClientID
,S.EdgeDate as [From]
,min(E.EdgeDate) as Till
from
X as S
inner join X as E
on S.ClientID = E.ClientID
and S.EdgeDate < E.EdgeDate
group by S.ClientID
,S.EdgeDate
),prds
as (select distinct
merged.ClientID
,merged.[From]
,merged.Till
,A.ID as PersonA
,B.ID as PersonB
,C.ID as PersonC
,D.ID as PersonD
from
merged
left join #PersonA as A
on merged.ClientID = A.ClientID
and A.[From] <= merged.[From]
and merged.Till <= A.Till
left join #PersonB as B
on merged.ClientID = B.ClientID
and B.[From] <= merged.[From]
and merged.Till <= B.Till
left join #PersonC as C
on merged.ClientID = C.ClientID
and C.[From] <= merged.[From]
and merged.Till <= C.Till
left join #PersonD as D
on merged.ClientID = D.ClientID
and D.[From] <= merged.[From]
and merged.Till <= D.Till
where not(A.ID is null
and B.ID is null
and C.ID is null
and D.ID is null
)
)
select ClientID
,[From]
,case
when Till = lead([From]
) over(order by Till)
then dateadd(d,-1,Till)
else Till
end as Till
,PersonA
,PersonB
,PersonC
,PersonD
from
prds
order by ClientID
,[From]
,Till;
Output with just the two Person tables given in the question:
+----------+------------+------------+---------+---------+
| ClientID | From | Till | PersonA | PersonB |
+----------+------------+------------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL |
| 1 | 2017-03-01 | 2017-04-29 | 10 | 6 |
| 1 | 2017-04-30 | 2017-06-30 | NULL | 6 |
| 1 | 2017-08-01 | 2018-01-12 | 12 | NULL |
+----------+------------+------------+---------+---------+
Output of script as it is above, with four Person tables:
+----------+------------+------------+---------+---------+---------+---------+
| ClientID | From | Till | PersonA | PersonB | PersonC | PersonD |
+----------+------------+------------+---------+---------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL | NULL | NULL |
| 1 | 2017-03-01 | 2017-03-31 | 10 | 6 | NULL | NULL |
| 1 | 2017-04-01 | 2017-04-29 | 10 | 6 | 12 | NULL |
| 1 | 2017-04-30 | 2017-04-30 | NULL | 6 | 12 | NULL |
| 1 | 2017-05-01 | 2017-05-09 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-10 | 2017-05-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-25 | 2017-06-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-06-25 | 2017-06-29 | NULL | 6 | NULL | 14 |
| 1 | 2017-06-30 | 2017-07-31 | NULL | NULL | NULL | 14 |
| 1 | 2017-08-01 | 2017-11-21 | 12 | NULL | NULL | 14 |
| 1 | 2017-11-22 | 2018-01-12 | 12 | NULL | NULL | NULL |
+----------+------------+------------+---------+---------+---------+---------+

How to select based on values "keyed" by another column?

PostgreSQL newbie here. I have data that look like this:
+-----------+---------+-------+
| StudentID | ClassID | Grade |
+-----------+---------+-------+
| 19927 | A13 | 5 |
| 19927 | A07 | 3 |
| 19927 | B22 | 7 |
| 10001 | A13 | 2 |
| 10001 | A07 | 8 |
| 22207 | A13 | 7 |
| 22207 | A07 | 10 |
| 22207 | C80 | 2 |
| 27516 | A07 | 8 |
+-----------+---------+-------+
I'm trying to select all students which have a higher grade in class A13 than in class A07. This means only including students who actually have grades in both classes.
What's the best way to do this? Having been brought up on Stata, I would normally try:
selecting only rows where classID = A07 or A13
reshaping to wide
select using a where clause on A13 > A07
But I feel like this is very un-SQL-like.
Postgresql gives lots of different ways of doing it, here's one
SELECT a13.* FROM
(SELECT * FROM table1 where classid='A13') as a13
INNER JOIN
(SELECT * FROM table1 where classid='A07') as a07
ON a13.grade > a07.grade

How to preserve additional keys when using "SELECT DISTINCT"?

I'm looking to preserve the sid, and cid pairs that link my tables when using SELECT DISTINCT in my query. signature, ip_src, and ip_dst is what makes it distinct. I just want the output to also include the corresponding sid and cid pairs.
QUERY:
SELECT DISTINCT signature, ip_src, ip_dst FROM
(SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC)
as d_dup;
OUTPUT:
signature | ip_src | ip_dst
-----------+------------+------------
29177 | 3244829114 | 2887777034
29177 | 2960340989 | 2887777034
29179 | 2887777893 | 2887777556
29178 | 1208608738 | 2887777034
29178 | 1211607091 | 2887777034
29177 | 776526845 | 2887777034
29177 | 1332731268 | 2887777034
(7 rows)
SUB QUERY:
SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC;
OUTPUT:
sid | cid | signature | timestamp | sid | hostname | interface | filter | detail | encoding | last_cid | sid | cid | ip_src | ip_dst | ip_ver | ip_hlen | ip_tos | ip_len | ip_id | ip_flags | ip_off | ip_ttl | ip_proto | ip_csum
-----+-------+-----------+-------------------------+-----+---------------------+-----------+--------+--------+----------+----------+-----+-------+------------+------------+--------+---------+--------+--------+-------+----------+--------+--------+----------+---------
3 | 13123 | 29177 | 2014-11-15 20:53:14.656 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13123 | 3244829114 | 2887777034 | 4 | 5 | 0 | 344 | 19301 | 0 | 0 | 122 | 6 | 8686
3 | 13122 | 29177 | 2014-11-15 20:53:14.43 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13122 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 19071 | 0 | 0 | 122 | 6 | 9191
3 | 13121 | 29177 | 2014-11-15 18:45:13.461 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13121 | 3244829114 | 2887777034 | 4 | 5 | 0 | 366 | 25850 | 0 | 0 | 122 | 6 | 2115
3 | 13120 | 29177 | 2014-11-15 18:45:13.23 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13120 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 25612 | 0 | 0 | 122 | 6 | 2650
3 | 13119 | 29177 | 2014-11-15 18:45:01.887 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13119 | 3244829114 | 2887777034 | 4 | 5 | 0 | 352 | 13697 | 0 | 0 | 122 | 6 | 14282
3 | 13118 | 29177 | 2014-11-15 18:45:01.681 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13118 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 13464 | 0 | 0 | 122 | 6 | 14798
4 | 51 | 29179 | 2014-11-15 18:44:02.06 | 4 | VS-101-Z1:dna2:dna3 | dna2:dna3 | | 1 | 0 | 51 | 4 | 51 | 2887777893 | 2887777556 | 4 | 5 | 0 | 80 | 18830 | 0 | 0 | 63 | 17 | 40533
3 | 13117 | 29177 | 2014-11-15 18:41:46.418 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13117 | 1332731268 | 2887777034 | 4 | 5 | 0 | 261 | 15393 | 0 | 0 | 119 | 6 | 62131
...
(30 rows)
How do I keep the sid, and cid when using SELECT DISTINCT?
This is shorter and probably faster:
SELECT DISTINCT ON (signature, ip_src, ip_dst)
signature, ip_src, ip_dst, sid, cid
FROM event e
JOIN sensor s USING (sid)
JOIN iphdr i USING (cid, sid)
WHERE timestamp >= NOW() - '1 day'::interval
ORDER BY signature, ip_src, ip_dst, timestamp DESC;
Assuming you want the latest row (greatest timestamp) from each set of dupes.
Detailed explanation:
Select first row in each GROUP BY group?
Sounds like you are looking for a window function:
SELECT *
FROM (
SELECT *,
row_number() over (partition by signature, ip_src, ip_dst order by timestamp desc) as rn
FROM event
JOIN sensor ON sensor.sid = event.sid
JOIN iphdr ON iphdr.cid = event.cid AND iphdr.sid = event.sid
WHERE timestamp >= NOW() - interval '1' day
) as d_dup
where rn = 1
order by timestamp desc;
Maybe something like this?
SELECT DISTINCT e.sid, e.cid, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL;
If you want the combination of (signature, ip_src, ip_dst) to be unique in the result (one row for each combination) then you can try something like this:
SELECT max(e.cid), max(e.sid), signature, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
GROUP BY signature, ip_src, ip_dst;
But it will give max cid and sid for each combination