SQL Query to get top 2 records of group - group-by

I have a following Input Table
Source EventType
A X
A X
A X
A Y
A Y
A Z
B L
B L
B L
B L
B M
B N
B N
Expected output
Source EventType Frequency
A X 3
A Y 2
B L 4
B N 2
How to form a SQL query to get the result as shown above ?
I was able to achieve results but with just one source at a time.
select TOP 2 eventype, count(*) as frequencey
from myEventTable
where source = 'A'
group by eventtype
order by count(*) desc

We can use ROW_NUMBER here:
WITH cte AS (
SELECT Source, EventType, COUNT(*) as Frequency,
ROW_NUMBER() OVER (PARTITION BY Source ORDER BY COUNT(*) DESC) rn
FROM myEventTable
GROUP BY Source, Eventtype
)
SELECT Source, EventType, Frequency
FROM cte
WHERE rn <= 2;
Demo
The reason this works is that ROW_NUMBER is applied after the GROUP BY operation completes, i.e. it runs against the groups. We can then easily limit to the top 2 per source, as ordered by frequency descending.

Related

DB2: SQL to return all rows in a group having a particular value of a column in two latest records of this group

I have a DB2 table having one of the columns (A) which has either value PQR or XYZ.
I need output where the latest two records based on col C date have value A = PQR.
Sample Table
A B C
--- ----- ----------
PQR Mark 08/08/2019
PQR Mark 08/01/2019
XYZ Mark 07/01/2019
PQR Joe 10/11/2019
XYZ Joe 10/01/2019
PQR Craig 06/06/2019
PQR Craig 06/20/2019
In this sample table, my output would be Mark and Craig records
Since 11.1
You may use the nth_value OLAP function.
Refer to OLAP specification.
SELECT A, B, C
FROM
(
SELECT
A, B, C
, NTH_VALUE (A, 1) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C1
, NTH_VALUE (A, 2) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C2
FROM TAB
)
WHERE C1 = 'PQR' AND C2 = 'PQR'
dbfiddle link.
Older versions
SELECT T.*
FROM TAB T
JOIN
(
SELECT B
FROM
(
SELECT
A, B
, ROWNUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
FROM TAB
)
WHERE RN IN (1, 2)
GROUP BY B
HAVING MIN(A) = MAX(A) AND COUNT(1) = 2 AND MIN(A) = 'PQR'
) G ON G.B = T.B;
A simple solution could be
SELECT A,B,C
FROM tab
WHERE A = 'PQR'
ORDER BY C DESC FETCH FIRST 2 ROWS only

calculate rank without using rank or rownums function by using single column

Do not use any functions like rank or rownums.
Hint: Formulate matrix operation using sql. A rank of an item indicates how many items are less than or equal to it.
A matrix can be simulated by cross join and rank can be derived by
counting items smaller than the current item.
Table A:-
x
----
d
b
a
g
c
k
k
g
Expected output:
x1 | rank
----+------
a | 1
b | 2
d | 3
g | 4
c | 5
k | 6
select x as x1, count(x) as rank
from (select DISTINCT x from A order by x) as sub
Your current query is on the right track, using a distinct subquery. For a working version, use a correlated subquery in the select clause which takes counts:
SELECT
x AS x1,
(SELECT COUNT(DISTINCT x) FROM A t WHERE t.x <= sub.x) rank
FROM (SELECT DISTINCT x FROM A) AS sub
ORDER BY
x;
Demo

Top N values in window frame

I have a table t with 3 fields of interest:
d (date), pid (int), and score (numeric)
I am trying to calculate a 4th field that is an average of each player's top N (3 or 5) scores for the days before the current row.
I tried the following join on a subquery but it is not producing the results I'm looking for:
SELECT t.d, t.pid, t.score, sq.highscores
FROM t, (SELECT *, avg(score) as highscores FROM
(SELECT *, row_number() OVER w AS rnum
FROM t AS t2
WINDOW w AS (PARTITION BY pid ORDER BY score DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) isq
WHERE rnum <= 3) sq
WHERE t.d = sq.d AND t.pid = sq.pid
Any suggestions would be greatly appreciated! I'm a hobbyist programmer and this is more complex of a query than I'm used to.
You can't select * and avg(score) in the same (inner) query. I.e. which non-aggregated values should be selected for each average? PostgreSQL won't decide this instead of you.
Becasue you PARTITION BY pid in the innermost query, you should use GROUP BY pid in the aggregating subquery. That way, you can SELECT pid, avg(score) as highscores:
SELECT pid, avg(score) as highscores
FROM (SELECT *, row_number() OVER w AS rnum
FROM t AS t2
WINDOW w AS (PARTITION BY pid ORDER BY score DESC)) isq
WHERE rnum <= 3
GROUP BY pid
Note: ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING makes no difference for row_number().
But if the top N part is fixed (and N will be few in your real-world use-case too), you can solve this without that much subquery (with the nth_value() window function):
SELECT d, pid, score,
(coalesce(nth_value(score, 1) OVER w, 0) +
coalesce(nth_value(score, 2) OVER w, 0) +
coalesce(nth_value(score, 3) OVER w, 0)) /
((nth_value(score, 1) OVER w IS NOT NULL)::int +
(nth_value(score, 2) OVER w IS NOT NULL)::int +
(nth_value(score, 3) OVER w IS NOT NULL)::int) highscores
FROM t
WINDOW w AS (PARTITION BY pid ORDER BY score DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
http://rextester.com/GUUPO5148

how to join two tables without repetation or the cells from second table in postgresql using PLSQL

When I try to join the below two table
I am not able to get the output I want by the join.
I tried using join but it didn't work let me know if its possible with plsql
Table 1:
col1 col2
1 a
1 b
1 c
2 a
2 b
3 a
table 2:
col1 col2
1 x
1 y
2 x
2 y
3 x
3 y
The output must be:
col1 col2 col3
1 a x
1 b y
1 c
2 a x
2 b y
3 a x
3 y
If use the join I am not able to get the same output as above.
The output I am getting is
1 a x
1 a y
1 b x
1 b y
1 c x
1 c y
2 a x
.....
.....
3 a x
3 a y
What you are searching is called a FULL OUTER JOIN. The result of this join contains elements from both input-tables, matching records get combined.
You can find more information here: https://stackoverflow.com/questions/4796872/full-outer-join-in-mysql
Using Window functions, specifically ROW_NUMBER() and partitioning by the Col1 in both tables, we can get a partitioned row_number that can be used as part of the join.
In other words, it seems to me that the order that the records are in is crucial for the join and result set you are desiring. Furthermore, using #Benvorth's suggestion of a FULL OUTER JOIN to achieve the NULLs in both direction.. I believe this might work:
SELECT
COALESCE(t1.col1,t2.col1) as col1,
t1.col2,
t2.col2
FROM
(SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table1) t1
FULL OUTER JOIN
(SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table2) t2 ON
t1.col1 = t2.col1 AND
t1.col1_row_number = t2.col1_row_number
That ROW_NUMBER() OVER (PARTITION BY col1, ORDER BY col2 ASC) bit will create row number for each record. The row_number will restart back at 1 for each new col1 value encountered. You can think of it like a RANK for each distinct Col1 value based on Col2's value. Table1's output from the subquery SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table1 will look like:
Table 1:
col1 col2 col1_row_number
1 a 1
1 b 2
1 c 3
2 a 1
2 b 2
3 a 1
So we do that with both tables, then we use that row number as part of the join along with col1.
A sqlfiddle showing this matching your desired result from the question

GROUP BY getting the second highest date

I'm currently doing this group by to retrieve the max date :
SELECT A, MAX(B) FROM X GROUP BY A
This is perfectly working. However, when I try to retrieve the second highest value, I'm totally lost.
If anyone has an idea...
Try this:
SELECT X.A,
MAX(X.B)
FROM YourTable X
JOIN
(
SELECT
X1.A,
MAX(X1.B)
FROM YourTable X1
GROUP BY X1.A
) X1 ON X1.A = X.A
AND X.B < X1.B
GROUP BY X.A
Basically this says get the max of all the ones that are less than the max.
You can use the ranking function ROW_NUMBER in a cte:
WITH CTE AS
(
SELECT A,
MaxB = MAX(B)OVER(PARTITION BY A),
RN = ROW_NUMBER() OVER (PARTITION BY A ORDER By B DESC)
FROM dbo.X
)
SELECT A, MaxB
FROM CTE
WHERE RN <= 2
This will return the two highest values for each group (if that is what you want).
You're columns are rather ambiguous, but if A is max_date then, B is some other value you wish to sort by, then one way to do it could be:
SELECT A FROM X ORDER BY B DESC LIMIT 2
Which will give you 2 rows with the second highest displayed first.