I have a simple but very important concept to clear in T-SQL.
I am writing a lot of T-SQL queries against a table, with a lot of aggregations and GROUP BY.
Now, in the SELECT clause of my T-SQL query, I have a CASE-WHEN statements. Please see below:
Statement 1:
SELECT X, Y, Z,
A = CASE
WHEN P = 1 THEN B
ELSE Q
END,
SUM(Sales)
FROM mytable
GROUP BY
X, Y, Z,
CASE
WHEN P = 1 THEN B
ELSE Q
END
Now can Statement 1 be written as Statement 2 ?
Statement 2:
SELECT X, Y, Z,
A = CASE
WHEN P = 1 THEN B
ELSE Q
END,
SUM(Sales)
FROM mytable
GROUP BY
X, Y, Z,
P, B, Q
Is Statement 1 = Statement 2 ?
Can the CASE-WHEN in the SELECT clause be modified in the GROUP BY clause into individual columns?
Will the result set be the same always ?
The difference relies on the amount of different values you might get from columns P, B and Q, against the result of your CASE statement. You can spot the different on this example.
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
P INT,
B INT,
Q INT,
Sales INT)
INSERT INTO #Data (
P,
B,
Q,
Sales)
VALUES
(1, 20, 300, 1000),
(1, 20, 400, 500),
(2, 1, 1, 50),
(2, 1, 1, 250)
-- Statement 2
SELECT
P,
B,
Q,
TotalSales = SUM(D.Sales)
FROM
#Data AS D
GROUP BY
P,
B,
Q
/*
All different combinations of PBQ and listed, and their sales added
P B Q TotalSales
1 20 300 1000
1 20 400 500
2 1 1 300
*/
-- Statement 1
SELECT
CaseResult = CASE WHEN P = 1 THEN B ELSE Q END,
TotalSales = SUM(D.Sales)
FROM
#Data AS D
GROUP BY
CASE WHEN P = 1 THEN B ELSE Q END
/*
The grouping value depends on value B when P = 1 (and not on Q!) so
all records with P = 1 and same B are grouped together and
all records with P = 0 and same Q are grouped together
CaseResult TotalSales
1 300
20 1500
*/
There might be the case when you data doesn't generate different values from the CASE to the combination of P, B and Q, in that case the results will be the same for both queries.
Related
From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.
consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is
SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;
I know the topic is a bit vague at best, but cannot find a way to describe my problem better...
An example, I have the following two tables:
TableA
IdA
Code
Value
123
A
1
123
B
2
123
C
3
456
A
4
456
F
6
456
E
7
...
TableB
IdB
Code
Value
X
A
1
X
B
2
X
C
3
Y
G
2
Y
D
8
Y
C
3
Z
A
1
Z
B
2
Z
C
3
Z
D
5
...
A set of records for a given IdA in TableA correlates to an equivalent set of records in TableB having a specific IdB.
For instance, for IdA = 123 in TableA, I have exactly three rows with certain codes and values, this would "map" to rows with IdB = X in TableB because it has the same combination of Codes and Values and the same number of rows. Note that it would not map to IdB = Z in TableB, because it has an additional row for Code D which IdA = 123 doesn't have in TableA.
Given only IdA, how to best write a query to find IdB?
If the codes and values were known, I could have done something similar to this:
SELECT b.IdB FROM TableB b
WHERE
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'A' AND x.Value = '1') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'B' AND x.Value = '2') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'C' AND x.Value = '3') AND
(SELECT COUNT(*) FROM TableB x WHERE x.IdB = b.IdB) = 3
But now I'm only given a value for IdA, so I need to look up values from TableA and combine that in the query for TableB. Any clever ideas on how to tackle this?
This is a question of Relational Division Without Remainder.
There are many solutions, here is one:
Take TableB and left join TableA to it
But calculate a total over the whole set of values from A
Group by IdB
Filter so we only have rows where the total count is equal to the number of matches to A (because COUNT(IdA) only counts non-nulls) and the total count must also be the same as the total number of rows that we want to match to.
DECLARE #idA int = 123;
SELECT
b.IdB
FROM TableB b
LEFT JOIN (
SELECT *,
total = COUNT(*) OVER ()
FROM TableA a
WHERE a.IdA = #idA
) a ON b.Code = a.Code AND b.Value = a.Value
GROUP BY
b.IdB
HAVING COUNT(*) = COUNT(a.IdA)
AND COUNT(*) = MIN(a.total);
db<>fiddle
Do not use any functions like rank or rownums.
Hint: Formulate matrix operation using sql. A rank of an item indicates how many items are less than or equal to it.
A matrix can be simulated by cross join and rank can be derived by
counting items smaller than the current item.
Table A:-
x
----
d
b
a
g
c
k
k
g
Expected output:
x1 | rank
----+------
a | 1
b | 2
d | 3
g | 4
c | 5
k | 6
select x as x1, count(x) as rank
from (select DISTINCT x from A order by x) as sub
Your current query is on the right track, using a distinct subquery. For a working version, use a correlated subquery in the select clause which takes counts:
SELECT
x AS x1,
(SELECT COUNT(DISTINCT x) FROM A t WHERE t.x <= sub.x) rank
FROM (SELECT DISTINCT x FROM A) AS sub
ORDER BY
x;
Demo
I have a following Input Table
Source EventType
A X
A X
A X
A Y
A Y
A Z
B L
B L
B L
B L
B M
B N
B N
Expected output
Source EventType Frequency
A X 3
A Y 2
B L 4
B N 2
How to form a SQL query to get the result as shown above ?
I was able to achieve results but with just one source at a time.
select TOP 2 eventype, count(*) as frequencey
from myEventTable
where source = 'A'
group by eventtype
order by count(*) desc
We can use ROW_NUMBER here:
WITH cte AS (
SELECT Source, EventType, COUNT(*) as Frequency,
ROW_NUMBER() OVER (PARTITION BY Source ORDER BY COUNT(*) DESC) rn
FROM myEventTable
GROUP BY Source, Eventtype
)
SELECT Source, EventType, Frequency
FROM cte
WHERE rn <= 2;
Demo
The reason this works is that ROW_NUMBER is applied after the GROUP BY operation completes, i.e. it runs against the groups. We can then easily limit to the top 2 per source, as ordered by frequency descending.
I have a table x which have the fields a, b, c, and d. I want to do a SELECT statement which is GROUPED BY a HAVING a_particular_value = ANY(array_agg(b)) and retrieves a, MIN(d), and c <- from which row is chosen by a_particular_value = ANY(array_agg(b)).
It's a bit confusing.
Lemme try to explain. a_particular_value = ANY(array_agg(b)) will choose some or one record from all records that is grouped by a. I want to retrieve the value of c from the record that causes the condition to be true. While NOT filter out other records because I still need those for the other aggregate function, MIN(d).
The query that I've tried to make:
SELECT a, MIN(d) FROM x
GROUP BY a
HAVING 1 = ANY(array_agg(b))
The only thing that's left to do is put c in the SELECT clause. How do I do this?
with agg as (
select a, min(d) as d
from x
group by a
having 1 = any(array_agg(b))
)
select distinct on (a, c)
a, c, d
from
x
inner join
agg using (a, d)
order by a, c
If min(d) is not unique within the a group then it is possible to exist more than one corresponding c. The above will return the smallest c. If you want the biggest do in instead
order by a, c desc
c can have various values in this scenario, so your only option is to group by c as well.
SELECT a, c FROM x
GROUP BY a, c
HAVING 1 = ANY(array_agg(b))
If you want to eliminate rows with b not satisfying condition before applying GROUP BY then use WHERE as documentation for HAVING says http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-HAVING