Does Redshift support User-Defined Variables in SELECT? - amazon-redshift

I'm reviewing some of our Redshift queries and found cases with multiple levels of nested select like the one below:
LEFT JOIN
(
SELECT *
FROM (
SELECT
id,
created_at,
min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled'
GROUP BY id, Y, Z, created_at
)
WHERE created_at = transition_date
) t1 ON b.id = t1.id
if this were MySQL, I would've done something like this to remove one level of nested select:
LEFT JOIN
(
SELECT
id,
created_at,
#tdate := min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled' and #tdate = bul.created_at
GROUP BY id, Y, Z, created_at
) t1 ON b.id = t1.id
Is it possible to so something similar in RedShift?
--- update
forgot to include GROUP BY in the nested SELECT, which may affect the answer

You can move the condition for the transition_date into the JOIN condition:
LEFT JOIN
(
SELECT
id,
created_at,
min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled'
) t1 ON b.id = t1.id AND t1.created_at = t1.transition_date

Related

Delete duplicate rows with Active flag false in Postgresql

I have a table with columns "ID", "Name" , "Email" , "Active". I added some duplicate values to the table.
I want to delete duplicate rows with flag false not all rows with active flag false. In the table I want to delete 2nd row only.
You may try below co-related sub-query (Before actual delete, You might want to to see the result using SELECT query)-
DELETE FROM YOUR_TABLE T1
WHERE EXISTS (SELECT NULL
FROM YOUR_TABLE T2
WHERE T1.ID = T2.ID
AND T1.NAME = T2.NAME
AND T1.EMAIL = T2.EMAIL
AND T1.ACTIVE <> T2.ACTIVE)
AND UPPER(T1.ACTIVE) = 'FALSE'
Try the below query:
DELETE t1 FROM tablename t1
INNER JOIN tablename t2
WHERE t1.id > t2.id AND t1.Name = t2.Name AND t1.Email =t2.Email AND t1.Active='FALSE'
DELETE FROM users T1
USING users T2
WHERE T1.ID <> T2.ID
AND T1.Name = T2.Name
AND T1.Email = T2.Email
AND T1.Active = FALSE;
DEMO
CREATE TABLE IF NOT EXISTS users (
ID serial PRIMARY KEY,
Name VARCHAR ( 50 ) NOT NULL,
Email VARCHAR ( 50 ) NOT NULL,
Active BOOLEAN NOT NULL
);
INSERT INTO users(Name, Email, Active) VALUES
('John', 'john.gmail.com', TRUE),
('John', 'john.gmail.com', FALSE),
('Bob', 'bob.gmail.com', FALSE);
SELECT * FROM users;
DELETE FROM users T1
USING users T2
WHERE T1.ID <> T2.ID
AND T1.Name = T2.Name
AND T1.Email = T2.Email
AND T1.Active = FALSE;
SELECT * FROM users;
DELETE FROM some_table
WHERE id IN (SELECT id FROM some_table GROUP BY id HAVING COUNT(*)>1)
AND NOT active;

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.

How to use a Table type in query

I have 9000 row in News table and use this code for selecting 20 from it:
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) as myTable
where myTable.Num BETWEEN 100 and 120
But time is 28 second spent reading! Also, I test this query with out join table and get result at 1 second.
So, I want use Table type for select join table and use this in query. I made new Table type using the following code:
DECLARE #MyTable2 IntListTable
Insert Into #MyTable2
Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in #MyTable2
) as myTable
where myTable.Num BETWEEN 100 and 120
But get Error in
SubjectID in #MyTable2
Error:
Incorrect syntax near '#MyTable2'.
Edit:
I test my code with:
Select myTable.Title
or use this code instead join table:
Where SubjectID in(13,14,20,21,25,24,26,24,28,29,30,54,55,60,47,98,99,65,14,20,33,666,987,254)
get result at 1 second.
but use this code in query:
Select myTable.MoreText
time is 28 second spent reading!. why!?
Try this,
Select x.Num
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) x
where x.Num <21
WITH myTempTable as (Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in (SELECT SubjectID FROM myTempTable)
) as myTable
where myTable.Num BETWEEN 100 and 120
You can try above query.
There is absolutely no need for a User-Defined Table Type in this query. It adds work but no actual benefit.
The problem is most likely the fact that you are using an IN list as those translate out to be an OR condition for each of the values. But an IN list isn't needed either.
This query can actually be simplified by rethinking it in terms of an INNER JOIN, which should be better as it will allow the Query Optimizer to do its job.
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN (
MenuSubject
INNER JOIN Menu
ON MenuSubject.MenuID = Menu.MenuID
) ON MenuSubject.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;
One final simplification that can be made, though I doubt it is needed here since 9000 rows is almost no data at all, is to first dump the results to a local temporary table and then use that in the INNER JOIN:
CREATE TABLE #Subjects
(
SubjectID INT NOT NULL -- PRIMARY KEY -- test with and without PK to see if it helps
);
INSERT INTO #Subjects (SubjectID)
SELECT MenuSubject.SubjectID
FROM MenuSubject
INNER JOIN Menu
ON Menu.MenuID = MenuSubject.MenuID;
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN #Subjects sub
ON sub.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;

postgresql where clause behavior

I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.

ordering by rows

OK so I have a query I am trying to build.. I have 2 tables, table1 has a bunch of regular records as normal with a unique ID (auto increment) and table2 has records that include some of those ids from table1. I am trying to order by the highest records with that same ID in table1.. Heres what I've got:
SELECT * FROM table1
WHERE table1.status = 1
AND (SELECT COUNT(*) FROM table2 WHERE table2.tbl1_id = table1.id)
ORDER BY table1.id DESC
Thanks :)
SELECT table1.id
FROM table1
LEFT JOIN table2 ON table2.tbl1_id = table1.id
WHERE table1.status = 1
GROUP BY table1.id
ORDER BY COUNT(table2.tbl1_id) DESC
Try this:
SELECT a.*, b.cnt
FROM table1 a LEFT JOIN
(
SELECT tbl1_id, COUNT(*) cnt
FROM table2
GROUP BY tbl1_id
) b
ON a.id = b.tbl1_id
WHERE table1.status = 1
ORDER BY cnt DESC