How to use a Table type in query - tsql

I have 9000 row in News table and use this code for selecting 20 from it:
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) as myTable
where myTable.Num BETWEEN 100 and 120
But time is 28 second spent reading! Also, I test this query with out join table and get result at 1 second.
So, I want use Table type for select join table and use this in query. I made new Table type using the following code:
DECLARE #MyTable2 IntListTable
Insert Into #MyTable2
Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in #MyTable2
) as myTable
where myTable.Num BETWEEN 100 and 120
But get Error in
SubjectID in #MyTable2
Error:
Incorrect syntax near '#MyTable2'.
Edit:
I test my code with:
Select myTable.Title
or use this code instead join table:
Where SubjectID in(13,14,20,21,25,24,26,24,28,29,30,54,55,60,47,98,99,65,14,20,33,666,987,254)
get result at 1 second.
but use this code in query:
Select myTable.MoreText
time is 28 second spent reading!. why!?

Try this,
Select x.Num
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) x
where x.Num <21

WITH myTempTable as (Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in (SELECT SubjectID FROM myTempTable)
) as myTable
where myTable.Num BETWEEN 100 and 120
You can try above query.

There is absolutely no need for a User-Defined Table Type in this query. It adds work but no actual benefit.
The problem is most likely the fact that you are using an IN list as those translate out to be an OR condition for each of the values. But an IN list isn't needed either.
This query can actually be simplified by rethinking it in terms of an INNER JOIN, which should be better as it will allow the Query Optimizer to do its job.
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN (
MenuSubject
INNER JOIN Menu
ON MenuSubject.MenuID = Menu.MenuID
) ON MenuSubject.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;
One final simplification that can be made, though I doubt it is needed here since 9000 rows is almost no data at all, is to first dump the results to a local temporary table and then use that in the INNER JOIN:
CREATE TABLE #Subjects
(
SubjectID INT NOT NULL -- PRIMARY KEY -- test with and without PK to see if it helps
);
INSERT INTO #Subjects (SubjectID)
SELECT MenuSubject.SubjectID
FROM MenuSubject
INNER JOIN Menu
ON Menu.MenuID = MenuSubject.MenuID;
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN #Subjects sub
ON sub.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;

Related

Calculate difference between the row counts of tables in two schemas in PostgreSQL

I have two table with same name in two different schemas (old and new dump). I would like to know the difference between the two integration.
I have two queries, that gives old and new count:
select count(*) as count_old from(
SELECT
distinct id
FROM
schema1.compound)q1
select count(*) as count_new from(
SELECT
distinct id
FROM
schema2.compound)q2
I would like have the following output.
table_name count_new count_new diff
compound 4740 4735 5
Any help is appreciated. Thanks in advance
with counts as (
select
(select count(distinct id) from schema1.compound) as count_old,
(select count(distinct id) from schema2.compound) as count_new
)
select
'compound' as table_name,
count_old,
count_new,
count_old - count_new as diff
from counts;
I think you could do something like this:
SELECT 'compound' AS table_name, count_old, count_new, (count_old - count_new) AS diff FROM (
SELECT(
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema1.compound)) AS count_old,
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema2.compound)) AS count_new
)
It was probably answered already, but it is a subquery/nested query.
You can directly compute the COUNT on distinct values if you use the DISTINCT keyword inside your aggregation function. Then you can join the queries extracting your two needed values, and use them inside your query to get the output table.
WITH cte AS (
SELECT new.cnt AS count_new,
old.cnt AS count_old
FROM (SELECT COUNT(DISTINCT id) AS cnt FROM schema1.compound) AS old
INNER JOIN (SELECT COUNT(DISTINCT id) AS cnt FROM schema2.compound) AS new
ON 1 = 1
)
SELECT 'compound' AS table_name,
count_new,
count_old,
count_new = count_old AS diff
FROM cte

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.

How to optimize SELECT DISTINCT when using multiple Joins?

I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins

Postgresql rows to columns (UNION ALL to JOIN)

Hello with this query I'm getting one result with four rows, how can I change it in order to get four named columns with their own result every one?
SELECT COUNT(*) FROM vehicles WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM user WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_events WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_alerts WHERE cus=1
Thanks in advance.
SELECT a.ct veh_count, b.ct user_count, c.ct event_count, d.ct alert_count
FROM
( SELECT COUNT(*) ct FROM vehicles WHERE cus=1 ) a,
( SELECT COUNT(*) ct FROM user WHERE cus=1 ) b,
( SELECT COUNT(*) ct FROM vehicle_events WHERE cus=1 ) c,
( SELECT COUNT(*) ct FROM vehicle_alerts WHERE cus=1 ) d;
UNION only adds rows; it has no effect on the columns.
Columns, which define the "shape" of the row tuples, must appear as selected columns1.
For example:
SELECT
(SELECT COUNT(*) FROM vehicles WHERE cus=1) as veh_count
,(SELECT COUNT(*) FROM users WHERE cus=1) as user_count
..
1 There are other constructs that can allow this, see crosstab for example - but the columns are fixed by the query command. It takes dynamic SQL to get a variable number of columns.

postgresql where clause behavior

I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.