How do I use CROSS APPLY in this situation? - tsql

I have an inline TVF that accept a primary key of a table and computes a value out of the row with that primary key (actually a table with that value as part of the select, but whatever).
Now I want to do something like this:
SELECT something
FROM table1
CROSS APPLY thefunction(table1primarykey) func
ON func.computedvalue = func.computedvalue(table2primarykey)
Problem is I did not use table2 yet, and could not do it, because the only way table1 and table2 are joined is via the same function return value.
How can I do something like this?

How about
SELECT *
FROM (
SELECT *
FROM table1
CROSS APPLY thefunction(table1primarykey)
) AS t1
INNER JOIN (
SELECT *
FROM table2
CROSS APPLY thefunction(table2primarykey)
) AS t2 ON t1.computedvalue = t2.computedvalue

Related

Is there a way to alias a query of the type "table.column" inside a function?

I created a CTE, the second part of the CTE contains a select with a st_contains. In this one, i've two columns of diferent tables with the same name. I want to alias one of this 'table.column' combination becouse when I do the selection at the end of the CTE outputs an error; column reference " " is ambiguous.
with
table1 as (
...
),
table2 as (
select*
from table3, table4
where st_contains (table3.atribute1, table4.atribute1)
)
select
table1.atribute1
table2.atribute1 #here i need somethig like table2.table3.atribute1
from table1
join table2 on table1.atribute2=table2.atribute2
;
I hope i explined the problem well.
Thanks!
Alias the table3 and table4 columns in your table2 CTE to resolve the ambiguity.
with
table1 as (
...
),
table2 as (
select table3.attribute1 table3atrr1, table4.attribute1 table4attr1
from table3, table4
where st_contains (table3.atribute1, table4.atribute1)
)
select
table1.atribute1
table2.table3atrr1 -- use the aliased column name
from table1
join table2 on table1.atribute2=table2.atribute2
;

Postgres join involving tables having join condition defined on an text array

I have two tables in postgresql
One table is of the form
Create table table1(
ID serial PRIMARY KEY,
Type []Text
)
Create table table2(
type text,
sellerID int
)
Now i want to get all the rows from table1 which are having type same that in table2 but the problem is that in table1 the type is an array.
In case the type in the table has an identifiable delimiter like ',' ,';' etc. you can rewrite the query as regexp_split_to_table(type,',') or versions later than 9.5 unnest function can be use too.
For eg.,
select * from
( select id ,regexp_split_to_table(type,',') from table1)table1
inner join
select * from table2
on trim(table1.type) = trim(table2.type)
Another good example can be found - https://www.dbrnd.com/2017/03/postgresql-regexp_split_to_array-to-split-string-using-different-delimiters/
SELECT
a[1] AS DiskInfo
,a[2] AS DiskNumber
,a[3] AS MessageKeyword
FROM (
SELECT regexp_split_to_array('Postgres Disk information , disk 2 , failed', ',')
) AS dt(a)
You can use the ANY operator in the JOIN condition:
select *
from table1 t1
join table2 t2 on t2.type = any (t1.type);
Note that if the types in the table1 match multiple rows in table2, you would get duplicates (from table1) because that's how a join works. Maybe you want an EXISTS condition instead:
select *
from table1 t1
where exists (select *
from table2 t2
where t2.type = any(t1.type));

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.

How to optimize SELECT DISTINCT when using multiple Joins?

I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins

set TableA boolean based on TableB record

My data looks like this:
TableA
- id INT
- is_in_table_b BOOL
TableB
- id INT
- table_a_id INT
I accidentally wiped out the 'is_in_table_b' BOOL on my dev machine while reorganizing the data structures, and I forgot how I created it. It's just a shortcut for some dev benchmarks.
All the "UPATE ... FROM ...." variations I tried are setting everything as "true" based on a the join. I can't remember if I originally had a CAST in this.
Does anyone know of a simple , elegant way to accomplish this? I just want to set is_in_table_b to True if the TableA.id appears in TableB.table_a_id. I know some non-elegant ways with inner queries, but I want to remember the more-correct ways to do this. I'm positive I had this done in an "Update From" originally.
This one should be simple enough:
UPDATE tableA SET
is_in_table_b = exists (select 1 FROM tableB WHERE table_a_id=tableA.id);
yeah, do a JOIN between the tables for an UPDATE.
the setup:
CREATE TABLE table_a (
id int not null auto_increment primary key,
is_in_b boolean
);
CREATE TABLE table_b (
table_a_id int
);
-- create some test data in table_a;
INSERT INTO table_a (is_in_b) VALUES (FALSE), (FALSE), (FALSE);
INSERT INTO table_a (is_in_b) SELECT FALSE
FROM table_a a1
JOIN table_a a2
JOIN table_a a3;
-- and create a subset of matching data in table_a;
INSERT INTO table_b (table_a_id)
SELECT id FROM table_a ORDER BY RAND() limit 5;
now the answer:
UPDATE table_a
JOIN table_b ON table_a_id = table_a.id
SET is_in_b = TRUE;
See the results with
SELECT * from table_b;
SELECT * FROM table_a WHERE is_in_b;
Works on http://sqlfiddle.com/#!2/8afc0/1 - should work in Postgres too I think.
Consider to drop that redundant column altogether and use a view or a "generated column" instead (with the EXISTS expression provided by #Daniel). Details under this related question:
Store common query as column?
Just be sure to have an index on TableB.table_a_id.