nvarchar vs varchar \ IN vs JOIN - tsql

Question regarding "IN" versus "INNER JOIN" and VARCHAR versus NVARCHAR
Examples lay it out...
This query returns the CORRECT results (11 count):
Uses nvarchar(10) data columns for comparison (Id and Location) via INNER JOIN
SELECT b.Location
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
This query returns the CORRECT records (11 count):
Uses the same query above as a sub-query, still comparing the same columns but using IN, and adds a forced CONVERT from NVARCHAR to VARCHAR.
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND convert(varchar(10),[configuration].[dbo].[M3_Customer].[Id]) in (
SELECT convert(varchar(10),b.Location)
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
)
This query returns INCORRECT results (10 count):
Only difference between this query and above is the comparison columns are converted to NVARCHAR
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND convert(nvarchar(10),[configuration].[dbo].[M3_Customer].[Id]) in (
SELECT convert(nvarchar(10),b.Location)
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
)
This is the original query that returned INCORRECT results - 10 records, should be returning 11 records:
The issue is associated to the difference between an INNER JOIN and IN - as well as - varchar vs nvarchar
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND [configuration].[dbo].[M3_Customer].[Id] IN (
SELECT [configuration].[dbo].[M3_WarehouseInventory].[Location]
FROM [configuration].[dbo].[M3_WarehouseInventory]
WHERE 1=1
and [configuration].[dbo].[M3_WarehouseInventory].[WarehouseId] = 'NCL'
GROUP BY [configuration].[dbo].[M3_WarehouseInventory].[Location]
)
AND [configuration].[dbo].[M3_Customer].[Status] <> '90'
The INNER JOIN logic works with nvarchar.
The IN logic works when converting to varchar.
The IN logic using nvarchar failed to return a single record.
We applied LTRIM(RTRIM() to all columns at one point but that did not resolve the issue.
Only using the combination of IN and converting the comparison columns to varchar - resolved the issue.
Why?

Related

Postgres join involving tables having join condition defined on an text array

I have two tables in postgresql
One table is of the form
Create table table1(
ID serial PRIMARY KEY,
Type []Text
)
Create table table2(
type text,
sellerID int
)
Now i want to get all the rows from table1 which are having type same that in table2 but the problem is that in table1 the type is an array.
In case the type in the table has an identifiable delimiter like ',' ,';' etc. you can rewrite the query as regexp_split_to_table(type,',') or versions later than 9.5 unnest function can be use too.
For eg.,
select * from
( select id ,regexp_split_to_table(type,',') from table1)table1
inner join
select * from table2
on trim(table1.type) = trim(table2.type)
Another good example can be found - https://www.dbrnd.com/2017/03/postgresql-regexp_split_to_array-to-split-string-using-different-delimiters/
SELECT
a[1] AS DiskInfo
,a[2] AS DiskNumber
,a[3] AS MessageKeyword
FROM (
SELECT regexp_split_to_array('Postgres Disk information , disk 2 , failed', ',')
) AS dt(a)
You can use the ANY operator in the JOIN condition:
select *
from table1 t1
join table2 t2 on t2.type = any (t1.type);
Note that if the types in the table1 match multiple rows in table2, you would get duplicates (from table1) because that's how a join works. Maybe you want an EXISTS condition instead:
select *
from table1 t1
where exists (select *
from table2 t2
where t2.type = any(t1.type));

Postgres JOIN on timestamp fails

Trying to do a simple FULL OUTER JOIN on a timestamp and it is outputing the full cartesian product instead of matching identical dates. What is wrong here?
SQL Fiddle with example data
CREATE TABLE A (
id INT,
time TIMESTAMP
);
CREATE TABLE B (
id INT,
time TIMESTAMP
);
Query:
SELECT A.Id AS a_id, A.Time AS a_time, B.Id AS b_id, B.Time AS b_time
FROM A
FULL OUTER JOIN B ON A.Time = B.Time
-- This works:
-- SELECT A.id, A.time, B.id, B.time
-- FROM A
-- FULL OUTER JOIN B ON A.id = B.id
You are using the wrong parameters on TO_DATE() on your INSERTS easy to test if you do
SELECT * FROM A;
SELECT * FROM B;
Instead of
TO_DATE('01-01-2002', '%d-%m-%Y')
Should be:
TO_DATE('01-01-2002', '%DD-%MM-%Y')
SQL DEMO
In your sql fiddle all your inserted dates are the same because your date pattern is wrong. Try using TO_DATE('01-01-2002', 'DD-MM-YYYY') instead of TO_DATE('01-01-2002', '%d-%m-%y')

TSQL - Why sysname is created when I create nVarChar column?

I have a table in my tsql datatable:
CREATE TABLE dbo.Test
(
Col nVarChar (50) null
)
GO
And then I executed query:
Select
c.name As Name, ty.name as Type, c.max_length As MaxLenght, c.precision As Precision, c.scale As Scale, c.is_nullable As IsNullable, *
From
sys.schemas s
inner join sys.tables t on s.schema_id = t.schema_id
inner join sys.columns c on t.object_id = c.object_id
inner join sys.types ty on ty.system_type_id = c.system_type_id
Where
s.name LIKE 'dbo' AND t.name LIKE 'Test'
The question is... Why there are Two Rows?!
Check this:
SELECT ROW_NUMBER() OVER(PARTITION BY system_type_id ORDER BY system_type_id)
, *
FROM sys.types;
Check the first column for values >1...
There are few types mapping to the same system_type_id. Some names are just an alias for something else...
UPDATE
This question addresses the same issue...

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.

How to optimize SELECT DISTINCT when using multiple Joins?

I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins