How to do a case is INSENSITIVE check using IS DISTINCT FROM - postgresql

I have to compare the values of multiple columns which includes character, integer, and date types columns from 2 tables.
I have used the following type of SQL ( actual SQL have 100+ column each side of DISTINCT FROM ).
SELECT table1.* from
Table1 JOIN table2 ON ( table1.id = table2.id )
WHERE
( table1.name, table1.email, table1.dob, table1.application_id )
IS DISTINCT FROM
( table2.name, table2.email, table2.dob, table2.application_id );
if email id/name has case difference then it is considering them distinct.
using LOWER() can solve the problem but that I need to write for each text column. Can I do this in any other way?

Related

nvarchar vs varchar \ IN vs JOIN

Question regarding "IN" versus "INNER JOIN" and VARCHAR versus NVARCHAR
Examples lay it out...
This query returns the CORRECT results (11 count):
Uses nvarchar(10) data columns for comparison (Id and Location) via INNER JOIN
SELECT b.Location
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
This query returns the CORRECT records (11 count):
Uses the same query above as a sub-query, still comparing the same columns but using IN, and adds a forced CONVERT from NVARCHAR to VARCHAR.
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND convert(varchar(10),[configuration].[dbo].[M3_Customer].[Id]) in (
SELECT convert(varchar(10),b.Location)
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
)
This query returns INCORRECT results (10 count):
Only difference between this query and above is the comparison columns are converted to NVARCHAR
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND convert(nvarchar(10),[configuration].[dbo].[M3_Customer].[Id]) in (
SELECT convert(nvarchar(10),b.Location)
FROM [configuration].[dbo].[M3_Customer] a
inner join [configuration].[dbo].[M3_WarehouseInventory] b
on a.Id = b.Location
WHERE 1=1
and b.[WarehouseId] = 'NCL'
AND a.[Status] <> '90'
GROUP BY b.[Location], a.Id
)
This is the original query that returned INCORRECT results - 10 records, should be returning 11 records:
The issue is associated to the difference between an INNER JOIN and IN - as well as - varchar vs nvarchar
SELECT *
FROM [configuration].[dbo].[M3_Customer]
WHERE 1=1
AND [configuration].[dbo].[M3_Customer].[Id] IN (
SELECT [configuration].[dbo].[M3_WarehouseInventory].[Location]
FROM [configuration].[dbo].[M3_WarehouseInventory]
WHERE 1=1
and [configuration].[dbo].[M3_WarehouseInventory].[WarehouseId] = 'NCL'
GROUP BY [configuration].[dbo].[M3_WarehouseInventory].[Location]
)
AND [configuration].[dbo].[M3_Customer].[Status] <> '90'
The INNER JOIN logic works with nvarchar.
The IN logic works when converting to varchar.
The IN logic using nvarchar failed to return a single record.
We applied LTRIM(RTRIM() to all columns at one point but that did not resolve the issue.
Only using the combination of IN and converting the comparison columns to varchar - resolved the issue.
Why?

Postgres Crosstab query with CTE (with clause)

Recently started working on Postgres and need to pivot data.
I wrote the following query:
select *
from crosstab (
$$
with tmp_kv as (
select distinct pat_id
,col.name as key, replace(replace(replace(value, '[',''), ']', ''),'"','') as value
from (
select p.Id as pat_id, nullif(kv.key,'undefined')::int as key, trim(kv.value::text,'"') as value
from pat_table p
left join e_table e on e.pat_id = p.id and e.id is null
,jsonb_each_text(p.data) as kv
) t
left join lateral (
select name::text as name from public.config_fields fld
where id = t.key
) col on true
)
select pat_id, key, value
from tmp_kv
where nullif(trim(key),'') is not null
order by pat_id, key
$$,$$
select distinct key from tmp_kv -- (Get error "relation "tmp_kv" does not exist" )
where nullif(trim(key),'') is not null
order by 1
$$
) as (
pat_id bigint
...
...
);
Query works if I take the WITH clause out into temporary table. But will be deploying it to production with read replicas, so need it to be working with a CTE. Is there a way?
The two queries passed as strings to the crosstab() function are separate queries.
A CTE can only be attached to a single query.
What you ask for is strictly impossible.
Since you have to spell out the (static) return type for crosstab() anyway, and the result of the query in the 2nd parameter has to match that, it's pointless to use a query with a dynamic result as 2nd parameter to begin with.

Postgres join involving tables having join condition defined on an text array

I have two tables in postgresql
One table is of the form
Create table table1(
ID serial PRIMARY KEY,
Type []Text
)
Create table table2(
type text,
sellerID int
)
Now i want to get all the rows from table1 which are having type same that in table2 but the problem is that in table1 the type is an array.
In case the type in the table has an identifiable delimiter like ',' ,';' etc. you can rewrite the query as regexp_split_to_table(type,',') or versions later than 9.5 unnest function can be use too.
For eg.,
select * from
( select id ,regexp_split_to_table(type,',') from table1)table1
inner join
select * from table2
on trim(table1.type) = trim(table2.type)
Another good example can be found - https://www.dbrnd.com/2017/03/postgresql-regexp_split_to_array-to-split-string-using-different-delimiters/
SELECT
a[1] AS DiskInfo
,a[2] AS DiskNumber
,a[3] AS MessageKeyword
FROM (
SELECT regexp_split_to_array('Postgres Disk information , disk 2 , failed', ',')
) AS dt(a)
You can use the ANY operator in the JOIN condition:
select *
from table1 t1
join table2 t2 on t2.type = any (t1.type);
Note that if the types in the table1 match multiple rows in table2, you would get duplicates (from table1) because that's how a join works. Maybe you want an EXISTS condition instead:
select *
from table1 t1
where exists (select *
from table2 t2
where t2.type = any(t1.type));

PostgreSQL - Append a table to another and add a field without listing all fields

I have two tables:
table_a with fields item_id,rank, and 50 other fields.
table_b with fields item_id, and the same 50 fields as table_a
I need to write a SELECT query that adds the rows of table_b to table_a but with rank set to a specific value, let's say 4.
Currently I have:
SELECT * FROM table_a
UNION
SELECT item_id, 4 rank, field_1, field_2, ...
How can I join the two tables together without writing out all of the fields and without using an INSERT query?
EDIT:
My idea is to join table_b to table_a somehow with the rank field remaining empty, then simply replace the null rank fields. The rank field is never null, but item_id can be duplicated and table_a may have item_id values that are not in table_b, and vice-versa.
I am not sure I understand why you need this, but you can use jsonb functions:
select (jsonb_populate_record(null::table_a, row)).*
from (
select to_jsonb(a) as row
from table_a a
union
select to_jsonb(b) || '{"rank": 4}'
from table_b b
) s
order by item_id;
Working example in rextester.
I'm pretty sure I've got it. The predefined rank column can be inserted into table_b by joining to the subset of itself with only the columns left of the column behind which you want to insert.
WITH
_leftcols AS ( SELECT item_id, 4 rank FROM table_b ),
_combined AS ( SELECT * FROM table_b JOIN _leftcols USING (item_id) )
SELECT * FROM _combined
UNION
SELECT * FROM table_a

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.