Find all tables having data in a given column - postgresql

I have a set of around 500 schemas and many of them have common columns. Now whenever I have a update I have to manually see all schemas having those columns and update if they have the data.
I was trying to get all the tables having those columns against number of rows for a specific column data.
Eg. Lets say I have col1 column in scehmas A, B and C. Can I get data in following format.
Col1 table number
1005 A 3
1005 B 4
1005 C 5
1006 A 7
Where 1005 is a row in col1. A is table. 3 is number of rows with 1005 in col1 in table A.
Kindly excuse my formatting and lack of queries because I posted this question from mobile.

Create below function and use that for the extraction of the data
DROP FUNCTION IF EXISTS fun_test (CHARACTER VARYING);
drop type if exists fun_test_out;
create type fun_test_out as(
"schema_name" VARCHAR(255)
,"table_name" VARCHAR(255)
,"column_value" VARCHAR(255)
,"count" INT
);
CREATE OR REPLACE FUNCTION fun_test (colname CHARACTER VARYING)
RETURNS SETOF fun_test_out
AS
$$
declare
r fun_test_out%rowtype;
l_colname VARCHAR(255);
l_cte TEXT;
l_insert TEXT;
tables RECORD;
begin
l_colname := colname ;
DROP TABLE IF EXISTS tmp_output;
CREATE temp TABLE tmp_output
(
schema_name VARCHAR(255)
,table_name VARCHAR(255)
,column_value VARCHAR(255)
,count INT
);
DROP TABLE IF EXISTS tmp_tablename;
CREATE temp TABLE tmp_tablename
(
table_schema VARCHAR(255)
,table_name VARCHAR(255)
,column_name VARCHAR(255)
);
l_cte := 'Insert into tmp_tablename ' || chr(10) ||
'SELECT table_schema,table_name,column_name' || chr(10) ||
'FROM information_schema.columns WHERE column_name = ''' || l_colname || '''' ;
EXECUTE l_cte;
FOR tables IN
SELECT table_schema,table_name,column_name
FROM tmp_tablename
LOOP
l_insert = 'Insert into tmp_output ' || chr(10) ||
'SELECT ''' || tables.table_schema || ''',''' || tables.table_name || ''',' || tables.column_name || ',COUNT(*)' || chr(10) ||
'FROM ' || tables.table_schema || '.' || tables.table_name || chr(10) ||
'group by ' || tables.column_name
;
EXECUTE l_insert;
END LOOP;
/******************************************************************
FINAL SELECT
******************************************************************/
FOR r in
select *
from tmp_output
loop
RETURN NEXT r;
END LOOP;
DROP TABLE IF EXISTS tmp_output;
DROP TABLE IF EXISTS tmp_tablename;
end
$$
LANGUAGE PLPGSQL;
You can call the function using below statement
Select * from fun_test('Column_name');

Related

Something wrong with execute in PostgreSQL

I make function that will change type of columns of any table. And for example I create a table test:
create table test(id text, name text, born text);
insert into test values ('1', 'Ivanov', '10-10-2012'), ('2', 'Petrov', '01-01-1999'),
('3', 'Sidorov', '03-12-1975');
And then make table with column name and new data type:
create table col_types(column_name text, data_type text);
insert into col_types values ('id', 'integer'), ('name', 'text'),
('born', 'date');
This my function:
create or replace function change_columns(my_table text, columns_types_table text) returns void as $$
declare
r text;
cur_type text;
begin
raise notice 'NOTICE smth: %', 1;
for r in (select column_name from information_schema.columns where table_name = my_table) loop
raise notice 'NOTICE r: %', 2;
execute 'select data_type from ' -- ERROR here
|| quote_ident(columns_types_table)
|| ' where column_name = '
|| quote_ident(r)
into strict cur_type;
raise notice 'NOTICE cur_type: %', 3;
execute 'alter table '
|| quote_ident(my_table)
|| ' alter column '
|| quote_ident(r)
||' type '
|| quote_ident(cur_type)
|| ' using '
|| r
|| '::'
|| cur_type;
end loop;
end
$$
language 'plpgsql';
My ERROR:
ERROR: column "id" does not exist
Where: PL/pgSQL function change_columns(text,text) line 9 at EXECUTE
Function request:
select change_columns('test', 'col_types');
The full error message (at least in Postgres 14) is
ERROR: column "id" does not exist
LINE 1: select data_type from col_types where column_name = id
^
QUERY: select data_type from col_types where column_name = id
CONTEXT: PL/pgSQL function change_columns(text,text) line 9 at EXECUTE
which makes it easy to see where you went wrong: id is treated as an identifier here. You actually wanted to run the query
select data_type from col_types where column_name = 'id'
-- ^ ^
To generate this, you need to use quote_literal instead of quote_identifier:
execute 'select data_type from '
|| quote_ident(columns_types_table)
|| ' where column_name = '
|| quote_literal(r)
into strict cur_type;
(online demo. Notice I also had to change the alter column statement not to escape the type name)
An even better approach would be to use a parameterised query with a USING clause:
execute 'select data_type from '
|| quote_ident(columns_types_table)
|| ' where column_name = $1'
into strict cur_type
using r;
(online demo)

Update Null columns to Zero dynamically in Redshift

Here is the code in SAS, It finds the numeric columns with blank and replace with 0's
DATA dummy_table;
SET dummy_table;
ARRAY DUMMY _NUMERIC_;
DO OVER DUMMY;
IF DUMMY=. THEN DUMMY=0;
END;
RUN;
I am trying to replicate this in Redshift, here is what I tried
create or replace procedure sp_replace_null_to_zero(IN tbl_nm varchar) as $$
Begin
Execute 'declare ' ||
'tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');' ||
'init_loop int := 0; ' ||
'cn_nm varchar; '
Begin
While init_loop <= tot_cnt
Loop
Raise info 'init_loop = %', Init_loop;
Raise info 'tot_cnt = %', tot_cnt;
Execute 'Select column_name into cn_nm from information_schema.columns ' ||
'where table_name ='|| tbl_nm || ' and ordinal_position = init_loop ' ||
'and data_type not in (''character varying'',''date'',''text''); '
Raise info 'cn_nm = %', cn_nm;
if cn_nm is not null then
Execute 'Update ' || tbl_nm ||
'Set ' || cn_nm = 0 ||
'Where ' || cn_nm is null or cn_nm =' ';
end if;
init_loop = init_loop + 1;
end loop;
End;
End;
$$ language plpgsql;
Issues I am facing
When I pass the Input parameter here, I am getting 0 count
tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');'
For testing purpose I tried hardcode the table name inside proc, I am getting the error amazon invalid operation: value for domain information_schema.cardinal_number violates check constraint "cardinal_number_domain_check"
Is this even possible in redshift, How can I do this logic or any other workaround.
Need Expertise advise here!!
You can simply run an UPDATE over the table(s) using the NVL(cn_nm,0) function
UPDATE tbl_raw
SET col2 = NVL(col2,0);
However UPDATE is a fairly expensive operation. Consider just using a view over your table that wraps the columns in NVL(cn_nm,0)
CREATE VIEW tbl_clean
AS
SELECT col1
, NVL(col2,0) col2
FROM tbl_raw;

How to call a stored procedure inside another stored procedure in postgreSQL

I have a two tables, table_version_1 and table_version_2, I am trying to generate a new table based on these two tables. For which I have to write 4 stored procedures,
procedure1, procedure2, procedure3, procedure4
I am triggering procedure1 from my application
select * from procedure1
I am calling procedure2 in procedure1
PERFORM procedure2
and I am calling procedure3 in procedure2
PERFORM procedure3
and calling procedure4 in procedure3
PERFORM procedure3
My stored procedure looks something like below.
CREATE OR REPLACE FUNCTION procedure1(tb_name, compare_tb_name, file_version, compare_file_version)
RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
createquery text;
fpo_data jsonb;
inddata jsonb;
f_primary_key text;
rowscount INTEGER;
datacount INTEGER;
fcdata1 text;
fid INTEGER;
BEGIN
createquery := 'CREATE TABLE IF NOT EXISTS ' || tb_name || '_' || file_version || '_' || compare_file_version || '_pk(
id serial PRIMARY KEY,
fc_pkey1 VARCHAR (250) NULL,
fc_pkey2 VARCHAR (250) NULL,
fc_pkey3 VARCHAR (250) NULL,
fpo_data TEXT NULL,
fid INTEGER NULL
)';
EXECUTE createquery;
EXECUTE 'SELECT count(*) FROM ' || tb_name || '_' || file_version || '_' || compare_file_version || '_pk' INTO rowscount;
EXECUTE 'SELECT count(*) FROM ' || tb_name INTO datacount;
IF(rowscount <> datacount) THEN
EXECUTE 'SELECT json_agg((fpdata, foseqid))::jsonb
FROM (SELECT fo_data AS fpdata, fo_seq_id as foseqid
FROM '||tb_name||'
LIMIT 1000
) t' INTO fpo_data;
FOR inddata IN SELECT * FROM jsonb_array_elements(fpo_data) LOOP
EXECUTE 'INSERT INTO ' || tb_name || '_' || file_version || '_' || compare_file_version || '_pk(fc_pkey1, fpo_data, fid) VALUES ($1, $2, $3)' USING f_primary_key, inddata, fid;
END LOOP;
ELSE
PERFORM procedure2(tb_name, compare_tb_name, file_version, compare_file_version);
END IF;
return 'Primary Key Generation completed';
END;
$BODY$;
I have not written the complete query. I just written important steps of my query.
My issue is that, in the above query I have create query, insert query and select query and at the end of the stored procedure I have written return statement. If I remove the return all my steps create query, insert query and select query are failing and if I write return then it is not going to procedure2. What is the correct process to run this procedure?

pl/pgsql CTE insert with parent child tables and array of ROWTYPE

I have two history tables. One is the parent and the second is the detail. In this case they are history tables that track changes in another table.
CREATE TABLE IF NOT EXISTS history (
id serial PRIMARY KEY,
tablename text,
row_id integer,
ts timestamp,
username text,
source text,
action varchar(10)
);
CREATE TABLE IF NOT EXISTS history_detail (
id serial PRIMARY KEY,
master_id integer NOT NULL references history(id),
colname text,
oldval text,
newval text
);
I then have function that will compare an existing row with a new row. The compare seems like a straight forward to me. The part I am struggling with is when I want to insert the differences into my history tables. During the compare I am storing the differences into an array of history_detail, of course at that time I do not know what the id or the parent table row will be. That is where I am getting hung up.
CREATE OR REPLACE FUNCTION update_prescriber(_npi integer, colnames text[]) RETURNS VOID AS $$
DECLARE
t text[];
p text[];
pos integer := 0;
ts text;
updstmt text := '';
sstmt text := '';
colname text;
_id integer;
_tstr text := '';
_dtl history_detail%ROWTYPE;
_dtls history_detail[] DEFAULT '{}';
BEGIN
-- get the master table row id.
SELECT id INTO _id FROM master WHERE npi = _npi;
-- these select all the rows' column values cast as text.
SELECT unnest_table('tempmaster', 'WHERE npi = ''' || _npi || '''') INTO t;
SELECT unnest_table('master', 'WHERE npi = ''' || _npi || '''') INTO p;
-- go through the arrays and compare values
FOREACH ts IN ARRAY t
LOOP
pos := pos + 1;
-- pos + 1 becuse the master table has the ID column
IF p[pos + 1] != ts THEN
colname := colnames[pos];
updstmt := updstmt || ', ' || colname || '=t.' || colname;
sstmt := sstmt || ',' || colname;
_dtl.colname := colname;
_dtl.oldval := p[pos + 1];
_dtl.newval := ts;
_dtls := array_append(dtls, dtl);
RAISE NOTICE 'THERE IS a difference at for COLUMN %, old: %, new: %', colname, p[pos + 1], ts;
END IF;
END LOOP;
RAISE NOTICE 'dtls length: %', array_length(dtls,1);
RAISE NOTICE 'dtls: %', dtls;
RAISE NOTICE 'done comparing: %', updstmt;
IF length(updstmt) > 0 THEN
WITH hist AS (
INSERT INTO history
(tablename, row_id, ts, username, source, action)
VALUES
('master', _id, current_timestamp, 'me', 'source', 'update')
RETURNING *
), dtls AS (
SELECT hist.id_
INSERT INTO history_detail
--
-- this is where I am having trouble
--
;
_tstr := 'UPDATE master
SET ' || substr(updstmt,2) || '
FROM (SELECT ' || substr(sstmt,2) || ' FROM tempmaster WHERE npi = ''' || _npi || ''') AS t
WHERE master.id = ' || _id || ';';
EXECUTE _tstr;
END IF;
END;
$$ LANGUAGE plpgsql;
In an ideal world I would be able to do all of this in a statement. I know I could do it in multiple statements wrapped inside another BEGIN..END. I would like to make sure that I do it in the most efficient way possible. I don't think that there is a way to get rid of the dynamic EXECUTE, but hopefully someone smarter than me can push me in the right direction.
Thanks for any help.
I was able to create a statement that would insert into the 2 history tables at once.
WITH hist AS (
INSERT INTO history
(tablename, row_id, ts, username, source, action)
VALUES
('master', _id, current_timestamp, 'me', 'source', 'update')
RETURNING id
), dtls AS (
SELECT (my_row).*
FROM unnest(_dtls) my_row
), inserts AS (
SELECT hist.id AS master_id,
dtls.colname AS colname,
dtls.newval AS newval,
dtls.oldval AS oldval
FROM dtls,hist
)
INSERT INTO history_detail
(master_id, colname, newval, oldval)
SELECT * FROM inserts
;
I'd still like to add the column update as something that isn't an EXECUTE statement, but I really don't think that is possible.

Performance of joining on multiple columns with potential NULL values

Lets say we have the following table
CREATE TABLE my_table
(
record_id SERIAL,
column_1 INTEGER,
column_2 INTEGER,
column_3 INTEGER,
price NUMERIC
);
With the following data
INSERT INTO my_table (column_1, column_2, column_3, price) VALUES
(1, NULL, 1, 54.99),
(1, NULL, 1, 69.50),
(NULL, 2, 2, 54.99),
(NULL, 2, 2, 69.50),
(3, 3, NULL, 54.99),
(3, 3, NULL, 69.50);
Now we do something like
CREATE TABLE my_table_aggregations AS
SELECT
ROW_NUMBER() OVER () AS aggregation_id,
column_1,
column_2,
column_3
FROM my_table
GROUP BY
column_1,
column_2,
column_3;
What I want to do now is assign an aggregation_id to each record_id in my_table. Now because I have NULL values I cant simply join by t1.column_1 = t2.column_1 because NULL = NULL is NULL and so the join will exclude these records.
Now I know that I should use something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
((t.column_1 IS NULL AND agg.column_1 IS NULL) OR t.column_1 = agg.column_1) AND
((t.column_2 IS NULL AND agg.column_2 IS NULL) OR t.column_2 = agg.column_2) AND
((t.column_3 IS NULL AND agg.column_3 IS NULL) OR t.column_3 = agg.column_3)
);
The problem here is that I am dealing with hundreds of millions of records and having an OR in the join seems to take forever to run.
There is an alternative, which is something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, -1) = COALESCE(agg.column_1, -1) AND
COALESCE(t.column_2, -1) = COALESCE(agg.column_2, -1) AND
COALESCE(t.column_3, -1) = COALESCE(agg.column_3, -1)
);
But the problem with this is that I am assuming there is no value in any of those columns which is -1.
Do note, this is an example which I am well aware I can use DENSE_RANK to get the same result. So lets pretend that this isn't an option.
Is there some crazy awesome way to get around having to use COALESCE but keeping the performance it has over using the correct way of the OR? I run tests, and the COALESCE is over 10 times faster than the OR.
I am running this on a Greenplum database so I am not sure if this performance difference is the same on a standard Postgres database.
Since my solution with NULLIF had performance problems, and your use of COALESCE was much faster, I wonder if you could try tweaking that solution to deal with the issue of -1. To do that, you could try casting to avoid false matches. I'm not sure what the performance hit would be, but it would look like:
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(cast(t.column_1 as varchar), 'NA') =
COALESCE(cast(agg.column_1 as varchar), 'NA') AND
COALESCE(cast(t.column_2 as varchar), 'NA') =
COALESCE(cast(agg.column_2 as varchar), 'NA') AND
COALESCE(cast(t.column_3 as varchar), 'NA') =
COALESCE(cast(agg.column_3 as varchar), 'NA')
);
After doing some thinking, I decided the best approach this this is to dynamically find a value for each column that can be used as the second param in a COALESCE join. The function is rather long, but it does what I need and more importantly, this way keeps the COALESCE performance, the only down side is getting the MIN values is an additional time cost, but we are talking a minute.
Here is the function:
CREATE OR REPLACE FUNCTION pg_temp.get_null_join_int_value
(
left_table_schema TEXT,
left_table_name TEXT,
left_table_columns TEXT[],
right_table_schema TEXT,
right_table_name TEXT,
right_table_columns TEXT[],
output_table_schema TEXT,
output_table_name TEXT
) RETURNS TEXT AS
$$
DECLARE
colum_name TEXT;
sql TEXT;
complete_sql TEXT;
full_left_table TEXT;
full_right_table TEXT;
full_output_table TEXT;
BEGIN
/*****************************
VALIDATE PARAMS
******************************/
-- this section validates all of the function parameters ensuring that the values that cannot be NULL are not so
-- also checks for empty arrays which is not allowed and then ensures both arrays are of the same length
IF (left_table_name IS NULL) THEN
RAISE EXCEPTION 'left_table_name cannot be NULL';
ELSIF (left_table_columns IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be NULL';
ELSIF (right_table_name IS NULL) THEN
RAISE EXCEPTION 'right_table_name cannot be NULL';
ELSIF (right_table_columns IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be NULL';
ELSIF (output_table_name IS NULL) THEN
RAISE EXCEPTION 'output_table_name cannot be NULL';
ELSIF (array_upper(left_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be an empty array';
ELSIF (array_upper(right_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be an empty array';
ELSIF (array_upper(left_table_columns, 1) <> array_upper(right_table_columns, 1)) THEN
RAISE EXCEPTION 'left_table_columns and right_table_columns must have a matching array length';
END IF;
/************************
TABLE NAMES
*************************/
-- create the full name of the left table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (left_table_schema IS NOT NULL) THEN
full_left_table = left_table_schema || '.' || left_table_name;
ELSE
full_left_table = left_table_name;
END IF;
-- create the full name of the right table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (right_table_schema IS NOT NULL) THEN
full_right_table = right_table_schema || '.' || right_table_name;
ELSE
full_right_table = right_table_name;
END IF;
-- create the full name of the output table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (output_table_schema IS NOT NULL) THEN
full_output_table = output_table_schema || '.' || output_table_name;
ELSE
full_output_table = output_table_name;
END IF;
/**********************
LEFT TABLE
***********************/
-- start to create the table which will store the min values from the left table
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_left_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the left table column names parameter
FOR colum_name IN SELECT UNNEST(left_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the left table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the left table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = sql;
/************************
RIGHT TABLE
*************************/
-- start to create the table which will store the min values from the right table
sql =
'DROP TABLE IF EXISTS temp_null_join_right_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_right_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the right table column names parameter
FOR colum_name IN SELECT UNNEST(right_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the right table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the right table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- start to create the final output table which will contain the column names defined in the left_table_columns parameter
-- each column will contain a negative value that is not present in both the left and right tables for the given column
sql =
'DROP TABLE IF EXISTS ' || full_output_table || ';' || E'\n' ||
'CREATE ' || (CASE WHEN output_table_schema IS NULL THEN 'TEMP ' END) || 'TABLE ' || full_output_table || ' AS' || E'\n' ||
'SELECT';
-- loop through each index of the left_table_columns array
FOR i IN coalesce(array_lower(left_table_columns, 1), 1)..coalesce(array_upper(left_table_columns, 1), 1) LOOP
-- add to the sql a call to the LEAST function
-- this function takes an infinite number of columns and returns the smallest value within those columns
-- we have -1 hardcoded because the smallest minimum value may be a positive integer and so we need to ensure the number used is negative
-- this way we will not confuse this value with a real ID from a table
sql = sql || E'\n\t' || 'LEAST(l."' || left_table_columns[i] || '", r."' || right_table_columns[i] || '", -1) AS "' || left_table_columns[i] || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish off the SQL which creates the final table
sql = sql || E'\n' ||
'FROM temp_null_join_left_table l' || E'\n' ||
'CROSS JOIN temp_null_join_right_table r' || ';';
-- create the final table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- we no longer need these tables
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'DROP TABLE IF EXISTS temp_null_join_right_table;';
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- return the SQL that has been run, good for debugging purposes or just understanding what the function does
RETURN complete_sql;
END;
$$
LANGUAGE plpgsql;
Below is an example usage of the function
SELECT pg_temp.get_null_join_int_value
(
-- left table
'public',
'my_table',
'{"column_1", "column_2", "column_3"}',
-- right table
'public',
'my_table_aggregations',
'{"column_1", "column_2", "column_3"}',
-- output table
NULL,
'temp_null_join_values'
);
Once the temp_null_join_values table is created you can do a sub select in the join for the COALESCE 2nd param.
DROP TABLE IF EXISTS temp_result_table;
CREATE TEMP TABLE temp_result_table AS
SELECT
t.record_id,
agg.aggregation_id
FROM public.my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, (SELECT column_1 FROM temp_null_join_values)) = COALESCE(agg.column_1, (SELECT column_1 FROM temp_null_join_values)) AND
COALESCE(t.column_2, (SELECT column_2 FROM temp_null_join_values)) = COALESCE(agg.column_2, (SELECT column_2 FROM temp_null_join_values)) AND
COALESCE(t.column_3, (SELECT column_3 FROM temp_null_join_values)) = COALESCE(agg.column_3, (SELECT column_3 FROM temp_null_join_values))
);
I hope this helps someone
How about:
SELECT
t.record_id,
a.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
NULLIF(t.column_1, agg.column_1) IS NULL
AND
NULLIF(agg.column_1, t.column_1) IS NULL
AND
NULLIF(t.column_2, agg.column_2) IS NULL
AND
NULLIF(agg.column_2, t.column_2) IS NULL
AND
NULLIF(t.column_3, agg.column_3) IS NULL
AND
NULLIF(agg.column_3, t.column_3) IS NULL
);