Postgresql create a log schema - postgresql

So my problem is simple. I have a schema prod with many tables, and another one log with the exact same tables and structure (primary keys change that's it).
When I do UPDATE or DELETE in the schema prod, I want to record old data in the log schema.
I have the following function called after a update or delete:
CREATE FUNCTION prod.log_data() RETURNS trigger
LANGUAGE plpgsql AS $$
DECLARE
v RECORD;
column_names text;
value_names text;
BEGIN
-- get column names of current table and store the list in a text var
column_names = '';
value_names = '';
FOR v IN SELECT * FROM information_schema.columns WHERE table_name = quote_ident(TG_TABLE_NAME) AND table_schema = quote_ident(TG_TABLE_SCHEMA) LOOP
column_names = column_names || ',' || v.column_name;
value_names = value_names || ',$1.' || v.column_name;
END LOOP;
-- remove first char ','
column_names = substring( column_names FROM 2);
value_names = substring( value_names FROM 2);
-- execute the insert into log schema
EXECUTE 'INSERT INTO log.' || TG_TABLE_NAME || ' ( ' || column_names || ' ) VALUES ( ' || value_names || ' )' USING OLD;
RETURN NULL; -- no need to return, it is executed after update
END;$$;
The annoying part is that I have to get column names from information_schema for each row.
I would rather use this:
EXECUTE 'INSERT INTO log.' || TG_TABLE_NAME || ' SELECT ' || OLD;
But some values can be NULL so this will execute:
INSERT INTO log.user SELECT 2,,,"2015-10-28 13:52:44.785947"
instead of
INSERT INTO log.user SELECT 2,NULL,NULL,"2015-10-28 13:52:44.785947"
Any idea to convert ",," to ",NULL,"?
Thanks
-Quentin

First of all I must say that in my opinion using PostgreSQL system tables (like information_schema) is the proper way for such a usecase. Especially that you must write it once: you create the function prod.log_data() and your done. Moreover it may be dangerous to use OLD in that context (just like *) as always because of not specified elements order.
But,
to answer your exact question the only way I know is to do some operations on OLD. Just observe that you cast OLD to text by doing concatenation ... ' SELECT ' || OLD. The default casting create that ugly double-commas. So, next you can play with that text. In the end I propose:
DECLARE
tmp TEXT
...
BEGIN
...
/*to make OLD -> text like (2,,3,4,,)*/
SELECT '' || OLD INTO tmp; /*step 1*/
/*take care of commas at the begining and end: '(,' ',)'*/
tmp := replace(replace(tmp, '(,', '(NULL,'), ',)', ',NULL)'); /*step 2*/
/* replace rest of commas to commas with NULL between them */
SELECT array_to_string(string_to_array(tmp, ',', ''), ',', 'NULL') INTO tmp; /*step 3*/
/* Now we can do EXECUTE*/
EXECUTE 'INSERT INTO log.' || TG_TABLE_NAME || ' SELECT ' || tmp;
Of course you can do steps 1-3 in one big step
SELECT array_to_string(string_to_array(replace(replace('' || NEW, '(,', '(NULL,'), ',)', ',NULL)'), ',', ''), ',', 'NULL') INTO tmp;
In my opinion this approach isn't any better from using information_schema, but it's your call.

Related

Getting A "Could Not Open Relation" Error On Simple Query

I have a function that creates a set of INSERT INTO ... VALUES scripts. If I uncomment the dvp.content line, the function fails with an "ERROR: could not open relation with OID ###", which refers to the temp table. The content column is a jsonb type. Not sure where to begin?
CREATE OR REPLACE FUNCTION export_docs_as_sql(doc_list uuid[], to_org_id uuid)
RETURNS table(id integer, sql text)
AS $$
BEGIN
...
-- use a temp table to gather all INSERT statements
CREATE TEMP TABLE IF NOT EXISTS doc_data_export(
id serial PRIMARY KEY,
sql text
);
...
-- get doc_version_pages
INSERT INTO doc_data_export(sql)
SELECT 'INSERT INTO doc_version_pages(id, doc_version_id, persona_id, care_category_id, patient_group_id, title, content, created_at, updated_at, is_guide, is_root) VALUES (' ||
quote_literal(dvp.id::TEXT) || ', ' ||
quote_literal(dvp.doc_version_id::TEXT) || ', ' ||
CASE WHEN p.name IS NOT NULL THEN '(SELECT px.id FROM personas px WHERE px.org_id = ' || quote_literal(dv.id::TEXT) || ' AND px.name = ' || quote_literal(p.name) || '), ' ELSE 'NULL, ' END ||
CASE WHEN c.name IS NOT NULL THEN '(SELECT cx.id FROM care_categories cx WHERE cx.org_id = ' || quote_literal(to_org_id) || ' AND cx.name = ' || quote_literal(c.name) || '), ' ELSE 'NULL, ' END ||
CASE WHEN g.name IS NOT NULL THEN '(SELECT gx.id FROM patient_groups gx WHERE gx.org_id = ' || quote_literal(to_org_id) || ' AND gx.name = ' || quote_literal(g.name) || '), ' ELSE 'NULL, ' END ||
quote_literal(dvp.title::TEXT) || ', ' ||
--dvp.content || ', ' ||
quote_literal(dvp.created_at::TEXT) || ', ' ||
quote_literal(now()::timestamp) || ', ' ||
quote_literal(dvp.is_guide::TEXT) || ', ' ||
quote_literal(dvp.is_root::TEXT) || ');'
FROM unnest(doc_list) l
INNER JOIN doc_versions dv ON l = dv.doc_id
INNER JOIN doc_version_pages dvp ON dv.id = dvp.doc_version_id
LEFT JOIN personas p ON dvp.persona_id = p.id
LEFT JOIN care_categories c ON dvp.care_category_id = c.id
LEFT JOIN patient_groups g ON dvp.patient_group_id = g.id;
...
-- output all inserts
RETURN QUERY SELECT * FROM doc_data_export;
-- drop temp table
DROP TABLE doc_data_export;
END;
$$ LANGUAGE plpgsql;
The "Could Not Open Relation" problem is occurring due to the bug described here, which remains an issue as of Postgres 14.0:
What seems to be happening is that if the strings are large enough to be
toasted, then the data returned out of the function with RETURN QUERY
contains toast pointers referencing the temp table's toast table.
If you drop the temp table then those pointers will fail upon use.
To explain further, when a column value is greater than the TOAST_TUPLE_THRESHOLD configuration parameter (usually 2KB) and cannot be compressed or when the column is configured with a storage parameter of EXTERNAL, the value will be broken down into chunks and stored in a special secondary table called a TOAST table. This table will be stored in the pg_toast schema and will be named like pg_toast.pg_toast_<table OID>.
So when you add dvp.content to the sql statement you insert that into doc_data_export, some of these values are larger than the aforementioned constraints and are thus TOASTed. Your RETURN QUERY is only sending the pointers to the values in the toast table. After the return is done, the temporary table and its corresponding TOAST table is dropped. Thus when the outer query attempts to materialize the results, it can't find the TOAST table that these pointers reference - hence the cryptic error message you see.
You can avoid sending TOAST pointers for the temporary table -and thus safely DROP it after the RETURN QUERY -by performing an operation on the sql column that returns the same value:
RETURN QUERY SELECT id, sql || '' FROM doc_data_export;
The simple function below will reproduce a minimal example of the TOAST bug when you set fail to true and demonstrate the successful workaround when you set fail to false.
DROP FUNCTION IF EXISTS buttered_toast(boolean);
CREATE OR REPLACE FUNCTION buttered_toast(fail boolean)
RETURNS table(id integer, enormous_data text)
AS $$
BEGIN
CREATE TEMPORARY TABLE tbl_with_toasts (
id integer PRIMARY KEY,
enormous_data text
) ON COMMIT DROP;
--generate a giant string that is sure to generate a TOAST table.
INSERT INTO tbl_with_toasts(id,enormous_data) SELECT 1, string_agg(gen_random_uuid()::text,'-') FROM generate_series(1,10000) as ints(int);
IF buttered_toast.fail THEN
-- will return pointers to tbl_with_toast's TOAST table for the "enormous_data" column.
RETURN QUERY SELECT tbl_with_toasts.id, tbl_with_toasts.enormous_data FROM tbl_with_toasts ;
ELSE
-- will generate and return new values for the "enormous_data" column
RETURN QUERY SELECT tbl_with_toasts.id, tbl_with_toasts.enormous_data || '' FROM tbl_with_toasts ;
END IF;
DROP TABLE tbl_with_toasts;
END;
$$ LANGUAGE plpgsql;
-- fails with "Could Not Open Relation"
select * from buttered_toast(true)
--succeeds
select * from buttered_toast(false);

Update Null columns to Zero dynamically in Redshift

Here is the code in SAS, It finds the numeric columns with blank and replace with 0's
DATA dummy_table;
SET dummy_table;
ARRAY DUMMY _NUMERIC_;
DO OVER DUMMY;
IF DUMMY=. THEN DUMMY=0;
END;
RUN;
I am trying to replicate this in Redshift, here is what I tried
create or replace procedure sp_replace_null_to_zero(IN tbl_nm varchar) as $$
Begin
Execute 'declare ' ||
'tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');' ||
'init_loop int := 0; ' ||
'cn_nm varchar; '
Begin
While init_loop <= tot_cnt
Loop
Raise info 'init_loop = %', Init_loop;
Raise info 'tot_cnt = %', tot_cnt;
Execute 'Select column_name into cn_nm from information_schema.columns ' ||
'where table_name ='|| tbl_nm || ' and ordinal_position = init_loop ' ||
'and data_type not in (''character varying'',''date'',''text''); '
Raise info 'cn_nm = %', cn_nm;
if cn_nm is not null then
Execute 'Update ' || tbl_nm ||
'Set ' || cn_nm = 0 ||
'Where ' || cn_nm is null or cn_nm =' ';
end if;
init_loop = init_loop + 1;
end loop;
End;
End;
$$ language plpgsql;
Issues I am facing
When I pass the Input parameter here, I am getting 0 count
tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');'
For testing purpose I tried hardcode the table name inside proc, I am getting the error amazon invalid operation: value for domain information_schema.cardinal_number violates check constraint "cardinal_number_domain_check"
Is this even possible in redshift, How can I do this logic or any other workaround.
Need Expertise advise here!!
You can simply run an UPDATE over the table(s) using the NVL(cn_nm,0) function
UPDATE tbl_raw
SET col2 = NVL(col2,0);
However UPDATE is a fairly expensive operation. Consider just using a view over your table that wraps the columns in NVL(cn_nm,0)
CREATE VIEW tbl_clean
AS
SELECT col1
, NVL(col2,0) col2
FROM tbl_raw;

Issue while forming regex string in a stored procedure

I am trying to run the below query in a stored procedure and its not working.
We tried to print the query using NOTICE and we saw E gets appended to the regex and thats the reason the query doesnt show any output.
Not working
select order,version from mytable
where substring(version from quote_literal('[0-9]+\.[0-9]{1}'))
IN ('6.2') and order= 'ABC';
But the same query if i run from pgadmin query tool, it works fine.
Working
select order,version from mytable
where substring(version from '[0-9]+\.[0-9]{1}')
IN ('6.2') and order= 'ABC';
My requirement is to form the reqex dynamically in the stored procedure. Please guide on how to achieve this.
Below is the line of code in my stored procedure,
sql_query = sql_query || ' AND substring(version from ' || quote_literal( '[0-9]+\.[0-9]{1}' ) || ') IN (' || quote_literal(compatibleVersions) || ')';
raise notice 'Value: %', sql_query;
EXECUTE sql_query INTO query_result ;
and from notice i am getting the below output,
AND substring(version from E'[0-9]+\\.[0-9]{1}') IN ('{6.2}')
My requirement is to make this regex work.
I narrowed down to this query,
working
select substring(version from '[0-9]+\.[0-9]{1}') from mytable ;
not working
select substring(version from quote_literal('[0-9]+\.[0-9]{1}')) from mytable ;
Now i think its easy to fix it. You can try at your end also running this above queries.
Since your problem is not really the extended string literal syntax using E, but the string representation of the array in the IN list, your PL/pgSQL should look somewhat like this:
sql_query = sql_query ||
' AND substring(version from ' || quote_literal( '[0-9]+\.[0-9]{1}' ) ||
') IN (' || (SELECT string_agg(quote_literal(x), ', ')
FROM unnest(compatibleVersions
) AS x(x)) || ')';
quote_literal should be used in situations where u want to dynamically construct queries. In such situation quote_literal will be replaced by E in the final constructed query.
right way to use
select * from config_support_module where substring(firmware from '[0-9]+\.[0-9]{1}') IN ('6.2');
select * from config_support_module where substring(firmware from E'[0-9]+\.[0-9]{1}') IN ('6.2') ;
wrong usage of quote_literal in static queries
select * from config_support_module where substring(firmware from quote_literal('[0-9]+\.[0-9]{1}')) IN ('6.2') ;
This doesnt give you any errors/output.
quote_literal usage in dynamic queries
sql_query = sql_query || ' AND substring(version from ' || quote_literal( '[0-9]+\.[0-9]{1}' ) || ') ... .. ...

Performance of joining on multiple columns with potential NULL values

Lets say we have the following table
CREATE TABLE my_table
(
record_id SERIAL,
column_1 INTEGER,
column_2 INTEGER,
column_3 INTEGER,
price NUMERIC
);
With the following data
INSERT INTO my_table (column_1, column_2, column_3, price) VALUES
(1, NULL, 1, 54.99),
(1, NULL, 1, 69.50),
(NULL, 2, 2, 54.99),
(NULL, 2, 2, 69.50),
(3, 3, NULL, 54.99),
(3, 3, NULL, 69.50);
Now we do something like
CREATE TABLE my_table_aggregations AS
SELECT
ROW_NUMBER() OVER () AS aggregation_id,
column_1,
column_2,
column_3
FROM my_table
GROUP BY
column_1,
column_2,
column_3;
What I want to do now is assign an aggregation_id to each record_id in my_table. Now because I have NULL values I cant simply join by t1.column_1 = t2.column_1 because NULL = NULL is NULL and so the join will exclude these records.
Now I know that I should use something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
((t.column_1 IS NULL AND agg.column_1 IS NULL) OR t.column_1 = agg.column_1) AND
((t.column_2 IS NULL AND agg.column_2 IS NULL) OR t.column_2 = agg.column_2) AND
((t.column_3 IS NULL AND agg.column_3 IS NULL) OR t.column_3 = agg.column_3)
);
The problem here is that I am dealing with hundreds of millions of records and having an OR in the join seems to take forever to run.
There is an alternative, which is something like this
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, -1) = COALESCE(agg.column_1, -1) AND
COALESCE(t.column_2, -1) = COALESCE(agg.column_2, -1) AND
COALESCE(t.column_3, -1) = COALESCE(agg.column_3, -1)
);
But the problem with this is that I am assuming there is no value in any of those columns which is -1.
Do note, this is an example which I am well aware I can use DENSE_RANK to get the same result. So lets pretend that this isn't an option.
Is there some crazy awesome way to get around having to use COALESCE but keeping the performance it has over using the correct way of the OR? I run tests, and the COALESCE is over 10 times faster than the OR.
I am running this on a Greenplum database so I am not sure if this performance difference is the same on a standard Postgres database.
Since my solution with NULLIF had performance problems, and your use of COALESCE was much faster, I wonder if you could try tweaking that solution to deal with the issue of -1. To do that, you could try casting to avoid false matches. I'm not sure what the performance hit would be, but it would look like:
SELECT
t.record_id,
agg.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(cast(t.column_1 as varchar), 'NA') =
COALESCE(cast(agg.column_1 as varchar), 'NA') AND
COALESCE(cast(t.column_2 as varchar), 'NA') =
COALESCE(cast(agg.column_2 as varchar), 'NA') AND
COALESCE(cast(t.column_3 as varchar), 'NA') =
COALESCE(cast(agg.column_3 as varchar), 'NA')
);
After doing some thinking, I decided the best approach this this is to dynamically find a value for each column that can be used as the second param in a COALESCE join. The function is rather long, but it does what I need and more importantly, this way keeps the COALESCE performance, the only down side is getting the MIN values is an additional time cost, but we are talking a minute.
Here is the function:
CREATE OR REPLACE FUNCTION pg_temp.get_null_join_int_value
(
left_table_schema TEXT,
left_table_name TEXT,
left_table_columns TEXT[],
right_table_schema TEXT,
right_table_name TEXT,
right_table_columns TEXT[],
output_table_schema TEXT,
output_table_name TEXT
) RETURNS TEXT AS
$$
DECLARE
colum_name TEXT;
sql TEXT;
complete_sql TEXT;
full_left_table TEXT;
full_right_table TEXT;
full_output_table TEXT;
BEGIN
/*****************************
VALIDATE PARAMS
******************************/
-- this section validates all of the function parameters ensuring that the values that cannot be NULL are not so
-- also checks for empty arrays which is not allowed and then ensures both arrays are of the same length
IF (left_table_name IS NULL) THEN
RAISE EXCEPTION 'left_table_name cannot be NULL';
ELSIF (left_table_columns IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be NULL';
ELSIF (right_table_name IS NULL) THEN
RAISE EXCEPTION 'right_table_name cannot be NULL';
ELSIF (right_table_columns IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be NULL';
ELSIF (output_table_name IS NULL) THEN
RAISE EXCEPTION 'output_table_name cannot be NULL';
ELSIF (array_upper(left_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'left_table_columns cannot be an empty array';
ELSIF (array_upper(right_table_columns, 1) IS NULL) THEN
RAISE EXCEPTION 'right_table_columns cannot be an empty array';
ELSIF (array_upper(left_table_columns, 1) <> array_upper(right_table_columns, 1)) THEN
RAISE EXCEPTION 'left_table_columns and right_table_columns must have a matching array length';
END IF;
/************************
TABLE NAMES
*************************/
-- create the full name of the left table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (left_table_schema IS NOT NULL) THEN
full_left_table = left_table_schema || '.' || left_table_name;
ELSE
full_left_table = left_table_name;
END IF;
-- create the full name of the right table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (right_table_schema IS NOT NULL) THEN
full_right_table = right_table_schema || '.' || right_table_name;
ELSE
full_right_table = right_table_name;
END IF;
-- create the full name of the output table
-- the schema name can be NULL which means that the table is temporary
-- because of this, we need to detect if we should specify the schema
IF (output_table_schema IS NOT NULL) THEN
full_output_table = output_table_schema || '.' || output_table_name;
ELSE
full_output_table = output_table_name;
END IF;
/**********************
LEFT TABLE
***********************/
-- start to create the table which will store the min values from the left table
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_left_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the left table column names parameter
FOR colum_name IN SELECT UNNEST(left_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the left table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the left table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = sql;
/************************
RIGHT TABLE
*************************/
-- start to create the table which will store the min values from the right table
sql =
'DROP TABLE IF EXISTS temp_null_join_right_table;' || E'\n' ||
'CREATE TEMP TABLE temp_null_join_right_table AS' || E'\n' ||
'SELECT';
-- loop through each column name in the right table column names parameter
FOR colum_name IN SELECT UNNEST(right_table_columns) LOOP
-- find the minimum value in this column and subtract one
-- we will use this as a value we know is not in the column of this table
sql = sql || E'\n\t' || 'MIN("' || colum_name || '")-1 AS "' || colum_name || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish the SQL to create the right table min values
sql = sql || E'\n' ||
'FROM ' || full_left_table || ';';
-- run the query that creates the table which stores the minimum values for each column in the right table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- start to create the final output table which will contain the column names defined in the left_table_columns parameter
-- each column will contain a negative value that is not present in both the left and right tables for the given column
sql =
'DROP TABLE IF EXISTS ' || full_output_table || ';' || E'\n' ||
'CREATE ' || (CASE WHEN output_table_schema IS NULL THEN 'TEMP ' END) || 'TABLE ' || full_output_table || ' AS' || E'\n' ||
'SELECT';
-- loop through each index of the left_table_columns array
FOR i IN coalesce(array_lower(left_table_columns, 1), 1)..coalesce(array_upper(left_table_columns, 1), 1) LOOP
-- add to the sql a call to the LEAST function
-- this function takes an infinite number of columns and returns the smallest value within those columns
-- we have -1 hardcoded because the smallest minimum value may be a positive integer and so we need to ensure the number used is negative
-- this way we will not confuse this value with a real ID from a table
sql = sql || E'\n\t' || 'LEAST(l."' || left_table_columns[i] || '", r."' || right_table_columns[i] || '", -1) AS "' || left_table_columns[i] || '",';
END LOOP;
-- remove the trailing comma from the SQL
sql = TRIM(TRAILING ',' FROM sql);
-- finish off the SQL which creates the final table
sql = sql || E'\n' ||
'FROM temp_null_join_left_table l' || E'\n' ||
'CROSS JOIN temp_null_join_right_table r' || ';';
-- create the final table
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- we no longer need these tables
sql =
'DROP TABLE IF EXISTS temp_null_join_left_table;' || E'\n' ||
'DROP TABLE IF EXISTS temp_null_join_right_table;';
EXECUTE sql;
-- store the sql which will be the return value of the function
complete_sql = complete_sql || E'\n\n' || sql;
-- return the SQL that has been run, good for debugging purposes or just understanding what the function does
RETURN complete_sql;
END;
$$
LANGUAGE plpgsql;
Below is an example usage of the function
SELECT pg_temp.get_null_join_int_value
(
-- left table
'public',
'my_table',
'{"column_1", "column_2", "column_3"}',
-- right table
'public',
'my_table_aggregations',
'{"column_1", "column_2", "column_3"}',
-- output table
NULL,
'temp_null_join_values'
);
Once the temp_null_join_values table is created you can do a sub select in the join for the COALESCE 2nd param.
DROP TABLE IF EXISTS temp_result_table;
CREATE TEMP TABLE temp_result_table AS
SELECT
t.record_id,
agg.aggregation_id
FROM public.my_table t
JOIN my_table_aggregations agg ON
(
COALESCE(t.column_1, (SELECT column_1 FROM temp_null_join_values)) = COALESCE(agg.column_1, (SELECT column_1 FROM temp_null_join_values)) AND
COALESCE(t.column_2, (SELECT column_2 FROM temp_null_join_values)) = COALESCE(agg.column_2, (SELECT column_2 FROM temp_null_join_values)) AND
COALESCE(t.column_3, (SELECT column_3 FROM temp_null_join_values)) = COALESCE(agg.column_3, (SELECT column_3 FROM temp_null_join_values))
);
I hope this helps someone
How about:
SELECT
t.record_id,
a.aggregation_id
FROM my_table t
JOIN my_table_aggregations agg ON
(
NULLIF(t.column_1, agg.column_1) IS NULL
AND
NULLIF(agg.column_1, t.column_1) IS NULL
AND
NULLIF(t.column_2, agg.column_2) IS NULL
AND
NULLIF(agg.column_2, t.column_2) IS NULL
AND
NULLIF(t.column_3, agg.column_3) IS NULL
AND
NULLIF(agg.column_3, t.column_3) IS NULL
);

How can I measure the amount of space taken by blobs on a Firebird 2.1 database?

I have a production database, using Firebird 2.1, where I need to find out how much space is used by each table, including the blobs. The blob-part is the tricky one, because it is not covered using the standard statistical report.
I do not have easy access to the server's desktop, so installing UDFs etc. is not a good solution.
How can I do this easily?
You can count total size of all BLOB fields in a database with following statement:
EXECUTE BLOCK RETURNS (BLOB_SIZE BIGINT)
AS
DECLARE VARIABLE RN CHAR(31) CHARACTER SET UNICODE_FSS;
DECLARE VARIABLE FN CHAR(31) CHARACTER SET UNICODE_FSS;
DECLARE VARIABLE S BIGINT;
BEGIN
BLOB_SIZE = 0;
FOR
SELECT r.rdb$relation_name, r.rdb$field_name
FROM rdb$relation_fields r JOIN rdb$fields f
ON r.rdb$field_source = f.rdb$field_name
WHERE f.rdb$field_type = 261
INTO :RN, :FN
DO BEGIN
EXECUTE STATEMENT
'SELECT SUM(OCTET_LENGTH(' || :FN || ')) FROM ' || :RN ||
' WHERE NOT ' || :FN || ' IS NULL'
INTO :S;
BLOB_SIZE = :BLOB_SIZE + COALESCE(:S, 0);
END
SUSPEND;
END
I modified the code example of Andrej to show the size of each blob field, not only the sum of all blobs.
And used SET TERM so you can copy&paste this snippet directly to tools like FlameRobin.
SET TERM #;
EXECUTE BLOCK
RETURNS (BLOB_SIZE BIGINT, TABLENAME CHAR(31), FIELDNAME CHAR(31) )
AS
DECLARE VARIABLE RN CHAR(31) CHARACTER SET UNICODE_FSS;
DECLARE VARIABLE FN CHAR(31) CHARACTER SET UNICODE_FSS;
DECLARE VARIABLE S BIGINT;
BEGIN
BLOB_SIZE = 0;
FOR
SELECT r.rdb$relation_name, r.rdb$field_name
FROM rdb$relation_fields r JOIN rdb$fields f
ON r.rdb$field_source = f.rdb$field_name
WHERE f.rdb$field_type = 261
INTO :RN, :FN
DO BEGIN
EXECUTE STATEMENT
'SELECT SUM(OCTET_LENGTH(' || :FN || ')) AS BLOB_SIZE, ''' || :RN || ''', ''' || :FN || '''
FROM ' || :RN ||
' WHERE NOT ' || :FN || ' IS NULL'
INTO :BLOB_SIZE, :TABLENAME, :FIELDNAME;
SUSPEND;
END
END
#
SET TERM ;#
This example doesn't work with ORDER BY, maybe a more elegant solution without EXECUTE BLOCK exists.