Count null values in each column of a table : PSQL - postgresql

I have a very big table but as an example I will only provide a very small part of it as following:-
col1 col2 col3 col4
10 2 12
13 4 11
0 1
3 5 111
I know how to find null values in one column. What I want to find is how many null values are there in each column just by writing one query.
Thanks in advance

You can use an aggregate with a filter:
select count(*) filter (where col1 is null) as col1_nulls,
count(*) filter (where col2 is null) as col2_nulls,
count(*) filter (where col3 is null) as col3_nulls,
count(*) filter (where col4 is null) as col4_nulls
from the_table;

I think you can generate this query on the fly. Here is an example of one way you can approach it:
CREATE OR REPLACE FUNCTION null_counts(tablename text)
RETURNS SETOF jsonb LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE 'SELECT to_jsonb(t) FROM (SELECT ' || (
SELECT string_agg('count(*) filter (where ' || a.attname::text || ' is null) as ' || a.attname || '_nulls', ',')
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = tablename::regclass
AND a.attnum > 0
AND a.attisdropped = false
) || ' FROM ' || tablename::regclass || ') as t';
END
$func$;
SELECT null_counts('your_table') AS val;

Related

Compare two tables and find the missing column using left join

I wanted to compare the two tables employees and employees_a and find the missing columns in the table comployees_a.
select a.Column_name,
From User_tab_columns a
LEFT JOIN User_tab_columns b
ON upper(a.table_name) = upper(b.table_name)||'_A'
AND a.column_name = b.column_name
Where upper(a.Table_name) = 'EMPLOYEES'
AND upper(b.table_name) = 'EMPLOYEES_A'
AND b.column_name is NULL
;
But this doesnt seems to be working. No rows are returned.
My employees table has the below columns
emp_name
emp_id
base_location
department
current_location
salary
manager
employees_a table has below columns
emp_name
emp_id
base_location
department
current_location
I want to find the rest two columns and add them into employees_a table.
I have more than 50 tables like this to compare them and find the missing column and add those columns into their respective "_a" table.
Missing columns? Why not using the MINUS set operator, seems to be way simpler, e.g.
select column_name from user_tables where table_name = 'EMP_1'
minus
select column_name from user_tables where table_name = 'EMP_2'
Thirstly, check if user_tab_columns table contains columns of your tables (in my case user_tab_columns is empty and I have to use all_tab_columns):
select a.Column_name
From User_tab_columns a
Where upper(a.Table_name) = 'EMPLOYEES'
Secondly, remove line AND upper(b.table_name) = 'EMPLOYEES_A', because upper(b.table_name) is null in case a column is not found. You have b.table_name in JOIN part of the SELECT already.
select a.Column_name
From User_tab_columns a
LEFT JOIN User_tab_columns b
ON upper(a.table_name) = upper(b.table_name)||'_A'
AND a.column_name = b.column_name
Where upper(a.Table_name) = 'EMPLOYEES'
AND b.column_name is NULL
You do not need any joins and can use:
select 'ALTER TABLE EMPLOYEES_A ADD "'
|| Column_name || '" '
|| CASE MAX(data_type)
WHEN 'NUMBER'
THEN 'NUMBER(' || MAX(data_precision) || ',' || MAX(data_scale) || ')'
WHEN 'VARCHAR2'
THEN 'VARCHAR2(' || MAX(data_length) || ')'
END
AS sql
From User_tab_columns
Where Table_name IN ('EMPLOYEES', 'EMPLOYEES_A')
GROUP BY COLUMN_NAME
HAVING COUNT(CASE table_name WHEN 'EMPLOYEES' THEN 1 END) = 1
AND COUNT(CASE table_name WHEN 'EMPLOYEES_A' THEN 1 END) = 0;
Or, for multiple tables:
select 'ALTER TABLE ' || MAX(table_name) || '_A ADD "'
|| Column_name || '" '
|| CASE MAX(data_type)
WHEN 'NUMBER'
THEN 'NUMBER(' || MAX(data_precision) || ',' || MAX(data_scale) || ')'
WHEN 'VARCHAR2'
THEN 'VARCHAR2(' || MAX(data_length) || ')'
END
AS sql
From User_tab_columns
Where Table_name IN ('EMPLOYEES', 'EMPLOYEES_A', 'SOMETHING', 'SOMETHING_A')
GROUP BY
CASE
WHEN table_name LIKE '%_A'
THEN SUBSTR(table_name, 1, LENGTH(table_name) - 2)
ELSE table_name
END,
COLUMN_NAME
HAVING COUNT(CASE WHEN table_name NOT LIKE '%_A' THEN 1 END) = 1
AND COUNT(CASE WHEN table_name LIKE '%_A' THEN 1 END) = 0;
fiddle

Postgresql dynamic dataset aggregation

I'm trying to aggregate multiple datasets where I have to be able to create dynamic rules on how to aggregate the data on an id basis. Below is a quick approach I wrote up that seems to do what I intended. Is there a more performant (and preferably safe) way of doing this. tbl1...n will be larger in size than agg_rule. I'm running postgres 13.0, if there are new features coming in 14 that'll help, that could also be of interest. I am only interesting in coalescing and sum:ing like below if that simplifies the problem.
CREATE TABLE tbl1 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl2 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl3 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE agg_rule AS
SELECT 1 id
, 'coalesce(tbl1.val, tbl2.val + tbl3.val)' expr;
CREATE OR REPLACE FUNCTION eval(_id INTEGER, _seq INTEGER)
RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE _res INTEGER;
BEGIN
EXECUTE 'SELECT ' || (
SELECT expr
FROM agg_rule
WHERE id = _id
) || '
FROM tbl1
FULL JOIN tbl2 USING (id, seq)
FULL JOIN tbl3 USING (id, seq)
WHERE id = ' || _id || '
AND seq = ' || _seq || ';' INTO _res;
RETURN _res;
END
$$;

How do I find all the NUMERIC columns in a table and do a SUM() on them?

I have a few tables in Netezza, DB2 and PostgreSQL databases, for which I need to reconcile and the best way we have come out with is to do a SUM() across all the NUMERIC Table columns on all the 3 databases.
Does anyone have a quick and simple way to find all the COLUMNS which are either NUMERIC or INTEGER or BIGINT and then run a SUM() on all these?
For comparing the results, I can do it manually also, or if someone has a way to capture these results in a common table and automatically check the differences in the SUM?
For DB2 you can use this metadata which will help you to find out the data type for each column
SELECT
COLUMN_NAME || ' ' || REPLACE(REPLACE(DATA_TYPE,'DECIMAL','NUMERIC'),'CHARACTER','VARCHAR') ||
CASE
WHEN DATA_TYPE = 'TIMESTAMP' THEN ''
ELSE
' (' ||
CASE
WHEN CHARACTER_MAXIMUM_LENGTH IS NOT NULL THEN CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(30))
WHEN NUMERIC_PRECISION IS NOT NULL THEN CAST(NUMERIC_PRECISION AS VARCHAR(30)) ||
CASE
WHEN NUMERIC_SCALE = 0 THEN ''
ELSE ',' || CAST(NUMERIC_SCALE AS VARCHAR(3))
END
ELSE ''
END || ')'
END || ',' "SQLCOL",
COLUMN_NAME,
DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE, ORDINAL_POSITION
FROM SYSIBM.COLUMNS
WHERE TABLE_NAME = 'insert your table name'
AND TABLE_SCHEMA = 'insert your table schema'
ORDER BY ORDINAL_POSITION
For Netezza, I got the following query:
SELECT 0 AS ATTNUM, 'SELECT' AS SQL
UNION
SELECT ATTNUM, 'SUM(' || ATTNAME || ') AS S_' || ATTNAME || ',' AS COLMN
FROM _V_RELATION_COLUMN RC
WHERE NAME = '<table-name>'
AND FORMAT_TYPE= 'NUMERIC'
UNION
SELECT 10000 AS ATTNUM, ' 0 AS FLAG FROM ' || '<table-name>'
ORDER BY ATTNUM
Still looking how to do this across DB2 and PostgreSQL.

PostgreSQL row to string

I have a result from a query like the below, which does not have a fixed number of columns
ID COL1 COL2 COL3 COL4
-------------------------------------
1 VAL11 VAL12 VAL13 VAL14
2 VAL21 VAL22 VAL23 VAL24
Now I want the result to be something like this.
RESULT
-----------------------------------------------------
ID:1, COL1:VAL11, COL2:VAL12, COL3:VAL13, COL4:VAL14
ID:2, COL1:VAL21, COL2:VAL22, COL3:VAL23, COL4:VAL24
Please help.
The quick and dirty way, but without the column names and including NULL values:
SELECT tbl::text
FROM tbl;
The slow & sure way:
SELECT array_to_string(ARRAY[
'ID:' || id
,'COL1:' || col1
,'COL2:' || col2
], ', ') AS result
FROM tbl;
If a column holds a NULL value, it will be missing in the result. I do not just concatenate, because NULL values would nullify the whole row.
array_to_string() makes sure that commas are only inserted where needed.
PostgreSQL 9.1 introduced the new function concat_ws() (much like the one in MySQL) with which we can further simplify:
SELECT concat_ws(', '
'ID:' || id
,'COL1:' || col1
,'COL2:' || col2
) AS result
FROM tbl;
SELECT
'ID:' ||coalesce(id::text, '<null>')
||', '||'COL1:'||coalesce(col1::text, '<null>')
||', '||'COL2:'||coalesce(col2::text, '<null>')
FROM tbl;
You can use this SQL to generate the first one for you (in case there're lot's of columns):
SELECT E'SELECT \n'||string_agg(trim(stmt), E' \n')||E'\n FROM tbl;'
FROM (SELECT
CASE WHEN a.attnum > 1 THEN $$||', '||$$ ELSE '' END ||
$$'$$||upper(a.attname)||$$:'||coalesce($$||quote_ident(a.attname)||
$$::text, '<null>')$$ AS stmt
FROM pg_attribute a, pg_class t
WHERE t.relkind='r' AND t.relname = 'tbl' AND a.attrelid = t.oid
AND NOT a.attisdropped AND a.attnum > 0) AS s;

Creating a view of frequencies from a table with unknown number of columns

I have a table with n columns and I need to create a view which contains the frequencies of every unique value in every column. n is unknown since, I need to apply the solution on numerous tables with different number of columns.
For example i have table:
column1 column2 column3
value1 value2 value3
value2 value2 value1
value1 value2 value2
The view should be something like this:
columnname value frequency
column1 value1 2
column1 value2 1
column2 value2 3
...
Since I have very little experience with sql any help would be extremely appreciated.
Many thanks in advance!
Thus far I have come up with this but am sort of stonewalled now.
CREATE or REPLACE FUNCTION create_view () RETURNS setof record AS $$
DECLARE
col RECORD;
BEGIN
for col in execute 'select column_name from information_schema.columns
where table_name = ''table123''' LOOP
???
END LOOP;
return;
END;
$$
LANGUAGE 'plpgsql';
Simple SQL:
SELECT 'col1' AS col, col1 AS val, count(*) AS ct FROM tbl GROUP BY col1
UNION ALL
SELECT 'col2', col2, count(*) FROM tbl GROUP BY col2
UNION ALL
SELECT 'col3', col3, count(*) FROM tbl GROUP BY col3
PL/pqSQL function executing dynamic SQL:
CREATE OR REPLACE FUNCTION f_demo(_schema text, _tbl text)
RETURNS TABLE(col text, val text, ct bigint) AS
$xx$
DECLARE
_fld text;
_sql text := '';
BEGIN
FOR _fld IN
SELECT a.attname -- use quote_ident to safeguard against SQLi
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = (COALESCE(_schema || '.', '') || _tbl)::regclass
AND a.attnum > 0
AND NOT a.attisdropped
-- AND a.attname ~~ '%col%' -- if you want to pick specific columns
LOOP
RETURN QUERY EXECUTE
'SELECT $1, ' || quote_ident(_fld) || '::text, count(*)
FROM ' || COALESCE(quote_ident(_schema) || '.', '') || quote_ident(_tbl) || '
GROUP BY 2'
USING _fld;
END LOOP;
END;
$xx$
LANGUAGE 'plpgsql';
Call:
SELECT * FROM f_demo('public', 'mytable');
Or, if you want to use the schema provided by search_path:
SELECT * FROM f_demo(NULL, 'mytable');
Major points
Works for any table and any number of columns with values of any type.
Values are cast to text to simplify my example. Could be done with a polymorphic type, too.
See this related answer for info on plpgsql techniques and links: https://stackoverflow.com/q/8146245/939860