Perform query using tables and columns from information_schema - postgresql

I'm trying to using information_schema.columns to find all of the columns in my database that has a geometry type and then check the SRID for the data in those columns.
I can do this with multiple queries where I first find the table names and column names
SELECT table_name, column_name
FROM information_schema.columns
WHERE udt_name = 'geometry';
and then (manually)
SELECT ST_SRID(column_name)
FROM table_name;
for each entry.
Does anyone how to streamline this into a single query?

Table names can't be variable; Postgres needs to be able to come up with an execution plan before it knows the parameter values. So you can't do this in a simple SQL statement.
Instead, you need to construct a dynamic query string using a procedural language like PL/pgSQL:
CREATE FUNCTION SRIDs() RETURNS TABLE (
tablename TEXT,
columnname TEXT,
srid INTEGER
) AS $$
BEGIN
FOR tablename, columnname IN (
SELECT table_name, column_name
FROM information_schema.columns
WHERE udt_name = 'geometry'
)
LOOP
EXECUTE format(
'SELECT ST_SRID(%s) FROM %s',
columnname, tablename
) INTO srid;
RETURN NEXT;
END LOOP;
END
$$
LANGUAGE plpgsql;
SELECT * FROM SRIDs();

Related

Find table names that contain a specific column entry from another table [duplicate]

Is it possible to search every column of every table for a particular value in PostgreSQL?
A similar question is available here for Oracle.
How about dumping the contents of the database, then using grep?
$ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp
$ grep United a.tmp
INSERT INTO countries VALUES ('US', 'United States');
INSERT INTO countries VALUES ('GB', 'United Kingdom');
The same utility, pg_dump, can include column names in the output. Just change --inserts to --column-inserts. That way you can search for specific column names, too. But if I were looking for column names, I'd probably dump the schema instead of the data.
$ pg_dump --data-only --column-inserts -U postgres your-db-name > a.tmp
$ grep country_code a.tmp
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('US', 'United States');
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('GB', 'United Kingdom');
Here's a pl/pgsql function that locates records where any column contains a specific value.
It takes as arguments the value to search in text format, an array of table names to search into (defaults to all tables) and an array of schema names (defaults all schema names).
It returns a table structure with schema, name of table, name of column and pseudo-column ctid (non-durable physical location of the row in the table, see System Columns)
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
JOIN information_schema.table_privileges p ON
(t.table_name=p.table_name AND t.table_schema=p.table_schema
AND p.privilege_type='SELECT')
JOIN information_schema.schemata s ON
(s.schema_name=t.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND (c.table_schema=ANY(haystack_schema) OR haystack_schema='{}')
AND t.table_type='BASE TABLE'
LOOP
FOR rowctid IN
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
)
LOOP
-- uncomment next line to get some progress report
-- RAISE NOTICE 'hit in %.%', schemaname, tablename;
RETURN NEXT;
END LOOP;
END LOOP;
END;
$$ language plpgsql;
See also the version on github based on the same principle but adding some speed and reporting improvements.
Examples of use in a test database:
Search in all tables within public schema:
select * from search_columns('foobar');
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | s3 | usename | (0,11)
public | s2 | relname | (7,29)
public | w | body | (0,2)
(3 rows)
Search in a specific table:
select * from search_columns('foobar','{w}');
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | w | body | (0,2)
(1 row)
Search in a subset of tables obtained from a select:
select * from search_columns('foobar', array(select table_name::name from information_schema.tables where table_name like 's%'), array['public']);
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | s2 | relname | (7,29)
public | s3 | usename | (0,11)
(2 rows)
Get a result row with the corresponding base table and and ctid:
select * from public.w where ctid='(0,2)';
title | body | tsv
-------+--------+---------------------
toto | foobar | 'foobar':2 'toto':1
Variants
To test against a regular expression instead of strict equality, like grep, this part of the query:
SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L
may be changed to:
SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L
For case insensitive comparisons, you could write:
SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)
to search every column of every table for a particular value
This does not define how to match exactly.
Nor does it define what to return exactly.
Assuming:
Find any row with any column containing the given value in its text representation - as opposed to equaling the given value.
Return the table name (regclass) and the tuple ID (ctid), because that's simplest.
Here is a dead simple, fast and slightly dirty way:
CREATE OR REPLACE FUNCTION search_whole_db(_like_pattern text)
RETURNS TABLE(_tbl regclass, _ctid tid) AS
$func$
BEGIN
FOR _tbl IN
SELECT c.oid::regclass
FROM pg_class c
JOIN pg_namespace n ON n.oid = relnamespace
WHERE c.relkind = 'r' -- only tables
AND n.nspname !~ '^(pg_|information_schema)' -- exclude system schemas
ORDER BY n.nspname, c.relname
LOOP
RETURN QUERY EXECUTE format(
'SELECT $1, ctid FROM %s t WHERE t::text ~~ %L'
, _tbl, '%' || _like_pattern || '%')
USING _tbl;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM search_whole_db('mypattern');
Provide the search pattern without enclosing %.
Why slightly dirty?
If separators and decorators for the row in text representation can be part of the search pattern, there can be false positives:
column separator: , by default
whole row is enclosed in parentheses:()
some values are enclosed in double quotes "
\ may be added as escape char
And the text representation of some columns may depend on local settings - but that ambiguity is inherent to the question, not to my solution.
Each qualifying row is returned once only, even when it matches multiple times (as opposed to other answers here).
This searches the whole DB except for system catalogs. Will typically take a long time to finish. You might want to restrict to certain schemas / tables (or even columns) like demonstrated in other answers. Or add notices and a progress indicator, also demonstrated in another answer.
The regclass object identifier type is represented as table name, schema-qualified where necessary to disambiguate according to the current search_path:
Find the referenced table name using table, field and schema name
What is the ctid?
How do I decompose ctid into page and row numbers?
You might want to escape characters with special meaning in the search pattern. See:
Escape function for regular expression or LIKE patterns
There is a way to achieve this without creating a function or using an external tool. By using Postgres' query_to_xml() function that can dynamically run a query inside another query, it's possible to search a text across many tables. This is based on my answer to retrieve the rowcount for all tables:
To search for the string foo across all tables in a schema, the following can be used:
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
left join xmltable('//table/row'
passing table_rows
columns
table_row text path 'table_row') as x on true
Note that the use of xmltable requires Postgres 10 or newer. For older Postgres version, this can be also done using xpath().
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
cross join unnest(xpath('/table/row/table_row/text()', table_rows)) as r(data)
The common table expression (WITH ...) is only used for convenience. It loops through all tables in the public schema. For each table the following query is run through the query_to_xml() function:
select to_jsonb(t)
from some_table t
where t::text like '%foo%';
The where clause is used to make sure the expensive generation of XML content is only done for rows that contain the search string. This might return something like this:
<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<table_row>{"id": 42, "some_column": "foobar"}</table_row>
</row>
</table>
The conversion of the complete row to jsonb is done, so that in the result one could see which value belongs to which column.
The above might return something like this:
table_name | table_row
-------------+----------------------------------------
public.foo | {"id": 1, "some_column": "foobar"}
public.bar | {"id": 42, "another_column": "barfoo"}
Online example for Postgres 10+
Online example for older Postgres versions
Without storing a new procedure you can use a code block and execute to obtain a table of occurences. You can filter results by schema, table or column name.
DO $$
DECLARE
value int := 0;
sql text := 'The constructed select statement';
rec1 record;
rec2 record;
BEGIN
DROP TABLE IF EXISTS _x;
CREATE TEMPORARY TABLE _x (
schema_name text,
table_name text,
column_name text,
found text
);
FOR rec1 IN
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_name <> '_x'
AND UPPER(column_name) LIKE UPPER('%%')
AND table_schema <> 'pg_catalog'
AND table_schema <> 'information_schema'
AND data_type IN ('character varying', 'text', 'character', 'char', 'varchar')
LOOP
sql := concat('SELECT ', rec1."column_name", ' AS "found" FROM ',rec1."table_schema" , '.',rec1."table_name" , ' WHERE UPPER(',rec1."column_name" , ') LIKE UPPER(''','%my_substring_to_find_goes_here%' , ''')');
RAISE NOTICE '%', sql;
BEGIN
FOR rec2 IN EXECUTE sql LOOP
RAISE NOTICE '%', sql;
INSERT INTO _x VALUES (rec1."table_schema", rec1."table_name", rec1."column_name", rec2."found");
END LOOP;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END; $$;
SELECT * FROM _x;
If you're using IntelliJ add your DB to Database view then right click on databases and select full text search, it will list all tables and all fields for your specific text.
And if someone think it could help. Here is #Daniel Vérité's function, with another param that accept names of columns that can be used in search. This way it decrease the time of processing. At least in my test it reduced a lot.
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_columns name[] default '{}',
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND (c.column_name=ANY(haystack_columns) OR haystack_columns='{}')
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
END IF;
END LOOP;
END;
$$ language plpgsql;
Bellow is an example of usage of the search_function created above.
SELECT * FROM search_columns('86192700'
, array(SELECT DISTINCT a.column_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public'
)
, array(SELECT b.table_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public')
);
Here's #Daniel Vérité's function with progress reporting functionality.
It reports progress in three ways:
by RAISE NOTICE;
by decreasing value of supplied {progress_seq} sequence from
{total number of colums to search in} down to 0;
by writing the progress along with found tables into text file,
located in c:\windows\temp\{progress_seq}.txt.
_
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}',
progress_seq text default NULL
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
DECLARE
currenttable text;
columnscount integer;
foundintables text[];
foundincolumns text[];
begin
currenttable='';
columnscount = (SELECT count(1)
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE')::integer;
PERFORM setval(progress_seq::regclass, columnscount);
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
foundintables = foundintables || tablename;
foundincolumns = foundincolumns || columnname;
RAISE NOTICE 'FOUND! %, %, %, %', schemaname,tablename,columnname, rowctid;
END IF;
IF (progress_seq IS NOT NULL) THEN
PERFORM nextval(progress_seq::regclass);
END IF;
IF(currenttable<>tablename) THEN
currenttable=tablename;
IF (progress_seq IS NOT NULL) THEN
RAISE NOTICE 'Columns left to look in: %; looking in table: %', currval(progress_seq::regclass), tablename;
EXECUTE 'COPY (SELECT unnest(string_to_array(''Current table (column ' || columnscount-currval(progress_seq::regclass) || ' of ' || columnscount || '): ' || tablename || '\n\nFound in tables/columns:\n' || COALESCE(
(SELECT string_agg(c1 || '/' || c2, '\n') FROM (SELECT unnest(foundintables) AS c1,unnest(foundincolumns) AS c2) AS t1)
, '') || ''',''\n''))) TO ''c:\WINDOWS\temp\' || progress_seq || '.txt''';
END IF;
END IF;
END LOOP;
END;
$$ language plpgsql;
-- Below function will list all the tables which contain a specific string in the database
select TablesCount(‘StringToSearch’);
--Iterates through all the tables in the database
CREATE OR REPLACE FUNCTION **TablesCount**(_searchText TEXT)
RETURNS text AS
$$ -- here start procedural part
DECLARE _tname text;
DECLARE cnt int;
BEGIN
FOR _tname IN SELECT table_name FROM information_schema.tables where table_schema='public' and table_type='BASE TABLE' LOOP
cnt= getMatchingCount(_tname,Columnames(_tname,_searchText));
RAISE NOTICE 'Count% ', CONCAT(' ',cnt,' Table name: ', _tname);
END LOOP;
RETURN _tname;
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification
-- Returns the count of tables for which the condition is met.
-- For example, if the intended text exists in any of the fields of the table,
-- then the count will be greater than 0. We can find the notifications
-- in the Messages section of the result viewer in postgres database.
CREATE OR REPLACE FUNCTION **getMatchingCount**(_tname TEXT, _clause TEXT)
RETURNS int AS
$$
Declare outpt text;
BEGIN
EXECUTE 'Select Count(*) from '||_tname||' where '|| _clause
INTO outpt;
RETURN outpt;
END;
$$ LANGUAGE plpgsql;
--Get the fields of each table. Builds the where clause with all columns of a table.
CREATE OR REPLACE FUNCTION **Columnames**(_tname text,st text)
RETURNS text AS
$$ -- here start procedural part
DECLARE
_name text;
_helper text;
BEGIN
FOR _name IN SELECT column_name FROM information_schema.Columns WHERE table_name =_tname LOOP
_name=CONCAT('CAST(',_name,' as VarChar)',' like ','''%',st,'%''', ' OR ');
_helper= CONCAT(_helper,_name,' ');
END LOOP;
RETURN CONCAT(_helper, ' 1=2');
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification

how to iterate in all schemas and find count from all tables present in all schemas with same table name for every 5mins?

imagine there are 5 schemas in my database and in every schema there is a common name table (ex:- table1) after every 5mins records get inserted in table1, how I can iterate in all schemas n calculate the count of table1[i have to automate the process so i am going to write the code in function and call that function after every 5mins using crontab].
Basically 2 options: Hard code schema.table and union the results. So something like:
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_or_rows integer)
language sql
as $$
select 'schema1', count(*) from schema1.table1 union all
select 'schema2', count(*) from schema2.table1 union all
select 'schema3', count(*) from schema3.table1 union all
...
select 'scheman', count(*) from scheman.table1;
$$;
The alternative being building the query dynamically from information_scheme.
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_of_rows bigint)
language plpgsql
as $$
declare
c_rows_count cursor is
select table_schema::text
from information_schema.tables
where table_name = 'table1';
l_tbl record;
l_sql_statement text = '';
l_connector text = '';
l_base_select text = 'select ''%s'', count(*) from %I.table1';
begin
for l_tbl in c_rows_count
loop
l_sql_statement = l_sql_statement ||
l_connector ||
format (l_base_select, l_tbl.table_schema, l_tbl.table_schema);
l_connector = ' union all ';
end loop;
raise notice E'Running Query: \n%', l_sql_statement;
return query execute l_sql_statement;
end;
$$;
Which is better. With few schema and few schema add/drop, opt for the first. It is direct and easily shows what you are doing. If you add/drop schema often then opt for the second. If you have many schema, but seldom add/drop them then modify the second to generate the first, save and schedule execution of the generated query.
NOTE: Not tested

Function to return dynamic set of columns for given table

I have a fields table to store column information for other tables:
CREATE TABLE public.fields (
schema_name varchar(100),
table_name varchar(100),
column_text varchar(100),
column_name varchar(100),
column_type varchar(100) default 'varchar(100)',
column_visible boolean
);
And I'd like to create a function to fetch data for a specific table.
Just tried sth like this:
create or replace function public.get_table(schema_name text,
table_name text,
active boolean default true)
returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
begin
for r in EXECUTE 'select * from ' || entity_name loop
return next r;
end loop;
return;
end
$$
language plpgsql;
With this function I have to specify columns when I call it!
select * from public.get_table('public', 'users') as dept(id int, uname text);
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in public.fields table.
Solution for the simple case
As explained in the referenced answers below, you can use registered (row) types, and thus implicitly declare the return type of a polymorphic function:
CREATE OR REPLACE FUNCTION public.get_table(_tbl_type anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE format('TABLE %s', pg_typeof(_tbl_type));
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.get_table(NULL::public.users); -- note the syntax!
Returns the complete table (with all user columns).
Wait! How?
Detailed explanation in this related answer, chapter
"Various complete table types":
Refactor a PL/pgSQL function to return the output of various SELECT queries
TABLE foo is just short for SELECT * FROM foo:
Is there a shortcut for SELECT * FROM?
2 steps for completely dynamic return type
But what you are trying to do is strictly impossible in a single SQL command.
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in
public.fields table.
There is no direct way to return an arbitrary selection of columns (return type not known at call time) from a function - or any SQL command. SQL demands to know number, names and types of resulting columns at call time. More in the 2nd chapter of this related answer:
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
There are various workarounds. You could wrap the result in one of the standard document types (json, jsonb, hstore, xml).
Or you generate the query with one function call and execute the result with the next:
CREATE OR REPLACE FUNCTION public.generate_get_table(_schema_name text, _table_name text)
RETURNS text AS
$func$
SELECT format('SELECT %s FROM %I.%I'
, string_agg(quote_ident(column_name), ', ')
, schema_name
, table_name)
FROM fields
WHERE column_visible
AND schema_name = _schema_name
AND table_name = _table_name
GROUP BY schema_name, table_name
ORDER BY schema_name, table_name;
$func$ LANGUAGE sql;
Call:
SELECT public.generate_get_table('public', 'users');
This create a query of the form:
SELECT usr_id, usr FROM public.users;
Execute it in the 2nd step. (You might want to add column numbers and order columns.)
Or append \gexec in psql to execute the return value immediately. See:
How to force evaluation of subquery before joining / pushing down to foreign server
Be sure to defend against SQL injection:
INSERT with dynamic table name in trigger function
Define table and column names as arguments in a plpgsql function?
Asides
varchar(100) does not make much sense for identifiers, which are limited to 63 characters in standard Postgres:
Maximum characters in labels (table names, columns etc)
If you understand how the object identifier type regclass works, you might replace schema and table name with a singe regclass column.
I think you just need another query to get the list of columns you want.
Maybe something like (this is untested):
create or replace function public.get_table(_schema_name text, _table_name text, active boolean default true) returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
columns varchar;
begin
-- Get the list of columns
SELECT string_agg(column_name, ', ')
INTO columns
FROM public.fields
WHERE fields.schema_name = _schema_name
AND fields.table_name = _table_name
AND fields.column_visible = TRUE;
-- Return rows from the specified table
RETURN QUERY EXECUTE 'select ' || columns || ' from ' || entity_name;
RETURN;
end
$$
language plpgsql;
Keep in mind that column/table references may need to be surrounded by double quotes if they have certain characters in them.

How to use variable as table name in plpgsql

I'm new to plpgsql. I'm trying to run a simple query in plpgsql using a variable as table name in plpgsql. But the variable is being interpreted as the table name instead of the value of the variable being interpreted as variable name.
DECLARE
v_table text;
z_table text;
max_id bigint;
BEGIN
FOR v_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'my_database'
AND table_schema = 'public'
AND table_name not like 'z_%'
LOOP
z_table := 'z_' || v_table;
SELECT max(id) from z_table INTO max_id;
DELETE FROM v_table where id > max_id;
END LOOP;
Some background information. For every table in my database, I have another table starting with "z_". E.g. for a table called "employee" I have identical table called "z_employee". z_employee contains the same set of data as employee. I use it to restore the employee table at the start of every test.
When I run this function I get the following error:
ERROR: relation "z_table" does not exist
LINE 1: SELECT max(id) from z_table
My guess is that I'm not allowed to use the variable z_table in the SQL query. At least not the way I'm using it here. But I don't know how it's supposed to be done.
Use dynamic SQL with EXECUTE, simplify, and escape identifiers properly:
CREATE OR REPLACE FUNCTION f_test()
RETURNS void AS
$func$
DECLARE
v_table text;
BEGIN
FOR v_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'my_database'
AND table_schema = 'public'
AND table_name NOT LIKE 'z_%'
LOOP
EXECUTE format('DELETE FROM %I v WHERE v.id > (SELECT max(id) FROM %I)'
, v_table, 'z_' || v_table);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Table names may need to be quoted to defend against syntax errors or even SQL injection! I use the convenient format() to concatenate the DELETE statement and escape identifiers properly.
A separate SELECT would be more expensive. You can do it all with a single DELETE statement.
Related:
Table name as a PostgreSQL function parameter
Aside:
You might use the (slightly faster) system catalog pg_tables instead:
SELECT tablename
FROM pg_catalog.pg_tables
WHERE schemaname = 'public'
AND tablename NOT LIKE 'z_%'
See:
How to check if a table exists in a given schema
table_catalog in information_schema.tables has no equivalent here. Only tables of the current database are visible anyway. So the above predicate WHERE table_catalog = 'my_database' produces an empty result set when connected to the wrong database.

How to add column if not exists on PostgreSQL?

Question is simple. How to add column x to table y, but only when x column doesn't exist ? I found only solution here how to check if column exists.
SELECT column_name
FROM information_schema.columns
WHERE table_name='x' and column_name='y';
With Postgres 9.6 this can be done using the option if not exists
ALTER TABLE table_name ADD COLUMN IF NOT EXISTS column_name INTEGER;
Here's a short-and-sweet version using the "DO" statement:
DO $$
BEGIN
BEGIN
ALTER TABLE <table_name> ADD COLUMN <column_name> <column_type>;
EXCEPTION
WHEN duplicate_column THEN RAISE NOTICE 'column <column_name> already exists in <table_name>.';
END;
END;
$$
You can't pass these as parameters, you'll need to do variable substitution in the string on the client side, but this is a self contained query that only emits a message if the column already exists, adds if it doesn't and will continue to fail on other errors (like an invalid data type).
I don't recommend doing ANY of these methods if these are random strings coming from external sources. No matter what method you use (client-side or server-side dynamic strings executed as queries), it would be a recipe for disaster as it opens you to SQL injection attacks.
Postgres 9.6 added ALTER TABLE tbl ADD COLUMN IF NOT EXISTS column_name.
So this is mostly outdated now. You might use it in older versions, or a variation to check for more than just the column name.
CREATE OR REPLACE function f_add_col(_tbl regclass, _col text, _type regtype)
RETURNS bool
LANGUAGE plpgsql AS
$func$
BEGIN
IF EXISTS (SELECT FROM pg_attribute
WHERE attrelid = _tbl
AND attname = _col
AND NOT attisdropped) THEN
RETURN false;
ELSE
EXECUTE format('ALTER TABLE %s ADD COLUMN %I %s', _tbl, _col, _type);
RETURN true;
END IF;
END
$func$;
Call:
SELECT f_add_col('public.kat', 'pfad1', 'int');
Returns true on success, else false (column already exists).
Raises an exception for invalid table or type name.
Why another version?
This could be done with a DO statement, but DO statements cannot return anything. And if it's for repeated use, I would create a function.
I use the object identifier types regclass and regtype for _tbl and _type which a) prevents SQL injection and b) checks validity of both immediately (cheapest possible way). The column name _col has still to be sanitized for EXECUTE with quote_ident(). See:
Table name as a PostgreSQL function parameter
format() requires Postgres 9.1+. For older versions concatenate manually:
EXECUTE 'ALTER TABLE ' || _tbl || ' ADD COLUMN ' || quote_ident(_col) || ' ' || _type;
You can schema-qualify your table name, but you don't have to.
You can double-quote the identifiers in the function call to preserve camel-case and reserved words (but you shouldn't use any of this anyway).
I query pg_catalog instead of the information_schema. Detailed explanation:
How to check if a table exists in a given schema
Blocks containing an EXCEPTION clause are substantially slower.
This is simpler and faster. The manual:
Tip
A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one.
Therefore, don't use EXCEPTION without need.
Following select query will return true/false, using EXISTS() function.
EXISTS(): The argument of EXISTS is an arbitrary SELECT statement, or
subquery. The subquery is evaluated to determine whether it returns
any rows. If it returns at least one row, the result of EXISTS is
"true"; if the subquery returns no rows, the result of EXISTS is
"false"
SELECT EXISTS(SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'x'
AND column_name = 'y');
and use the following dynamic SQL statement to alter your table
DO
$$
BEGIN
IF NOT EXISTS (SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'x'
AND column_name = 'y') THEN
ALTER TABLE x ADD COLUMN y int DEFAULT NULL;
ELSE
RAISE NOTICE 'Already exists';
END IF;
END
$$
For those who use Postgre 9.5+(I believe most of you do), there is a quite simple and clean solution
ALTER TABLE if exists <tablename> add if not exists <columnname> <columntype>
the below function will check the column if exist return appropriate message else it will add the column to the table.
create or replace function addcol(schemaname varchar, tablename varchar, colname varchar, coltype varchar)
returns varchar
language 'plpgsql'
as
$$
declare
col_name varchar ;
begin
execute 'select column_name from information_schema.columns where table_schema = ' ||
quote_literal(schemaname)||' and table_name='|| quote_literal(tablename) || ' and column_name= '|| quote_literal(colname)
into col_name ;
raise info ' the val : % ', col_name;
if(col_name is null ) then
col_name := colname;
execute 'alter table ' ||schemaname|| '.'|| tablename || ' add column '|| colname || ' ' || coltype;
else
col_name := colname ||' Already exist';
end if;
return col_name;
end;
$$
This is basically the solution from sola, but just cleaned up a bit. It's different enough that I didn't just want to "improve" his solution (plus, I sort of think that's rude).
Main difference is that it uses the EXECUTE format. Which I think is a bit cleaner, but I believe means that you must be on PostgresSQL 9.1 or newer.
This has been tested on 9.1 and works. Note: It will raise an error if the schema/table_name/or data_type are invalid. That could "fixed", but might be the correct behavior in many cases.
CREATE OR REPLACE FUNCTION add_column(schema_name TEXT, table_name TEXT,
column_name TEXT, data_type TEXT)
RETURNS BOOLEAN
AS
$BODY$
DECLARE
_tmp text;
BEGIN
EXECUTE format('SELECT COLUMN_NAME FROM information_schema.columns WHERE
table_schema=%L
AND table_name=%L
AND column_name=%L', schema_name, table_name, column_name)
INTO _tmp;
IF _tmp IS NOT NULL THEN
RAISE NOTICE 'Column % already exists in %.%', column_name, schema_name, table_name;
RETURN FALSE;
END IF;
EXECUTE format('ALTER TABLE %I.%I ADD COLUMN %I %s;', schema_name, table_name, column_name, data_type);
RAISE NOTICE 'Column % added to %.%', column_name, schema_name, table_name;
RETURN TRUE;
END;
$BODY$
LANGUAGE 'plpgsql';
usage:
select add_column('public', 'foo', 'bar', 'varchar(30)');
Can be added to migration scripts invoke function and drop when done.
create or replace function patch_column() returns void as
$$
begin
if exists (
select * from information_schema.columns
where table_name='my_table'
and column_name='missing_col'
)
then
raise notice 'missing_col already exists';
else
alter table my_table
add column missing_col varchar;
end if;
end;
$$ language plpgsql;
select patch_column();
drop function if exists patch_column();
In my case, for how it was created reason it is a bit difficult for our migration scripts to cut across different schemas.
To work around this we used an exception that just caught and ignored the error. This also had the nice side effect of being a lot easier to look at.
However, be wary that the other solutions have their own advantages that probably outweigh this solution:
DO $$
BEGIN
BEGIN
ALTER TABLE IF EXISTS bobby_tables RENAME COLUMN "dckx" TO "xkcd";
EXCEPTION
WHEN undefined_column THEN RAISE NOTICE 'Column was already renamed';
END;
END $$;
You can do it by following way.
ALTER TABLE tableName drop column if exists columnName;
ALTER TABLE tableName ADD COLUMN columnName character varying(8);
So it will drop the column if it is already exists. And then add the column to particular table.
Simply check if the query returned a column_name.
If not, execute something like this:
ALTER TABLE x ADD COLUMN y int;
Where you put something useful for 'x' and 'y' and of course a suitable datatype where I used int.