Count for each columns in table (not null) PostgreSQL PL/pgSQL - postgresql

I am trying to count the number of rows that do not contain null for each column in the table
There is a simple table actor_new
The first 2 columns (actor_id, first_name) contain 203 rows not null
Other 2 columns (last_name, last_update) contain 200 rows not null
This is a simple test that outputs the same value for all columns, but if you perform select separately, then everything works correctly, please help me understand the LOOP block
create or replace function new_cnt_test_ho(in_table text, out out_table text, out cnt_rows int) returns setof record AS $$
DECLARE i text;
BEGIN
FOR i IN
select column_name
from information_schema."columns"
where table_schema = 'public'
and table_name = in_table
LOOP
execute '
select $1, count($1)
from '|| quote_ident(in_table) ||'
where $1 is not null '
INTO out_table, cnt_rows
using i, quote_literal(i), quote_ident(in_table), quote_literal(in_table) ;
return next;
END LOOP;
END;
$$LANGUAGE plpgsql
Result:
select * from new_cnt_test_ho('actor_new')
out_table |cnt_rows|
-----------+--------+
actor_id | 203|
first_name | 203|
last_name | 203|
last_update| 203|
There are 4 parameters specified in using, because I assumed that the error was in quotes, I took turns playing with arguments from 1 to 4
The correct result should be like this
out_table |cnt_rows|
-----------+--------+
actor_id | 203|
first_name | 203|
last_name | 200|
last_update| 200|

based on your title: input is a table name, output is a table one column is column name, another column is return of count(column)
first check the table exists or not.
then for loop get each column name, after that for each column name run a query.
a sample query is select 'cola',count(cola) from count_nulls. first occurrence is literal 'cola', so we need quote_literal(cols.column_name),
second is the column name, so we need use quote_ident(cols.column_name)
select 'cola',count(cola) from count_nulls will count column cola all not null value. if a column all value is null then return 0.
The following function will return the expected result. Can be simplified, since i use a lot of raise notice.
CREATE OR REPLACE FUNCTION get_all_nulls (_table text)
RETURNS TABLE (
column_name_ text,
numberofnull bigint
)
AS $body$
DECLARE
cols RECORD;
_sql text;
_table_exists boolean;
_table_reg regclass;
BEGIN
_table_reg := _table::regclass;
_table_exists := (
SELECT
EXISTS (
SELECT
FROM
pg_tables
WHERE
schemaname = 'public'
AND tablename = _table));
FOR cols IN
SELECT
column_name
FROM
information_schema.columns
WHERE
table_name = _table
AND table_schema = 'public' LOOP
_sql := 'select ' || quote_literal(cols.column_name) || ',count(' || quote_ident(cols.column_name) || ') from ' || quote_ident(_table::text);
RAISE NOTICE '_sql:%', _sql;
RETURN query EXECUTE _sql;
END LOOP;
END;
$body$ STRICT
LANGUAGE plpgsql;
setup.
begin;
create table count_nulls(cola int, colb int, colc int);
INSERT into count_nulls values(null,null,null);
INSERT into count_nulls values(1,null,null);
INSERT into count_nulls values(2,3,null);
commit;

Related

Find table names that contain a specific column entry from another table [duplicate]

Is it possible to search every column of every table for a particular value in PostgreSQL?
A similar question is available here for Oracle.
How about dumping the contents of the database, then using grep?
$ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp
$ grep United a.tmp
INSERT INTO countries VALUES ('US', 'United States');
INSERT INTO countries VALUES ('GB', 'United Kingdom');
The same utility, pg_dump, can include column names in the output. Just change --inserts to --column-inserts. That way you can search for specific column names, too. But if I were looking for column names, I'd probably dump the schema instead of the data.
$ pg_dump --data-only --column-inserts -U postgres your-db-name > a.tmp
$ grep country_code a.tmp
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('US', 'United States');
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('GB', 'United Kingdom');
Here's a pl/pgsql function that locates records where any column contains a specific value.
It takes as arguments the value to search in text format, an array of table names to search into (defaults to all tables) and an array of schema names (defaults all schema names).
It returns a table structure with schema, name of table, name of column and pseudo-column ctid (non-durable physical location of the row in the table, see System Columns)
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
JOIN information_schema.table_privileges p ON
(t.table_name=p.table_name AND t.table_schema=p.table_schema
AND p.privilege_type='SELECT')
JOIN information_schema.schemata s ON
(s.schema_name=t.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND (c.table_schema=ANY(haystack_schema) OR haystack_schema='{}')
AND t.table_type='BASE TABLE'
LOOP
FOR rowctid IN
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
)
LOOP
-- uncomment next line to get some progress report
-- RAISE NOTICE 'hit in %.%', schemaname, tablename;
RETURN NEXT;
END LOOP;
END LOOP;
END;
$$ language plpgsql;
See also the version on github based on the same principle but adding some speed and reporting improvements.
Examples of use in a test database:
Search in all tables within public schema:
select * from search_columns('foobar');
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | s3 | usename | (0,11)
public | s2 | relname | (7,29)
public | w | body | (0,2)
(3 rows)
Search in a specific table:
select * from search_columns('foobar','{w}');
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | w | body | (0,2)
(1 row)
Search in a subset of tables obtained from a select:
select * from search_columns('foobar', array(select table_name::name from information_schema.tables where table_name like 's%'), array['public']);
schemaname | tablename | columnname | rowctid
------------+-----------+------------+---------
public | s2 | relname | (7,29)
public | s3 | usename | (0,11)
(2 rows)
Get a result row with the corresponding base table and and ctid:
select * from public.w where ctid='(0,2)';
title | body | tsv
-------+--------+---------------------
toto | foobar | 'foobar':2 'toto':1
Variants
To test against a regular expression instead of strict equality, like grep, this part of the query:
SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L
may be changed to:
SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L
For case insensitive comparisons, you could write:
SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)
to search every column of every table for a particular value
This does not define how to match exactly.
Nor does it define what to return exactly.
Assuming:
Find any row with any column containing the given value in its text representation - as opposed to equaling the given value.
Return the table name (regclass) and the tuple ID (ctid), because that's simplest.
Here is a dead simple, fast and slightly dirty way:
CREATE OR REPLACE FUNCTION search_whole_db(_like_pattern text)
RETURNS TABLE(_tbl regclass, _ctid tid) AS
$func$
BEGIN
FOR _tbl IN
SELECT c.oid::regclass
FROM pg_class c
JOIN pg_namespace n ON n.oid = relnamespace
WHERE c.relkind = 'r' -- only tables
AND n.nspname !~ '^(pg_|information_schema)' -- exclude system schemas
ORDER BY n.nspname, c.relname
LOOP
RETURN QUERY EXECUTE format(
'SELECT $1, ctid FROM %s t WHERE t::text ~~ %L'
, _tbl, '%' || _like_pattern || '%')
USING _tbl;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM search_whole_db('mypattern');
Provide the search pattern without enclosing %.
Why slightly dirty?
If separators and decorators for the row in text representation can be part of the search pattern, there can be false positives:
column separator: , by default
whole row is enclosed in parentheses:()
some values are enclosed in double quotes "
\ may be added as escape char
And the text representation of some columns may depend on local settings - but that ambiguity is inherent to the question, not to my solution.
Each qualifying row is returned once only, even when it matches multiple times (as opposed to other answers here).
This searches the whole DB except for system catalogs. Will typically take a long time to finish. You might want to restrict to certain schemas / tables (or even columns) like demonstrated in other answers. Or add notices and a progress indicator, also demonstrated in another answer.
The regclass object identifier type is represented as table name, schema-qualified where necessary to disambiguate according to the current search_path:
Find the referenced table name using table, field and schema name
What is the ctid?
How do I decompose ctid into page and row numbers?
You might want to escape characters with special meaning in the search pattern. See:
Escape function for regular expression or LIKE patterns
There is a way to achieve this without creating a function or using an external tool. By using Postgres' query_to_xml() function that can dynamically run a query inside another query, it's possible to search a text across many tables. This is based on my answer to retrieve the rowcount for all tables:
To search for the string foo across all tables in a schema, the following can be used:
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
left join xmltable('//table/row'
passing table_rows
columns
table_row text path 'table_row') as x on true
Note that the use of xmltable requires Postgres 10 or newer. For older Postgres version, this can be also done using xpath().
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
cross join unnest(xpath('/table/row/table_row/text()', table_rows)) as r(data)
The common table expression (WITH ...) is only used for convenience. It loops through all tables in the public schema. For each table the following query is run through the query_to_xml() function:
select to_jsonb(t)
from some_table t
where t::text like '%foo%';
The where clause is used to make sure the expensive generation of XML content is only done for rows that contain the search string. This might return something like this:
<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<table_row>{"id": 42, "some_column": "foobar"}</table_row>
</row>
</table>
The conversion of the complete row to jsonb is done, so that in the result one could see which value belongs to which column.
The above might return something like this:
table_name | table_row
-------------+----------------------------------------
public.foo | {"id": 1, "some_column": "foobar"}
public.bar | {"id": 42, "another_column": "barfoo"}
Online example for Postgres 10+
Online example for older Postgres versions
Without storing a new procedure you can use a code block and execute to obtain a table of occurences. You can filter results by schema, table or column name.
DO $$
DECLARE
value int := 0;
sql text := 'The constructed select statement';
rec1 record;
rec2 record;
BEGIN
DROP TABLE IF EXISTS _x;
CREATE TEMPORARY TABLE _x (
schema_name text,
table_name text,
column_name text,
found text
);
FOR rec1 IN
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_name <> '_x'
AND UPPER(column_name) LIKE UPPER('%%')
AND table_schema <> 'pg_catalog'
AND table_schema <> 'information_schema'
AND data_type IN ('character varying', 'text', 'character', 'char', 'varchar')
LOOP
sql := concat('SELECT ', rec1."column_name", ' AS "found" FROM ',rec1."table_schema" , '.',rec1."table_name" , ' WHERE UPPER(',rec1."column_name" , ') LIKE UPPER(''','%my_substring_to_find_goes_here%' , ''')');
RAISE NOTICE '%', sql;
BEGIN
FOR rec2 IN EXECUTE sql LOOP
RAISE NOTICE '%', sql;
INSERT INTO _x VALUES (rec1."table_schema", rec1."table_name", rec1."column_name", rec2."found");
END LOOP;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END; $$;
SELECT * FROM _x;
If you're using IntelliJ add your DB to Database view then right click on databases and select full text search, it will list all tables and all fields for your specific text.
And if someone think it could help. Here is #Daniel Vérité's function, with another param that accept names of columns that can be used in search. This way it decrease the time of processing. At least in my test it reduced a lot.
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_columns name[] default '{}',
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND (c.column_name=ANY(haystack_columns) OR haystack_columns='{}')
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
END IF;
END LOOP;
END;
$$ language plpgsql;
Bellow is an example of usage of the search_function created above.
SELECT * FROM search_columns('86192700'
, array(SELECT DISTINCT a.column_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public'
)
, array(SELECT b.table_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public')
);
Here's #Daniel Vérité's function with progress reporting functionality.
It reports progress in three ways:
by RAISE NOTICE;
by decreasing value of supplied {progress_seq} sequence from
{total number of colums to search in} down to 0;
by writing the progress along with found tables into text file,
located in c:\windows\temp\{progress_seq}.txt.
_
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}',
progress_seq text default NULL
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
DECLARE
currenttable text;
columnscount integer;
foundintables text[];
foundincolumns text[];
begin
currenttable='';
columnscount = (SELECT count(1)
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE')::integer;
PERFORM setval(progress_seq::regclass, columnscount);
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
foundintables = foundintables || tablename;
foundincolumns = foundincolumns || columnname;
RAISE NOTICE 'FOUND! %, %, %, %', schemaname,tablename,columnname, rowctid;
END IF;
IF (progress_seq IS NOT NULL) THEN
PERFORM nextval(progress_seq::regclass);
END IF;
IF(currenttable<>tablename) THEN
currenttable=tablename;
IF (progress_seq IS NOT NULL) THEN
RAISE NOTICE 'Columns left to look in: %; looking in table: %', currval(progress_seq::regclass), tablename;
EXECUTE 'COPY (SELECT unnest(string_to_array(''Current table (column ' || columnscount-currval(progress_seq::regclass) || ' of ' || columnscount || '): ' || tablename || '\n\nFound in tables/columns:\n' || COALESCE(
(SELECT string_agg(c1 || '/' || c2, '\n') FROM (SELECT unnest(foundintables) AS c1,unnest(foundincolumns) AS c2) AS t1)
, '') || ''',''\n''))) TO ''c:\WINDOWS\temp\' || progress_seq || '.txt''';
END IF;
END IF;
END LOOP;
END;
$$ language plpgsql;
-- Below function will list all the tables which contain a specific string in the database
select TablesCount(‘StringToSearch’);
--Iterates through all the tables in the database
CREATE OR REPLACE FUNCTION **TablesCount**(_searchText TEXT)
RETURNS text AS
$$ -- here start procedural part
DECLARE _tname text;
DECLARE cnt int;
BEGIN
FOR _tname IN SELECT table_name FROM information_schema.tables where table_schema='public' and table_type='BASE TABLE' LOOP
cnt= getMatchingCount(_tname,Columnames(_tname,_searchText));
RAISE NOTICE 'Count% ', CONCAT(' ',cnt,' Table name: ', _tname);
END LOOP;
RETURN _tname;
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification
-- Returns the count of tables for which the condition is met.
-- For example, if the intended text exists in any of the fields of the table,
-- then the count will be greater than 0. We can find the notifications
-- in the Messages section of the result viewer in postgres database.
CREATE OR REPLACE FUNCTION **getMatchingCount**(_tname TEXT, _clause TEXT)
RETURNS int AS
$$
Declare outpt text;
BEGIN
EXECUTE 'Select Count(*) from '||_tname||' where '|| _clause
INTO outpt;
RETURN outpt;
END;
$$ LANGUAGE plpgsql;
--Get the fields of each table. Builds the where clause with all columns of a table.
CREATE OR REPLACE FUNCTION **Columnames**(_tname text,st text)
RETURNS text AS
$$ -- here start procedural part
DECLARE
_name text;
_helper text;
BEGIN
FOR _name IN SELECT column_name FROM information_schema.Columns WHERE table_name =_tname LOOP
_name=CONCAT('CAST(',_name,' as VarChar)',' like ','''%',st,'%''', ' OR ');
_helper= CONCAT(_helper,_name,' ');
END LOOP;
RETURN CONCAT(_helper, ' 1=2');
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification

Array error passing dynamic number of parameters to function

I'm trying to create a function to receive the name of the table in my schema already created and a several name of columns within this table (dynamic number of columns) and return a table with all the columns in a unique column with the value of each column separated by comma.
I'm trying this:
CREATE OR REPLACE PROCEDURE public.matching(IN table text, VARIADIC column_names text[])
LANGUAGE 'plpgsql'
AS $BODY$DECLARE
column_text text;
BEGIN
EXECUTE format ($$ SELECT array_to_string(%s, ' ')$$, column_names) into column_text;
EXECUTE format ($$ CREATE TABLE temp1 AS
SELECT concat(%s, ' ') FROM %s $$, column_text, table);
END;$BODY$;
This return an error:
ERROR: syntax error at or near «{»
LINE 1: SELECT array_to_string({city,address}, ' ')
which is the error?
If you simplify the generation of the dynamic SQL, things get easier:
CREATE OR REPLACE PROCEDURE public.matching(IN table_name text, VARIADIC column_names text[])
LANGUAGE plpgsql
AS
$BODY$
DECLARE
l_sql text;
BEGIN
l_sql := format($s$
create table temp1 as
select concat_ws(',', %s) as everything
from %I
$s$, array_to_string(column_names, ','), table_name);
raise notice 'Running %', l_sql;
EXECUTE l_sql;
END;
$BODY$;
So if you e.g. pass in 'some_table' and {'one', 'two', 'three'} the generated SQL will look like this:
create table temp1 as select concat_ws(',', one,two,three) as everything from some_table
I also used a column alias for the new column, so that the new table has a defined name. Note that the way I put the column names into the SQL string won't properly deal with identifiers that need double quotes (but they should be avoided anyway)
If you want to "return a table", then maybe a function might be the better solution:
CREATE OR REPLACE function matching(IN table_name text, VARIADIC column_names text[])
returns table (everything text)
LANGUAGE plpgsql
AS
$BODY$
DECLARE
l_sql text;
BEGIN
l_sql := format($s$
select concat_ws(',', %s) as everything
from %I
$s$, array_to_string(column_names, ','), table_name);
return query execute l_sql;
END;
$BODY$;
Then you can use it like this:
select *
from matching('some_table', 'one', 'two', 'three');
I propose different but similar code.
With following script:
CREATE OR REPLACE PROCEDURE public.test(IN p_old_table text, IN p_old_column_names text[], IN p_new_table text)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
old_column_list text;
ctas_stmt text;
BEGIN
old_column_list = array_to_string(p_old_column_names,',');
RAISE NOTICE 'old_column_list=%', old_column_list;
ctas_stmt = format('CREATE TABLE %s AS SELECT %s from %s', p_new_table, old_column_list, p_old_table);
RAISE NOTICE 'ctas_stmt=%', ctas_stmt;
EXECUTE ctas_stmt;
END;
$BODY$;
--
create table t(x int, y text, z timestamp, z1 text);
insert into t values (1, 'OK', current_timestamp, null);
select * from t;
--
call test('t',ARRAY['x','y','z'], 'tmp');
--
\d tmp;
select * from tmp;
I have following execution:
CREATE OR REPLACE PROCEDURE public.test(IN p_old_table text, IN p_old_column_names text[], IN p_new_table text)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
old_column_list text;
ctas_stmt text;
BEGIN
old_column_list = array_to_string(p_old_column_names,',');
RAISE NOTICE 'old_column_list=%', old_column_list;
ctas_stmt = format('CREATE TABLE %s AS SELECT %s from %s', p_new_table, old_column_list, p_old_table);
RAISE NOTICE 'ctas_stmt=%', ctas_stmt;
EXECUTE ctas_stmt;
END;
$BODY$;
CREATE PROCEDURE
create table t(x int, y text, z timestamp, z1 text);
CREATE TABLE
insert into t values (1, 'OK', current_timestamp, null);
INSERT 0 1
select * from t;
x | y | z | z1
---+----+----------------------------+----
1 | OK | 2020-04-14 11:37:28.641328 |
(1 row)
call test('t',ARRAY['x','y','z'], 'tmp');
psql:tvar.sql:24: NOTICE: old_column_list=x,y,z
psql:tvar.sql:24: NOTICE: ctas_stmt=CREATE TABLE tmp AS SELECT x,y,z from t
CALL
Table "public.tmp"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
x | integer | | |
y | text | | |
z | timestamp without time zone | | |
select * from tmp;
x | y | z
---+----+----------------------------
1 | OK | 2020-04-14 11:37:28.641328
(1 row)

Using variables in a plpgsql function

Ok so I used a string_agg like this.
select string_agg(DISTINCT first_name,', ' ORDER BY first_name) FROM person_test;
Then I wrote this to return the values to a table.
SELECT *
FROM person_test
where first_name = ANY(string_to_array('Aaron,Anne', ','));
Now I want to put this in a function so that instead of acturally putting names into the string_to_array, I can just call the string_agg.
I am new to postgres and am not finding any good documentation on how to do this online. I believe I would have to declare the the string_agg and then call it in string_to_array but I am having no such luck.
This was my attempt, I know this is now right but if anyone could add some feedback. I am getting an error between results and ALAIS and on the return.
create or REPLACE FUNCTION select_persons(VARIADIC names TEXT[]);
declare results ALIAS select string_agg(DISTINCT first_name,', ' ORDER BY first_name) FROM person_test;
BEGIN
return setof person_test LANGUAGE sql as $$
select * from person_test
where first_name = any(results)
end;
$$ language sql;
You can create a function with variable number of arguments.
Example:
create table person_test (id int, first_name text);
insert into person_test values
(1, 'Ann'), (2, 'Bob'), (3, 'Ben');
create or replace function select_persons(variadic names text[])
returns setof person_test language sql as $$
select *
from person_test
where first_name = any(names)
$$;
select * from select_persons('Ann');
id | first_name
----+------------
1 | Ann
(1 row)
select * from select_persons('Ann', 'Ben', 'Bob');
id | first_name
----+------------
1 | Ann
2 | Bob
3 | Ben
(3 rows)
To use a variable inside a plpgsql function, you should declare the variable and use select ... into (or assignment statement). Example:
create or replace function my_func()
returns setof person_test
language plpgsql as $$
declare
aggregated_names text;
begin
select string_agg(distinct first_name,', ' order by first_name)
into aggregated_names
from person_test;
-- here you can do something using aggregated_names
return query
select *
from person_test
where first_name = any(string_to_array(aggregated_names, ', '));
end $$;
select * from my_func();

Get IDs from multiple columns in multiple tables as one set or array

I have multiple tables with each two rows of interest: connection_node_start_id and connection_node_end_id. My goal is to get a collection of all those IDs, either as a flat ARRAY or as a new TABLE consisting of one row.
Example output ARRAY:
result = {1,4,7,9,2,5}
Example output TABLE:
IDS
-------
1
4
7
9
2
5
My fist attempt is somewhat clumsy and does not work properly as the SELECT statement just returns one row. It seems there must be a simple way to do this, can someone point me into the right direction?
CREATE OR REPLACE FUNCTION get_connection_nodes(anyarray)
RETURNS anyarray AS
$$
DECLARE
table_name varchar;
result integer[];
sel integer[];
BEGIN
FOREACH table_name IN ARRAY $1
LOOP
RAISE NOTICE 'table_name(%)',table_name;
EXECUTE 'SELECT ARRAY[connection_node_end_id,
connection_node_start_id] FROM ' || table_name INTO sel;
RAISE NOTICE 'sel(%)',sel;
result := array_cat(result, sel);
END LOOP;
RETURN result;
END
$$
LANGUAGE 'plpgsql';
Test table:
connection_node_start_id | connection_node_end_id
--------------------------------------------------
1 | 4
7 | 9
Call:
SELECT get_connection_nodes(ARRAY['test_table']);
Result:
{1,4} -- only 1st row, rest is missing
For Postgres 9.3+
CREATE OR REPLACE FUNCTION get_connection_nodes(text[])
RETURNS TABLE (ids int) AS
$func$
DECLARE
_tbl text;
BEGIN
FOREACH _tbl IN ARRAY $1
LOOP
RETURN QUERY EXECUTE format('
SELECT t.id
FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, _tbl);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Related answer on dba.SE:
SELECT DISTINCT on multiple columns
Or drop the loop and concatenate a single query. Probably fastest:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT t.id FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
Related:
Loop through like tables in a schema
LATERAL was introduced with Postgres 9.3.
For older Postgres
You can use the set-returning function unnest() in the SELECT list, too:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(
'SELECT unnest(ARRAY[connection_node_start_id
, connection_node_end_id]) FROM ' || tbl
, ' UNION ALL '
)
FROM (SELECT quote_ident(tbl) AS tbl FROM unnest($1) tbl) t
);
END
$func$ LANGUAGE plpgsql;
Should work with pg 8.4+ (or maybe even older). Works with current Postgres (9.4) as well, but LATERAL is much cleaner.
Or make it very simple:
CREATE OR REPLACE FUNCTION get_connection_nodes3(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT connection_node_start_id FROM %1$I
UNION ALL
SELECT connection_node_end_id FROM %1$I'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
format() was introduced with pg 9.1.
Might be a bit slower with big tables because each table is scanned once for every column (so 2 times here). Sort order in the result is different, too - but that does not seem to matter for you.
Be sure to sanitize escape identifiers to defend against SQL injection and other illegal syntax. Details:
Table name as a PostgreSQL function parameter
The EXECUTE ... INTO statement can only return data from a single row:
If multiple rows are returned, only the first will be assigned to the INTO variable.
In order to concatenate values from all rows you have to aggregate them first by column and then append the arrays:
EXECUTE 'SELECT array_agg(connection_node_end_id) ||
array_agg(connection_node_start_id) FROM ' || table_name INTO sel;
You're probably looking for something like this:
CREATE OR REPLACE FUNCTION d (tblname TEXT [])
RETURNS TABLE (c INTEGER) AS $$
DECLARE sql TEXT;
BEGIN
WITH x
AS (SELECT unnest(tblname) AS tbl),
y AS (
SELECT FORMAT('
SELECT connection_node_end_id
FROM %s
UNION ALL
SELECT connection_node_start_id
FROM %s
', tbl, tbl) AS s
FROM x)
SELECT string_agg(s, ' UNION ALL ')
INTO sql
FROM y;
RETURN QUERY EXECUTE sql;
END;$$
LANGUAGE plpgsql;
CREATE TABLE a (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO A VALUES (1,2);
CREATE TABLE b (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO B VALUES (100, 101);
SELECT * from d(array['a','b']);
c
-----
1
2
100
101
(4 rows)

PostgreSQL 9.3: missing FROM-clause entry for table

I have a table with two columns.
Example:
create table t1
(
cola varchar,
colb varchar
);
Now I want to insert the rows from function.
In the function: I want to use two parameters which is of type varchar to insert the values into the above table. I am passing the string to insert into the table.
I am passing two string of characters as a parameters to the function:
Parameters:
cola varchar = 'a,b,c,d';
colb varchar = 'e,f,g,h';
The above parameters have to insert into the table like this:
cola colb
----------------
a e
b f
c g
d h
My try:
create or replace function fun_t1(cola varchar,colb varchar)
returns void as
$body$
Declare
v_Count integer;
v_i integer = 0;
v_f1 text;
v_cola varchar;
v_colb varchar;
v_query varchar;
Begin
drop table if exists temp_table;
create temp table temp_table
(
cola varchar,
colb varchar
);
v_Count := length(cola) - length(replace(cola, ',', ''));
raise info '%',v_Count;
WHILE(v_i<=v_Count) LOOP
INSERT INTO temp_table
SELECT LEFT(cola,CHARINDEX(',',cola||',',0)-1)
,LEFT(colb,CHARINDEX(',',colb||',',0)-1);
cola := overlay(cola placing '' from 1 for CHARINDEX(',',cola,0));
colb := overlay(colb placing '' from 1 for CHARINDEX(',',colb,0));
v_i := v_i + 1;
END LOOP;
for v_f1 IN select * from temp_table loop
v_cola := v_f1.cola; /* Error occurred here */
v_colb := v_f1.colb;
v_query := 'INSERT INTO t1 values('''||v_cola||''','''||v_colb||''')';
execute v_query;
end loop;
end;
$body$
language plpgsql;
Note: In the function I have used temp_table that is according to the requirement which
I am using for the other use also in the function which I have not display here.
Calling function:
SELECT fun_t1('a,b,c','d,e,f');
Getting an error:
missing FROM-clause entry for table "v_f1"
Try this way using split_part() : -
create or replace function ins_t1(vala varchar,valb varchar,row_cnt int) returns void as
$$
BEGIN
FOR i IN 1..row_cnt LOOP -- row_cnt is the number rows you need to insert (ex. 4 or 5 or whatever it is)
insert into t1 (cola,colb)
values (
(select split_part(vala,',',i))
,(select split_part(valb,',',i))
);
END LOOP;
END;
$$
language plpgsql
function call :select ins_t1('a,b,c,d','e,f,g,h',4)
As mike-sherrill-cat-recall said in his answer by using regexp_split_to_table
create or replace function fn_t1(vala varchar,valb varchar) returns void
as
$$
insert into t1 (cola, colb)
select col1, col2 from (select
trim(regexp_split_to_table(vala, ',')) col1,
trim(regexp_split_to_table(valb, ',')) col2)t;
$$
language sql
function call :select fn_t1('a,b,c,d','e,f,g,h')
If there's no compelling reason to use a function for this, you can just split the text using a regular expression. Here I've expressed your arguments as a common table expression, but that's just for convenience.
with data (col1, col2) as (
select 'a, b, c, d'::text, 'e, f, g, h'::text
)
select
trim(regexp_split_to_table(col1, ',')) as col_a,
trim(regexp_split_to_table(col2, ',')) as col_b
from data;
col_a col_b
--
a e
b f
c g
d h
If there is a compelling reason to use a function, just wrap a function definition around that SELECT statement.
create function strings_to_table(varchar, varchar)
returns table (col_a varchar, col_b varchar)
as
'select trim(regexp_split_to_table($1, '','')),
trim(regexp_split_to_table($2, '',''));'
language sql
stable
returns null on null input;
select * from strings_to_table('a,b,c,d', 'e,f, g, h');
col_a col_b
--
a e
b f
c g
d h
My personal preference is usually to build functions like this to return tables rather than inserting into tables. To insert, I'd usually write a SQL statement like this.
insert into foo (col_a, col_b)
select col_a, col_b from strings_to_table('a,b,c,d', 'e,f,g,h');
The simpest way is using plpython for this.
create or replace function fill_t1(cola varchar, colb varchar) returns void as $$
for r in zip(cola.split(','), colb.split(',')):
plpy.execute(plpy.prepare('insert into t1(cola, colb) values ($1, $2)', ['varchar', 'varchar']), [r[0], r[1]])
$$ language plpythonu;
The result:
# create table t1 (cola varchar, colb varchar);
CREATE TABLE
# select fill_t1('1,2,3', '4,5,6');
fill_t1
---------
(1 row)
# select * from t1;
cola | colb
------+------
1 | 4
2 | 5
3 | 6
(3 rows)
You can read about Python zip function here: https://docs.python.org/2/library/functions.html#zip