plpgsql return column name and column stats as a table - plpgsql

I am trying to use pgplpsql in a postgres database to:
- loop through all columns in a schema, and for every column that is double precision I want to return a table containing fields 'table name, column name, min value, max value, mean value, median value.
-So far I have managed to return the min, max, mean of these fields - but not in a table despite defining the table in the 'returns statement'.
Question:
How do I return a table properly
How do I include the column name and table name for the columns within this table? I have tried many things with a full range of error messages.
Becky
DROP FUNCTION household.numeric_stats(schemanm text);
CREATE OR REPLACE FUNCTION household.numeric_stats(schemanm text)
returns table(min double precision, max double precision, avg double precision)as $$
DECLARE
cname text;
tname text;
BEGIN
for cname,tname in SELECT column_name::text col,table_name::text tble FROM information_schema.columns
where table_schema = schemanm and data_type in ('double precision')
and table_name::text not in ('ap_household','derived_forest_income', 'derived_product_income','derivedproduct_income','view_income_overview_by_household')
LOOP
RAISE NOTICE 'cname is: % from %', cname, tname;
return query
execute format('select min(%I), max(%I), avg(%I) from %I.%I where %I != ''NaN''', cname, cname, cname, schemanm, tname, cname);
END;
$$
LANGUAGE plpgsql;

Would work like this:
CREATE OR REPLACE FUNCTION household.numeric_stats(schemanm text)
RETURNS TABLE(tname text, cname text
, min float8, max float8
, avg float8, median float8) AS
$func$
BEGIN
FOR tname, cname IN
SELECT table_name::text, column_name::text
FROM information_schema.columns
WHERE table_schema = schemanm
AND data_type = 'double precision'
AND table_name <> ALL ('{ap_household,derived_forest_income, derived_product_income,derivedproduct_income,view_income_overview_by_household}'::varchar[])
LOOP
-- RAISE NOTICE 'tname: %, cname: %', tname, cname;
RETURN QUERY EXECUTE format(
$f$SELECT $1, $2, min(%1$I), max(%1$I), avg(%1$I), median(%1$I)
FROM %2$I.%3$I
WHERE %1$I <> 'NaN'$f$, cname, schemanm, tname)
USING tname, cname;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM household.numeric_stats('myschema');
You need to create the median aggregate functions before you can use it. Instructions in the Postgres Wiki:
https://wiki.postgresql.org/wiki/Aggregate_Median

Related

PostgreSQL: function to display columns in alphabetical order

On PostgreSQL, I need to see the table's columns in alphabetical order, so I'm using the query:
SELECT column_name, data_type FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'organizations' ORDER BY column_name ASC;
I use it a lot every day, so I want to create a function:
CREATE OR REPLACE FUNCTION seecols(table_name text)
RETURNS TABLE (column_name varchar, data_type varchar)
AS $func$
DECLARE
_query varchar;
BEGIN
-- Displays columns by alphabetic order
_query := 'SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '''||table_name||''' ';
RETURN QUERY EXECUTE _query;
END;
$func$ LANGUAGE plpgsql;
But when I try:
SELECT seecols('organizations');
I'm getting:
**structure of query does not match function result type**
I guess the line "RETURNS TABLE (column_name varchar, data_type varchar)" is wrongly defined. But since this is my first time using plpgsql, I don't know how to make it more dynamic.
You don't need neither dynamic sql nor plpgsql here. Just embed your sql query into a sql function :
CREATE OR REPLACE FUNCTION seecols (IN t_name text, OUT column_name varchar, OUT data_type varchar)
RETURNS setof record LANGUAGE sql AS $$
SELECT column_name, data_type
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = t_name
ORDER BY column_name ASC ;
$$ ;
see dbfiddle

PL/PGSQL Operator does not exist: information_schema.sql_identifier

I'm writing a function to write a dynamic query.
This is the original query without function
SELECT b.column_name, a.default_flag, CAST(AVG(a.payment_ratio) AS NUMERIC), CAST(MAX(a.payment_ratio) AS NUMERIC) FROM user_joined a, information_schema.columns b where b.column_name = 'payment_ratio' group by a.default_flag, b.column_name
Then, I put it into a function like this
CREATE OR REPLACE FUNCTION test4(col text)
RETURNS TABLE(
col_name TEXT,
default_flag bigint,
average NUMERIC,
maximum NUMERIC) AS $$
BEGIN
RETURN QUERY EXECUTE FORMAT
('SELECT CAST(b.column_name AS TEXT), a.default_flag, CAST(AVG(a.'||col||') AS NUMERIC), CAST(MAX(a.'||col||') AS NUMERIC) FROM user_joined a, information_schema.columns b where b.column_name = %I group by a.default_flag, b.column_name', col);
END; $$
LANGUAGE PLPGSQL;
When I try to run
SELECT * FROM test4('payment_ratio')
I get this error
ERROR: operator does not exist: information_schema.sql_identifier = double precision
LINE 1: ... information_schema.columns b where b.column_name = payment_...
Is there anything wrong with my function?
The columns in information_schema have the (somewhat strange) data type sql_identifier and that can't be compared directly to a text value. You need to cast it in the SQL query.
You are also using the %I incorrectly. In the join condition the column name is a string constant so you need to use %L there. In the SELECT list, it's an identifier, so you need to use %I there.
CREATE OR REPLACE FUNCTION test4(col text)
RETURNS TABLE(
col_name TEXT,
default_flag bigint,
average NUMERIC,
maximum NUMERIC) AS $$
BEGIN
RETURN QUERY EXECUTE
FORMAT ('SELECT CAST(b.column_name AS TEXT),
a.default_flag, CAST(AVG(a.%I) AS NUMERIC),
CAST(MAX(a.'||col||') AS NUMERIC)
FROM user_joined a
JOIN information_schema.columns b ON b.column_name::text = %L
group by a.default_flag, b.column_name', col, col);
END; $$
LANGUAGE PLPGSQL;

How to do postgresql select query funciton using parameter?

I want to create a postgresql funciton that returns records. But if I pass an id parameter, it should be add in where clause. if I do not pass or null id parameter, where clasuse will not add the query.
CREATE OR REPLACE FUNCTION my_func(id integer)
RETURNS TABLE (type varchar, total bigint) AS $$
DECLARE where_clause VARCHAR(200);
BEGIN
IF id IS NOT NULL THEN
where_clause = ' group_id= ' || id;
END IF ;
RETURN QUERY SELECT
type,
count(*) AS total
FROM
table1
WHERE
where_clause ???
GROUP BY
type
ORDER BY
type;
END
$$
LANGUAGE plpgsql;
You can either use one condition that takes care of both situations (then you don't need PL/pgSQL to begin with):
CREATE OR REPLACE FUNCTION my_func(p_id integer)
RETURNS TABLE (type varchar, total bigint)
AS $$
SELECT type,
count(*) AS total
FROM table1
WHERE p_id is null or group_id = p_id
GROUP BY type
ORDER BY type;
$$
LANGUAGE sql;
But an OR condition like that is typically not really good for performance. The second option you have, is to simply run two different statements:
CREATE OR REPLACE FUNCTION my_func(p_id integer)
RETURNS TABLE (type varchar, total bigint)
AS $$
begin
if (p_id is null) then
return query
SELECT type,
count(*) AS total
FROM table1
GROUP BY type
ORDER BY type;
else
return query
SELECT type,
count(*) AS total
FROM table1
WHERE group_id = p_id
GROUP BY type
ORDER BY type;
end if;
END
$$
LANGUAGE plgpsql;
And finally you can build a dynamic SQL string depending the parameter:
CREATE OR REPLACE FUNCTION my_func(p_id integer)
RETURNS TABLE (type varchar, total bigint)
AS $$
declare
l_sql text;
begin
l_sql := 'SELECT type, count(*) AS total FROM table1 '
if (p_id is not null) then
l_sql := l_sql || ' WHERE group_id = '||p_id;
end if;
l_sql := l_sql || ' GROUP BY type ORDER BY type';
return query execute l_sql;
end;
$$
LANGUAGE plpgsql;
Nothing is required just to use the variable as it is for more info please refer :plpgsql function parameters

Get IDs from multiple columns in multiple tables as one set or array

I have multiple tables with each two rows of interest: connection_node_start_id and connection_node_end_id. My goal is to get a collection of all those IDs, either as a flat ARRAY or as a new TABLE consisting of one row.
Example output ARRAY:
result = {1,4,7,9,2,5}
Example output TABLE:
IDS
-------
1
4
7
9
2
5
My fist attempt is somewhat clumsy and does not work properly as the SELECT statement just returns one row. It seems there must be a simple way to do this, can someone point me into the right direction?
CREATE OR REPLACE FUNCTION get_connection_nodes(anyarray)
RETURNS anyarray AS
$$
DECLARE
table_name varchar;
result integer[];
sel integer[];
BEGIN
FOREACH table_name IN ARRAY $1
LOOP
RAISE NOTICE 'table_name(%)',table_name;
EXECUTE 'SELECT ARRAY[connection_node_end_id,
connection_node_start_id] FROM ' || table_name INTO sel;
RAISE NOTICE 'sel(%)',sel;
result := array_cat(result, sel);
END LOOP;
RETURN result;
END
$$
LANGUAGE 'plpgsql';
Test table:
connection_node_start_id | connection_node_end_id
--------------------------------------------------
1 | 4
7 | 9
Call:
SELECT get_connection_nodes(ARRAY['test_table']);
Result:
{1,4} -- only 1st row, rest is missing
For Postgres 9.3+
CREATE OR REPLACE FUNCTION get_connection_nodes(text[])
RETURNS TABLE (ids int) AS
$func$
DECLARE
_tbl text;
BEGIN
FOREACH _tbl IN ARRAY $1
LOOP
RETURN QUERY EXECUTE format('
SELECT t.id
FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, _tbl);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Related answer on dba.SE:
SELECT DISTINCT on multiple columns
Or drop the loop and concatenate a single query. Probably fastest:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT t.id FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
Related:
Loop through like tables in a schema
LATERAL was introduced with Postgres 9.3.
For older Postgres
You can use the set-returning function unnest() in the SELECT list, too:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(
'SELECT unnest(ARRAY[connection_node_start_id
, connection_node_end_id]) FROM ' || tbl
, ' UNION ALL '
)
FROM (SELECT quote_ident(tbl) AS tbl FROM unnest($1) tbl) t
);
END
$func$ LANGUAGE plpgsql;
Should work with pg 8.4+ (or maybe even older). Works with current Postgres (9.4) as well, but LATERAL is much cleaner.
Or make it very simple:
CREATE OR REPLACE FUNCTION get_connection_nodes3(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT connection_node_start_id FROM %1$I
UNION ALL
SELECT connection_node_end_id FROM %1$I'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
format() was introduced with pg 9.1.
Might be a bit slower with big tables because each table is scanned once for every column (so 2 times here). Sort order in the result is different, too - but that does not seem to matter for you.
Be sure to sanitize escape identifiers to defend against SQL injection and other illegal syntax. Details:
Table name as a PostgreSQL function parameter
The EXECUTE ... INTO statement can only return data from a single row:
If multiple rows are returned, only the first will be assigned to the INTO variable.
In order to concatenate values from all rows you have to aggregate them first by column and then append the arrays:
EXECUTE 'SELECT array_agg(connection_node_end_id) ||
array_agg(connection_node_start_id) FROM ' || table_name INTO sel;
You're probably looking for something like this:
CREATE OR REPLACE FUNCTION d (tblname TEXT [])
RETURNS TABLE (c INTEGER) AS $$
DECLARE sql TEXT;
BEGIN
WITH x
AS (SELECT unnest(tblname) AS tbl),
y AS (
SELECT FORMAT('
SELECT connection_node_end_id
FROM %s
UNION ALL
SELECT connection_node_start_id
FROM %s
', tbl, tbl) AS s
FROM x)
SELECT string_agg(s, ' UNION ALL ')
INTO sql
FROM y;
RETURN QUERY EXECUTE sql;
END;$$
LANGUAGE plpgsql;
CREATE TABLE a (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO A VALUES (1,2);
CREATE TABLE b (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO B VALUES (100, 101);
SELECT * from d(array['a','b']);
c
-----
1
2
100
101
(4 rows)

PostgreSQL 9.3: Check only time from timestamp

I have the following table with one field of type timestamp.
Create table Test_Timestamp
(
ColumnA timestamp
);
Now inserting some records for demonstration:
INSERT INTO Test_Timestamp VALUES('1900-01-01 01:21:15'),
('1900-01-01 02:11:25'),
('1900-01-01 12:52:10'),
('1900-01-01 03:20:05');
Now I have created function Function_Test with two parameters namely St_time and En_Time which
are of type varchar, In which I only pass the time like 00:00:01. And after that Function has
to return the table with that condition of two time's parameters.
CREATE OR REPLACE FUNCTION Function_Test
(
St_Time varchar,
En_Time varchar
)
RETURNS TABLE
(
columX timestamp
)
AS
$BODY$
Declare
sql varchar;
wher varchar;
BEGIN
wher := 'Where columna BETWEEN '|| to_char(cast(St_Time as time),'''HH24:MI:SS''') ||' AND '|| to_char(cast(En_Time as time),'''HH24:MI:SS''') ||'';
RAISE INFO '%',wher;
sql := 'SELECT * FROM Test_Timestamp ' || wher ;
RAISE INFO '%',sql;
RETURN QUERY EXECUTE sql;
END;
$BODY$
LANGUAGE PLPGSQL;
---Calling function
SELECT * FROM Function_Test('00:00:00','23:59:59');
But getting an error:
ERROR: invalid input syntax for type timestamp: "00:00:01"
LINE 1: ...ELECT * FROM Test_Timestamp where ColumnA BETWEEN '00:00:01'...
You can cast the column to a time: ColumnA::time
You should also not pass a time (or a date, or a timestamp) as a varchar. And you don't need dynamic SQL or a PL/pgSQL function for this:
CREATE OR REPLACE FUNCTION Function_Test(St_Time time, en_Time time)
RETURNS TABLE (columX timestamp)
AS
$BODY$
SELECT *
FROM Test_Timestamp
where columna::time between st_time and en_time;
$BODY$
LANGUAGE sql;
Call it like this:
select *
from Function_Test(time '03:00:00', time '21:10:42');
You can use extract to the hour, minute and second
http://www.postgresql.org/docs/9.3/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT