Postgres Functions: Getting the Return Table Column Details - postgresql

I feel the need to get the column names and data types of the table returned by any function that has a 'record' return data type, because...
A key process in an existing SQL Server-based system makes use of a stored procedure that takes a user-defined function as a parameter. An initial step gets the column names and types of the table returned by the function that was passed as a parameter.
In Postgres 13 I can use pg_proc.prorettype and the corresponding pg_type to find functions that return record types...that's a start. I can also use pg_get_function_result() to get the string containing the information I need. But, it's a string, and while I ultimately will have to assemble a very similar string, this is just one application of the info. Is there a tabular equivalent containing (column_name, data_type, ordinal_position), or do I need to do that myself?
Is there access to a composite data type the system may have created when such a function is created?
One option that I think will work for me, but I think it's a little weird, is to:
> create temp table t as select * from function() limit 0;
then look that table up in info_schema.columns, assemble what I need and drop the temp table...putting all of this into a function.

You can query the catalog table pg_proc, which contains all the required information:
SELECT coalesce(p.na, 'column' || p.i),
p.ty::regtype,
p.i
FROM pg_proc AS f
CROSS JOIN LATERAL unnest(
coalesce(f.proallargtypes, ARRAY[f.prorettype]),
f.proargmodes,
f.proargnames
)
WITH ORDINALITY AS p(ty,mo,na,i)
WHERE f.proname = 'interval_ok'
AND coalesce(p.mo, 'o') IN ('o', 't')
ORDER BY p.i;

Related

Process a row with unknown structure in a cursor

I am new to using cursors for looping through a set of rows. But so far I had prior knowledge of which columns I am about to read.
E.g.
DECLARE db_cursor FOR
SELECT Column1, Column2
FROM MyTable
DECLARE #ColumnOne VARCHAR(50), #ColumnTwo VARCHAR(50)
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #ColumnOne, #ColumnTwo
...
But the tables I am about to read into my key/value table have no specific structure and I should be able to process them one row at a time. How, using a nested cursor, can I loop through all the columns of the fetched row and process them according to their type and name?
TSQL cursors are not really designed to read data from tables of unknown structure. The two possibilities I can think of to achieve something in that direction are:
First read the column names of an unknown table from the Information Schema Views (see System Information Schema Views (Transact-SQL)). Then use dynamic SQL to create the cursor.
If you simply want to get any columns as a large string value, you might also try a simple SELECT * FROM TABLE_NAME FOR XML AUTO and further process the retrieved data for your purposes (see FOR XML (SQL Server)).
SQL is not very good in dealing with sets generically. In most cases you must know column names, data types and much more in advance. But there is XQuery. You can transform any SELECT into XML rather easily and use the mighty abilities to deal with generic structures there. I would not recommend this, but it might be worth a try:
CREATE PROCEDURE dbo.Get_EAV_FROM_SELECT
(
#SELECT NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #tmptbl TABLE(TheContent XML);
DECLARE #cmd NVARCHAR(MAX)= N'SELECT (' + #SELECT + N' FOR XML RAW, ELEMENTS XSINIL);';
INSERT INTO #tmptbl EXEC(#cmd);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #tmptbl t
CROSS APPLY t.TheContent.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c)
END;
GO
EXEC Get_EAV_FROM_SELECT #SELECT='SELECT TOP 10 o.object_id,o.* FROM sys.objects o';
GO
--Clean-Up for test purpose
DROP PROCEDURE Get_EAV_FROM_SELECT;
The idea in short
The select is passed into the procedure as string. With the SP we create a statement dynamically and create XML from it.
The very first column is considered to be the Row's ID, if not (like in sys.objects) we can write the SELECT and force it that way.
The inner SELECT will read each row and return a classical EAV-list.

How to update a value using string_agg without using a function?

I solved this problem with a function and I like my solution, but I want to know if there is a way to solve this problem without using functions. Here is the thing:
There are four tables that are relevant to this:
entities: entities of the system (tenants)
members: members of an entity
member_sets: sets of members
members_and_sets: table to join members and sets (many to many)
The member_sets table has a column named bits which is a binary representation of the set, so, for example, if an entity has 5 members and one specific set has the third member, the value of the bits column is 00100, the entity has three special kinds of sets: universe, empty and unit, their binary repesentation is: 11111, 00000 and 10000 respectively, assuming the unit set has the first member.
The challenge is keep this binary representation of the set up to date; Whenever one member is added to the entity, all binary representations must be updated. This is easy to do with a trigger and a function, my solution is this:
CREATE FUNCTION setbits(INTEGER) RETURNS VARBIT AS
$$SELECT STRING_AGG(belongs, '')::VARBIT AS setbits FROM (
SELECT LEAST(COALESCE(members_and_sets.set_id, 0), 1)::text AS belongs
FROM members LEFT JOIN members_and_sets
ON members.id=members_and_sets.member_id
AND members_and_sets.set_id=$1
GROUP BY members.id,members_and_sets.set_id
ORDER BY members.id)
AS bitsring;$$
LANGUAGE SQL
RETURNS NULL ON NULL INPUT;
-- calling this function in a trigger after inserting a new member:
UPDATE member_sets ms
SET bits=setbits(ms.id)
WHERE ms.entity_id=NEW.entity_id;
Now my question is: Can I do this without using a function? I tried with CTE but apparently I'm to noob to accomplish this; I couldn't pass the set_id to the must inner query so my solution was to wrap the query in a function and pass the set_id as an argument to the function. Again, this solution works perfectly, I just want to know if there is no way I can do this without a function call.
As your function body is simply a SELECT, you should be able to replace the function call with a subquery:
UPDATE member_sets ms
SET bits= (
SELECT STRING_AGG(belongs, '')::VARBIT AS setbits FROM (
SELECT LEAST(COALESCE(members_and_sets.set_id, 0), 1)::text AS belongs
FROM members LEFT JOIN members_and_sets
ON members.id=members_and_sets.member_id
AND members_and_sets.set_id=ms.id
GROUP BY members.id,members_and_sets.set_id
ORDER BY members.id)
AS bitsring
)
WHERE ms.entity_id=NEW.entity_id;

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Postgresql dblink

Trying to be lazy when looking at an example
SELECT realestate.address, realestate.parcel, s.sale_year, s.sale_amount,
FROM realestate INNER JOIN
dblink('dbname=somedb port=5432 host=someserver
user=someuser password=somepwd',
'SELECT parcel_id, sale_year,
sale_amount FROM parcel_sales')
AS s(parcel_id char(10),sale_year int, sale_amount int)
Is there a way of getting the AS section filled in from the table?
I'm copying data from tables of the same name and structure on different servers.
If I can get the structure to copy from the existing table, it will save me a lot of time
Thanks
Bruce
The answer is: No. See the doc:
Since dblink can be used with any query, it is declared to return record, rather than specifying any particular set of columns. This means that you must specify the expected set of columns in the calling query — otherwise PostgreSQL would not know what to expect.
http://www.postgresql.org/docs/9.1/static/contrib-dblink-function.html
Edit: by the way, for a table or a view, you can get the fields name and type in a first query:
select column_name
from information_schema.columns
where table_name = 'your_table_or_view';
You could then use it to fill the fields declaration.
Alexis

SQL Views - no variables?

Is it possible to declare a variable within a View? For example:
Declare #SomeVar varchar(8) = 'something'
gives me the syntax error:
Incorrect syntax near the keyword 'Declare'.
You are correct. Local variables are not allowed in a VIEW.
You can set a local variable in a table valued function, which returns a result set (like a view does.)
http://msdn.microsoft.com/en-us/library/ms191165.aspx
e.g.
CREATE FUNCTION dbo.udf_foo()
RETURNS #ret TABLE (col INT)
AS
BEGIN
DECLARE #myvar INT;
SELECT #myvar = 1;
INSERT INTO #ret SELECT #myvar;
RETURN;
END;
GO
SELECT * FROM dbo.udf_foo();
GO
You could use WITH to define your expressions. Then do a simple Sub-SELECT to access those definitions.
CREATE VIEW MyView
AS
WITH MyVars (SomeVar, Var2)
AS (
SELECT
'something' AS 'SomeVar',
123 AS 'Var2'
)
SELECT *
FROM MyTable
WHERE x = (SELECT SomeVar FROM MyVars)
EDIT: I tried using a CTE on my previous answer which was incorrect, as pointed out by #bummi. This option should work instead:
Here's one option using a CROSS APPLY, to kind of work around this problem:
SELECT st.Value, Constants.CONSTANT_ONE, Constants.CONSTANT_TWO
FROM SomeTable st
CROSS APPLY (
SELECT 'Value1' AS CONSTANT_ONE,
'Value2' AS CONSTANT_TWO
) Constants
#datenstation had the correct concept. Here is a working example that uses CTE to cache variable's names:
CREATE VIEW vwImportant_Users AS
WITH params AS (
SELECT
varType='%Admin%',
varMinStatus=1)
SELECT status, name
FROM sys.sysusers, params
WHERE status > varMinStatus OR name LIKE varType
SELECT * FROM vwImportant_Users
also via JOIN
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers INNER JOIN params ON 1=1
WHERE status > varMinStatus OR name LIKE varType
also via CROSS APPLY
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers CROSS APPLY params
WHERE status > varMinStatus OR name LIKE varType
Yes this is correct, you can't have variables in views
(there are other restrictions too).
Views can be used for cases where the result can be replaced with a select statement.
Using functions as spencer7593 mentioned is a correct approach for dynamic data. For static data, a more performant approach which is consistent with SQL data design (versus the anti-pattern of writting massive procedural code in sprocs) is to create a separate table with the static values and join to it. This is extremely beneficial from a performace perspective since the SQL Engine can build effective execution plans around a JOIN, and you have the potential to add indexes as well if needed.
The disadvantage of using functions (or any inline calculated values) is the callout happens for every potential row returned, which is costly. Why? Because SQL has to first create a full dataset with the calculated values and then apply the WHERE clause to that dataset.
Nine times out of ten you should not need dynamically calculated cell values in your queries. Its much better to figure out what you will need, then design a data model that supports it, and populate that data model with semi-dynamic data (via batch jobs for instance) and use the SQL Engine to do the heavy lifting via standard SQL.
What I do is create a view that performs the same select as the table variable and link that view into the second view. So a view can select from another view. This achieves the same result
How often do you need to refresh the view? I have a similar case where the new data comes once a month; then I have to load it, and during the loading processes I have to create new tables. At that moment I alter my view to consider the changes.
I used as base the information in this other question:
Create View Dynamically & synonyms
In there, it is proposed to do it 2 ways:
using synonyms.
Using dynamic SQL to create view (this is what helped me achieve my result).