PostgreSQL function using dblink, can I use input variables to define table structure - and return the result as a variable? - postgresql

I am trying to create a function in PostgreSQL that contains a dblink query. The definition of the field that it returns could vary, so I would like to set the field name and data types as input parameters. The error being returned suggests that it isn't referencing the input variables when I name them,
CREATE OR REPLACE FUNCTION validation.dblink_date_check(
username text,
pass text,
srcschemaname text,
srctablename text,
srcdbname text,
datefield text,
datetype text DEFAULT 'date'::text,
srchostname text DEFAULT 'xxx' text,
srcport integer DEFAULT 1234,
OUT max_date timestamp)
RETURNS timestamp
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$
BEGIN
max_date := (
select c.*
from dblink('dbname='||srcdbname||' host='||srchostname||' port='||srcport||' user='||username||' password='||pass||'',
'select
max('||datefield||') from '||srcschemaname||'.'||srctablename||' a')
--this doesn't work
as c(datefield datetype
));
/*--this works
as c(datefield date
));;*/
raise notice '% - (source): %', date_trunc('second', clock_timestamp()::timestamp),max_date;
return;
End;
$BODY$;
When it doesn't work the error I get is,
ERROR: type "datetype" does not exist
LINE 6: as c(datefield datetype
^
Is there a way to structure the syntax to work (as sometimes it might not be a date, it might be a timestamp, or an integer etc)? I have used 'execute' in other functions (to insert), but using it to select the variable 'max_date' doesn't work.

Related

How to execute a query with a column name passed as parameter to a plpgsql function?

I have a table with multiple columns in PostgreSQL. I try to make a function returning a table with a few default columns and a variable column. The column name should be passed as function parameter. Example:
SELECT * FROM get_gas('temperature');
This is my code right now:
CREATE OR REPLACE FUNCTION get_gas(gas text)
RETURNS TABLE (id INTEGER, node_id INTEGER,
gas_name DOUBLE PRECISION,
measurement_timestamp timestamp without time zone )
AS
$$
BEGIN
SELECT measurements_lora.id, measurements_lora.node_id, gas, measurements_lora.measurement_timestamp
AS measure
FROM public.measurements_lora;
END
$$ LANGUAGE plpgsql;
When passing, for example, 'temperature' as column name (gas), I want to get a table with these columns from the function call.
id - node_id - temperature - measurement_timestamp
How would I achieve this?
You can use EXECUTE statement.
CREATE OR REPLACE FUNCTION get_gas(gas text) RETURNS TABLE (f1 INTEGER, f2 INTEGER, f3 DOUBLE PRECISION, f4 timestamp without time zone ) AS
$$
DECLARE
sql_to_execute TEXT;
BEGIN
SELECT 'SELECT measurements_lora.id,
measurements_lora.node_id, '
|| gas ||',
measurements_lora.measurement_timestamp AS measure
FROM public.measurements_lora '
INTO sql_to_execute;
RETURN QUERY EXECUTE sql_to_execute;
END
$$ LANGUAGE plpgsql;
This will create a variable sql_to_execute with your field and. QUERY EXECUTE will execute your interpreted query.
EDIT 1: Look at the another answer the concernings about security issues.
If you really need dynamic SQL in a PL/pgSQL function (which you don't), be sure to defend against SQL injection! Like:
CREATE OR REPLACE FUNCTION get_gas(gas text)
RETURNS TABLE (id integer
, node_id integer
, gas_name double precision
, measurement_timestamp timestamp)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
'SELECT m.id, m.node_id, m.%I, m.measurement_timestamp
FROM public.measurements_lora m'
, gas
);
END
$func$;
The format specifier %I in format() double-quotes identifiers where needed,
See:
SQL injection in Postgres functions vs prepared queries
Insert text with single quotes in PostgreSQL

Using parameter as column name in Postgres function

I have a Postgres table bearing the following form
CREATE TABLE "public"."days"
(
"id" integer NOT NULL,
"day" character varying(9) NOT NULL,
"visits" bigint[] NOT NULL,
"passes" bigint[] NOT NULL
);
I would like to write a function that allows me to return the visits or the passees column as its result for a specified id. My first attempt goes as follows
CREATE OR REPLACE FUNCTION day_entries(INT,TEXT) RETURNS BIGINT[] LANGUAGE sql AS
'SELECT $2 FROM days WHERE id = $1;'
which fails with an error along the lines of
return type mismatch in function declared to return bigint[]
DETAIL: Actual return type is text.
If I put in visits in place of the $2 things work just as expected. It would make little sense to define several functions to match different columns from the days table. Is there a way to pass the actual column name as a parameter while still keeping Postgres happy?
You can't use parameters as identifiers (=column name), you need dynamic SQL for that. And that requires PL/pgSQL:
CREATE OR REPLACE FUNCTION day_entries(p_id int, p_column text)
RETURNS BIGINT[]
AS
$$
declare
l_result bigint[];
begin
execute format('SELECT %I FROM days WHERE id = $1', p_column)
using p_id
into l_result;
return l_result;
end;
$$
LANGUAGE plpgsql;
format() properly deals with identifiers when building dynamic SQL. The $1 is a parameter placeholder and the value for that is passed with the using p_id clause of the execute statement.

Execute a dynamic crosstab query

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
Here's the function:
create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
dynsql2 = 'select * from crosstab (
''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
''select distinct '||colc||' from '||tablename||' order by 1''
)
as ct (
'||rowc||' varchar,'||columnlist||'
);';
return dynsql2;
end
$$;
So now I can call the function:
select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');
Which returns (because the return type of the function is varchar):
select * from crosstab (
'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
from globalpayments
group by 1,2
order by 1,2'
, 'select distinct currency
from globalpayments
order by 1'
) as ct ( month varchar,CAD text,EUR text,GBP text,USD text );
How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:
Edit 1
This function comes close, but I need it to return more than just the first column of the first record
Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?
create or replace function eval( sql text ) returns text as $$
declare
as_txt text;
begin
if sql is null then return null ; end if ;
execute sql into as_txt ;
return as_txt ;
end;
$$ language plpgsql
usage: select * from eval($$select * from analytics limit 1$$)
However it just returns the first column of the first record :
eval
----
2015
when the actual result looks like this:
Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.
A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.
Refactor a PL/pgSQL function to return the output of various SELECT queries
You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.
There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.
Selecting multiple max() values using a single SQL statement
I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.
A shorter and cleaner implementation could look like this:
CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
, _expr text -- still vulnerable to SQL injection!
, _type regtype)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_cat_list text;
_col_list text;
BEGIN
-- generate categories for xtab param and col definition list
EXECUTE format(
$$SELECT string_agg(quote_literal(x.cat), '), (')
, string_agg(quote_ident (x.cat), %L)
FROM (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
, ' ' || _type || ', ', _cat, _tbl)
INTO _cat_list, _col_list;
-- generate query string
RETURN format(
'SELECT * FROM crosstab(
$q$SELECT %I, %I, %s
FROM %I
GROUP BY 1, 2 -- only works if the 3rd column is an aggregate expression
ORDER BY 1, 2$q$
, $c$VALUES (%5$s)$c$
) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);
END
$func$;
Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:
PostgreSQL Crosstab Query
This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.
Table name as a PostgreSQL function parameter
However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.
SQL injection in Postgres functions vs prepared queries
Scans the table only once for both lists of categories and should be a bit faster.
Still can't return completely dynamic row types since that's strictly not possible.
Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.
Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).
So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.
For example, we could do something like this with a wrapper function but the same logic could be folded into your function:
CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar,
cellc varchar, celldatatype varchar)
returns setof record language plpgsql as $$
DECLARE outrow record;
BEGIN
FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
LOOP
RETURN NEXT outrow
END LOOP;
END;
$$;
Then you supply the record structure on calling the function just like you do with crosstab.
Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

How to derive a column name in the return type from input parameters to the function?

Using Postgres 9.5 I have built this function:
CREATE or REPLACE FUNCTION func_getratio_laglag(_numeratorLAG text, _n1 int, _denominatorLAG text, _n2 int, _table text)
RETURNS TABLE (date_t timestamp without time zone, customer_code text, index text, ratio real) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT
date_t,
customer_code,
index,
(LAG('||quote_ident(_numeratorLAG)||',' || quote_literal(_n1)||') OVER W / LAG('||quote_ident(_denominatorLAG)||','|| quote_literal(_n2)||') OVER W) '
|| ' FROM ' || quote_ident(_table)
|| ' WINDOW W AS (PARTITION BY customer_code ORDER BY date_t asc);';
END;
$BODY$ LANGUAGE plpgsql;
All the function does is allow me the ability to pick a 2 different columns from a specified table and calculate a ratio between them based on different lag windows. To execute the function above I use the following query:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist');
This returns a table with the column labels date_t, customer_code, index and ratio. I have really struggled on how to output ratio as a dynamic column label. That is, I would like to make it contingent on the input parameters e.g. if I ran the select query above then I would like the column labels date_t, customer_code, index and order_first_1_order_last_0.
I am stuck, any advice or hints?
How to derive a column name in the return type from input parameters to the function?
The short answer: Not possible.
SQL is very rigid about column data types and names. Those have to be declared before or at call time at the latest. No exceptions. No truly dynamic column names.
I can think of 3 half-way workarounds:
1. Column aliases
Use your function as is (or rather the audited version I suggest below) and add column aliases in the function call:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist')
AS f(date_t, customer_code, index, order_first_1_order_last_0)
I would do that.
2. Column definition list
Create your function to return anonymous records:
RETURNS SETOF record
Then you have to provide a column definition list with every call:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist')
AS f(date_t timestamp, customer_code text, index text, order_first_1_order_last_0 real)
I would not do that.
3. Use a registered row type as polymorphic input / output type.
Mostly useful if you happen to have row types at hand. You could register a row type on the fly by crating a temporary table, but that seems like overkill for your use case.
Details in the last chapter of this answer:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Function audit
Use format() to make building query string much more safe and simple.
Read the manual if you are not familiar with it.
CREATE OR REPLACE FUNCTION func_getratio_laglag(
_numerator_lag text, _n1 int
, _denominator_lag text, _n2 int
, _table regclass)
RETURNS TABLE (date_t timestamp, customer_code text, index text, ratio real) AS
$func$
BEGIN
RETURN QUERY EXECUTE format (
'SELECT date_t, customer_code, index
, (lag(%I, %s) OVER w / lag(%I, %s) OVER w) -- data type must match
FROM %s
WINDOW w AS (PARTITION BY customer_code ORDER BY date_t)'
, _numerator_lag, _n1, _denominator_lag, _n2, _table::text
);
END
$func$ LANGUAGE plpgsql;
Note the data type regclass for the table name. That's my personal (optional) suggestion.
Table name as a PostgreSQL function parameter
Aside: I would also advise not to use mixed-case identifiers in Postgres.
Are PostgreSQL column names case-sensitive?

How to pass timestamps to a function and retrieve resulting rows

I have created custom data type. In that I have given alias name of the one field. you will get that in body of the function below.
create type voucher as (
ori numeric, RECEIPT_NO numeric
, receipt_date timestamp with time zone, reg_no character varying
, patient_name character varying, tot_refund_bill_amount double precision
, username character varying );
Thea above statement completes successfully.
Then I want to create a function:
create or replace function billing.voucher_receipt (in_from_date timestamp with time zone, in_to_date timestamp with time zone)
returns setof voucher as $$
declare
out_put voucher%rowtype;
begin
return query(select C.receipt_no as ori ,A.RECEIPT_NO, receipt_date , A.reg_no, patient_name, tot_refund_bill_amount, username
from billing.tran_counter_receipt as a inner join mas_user as b on a.ent_by=b.uid AND cash_book='REFUND'
INNER JOIN billing.tran_BILL AS C ON C.REG_NO=A.REG_NO AND C.CASH_BOOK='GCASH' where receipt_date>=in_from_date and receipt_date<=in_to_date);
end;$$
LANGUAGE plpgsql
Executes without problem.
But when I call it with input like this:
select * from voucher_receipt ('2014-09-25 11:42:44.298346+05:30'
, '2014-09-29 11:03:47.573049+05:30')
it shows an error:
ERROR: function voucher_receipt(unknown, unknown) does not exist
LINE 1: select * from voucher_receipt ('2014-09-25 11:42:44.298346+0...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Can any one help me out from this?
Explain error
You created your function in the schema billing with:
create or replace function billing.voucher_receipt( ...
Then you call without schema-qualification:
select * from voucher_receipt ( ...
This only works while your current setting for search_path includes the schema billing.
Better function
You don't need to create a composite type. Unless you need the same type in multiple places just use RETURNS TABLE to define the return type in the function:
CREATE OR REPLACE FUNCTION billing.voucher_receipt (_from timestamptz
, _to timestamptz)
RETURNS TABLE (
ori numeric
, receipt_no numeric
, receipt_date timestamptz
, reg_no varchar
, patient_name varchar
, tot_refund_bill_amount float8
, username varchar) AS
$func$
BEGIN
RETURN QUERY
SELECT b.receipt_no -- AS ori
, cr.RECEIPT_NO
, ??.receipt_date
, cr.reg_no
, ??.patient_name
, ??.tot_refund_bill_amount
, ??.username
FROM billing.tran_counter_receipt cr
JOIN billing.tran_bill b USING (reg_no)
JOIN mas_user u ON u.uid = cr.ent_by
WHERE ??.receipt_date >= _from
AND ??.receipt_date <= _to
AND b.CASH_BOOK = 'GCASH'
AND ??.cash_book = 'REFUND'
END
$func$ LANGUAGE plpgsql;
Notes
Don't call your parameters "date" while they are actually timestamptz.
RETURN QUERY does not require parentheses.
No need for DECLARE out_put voucher%rowtype; at all.
Your format was inconsistent and messy. That ruins readability and that's also where bugs can hide.
This could just as well be a simple SQL function.
Column names in RETURNS TABLE are visible in the function body almost everywhere. table-qualify columns in your query to avoid ambiguities (and errors). Replace all ??. I left in the code, where information was missing.
Output column names are superseded by names in the RETURNS declaration. So AS ori in the SELECT list is just documentation in this case.
Why schema-qualify billing.tran_bill but not mas_user?