Postgres: Function check duplicates in table - postgresql

I have table with multiple duplicates and I want to make from these a function. Could you please help me to make a function from this code? thanks.
SELECT id_member,id_service,amount,date, count(*) as number_of_duplicates
from evidence
GROUP BY id_member,id_service,amount,date
HAVING COUNT(*) > 1;
CREATE OR REPLACE FUNCTION check_for_duplicates()
RETURNS VOID AS
$BODY$
BEGIN
SELECT id_member,id_service,amount,date, count(*) as number_of_duplicates
from evidence
GROUP BY id_member,id_service,amount,date
HAVING COUNT(*) > 1;
END;
$BODY$
LANGUAGE ‘plpgsql‘;

If a function should return a result set it needs to be declared as returns table () or returns setof
You also don't need a PL/pgSQL function for that, a simple SQL function will do and is more efficient:
CREATE OR REPLACE FUNCTION check_for_duplicates()
RETURNS table (id_member integer, id_service integer, amount numeric, date date, number_of_duplicates bigint)
$BODY$
SELECT id_member,id_service,amount,date, count(*) as number_of_duplicates
from evidence
GROUP BY id_member,id_service,amount,date
HAVING COUNT(*) > 1;
$BODY$
LANGUAGE sql;
You didn't show us the definition of the table evidence so I had to guess the data type of the columns. You will need to adjust the types in the returns table (...) part to match the types from the table.
Having said that: I would create a view for things like that, not a function.
Unrelated, but: date is a horrible name for a column. For one because it's also a keyword but more importantly it doesn't document what the column contains: a release date? An expiration date? A due date?

Related

issue with create function in PostgreSQL

I am trying to get year from orderdate function
type
orderdate date
create or replace function getyearfromdate(year date)returns
table
as
$$
begin
return QUERY execute (
'select extract (year from orderdate) FROM public.orderalbum'
);
end;
$$
language plpgsql;
I write a logic but not able to create a function
I want to return year from the orderdate.
I want to pass a orderdate and return year from the function
I am facing below error
ERROR: syntax error at or near "as"
LINE 3: as
^
SQL state: 42601
Character: 70
Based on your comments, it seems you only want a wrapper around the extract() function. In that case you do not want a set returning function. And you don't need PL/pgSQL or even dynamic SQL for this:
create or replace function getyearfromdate(p_date_value date)
returns int --<< make this a scalar function!
as
$$
select extract(year from p_date_value)::int;
$$
language sql;
Note that I renamed your parameter as I find a parameter named year for a date value highly confusing.
That function can then be used as part of a SELECT list:
SELECT ..., getyearfromdate(orderdate)
FROM public.orderalbum
GROUP BY ...
Original answer based on the question before comments clarified it.
As documented in the manual returns table requires a table definition.
Your use of dynamic SQL is also useless.
create or replace function getyearfromdate(year date)
returns table (year_of_month int)
as
$$
begin
return QUERY
select extract(year from orderdate)::int
FROM public.orderalbum;
end;
$$
language plpgsql;
I am not sure why you are passing a parameter to the function that you never use.

How to return diferent table data based on an ID passed to an SQL function

Based on this question I would like to know if it is possible to return different table data based on an ID passed to the function.
Something like (pseudocode):
CREATE FUNCTION schemaB.testFunc(p_id INT, select_param INT)
RETURNS setof schemaZ.Table_1
AS
$$
CASE
WHEN select_param = 1 THEN SELECT * FROM schemaZ.Table_1 WHERE id = p_id
WHEN select_param = 2 THEN SELECT * FROM schemaZ.Table_2 WHERE id = p_id
END;
$$
language sql;
Table_1 and Table_2 share no same columns and that invalidates the above RETURNS clause.
This is generally impossible with SQL functions. Even with a polymorphic return type, the actual return type must be determined at call time. But all statements in an SQL function are planned before the function is executed. So you'd always end up with an error message for one of the SELECT statements returning data that doesn't fit the return type.
The same can be done with dynamic SQL in a PL/pgSQL function - with some trickery:
CREATE OR REPLACE FUNCTION f_demo(_tabletype anyelement, _id int)
RETURNS SETOF anyelement LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
format('SELECT * FROM %s WHERE id = $1', pg_typeof(_tabletype))
USING _id;
END
$func$;
Call (important!):
SELECT * FROM f_demo(null::schemaZ.Table_1, 1);
The "trick" is to cast a null value to the desired table type, thereby defining the return type and choosing from which table to select. Detailed explanation:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Take this as proof of concept. Typically, there are better (safer, less confusing, more performant) solutions ...
Related:
Difference between language sql and language plpgsql in PostgreSQL functions

Query on Return Statement - PostgreSQL

I have this question, I was doing some migration from SQL Server to PostgreSQL 12.
The scenario, I am trying to accomplish:
The function should have a RETURN Statement, be it with SETOF 'tableType' or RETURN TABLE ( some number of columns )
The body starts with a count of records, if there is no record found based on input parameters, then simply Return Zero (0), else, return the entire set of record defined in the RETURN Statement.
The Equivalent part in SQL Server or Oracle is: They can just put a SELECT Statement inside a Procedure to accomplish this. But, its a kind of difficult in case of PostgreSQL.
Any suggestion, please.
What I could accomplish still now - If no record found, it will simply return NULL, may be using PERFORM, or may be selecting NULL as column name for the returning tableType columns.
I hope I am clear !
What I want is something like -
============================================================
CREATE OR REPLACE FUNCTION public.get_some_data(
id integer)
RETURNS TABLE ( id_1 integer, name character varying )
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
p_id alias for $1;
v_cnt integer:=0;
BEGIN
SELECT COUNT(1) FROM public.exampleTable e
WHERE id::integer = e.id::integer;
IF v_cnt= 0 THEN
SELECT 0;
ELSE
SELECT
a.id, a.name
public.exampleTable a
where a.id = p_id;
END;
$BODY$;
If you just want to return a set of a single table, using returns setof some_table is indeed the easiest way. The most basic SQL function to do that would be:
create function get_data()
returns setof some_table
as
$$
select *
from some_table;
$$
language sql;
PL/pgSQL isn't really necessary to put a SELECT statement into a function, but if you need to do other things, you need to use RETURN QUERY in a PL/pgSQL function:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
end;
$$
language plpgsql;
A function as exactly one return type. You can't have a function that sometimes returns an integer and sometimes returns thousands of rows with a dozen columns.
The only thing you could do, if you insist on returning something is something like this:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
if not found then
return query
select (null::some_table).*;
end if;
end;
$$
language plpgsql;
But I would consider the above an extremely ugly and confusing (not to say stupid) solution. I certainly wouldn't let that pass through a code review.
The caller of the function can test if something was returned in the same way I implemented that ugly hack: check the found variable after using the function.
One more hack to get as close as possible to what you want. But I will repeat what others have told you: You cannot do what you want directly. Just because MS SQL Server lets you get away poor coding does not mean Postgres is obligated to do so. As the link by #a_horse_with_no_name implies converting code is easy, once you migrate how you think about the problem in the first place. The closest you can get is return a tuple with a 0 id. The following is one way.
create or replace function public.get_some_data(
p_id integer)
returns table ( id integer, name character varying )
language plpgsql
as $$
declare
v_at_least_one boolean = false;
v_exp_rec record;
begin
for v_exp_rec in
select a.id, a.name
from public.exampletable a
where a.id = p_id
union all
select 0,null
loop
if v_exp_rec.id::integer > 0
or (v_exp_rec.id::integer = 0 and not v_at_least_one)
then
id = v_exp_rec.id;
name = v_exp_rec.name;
return next;
v_at_least_one = true;
end if;
end loop ;
return;
end
$$;
But that is still just a hack and assumes there in not valid row with id=0. A much better approach would by for the calling routing to check what the function returns (it has to do that in one way or another anyway) and let the function just return the data found instead of making up data. That is that mindset shift. Doing that you can reduce this function to a simple select statement:
create or replace function public.get_some_data2(
p_id integer)
returns table ( id integer, name character varying )
language sql strict
as $$
select a.id, a.name
from public.exampletable a
where a.id = p_id;
$$;
Or one of the other solutions offered.

Execute a dynamic crosstab query

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
Here's the function:
create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
dynsql2 = 'select * from crosstab (
''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
''select distinct '||colc||' from '||tablename||' order by 1''
)
as ct (
'||rowc||' varchar,'||columnlist||'
);';
return dynsql2;
end
$$;
So now I can call the function:
select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');
Which returns (because the return type of the function is varchar):
select * from crosstab (
'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
from globalpayments
group by 1,2
order by 1,2'
, 'select distinct currency
from globalpayments
order by 1'
) as ct ( month varchar,CAD text,EUR text,GBP text,USD text );
How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:
Edit 1
This function comes close, but I need it to return more than just the first column of the first record
Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?
create or replace function eval( sql text ) returns text as $$
declare
as_txt text;
begin
if sql is null then return null ; end if ;
execute sql into as_txt ;
return as_txt ;
end;
$$ language plpgsql
usage: select * from eval($$select * from analytics limit 1$$)
However it just returns the first column of the first record :
eval
----
2015
when the actual result looks like this:
Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.
A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.
Refactor a PL/pgSQL function to return the output of various SELECT queries
You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.
There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.
Selecting multiple max() values using a single SQL statement
I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.
A shorter and cleaner implementation could look like this:
CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
, _expr text -- still vulnerable to SQL injection!
, _type regtype)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_cat_list text;
_col_list text;
BEGIN
-- generate categories for xtab param and col definition list
EXECUTE format(
$$SELECT string_agg(quote_literal(x.cat), '), (')
, string_agg(quote_ident (x.cat), %L)
FROM (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
, ' ' || _type || ', ', _cat, _tbl)
INTO _cat_list, _col_list;
-- generate query string
RETURN format(
'SELECT * FROM crosstab(
$q$SELECT %I, %I, %s
FROM %I
GROUP BY 1, 2 -- only works if the 3rd column is an aggregate expression
ORDER BY 1, 2$q$
, $c$VALUES (%5$s)$c$
) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);
END
$func$;
Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:
PostgreSQL Crosstab Query
This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.
Table name as a PostgreSQL function parameter
However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.
SQL injection in Postgres functions vs prepared queries
Scans the table only once for both lists of categories and should be a bit faster.
Still can't return completely dynamic row types since that's strictly not possible.
Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.
Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).
So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.
For example, we could do something like this with a wrapper function but the same logic could be folded into your function:
CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar,
cellc varchar, celldatatype varchar)
returns setof record language plpgsql as $$
DECLARE outrow record;
BEGIN
FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
LOOP
RETURN NEXT outrow
END LOOP;
END;
$$;
Then you supply the record structure on calling the function just like you do with crosstab.
Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

How to return uncertain number columns of a table from a postgresql function?

As we know, plpgsql functions can return a table like this:
RETURNS table(int, char(1), ...)
But how to write this function, when the list of columns is uncertain at the time of creating the function.
When a function returns anonymous records
RETURNS SETOF record
you have to provide a column definition list when calling it with SELECT * FROM. SQL demands to know column names and types to interpret *. For registered tables and types this is provided by the system catalog. For functions you need to declare it yourself one way or the other. Either in the function definition or in the call. The call could look like #Craig already provided. You probably didn't read his answer carefully enough.
Depending on what you need exactly, there are a number of ways around this, though:
1) Return a single anonymous record
Example:
CREATE OR REPLACE FUNCTION myfunc_single() -- return a single anon rec
RETURNS record AS
$func$
DECLARE
rec record;
BEGIN
SELECT into rec 1, 'foo'; -- note missing type for 'foo'
RETURN rec;
END
$func$ LANGUAGE plpgsql;
This is a very limited niche. Only works for a single anonymous record from a function defined with:
RETURNS record
Call without * FROM:
SELECT myfunc_single();
Won't work for a SRF (set-returning function) and only returns a string representation of the whole record (type record). Rarely useful.
To get individual cols from a single anonymous record, you need to provide a column definition list again:
SELECT * FROM myfunc_single() AS (id int, txt unknown); -- note "unknown" type
2) Return well known row type with a super-set of columns
Example:
CREATE TABLE t (id int, txt text, the_date date);
INSERT INTO t VALUES (3, 'foz', '2014-01-13'), (4, 'baz', '2014-01-14');
CREATE OR REPLACE FUNCTION myfunc_tbl() -- return well known table
RETURNS SETOF t AS
$func$
BEGIN
RETURN QUERY
TABLE t;
-- SELECT * FROM t; -- equivalent
END
$func$ LANGUAGE plpgsql;
The function returns all columns of the table. This is short and simple and performance won't suffer as long as your table doesn't hold a huge number of columns or huge columns.
Select individual columns on call:
SELECT id, txt FROM myfunc_tbl();
SELECT id, the_date FROM myfunc_tbl();
-> SQLfiddle demonstrating all.
3) Advanced solutions
This answer is long enough already. And this closely related answer has it all:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Look to the last chapter in particular: Various complete table types
If the result is of uncertain/undefined format you must use RETURNS record or (for a multi-row result) RETURNS SETOF record.
The calling function must then specify the table format, eg:
SELECT my_func() AS result(a integer, b char(1));
BTW, char is an awful data type with insane space-padding rules that date back to the days of fixed-width file formats. Don't use it. Always just use text or varchar.
Given comments, let's make this really explicit:
regress=> CREATE OR REPLACE FUNCTION f_something() RETURNS SETOF record AS $$
SELECT 1, 2, TEXT 'a';
$$ LANGUAGE SQL;
CREATE FUNCTION
regress=> SELECT * FROM f_something();
ERROR: a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM f_something();
regress=> SELECT * FROM f_something() AS x(a integer, b integer, c text);
a | b | c
---+---+---
1 | 2 | a
(1 row)