Syntax error in create aggregate - postgresql

Trying to create an aggregate function:
create aggregate min (my_type) (
sfunc = least,
stype = my_type
);
ERROR: syntax error at or near "least"
LINE 2: sfunc = least,
^
What am I missing?
Although the manual calls least a function:
The GREATEST and LEAST functions select the largest or smallest value from a list of any number of expressions.
I can not find it:
\dfS least
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
(0 rows)

Like CASE, COALESCE and NULLIF, GREATEST and LEAST are listed in the chapter Conditional Expressions. These SQL constructs are not implemented as functions .. like #Laurenz provided in the meantime.
The manual advises:
Tip: If your needs go beyond the capabilities of these conditional
expressions, you might want to consider writing a stored procedure in
a more expressive programming language.
The terminology is a bit off here as well, since Postgres does not support true "stored procedures", just functions. (Which is why there is an open TODO item "Implement stored procedures".)
This manual page might be sharpened to avoid confusion ...
#Laurenz also provided an example. I would just use LEAST in the function to get identical functionality:
CREATE FUNCTION f_least(anyelement, anyelement)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS
'SELECT LEAST($1, $2)';
Do not make it STRICT, that would be incorrect. LEAST(1, NULL) returns 1 and not NULL.
Even if STRICT was correct, I would not use it, because it can prevent function inlining.
Note that this function is limited to exactly two parameters while LEAST takes any number of parameters. You might overload the function to cover 3, 4 etc. input parameters. Or you could write a VARIADIC function for up to 100 parameters.

LEAST and GREATEST are not real functions; internally they are parsed as MinMaxExpr (see src/include/nodes/primnodes.h).
You could achieve what you want with a generic function like this:
CREATE FUNCTION my_least(anyelement, anyelement) RETURNS anyelement
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT
AS 'SELECT LEAST($1, $2)';
(thanks to Erwin Brandstetter for the CALLED ON NULL INPUT and the idea to use LEAST.)
Then you can create your aggregate as
CREATE AGGREGATE min(my_type) (sfunc = my_least, stype = my_type);
This will only work if there are comparison functions for my_type, otherwise you have to come up with a different my_least function.

Related

PostgreSQL Base Types (Scalar Type)

I have a use case where a custom base type in a PostgreSQL database would be very beneficial for dealing with non-linear data. The examples of this include defining using a input and output function to a C function. In my case I would rather just define the inp and out functions using SQL and then using the "LIKE" to inherit everything else from the double precision. Has anyone done this? is it even possible?
Possible example:
-- sample linear to logrithmic functions
CREATE FUNCTION to_linear(anyelement) RETURNS double precision
LANGUAGE SQL
AS
$$
SELECT CASE WHEN $1 > 0 THEN 30 / log($1::double precision) ELSE 0 END
$$;
create function to_log(anyelement) returns double precision
language sql
as $$
select 10^($1::double precision/30.0);
$$;
-- create the base type
create type mylogdata
(
INPUT = to_linear,
OUTPUT = to_log,
LIKE = double precision
) ;
-- sample use in a table definition
CREATE TABLE test_table(
mydata mylogdata
);
What I'm really after is a "sudo" or "partial" base-type to allow for a simple in-out conversions while allowing the existing functions (sum, average, etc...) to work on the inherited type (in this case, double precision); basically avoiding to write/rewrite functions in C.
Thoughts? Ideas? Comments? Not possible? :)
Much Thanks!
On a side note, if we had do go down the 'C' route, I think there could be an opportunity to create a more generic logarithmic scalar/base-type like the Char, Varchar, or Arbitrary Precision Number which could allow for the dynamic declaration of the log base and scale of the non-linear data.
Something like this could a big win for the science community and those of us dealing with "wave" based data like sound, vibration, earth quakes, light, radiation, etc. Here is a sample definition of the base:
Logarithmic(base, scale)
-- Below my idea for use in a table definition
-- Obviously the IN/OUT functions would have to be modified to use the base and scaling
-- as defined ( most likely in C ?? )
CREATE TABLE test_table
(
mydata logarithmic(10, 30)
);
If someone is interested in partnering in creating something like this, let me know.
If you want to write a data type with new type input and output functions, you have to write that in C. No doubt you can reuse a lot of functions from double precision.
But I would go the other way. Rather than having a type that looks like a number, but the known arithmetic operators behave weirdly, define a new set of operators on an existing data type. That can be done in SQL, and it feels more natural to me.
If you create an operator class for your new operators, you can also use them with indexes.

can i declare a postgres function that takes an array of any type?

Ie I need a function that can be called like this
select myfunc({1,'foo', true})
or
select myfunc({42.0,7, false, x'ff'})
to be 100% clear I actually want
select myfunc(array[col1,col2,col3])
where col1, col2, col3 are of different types. Maybe that makes a difference to the answers
https://www.postgresql.org/docs/current/static/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type.
function can accept anyarray, which effectively is array of values of any same type, not an array of any type mixed in one array...
what you probably look for instead would be something like:
so=# create function ae(i anyelement) returns anyelement as $$
begin
raise info '%',i::text; return i;
end;
$$ language plpgsql
;
CREATE FUNCTION
so=# create table pm100(f float,b bool, t bytea);
CREATE TABLE
so=# select ae((42.0, false, '\xff')::pm100);
INFO: (42,f,"\\xff")
ae
----------------
(42,f,"\\xff")
(1 row)
No, You cannot do it - #Vao Tsun reply is absolutely correct. PostgreSQL SQL language is pretty static - like C or Pascal. There are few dynamic features, but these features are limited.
Any query has two stages - planning and execution. And data types of any value must be known in planning time (dynamic queries and record type is a exception - but only locally in PLpgSQL). Because all types must be known before execution, PostgreSQL doesn't allow features that can hold type dynamic values - like polymorphic collections.
For constant values there can be workaround for your case. You can write variadic function with parameters of "any" type. It has sense only for constant values - types are known in planning time, and this functions can be implemented only in C language. For example, the format function is of this kind.
The necessity do some dynamic work is signal of "broken" design. "broken" from PostgreSQL perspective. Some patterns cannot be implemented in Postgres, and is better implement it outside or with different kind of software.

PostgreSQL - Auto Cast for types?

I'm working on porting database from Firebird to PostgreSQL and have many errors related to type cast. For example let's take one simple function:
CREATE OR REPLACE FUNCTION f_Concat3 (
s1 varchar, s2 varchar, s3 varchar
)
RETURNS varchar AS
$body$
BEGIN
return s1||s2||s3;
END;
$body$ LANGUAGE 'plpgsql' IMMUTABLE CALLED ON NULL INPUT SECURITY INVOKER LEAKPROOF COST 100;
As Firebird is quite flexible to types this functions was called differently: some of the arguments might be another type: integer/double precision/timestamp. And of course in Postgres function call f_Concat3 ('1', 2, 345.345) causes an error like:
function f_Concat3(unknown, integer, numeric) not found.
The documentation is recomended to use an explicit cast like:
f_Concat3 ('1'::varchar, 2::varchar, 345.345::varchar)
Also I can create a function clones for all possible combinations of types what might occur and it will work. An example to resolve error:
CREATE OR REPLACE FUNCTION f_Concat3 (
s1 varchar, s2 integer, s3 numeric
)
RETURNS varchar AS
$body$
BEGIN
return s1::varchar||s2::varchar||s3::varchar;
END;
However this is very bad and ugly and it wont work with big functions.
Important: We have one general code base for all DB and use our own language to create application objects (forms, reports, etc) which contains select queries. It is not possible to use explicit cast on function calls cause we will lose compatibility with other DB.
I am confused that the integer argument can not be casted to the numeric or double precision, or date / number to a string. I even face problems with integer to smallint, and vice versa. Most database act not like this.
Is there any best practice for such situation?
Is there any alternatives for explicit cast?
SQL is a typed language, and PostgreSQL takes that more seriously than other relational databases. Unfortunately that means extra effort when porting an application with sloppy coding.
It is tempting to add implicit casts, but the documentation warns you from creating casts between built-in data types:
Additional casts can be added by the user with the CREATE CAST command. (This is usually done in conjunction with defining new data types. The set of casts between built-in types has been carefully crafted and is best not altered.)
This is not an idle warning, because function resolution and other things may suddenly fail or misbehave if you create new casts between existing types.
I think that if you really don't want to clean up the code (which would make it more portable for the future), you have no choice but to add more versions of your functions.
Fortunately PostgreSQL has function overloading which makes that possible.
You can make the job easier by using one argument with a polymorphic type in your function definition, like this:
CREATE OR REPLACE FUNCTION f_concat3 (
s1 text, s2 integer, s3 anyelement
) RETURNS text
LANGUAGE sql IMMUTABLE LEAKPROOF AS
'SELECT f_concat3(s1, s2::text, s3::text)';
You cannot use more than one anyelement argument though, because that will only work if all such parameters are of the same type.
If you use function overloading, be careful that you don't create ambiguities that would make function resolution fail.

Indexing an array for full text search

I am trying to index documents to be searchable on their tag array.
CREATE INDEX doc_search_idx ON documents
USING gin(
to_tsvector('english', array_to_string(tags, ' ')) ||
to_tsvector('english', coalesce(notes, '')))
)
Where tags is a (ci)text[]. However, PG will refuse to index array_to_string because it is not always immutable.
PG::InvalidObjectDefinition: ERROR: functions in index expression must be marked IMMUTABLE
I've tried creating a homebrew array_to_string immutable function, but I feel like playing with fire as I don't know what I'm doing. Any way not to re-implement it?
Looks like I could just repackage the same function and label it immutable, but looks like there are risks when doing that.
How do I index the array for full-text search?
In my initial answer I suggested a plain cast to text: tags::text. However, while most casts to text from basic types are defined IMMUTABLE, this it is not the case for array types. Obviously because (quoting Tom Lane in a post to pgsql-general):
Because it's implemented via array_out/array_in rather than any more
direct method, and those are marked stable because they potentially
invoke non-immutable element I/O functions.
Bold emphasis mine.
We can work with that. The general case cannot be marked as IMMUTABLE. But for the case at hand (cast citext[] or text[] to text) we can safely assume immutability. Create a simple IMMUTABLE SQL function that wraps the function. However, the appeal of my simple solution is mostly gone now. You might as well wrap array_to_string() (like you already pondered) for which similar considerations apply.
For citext[] (create separate functions for text[] if needed):
Either (based on a plain cast to text):
CREATE OR REPLACE FUNCTION f_ciarr2text(citext[])
RETURNS text LANGUAGE sql IMMUTABLE AS 'SELECT $1::text';
This is faster.
Or (using array_to_string() for a result without curly braces):
CREATE OR REPLACE FUNCTION f_ciarr2text(citext[])
RETURNS text LANGUAGE sql IMMUTABLE AS $$SELECT array_to_string($1, ',')$$;
This is a bit more correct.
Then:
CREATE INDEX doc_search_idx ON documents USING gin (
to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
|| ' ' || COALESCE(notes,'')));
I did not use the polymorphic type ANYARRAY like in your answer, because I know text[] or citext[] are safe, but I can't vouch for all other array types.
Tested in Postgres 9.4 and works for me.
I added a space between the two strings to avoid false positive matches across the concatenated strings. There is an example in the manual.
If you sometimes want to search just tags or just notes, consider a multicolumn index instead:
CREATE INDEX doc_search_idx ON documents USING gin (
to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
, to_tsvector('english', COALESCE(notes,''));
The risks you are referring to apply to temporal functions mostly, which are used in the referenced question. If time zones (or just the type timestamptz) are involved, results are not actually immutable. We do not lie about immutability here. Our functions are actually IMMUTABLE. Postgres just can't tell from the general implementation it uses.
Related
Often people think they need text search, while similarity search with trigram indexes would be a better fit:
PostgreSQL LIKE query performance variations
Not relevant in this exact case, but while working with citext, consider this:
Index on column with data type citext not used
Here's my naive solution, to wrap it and call it immutable, as suspected.
CREATE FUNCTION immutable_array_to_string(arr ANYARRAY, sep TEXT)
RETURNS text
AS $$
SELECT array_to_string(arr, sep);
$$
LANGUAGE SQL
IMMUTABLE
;

How to add a custom aggregate function (eg. MAX/MIN) to PostgreSQL?

I would like to add some extension function in Postgresql like max/min, but I could not find the source function of them. Could anyone suggest which part of the source code should I view? thanks,
Here is an example. I have relation: model(id int) where model is a bunch of CAD models each one has an ID ; I want to find all models which id>5 and area>5. but I do not want to calculate all face area, so I uses having clause only calculate a subset. here is the query:
select model.id, model.face_number
from
model
where
id>5
group by model.id
having
area(model.id)>5;
I want to define function area(oid) function like max/min as FDW. but I do not know how to pass the input parameters, so I want to compare it with min/max.
This doesn't make much sense.
min and max are aggregate functions. They reduce a set of rows into a single value.
Your problem description doesn't seem to have much to do with aggregation. So it's not at all clear that aggregate functions have anything to do with it.
If you really do need to write an aggregate function, start with the PostgreSQL manual:
User-defined aggregates
C-language extension functions
Writing extensions
Extension-building infrastructure
I strongly recommend that you prototype your aggregate function in PL/PgSQL or another procedural language. Write it in C only if you've demonstrated that it can work using a quicker-to-work-with language, and determined that you need it faster than you can do with PL/PgSQL or PL/Python or whatever.
Anyway, if you want to find the implementation of min/max, start here:
select a.*, so.oprname as aggsortopname, tt.typname as aggtranstypename
from pg_aggregate a
inner join pg_proc p on (a.aggfnoid = p.oid)
inner join pg_type tt on (a.aggtranstype = tt.oid)
inner join pg_operator so on (a.aggsortop = so.oid)
where p.proname = 'max';
There you'll see that the aggregate is composed of multiple parts: a transform function, a sort operator, a transitional state type, an optional final function, etc. The documentation on user-defined aggregates explains that in detail.
So there's no single "max function". The definition of max in pg_proc.h actually just refers to a dummy function.
So for max(int4), it's defined as the transition function int4larger (src/backend/utils/adt/int.c) over transition type int4, with the sort operator >, with no final function.
You do not want an aggregate function for what you have described. You also should not worry about performance until you have a working version--likely PostgreSQL's query optimizer will do exactly what you want if you write this:
select model.id, model.face_number
from
model
where
id>5 and area(model.id)>5;
Here is an example function:
CREATE FUNCTION area(int in_id)
RETURNS double precision AS $$
SELECT length*width FROM model WHERE id=in_id;
$$ LANGUAGE SQL STABLE;
Of course you can replace length*width with some more appropriate calculation.