Pass initial condition as an argument to a custom aggregate - postgresql

I want to create a function that takes an initial condition as an argument and uses a set of values to compute a final result. In my specific case (has to do with geometry processing in PostGIS), it's important that each member of the set is processed against the current (which might be the initial) state one at a time for keeping the result clean. (I need to deal with sliver and gap issues, and have had a very difficult time doing so any way other than one element at a time.) The processing I need to do is already defined as a function that takes two appropriate arguments (where the first can be the current state and the second can be a value from the set).
So I want something similar to what you would expect is intended by this:
SELECT my_func('some initial condition', my_table.some_column) FROM my_table;
Aggregates seem like a natural fit for this, but I can't figure out how to get the function to accept an initial state. An iterative approach in PL/pgSQL would be fairly straightforward:
CREATE FUNCTION my_func(initial sometype, values sometype[])
-- Returns, language, etc.
AS $$
DECLARE
current sometype := initial;
v sometype;
BEGIN
FOREACH v IN ARRAY values LOOP
current := SomeBinaryOperation(current, v);
END LOOP;
RETURN current;
END
$$
But it would require rolling the values up into an array manually:
SELECT my_func('some initial condition', ARRAY_AGG(my_table.some_column)) FROM my_table;
You can create aggregates with multiple arguments, but the arguments that follow the first one are used as additional arguments to the transition function. I can see no way that one of them could be turned into an initial condition. (At least not without a remarkably hacky function that treats its third argument as an initial condition if the first argument is NULL or similar. And that's only if the aggregate argument can be a constant instead of a column.)
Am I best off just using the PL/pgSQL iterative approach, or is there a way to create an aggregate that accepts its initial condition as an argument? Or is there something I haven't thought of?
I'm on PostgreSQL 9.3 at the moment, but upgrading may be an option if there's new stuff that would help.

Related

Get data copied by a function

I have a quite complicated data structure that lies in several tables. I have a function that makes a copy of that structure. I want to make a copy and get newly created data in a single query like this:
SELECT
*
FROM
main_table
JOIN other_table
ON (main_table.id = other_table.main_id)
WHERE
main_table.id = make_copy(old_id);
The copy is successfully created, but is not returned by the above query. I guess it is not yet visible for the outer query or somehow committed.
I have also tried to use WITH ... SELECT ... but with no success...
The function make_copy(id) is declared as VOLATILE because it modifies the database, and multiple calls with the same parameter will create multiple copies.
Possible solution could be that make_copy(id) function would return the whole new data structure (SELECT * FROM make_copy(old_id)) but it would require many aliasing (many tables have id or name column). Also I would end up with many places to build (read) that data structure.
How can I call that function and use its result (and all side effects) in one query?
I'm afraid that's not possible without splitting it into two queries.
CTE can't help you - Data-Modifying Statements in WITH (See there example with updating table inside of the cte):
...The sub-statements in WITH are executed concurrently with each
other and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables. This alleviates the effects of the
unpredictability of the actual order of row updates, and means that
RETURNING data is the only way to communicate changes between
different WITH sub-statements and the main query...
And I guess you cannot do this with function either - Function Volatility Categories:
For functions written in SQL or in any of the standard procedural
languages, there is a second important property determined by the
volatility category, namely the visibility of any data changes that
have been made by the SQL command that is calling the function. A
VOLATILE function will see such changes, a STABLE or IMMUTABLE
function will not. ... VOLATILE functions obtain a fresh snapshot at
the start of each query they execute.

Postgres PL/pgSQL, possible to declare anonymous custom types?

With DB2 I'm able to declare anonymous custom types (e.g. row types or composite types) for my user defined functions - see the following example (especially the last line):
DB2 example:
CREATE OR REPLACE FUNCTION myFunction(IN input1 DECIMAL(5), IN input2 DECIMAL(5))
RETURNS DECIMAL(2)
READS SQL DATA
LANGUAGE SQL
NO EXTERNAL ACTION
NOT DETERMINISTIC
BEGIN
DECLARE TYPE customAnonymousType AS ROW(a1 DECIMAL(2), a2 DECIMAL(2), a3 DECIMAL(2));
/* do something fancy... */
Can I do something similar with PL/pgSQL? I know I would be able to use existing row types, also existing user defined types - but do I really have to define the type in advance?
I also know about the RECORD type, but as far as I understand I would not be able to use it in arrays (and also it would not be a well defined type).
Comments asked for an example, even though it does lengthen the question a lot I tried to define a quite simple example (still for DB2):
CREATE OR REPLACE FUNCTION myFunction(IN input1 DECIMAL(5), IN input2 DECIMAL(5))
RETURNS DECIMAL(2)
READS SQL DATA
LANGUAGE SQL
NO EXTERNAL ACTION
NOT DETERMINISTIC
BEGIN
DECLARE TYPE customAnonymousType AS ROW(a1 DECIMAL(2), a2 CHARACTER VARYING(50));
DECLARE TYPE customArray AS customAnonymousType ARRAY[INTEGER];
DECLARE myArray customArray;
SET myArray[input1] = (50, 'Product 1');
SET myArray[input2] = (99, 'Product 2');
RETURN myArray[ARRAY_FIRST(myArray)].a1;
END
This function of course only serves as a dummy function (but I suppose it is already quite long for a question here). Actually it just decides which number to return depending on if input1 is greater than input2. If input1 is smaller than input2, it returns 50, if input2 is smaller or equal than input2 it would return 99.
I know I'm not even using my a2 character field of my type (so in this case I would also be able to just use an number array) and that there are probably many, many better solutions to return two fixed numbers depending on the input values, but still my original questions remains if I am able to use anonymous custom types in PL/pgSQL (as I would in Oracle or DB2 procedures) - or if there are any similar alternatives.
You cannot to create types with local visibility in Postgres. This functionality is not supported. Postgres support global custom composite types only.
See CREATE TYPE doc. This statement cannot be used in DECLARE part of plpgsql block.

How to find out if a sequence was initialized in this session?

I need to read the current value of a sequence in a function. However, for the first time in each session I try to use currval(), I get following error:
currval of sequence "foo_seq" is not yet defined in this session
Hint for those who might find this question by googling for this error: you need to initialize the sequence for each session, either by nextval() or setval().
I could use something like lastval() or even setval('your_table_id_seq', (SELECT MAX(id) FROM your_table)); instead, but this seems seems either prone to gaps or slower than simple currval(). My aim is to avoid gaps and inconsistencies (I know some of the values will be added manually), so using nextval() before logic handling them is not ideal for my purpose. I would need this to initialize the sequence for the session anyway, but I would prefer to do something like this:
--start of the function here
IF is_not_initialized THEN
SELECT setval('foo_seq', (SELECT MAX(id) FROM bar_table)) INTO _current;
ELSE
SELECT currval('foo_seq') INTO _current;
END IF;
--some magic with the _current variable and nextvalue() on the right position
The point is that I have no idea how might "is_not_initialized" look like and whether is it possible at all. Is there any function or other trick to do it?
EDIT: Actually, my plan is to let each group of customers choose between proper sequence, no sequence at all, and the strange "something like a sequence" I'm asking for now. Even if the customer wanted such a strange sequence, it would be used only for the columns where it is needed - usually because there are some analog data and we need to store their keys (usually almost gapless sequence) into the DB for backward compatibility.
Anyway, you are right that this is hardly proper solution and that no sequence might be better than such a messy workaround in those situations, so I'll think (and discuss with customers) again whether it is really needed.
Craig, a_horse and pozs have provided information which can help you understand principles of using sequences. Apart from the question how are you going to use it, here is a function which returns current value of a sequence if it has been initialized or null otherwise.
If a sequence seq has not been initialized yet, currval(seq) raises exception with sqlstate 55000.
create or replace function current_seq_value(seq regclass)
returns integer language plpgsql
as $$
begin
begin
return (select currval(seq));
exception
when sqlstate '55000' then return null;
end;
end $$;
select current_seq_value('my_table_id_seq')
My aim is to avoid gaps and inconsistencies
You cannot use sequences if you want to avoid gaps. Nor can you reasonably use sequences if you want to assign some values manually.
The approach you are taking is unsound. It will not work. Forget about it, it isn't going to do what you think it's going to do.
I just wrote a sample implementation of a trivial gapless sequence generator for someone a few days ago, and there's a more complete one in this question.
You need to understand that unlike true sequences, gapless sequences are transactional. A consequence is that only one running transaction can have an uncommitted ID. If 100 concurrent transactions try to get IDs, only one of them will actually get the ID. The others will have to wait until that one commits or rolls back. So they're terrible for concurrency, especially if combined with long running transactions. They can also cause deadlocks if you use multiple different gapless sequences and different transactions might access them in different orders.
So think carefully whether you really need this.
Read: PostgreSQL gapless sequences

Is it possible to use a stable function in an index in Postgres?

I've been working on a project at work and have come to the realization that I must invoke a function in several of the queries' WHERE clauses. The performance isn't terrible exactly, but I would love to improve it. So I looked at the docs for indexes which mentioned that:
An index field can be an expression computed from the values of one or more columns of the table row.
Awesome. So I tried creating an index:
CREATE INDEX idx_foo ON foo_table (stable_function(foo_column));
And received an error:
ERROR: functions in index expression must be marked IMMUTABLE
So then I read about Function Volatility Categories which had this to say about stable volatility:
In particular, it is safe to use an expression containing such a function in an index scan condition.
Based on the phrasing "index scan condition" I'm guessing it doesn't mean an actual index. So what does it mean? Is it possible to utilize a stable function in an index? Or do we have to go all the way and ensure this would work as an immutable function?
We're using Postgres v9.0.1.
An "index scan condition" is a search condition, and can use a volatile function, which will be called for each row processed. An index definition can only use a function if it is immutable -- that is, that function will always return the same value when called with any given set of arguments, and has no user-visible side effects. If you think about it a little, you should be able to see what kind of trouble you could get into if the function might return a different value than what it did when the index entry was created.
You might be tempted to lie to the database and declare a function as immutable which isn't really; but if you do, the database will probably do surprising things that you would rather it didn't.
9.0.1 has bugs for which fixes are available. Please upgrade to 9.0.somethingrecent.
http://www.postgresql.org/support/versioning/

Execute statements for every record in a table

I have a temporary table (or, say, a function which returns a table of values).
I want to execute some statements for each record in the table.
Can this be done without using cursors?
I'm not opposed to cursors, but would like a more elegant syntax\way of doing it.
Something like this randomly made-up syntax:
for (select A,B from #temp) exec DoSomething A,B
I'm using Sql Server 2005.
I dont think what you want to to is that easy.
What i have found is that you can create a scalar function taking the arguments A and B and then from within the function execute an Extended Stored Procedure. This might achieve what you want to do, but it seems that this might make the code even more complex.
I think for readibility and maintainability, you should stick to the CURSOR implementation.
I would look into changing the stored proc so that it can work against a set of data rather than a single row input.
Would CROSS/OUTER APPLY do what you want if you need RBAR processing.
It's elegant, but depends on what processing you need to do