Avoid to nest aggregation functions in PostgreSQL 8.3.4 - postgresql

Assuming that my subquery yields a number of rows with the columns (x,y), I would like to calculate the value avg(abs(x-mean)/y). where mean effectively is avg(x).
select avg(abs(x-avg(x))/y) as incline from subquery fails because I cannot nest aggregation functions. Nor can I think of a way to calculate the mean in a subquery while keeping the original result set. An avgdev function as it exists in other dialects would not exactly help me, so here I am stuck. Probably just due to lack of sql knowledge - calculating the value from the result set in postprocessing is easy.
Which SQL construct could help me?
Edit: Server version is 8.3.4. No window functions with WITH or OVER available here.

Not sure I understand you correctly, but you might be looking for something like this:
SELECT avg(x - mean/y)
FROM (
SELECT x,
y,
avg(x) as mean over(partition by your_grouping_column)
FROM your_table
) t
If you do not need to group your results to get the correct avg(x) then simply leave out the "partition by" using an empty over: over()

if your data sets are not too large you could accumulate them into an array and then return the incline from a function:
create type typ as (x numeric, y numeric);
create aggregate array_accum( sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}' );
create or replace function unnest(anyarray) returns setof anyelement
language sql immutable strict as $$
select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i;$$;
create function get_incline(typ[]) returns numeric
language sql immutable strict as $$
select avg(abs(x-(select avg(x) from unnest($1)))/y) from unnest($1);$$;
select get_incline((select array_accum((x,y)::typ) from subquery));
sample view for testing:
create view subquery as
select generate_series(1,5) as x, generate_series(1,6) as y;

One option I found is to use a temporary table:
begin;
create temporary table sub on commit drop as (...subquery code...);
select avg(abs(x-mean)/y) as incline from (SELECT x, y, (SELECT avg(x) FROM sub) AS mean FROM sub) as sub2;
commit;
But is that overkill?

Related

PostgreSQL function returning a table without specifying each field in a table

I'm moving from SQL server to Postgresql. In SQL Server I can define table-based function as an alias for a query. Example:
Create Function Example(#limit int) As
Returns Table As Return
Select t1.*, t2.*, price * 0.5 discounted
From t1
Inner Join t2 on t1.id = t2.id
Where t1.price < #limit;
GO
Select * From Example(100);
It gives me a way to return all fields and I don't need to specify types for them. I can easily change field types of a table, add new fields, delete fields, and then re-create a function.
So the question is how to do such thing in Postgresql? I found out that Postgresql requires to explicitly specify all field names and types when writing a function. May be I need something else, not a function?
Postgres implicitly creates a type for each table. So, if you are just selecting from one table, it's easiest to use that type in your function definition:
CREATE TABLE test (id int, value int);
CREATE FUNCTION mytest(p_id int)
RETURNS test AS
$$
SELECT * FROM test WHERE id = p_id;
$$ LANGUAGE SQL;
You are then free to add, remove, or alter columns in test and your function will still return the correct columns.
EDIT:
The question was updated to use the function parameter in the limit clause and to use a more complex query. I would still recommend a similar approach, but you could use a view as #Bergi recommends:
CREATE TABLE test1 (a int, b int);
CREATE TABLE test2 (a int, c int);
CREATE VIEW test_view as SELECT a, b, c from test1 JOIN test2 USING (a);
CREATE FUNCTION mytest(p_limit int)
RETURNS SETOF test_view AS
$$
SELECT * FROM test_view FETCH FIRST p_limit ROWS ONLY
$$ LANGUAGE SQL;
You aren't going to find an exact replacement for the behavior in SQL Server, it's just not how Postgres works.
If you change the function frequently, I'd suggest to use view instead of a function. Because every time you re-create a function, it gets compiled and it's a bit expensive, otherwise you're right - Postgres requires field name and type in functions:
CREATE OR REPLACE VIEW example AS
SELECT t1.*, t2.*, price * 0.5 discounted
FROM t1 INNER JOIN t2 ON t1.id = t2.id;
then
SELECT * FROM example WHERE price < 100;
You can do something like this
CREATE OR REPLACE FUNCTION Example(_id int)
RETURNS RECORD AS
$$
DECLARE
data_record RECORD;
BEGIN
SELECT * INTO data_record FROM SomeTable WHERE id = _id;
RETURN data_record;
END;
$$
LANGUAGE 'plpgsql';

Postgresql 8.4 a variable hold set of records returned from another function

I am using postgresql 8.4 in backend
I have a postgres function say A() it can return a set of records (3 columns) like:
<A_id>::int,<A_ts_1>::timestamp,<A_ts_2>::timestamp
function A define like this(for example):
CREATE OR REPLACE FUNCTION A()
RETURNS SETOF record AS
$$
DECLARE
BEGIN
RETURN QUERY SELECT DISTINCT ON (A.id) A.id, A.ts_1, A.ts_2 FROM tablea;
END;
$$ LANGUAGE plpgsql;
SQL
function A has been called in another function B. In function B I need a variable to hold what returned from A() then do some query for example:
<variable> = select * from A();
a_id_array = ARRAY(select A_id from <variable>);
a_filtered_array = ARRAY(select A_id from <variable> where A_ts_1 ><a_timestamp> and A_ts_2 < <a_timestamp>);
So My question is what variable I should define to hold the set of records returned from A().
I tried temp table which really not good for multi-session env, it blocks data insertion. postgresql create temp table could block data insertion?
I checked doc for views seems not meet my requirements, however I may wrong so if any of you could give me an idea on how to use view in this case and use view will block data insertion as well?
Thank all!
P.S.
I think the worse case is in function B() I call function A() twice for example:
a_id_array = ARRAY(select A_id from A());
a_filtered_array = ARRAY(select A_id from A() where A_ts_1 ><a_timestamp> and A_ts_2 < <a_timestamp>);
Then my question would slightly change, can I achive this case just using one function call to A()?
PostgreSQL doesn't (yet, as of postgres 10) have table-valued variables backed by a tuplestore. So your best options are:
Return a REFCURSOR and use it from the other function. Can be clumsy to work with as you cannot reuse the resultset easily or FETCH in a subquery. It's not always easy to generate a cursor resultset either, depending on how you're creating the results.
Use temp tables with generated names so they don't collide. Lots of dynamic SQL involved here (EXECUTE format(...)) but it works.
Avoid trying to pass result sets between functions
After researching, found a way to replace using temp table and query returned set of record which is using WITH query.
SELECT c.r_ids, c.a_r_ids into a_id_array, a_filtered_array FROM(
WITH returned_r AS (SELECT * FROM a())
SELECT * from (
SELECT ARRAY( SELECT A_id from returned_r ) as r_ids ) as a
CROSS JOIN (
SELECT ARRAY(SELECT A_id FROM returned_r WHERE A_ts_1 is NOT NULL AND A_ts_2 IS NULL) as a_r_ids
) as b
) as c;

Merge hstore data from multiple rows

Imagine I have a table with this definition:
CREATE TABLE test (
values HSTORE NOT NULL
);
Imagine I insert a few records and end up with the following:
values
-----------------------------
"a"=>"string1","b"=>"string2"
"b"=>"string2","c"=>"string3"
Is there any way I can make an aggregate query that will give me a new hstore with the merged keys (and values) for all rows.
Pseudo-query:
SELECT hstore_sum(values) AS value_sum FROM test;
Desired result:
value_sum
--------------------------------------------
"a"=>"string1","b"=>"string2","c"=>"string3"
I am aware of potential conflicts with different values for each key, but in my case the order / priority of which value is picked is not important (it does not even have to be deterministic, as they will be the same for the same key).
Is this possible out of the box or do you have to use some specific homemade SQL functions or other to do it?
You can do a lot of things, f.ex:
My first thought was to use the each() function, and aggregate keys and values separately, like:
SELECT hstore(array_agg(key), array_agg(value))
FROM test,
LATERAL each(hs);
But this performs the worst.
You can use the hstore_to_array() function too, to build a key-value altering array, like (#JakubKania):
SELECT hstore(array_agg(altering_pairs))
FROM test,
LATERAL unnest(hstore_to_array(hs)) altering_pairs;
But this isn't perfect yet.
You can rely the hstore values' representation, and build up a string, which will contain all your pairs:
SELECT hstore(string_agg(nullif(hs::text, ''), ','))
FROM test;
This is quite fast. However, if you want, you can use a custom aggregate function (which can use the built-in hstore concatenation):
CREATE AGGREGATE hstore_sum (hstore) (
SFUNC = hs_concat(hstore, hstore),
STYPE = hstore
);
-- i used the internal function (hs_concat) for the concat (||) operator,
-- if you do not want to rely on this function,
-- you could easily write an equivalent in a custom SQL function
SELECT hstore_sum(hs)
FROM test;
SQLFiddle
There is no in built function for that but hstore offers a few functions that allow to transform it to something else, for example an array. So we cast it to array, merge the arrays and create hstore from final array:
SELECT hstore(array_agg(x)) FROM
(SELECT unnest(hstore_to_array(hs)) AS x
FROM test)
as q;
http://sqlfiddle.com/#!15/cb11a/1
P.S. Some other combination (like going with JSON) might be more efficient.
This is what I wrote which works in production now. I avoid excessive conversion between types e.g. hstore and array. I also don't use hs_concat as sfunc directly as it will pruduce NULL if any of the hashes it is aggregating is NULL.
CREATE OR REPLACE FUNCTION public.agg_hstore_sum_sfunc(state hstore, val hstore)
RETURNS hstore AS $$
BEGIN
IF val IS NOT NULL THEN
IF state IS NULL THEN
state := val;
ELSE
state := state || val;
END IF;
END IF;
RETURN state;
END;
$$ LANGUAGE 'plpgsql';
CREATE AGGREGATE public.sum(hstore) (
SFUNC = public.agg_hstore_sum_sfunc,
STYPE = hstore
);

eliminate duplicate array values in postgres

I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}

Sorting array elements

I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)