Merge hstore data from multiple rows - postgresql

Imagine I have a table with this definition:
CREATE TABLE test (
values HSTORE NOT NULL
);
Imagine I insert a few records and end up with the following:
values
-----------------------------
"a"=>"string1","b"=>"string2"
"b"=>"string2","c"=>"string3"
Is there any way I can make an aggregate query that will give me a new hstore with the merged keys (and values) for all rows.
Pseudo-query:
SELECT hstore_sum(values) AS value_sum FROM test;
Desired result:
value_sum
--------------------------------------------
"a"=>"string1","b"=>"string2","c"=>"string3"
I am aware of potential conflicts with different values for each key, but in my case the order / priority of which value is picked is not important (it does not even have to be deterministic, as they will be the same for the same key).
Is this possible out of the box or do you have to use some specific homemade SQL functions or other to do it?

You can do a lot of things, f.ex:
My first thought was to use the each() function, and aggregate keys and values separately, like:
SELECT hstore(array_agg(key), array_agg(value))
FROM test,
LATERAL each(hs);
But this performs the worst.
You can use the hstore_to_array() function too, to build a key-value altering array, like (#JakubKania):
SELECT hstore(array_agg(altering_pairs))
FROM test,
LATERAL unnest(hstore_to_array(hs)) altering_pairs;
But this isn't perfect yet.
You can rely the hstore values' representation, and build up a string, which will contain all your pairs:
SELECT hstore(string_agg(nullif(hs::text, ''), ','))
FROM test;
This is quite fast. However, if you want, you can use a custom aggregate function (which can use the built-in hstore concatenation):
CREATE AGGREGATE hstore_sum (hstore) (
SFUNC = hs_concat(hstore, hstore),
STYPE = hstore
);
-- i used the internal function (hs_concat) for the concat (||) operator,
-- if you do not want to rely on this function,
-- you could easily write an equivalent in a custom SQL function
SELECT hstore_sum(hs)
FROM test;
SQLFiddle

There is no in built function for that but hstore offers a few functions that allow to transform it to something else, for example an array. So we cast it to array, merge the arrays and create hstore from final array:
SELECT hstore(array_agg(x)) FROM
(SELECT unnest(hstore_to_array(hs)) AS x
FROM test)
as q;
http://sqlfiddle.com/#!15/cb11a/1
P.S. Some other combination (like going with JSON) might be more efficient.

This is what I wrote which works in production now. I avoid excessive conversion between types e.g. hstore and array. I also don't use hs_concat as sfunc directly as it will pruduce NULL if any of the hashes it is aggregating is NULL.
CREATE OR REPLACE FUNCTION public.agg_hstore_sum_sfunc(state hstore, val hstore)
RETURNS hstore AS $$
BEGIN
IF val IS NOT NULL THEN
IF state IS NULL THEN
state := val;
ELSE
state := state || val;
END IF;
END IF;
RETURN state;
END;
$$ LANGUAGE 'plpgsql';
CREATE AGGREGATE public.sum(hstore) (
SFUNC = public.agg_hstore_sum_sfunc,
STYPE = hstore
);

Related

Pg sql use ilike operator to search text in array

I have query like
_search_text := 'ind'
select * from table where (_search_text ILIKE ANY (addresses))
The st.addresses have value like [india,us,uk,pak,bang]
It should return each item rows where any item of column addresses contains the string _search_text,
Currently it returns only if give full india in _search_text.
What should I make the change
I was also try to thinkin to use unnest, but since it wil be a sub clause of a very long where cluase... so avoid that.
Thanks
I afraid so it is not possible - LIKE, ILIKE supports a array only on right side, and there is searching pattern. Not string. You can write own SQL function:
CREATE OR REPLACE FUNCTION public.array_like(text[], text)
RETURNS boolean
LANGUAGE sql
IMMUTABLE STRICT
AS $function$
select bool_or(v like $2) from unnest($1) v
$function$
and the usage can looks like:
WHERE array_like(addresses, _search_text)
So the readability of this query can be well. But I afraid about performance. There cannot be used any index on this expression. It is result of little bit bad design (the data are not normalized).
ilike ignores cases (difference in upper or lower cases), it doesn't search for string containing your value.
In your case you can use:
_search_text := '%ind%'
select * from table where (_search_text ILIKE ANY (addresses))
ANY will not work here because the arguments are in the wrong order for ILIKE.
You can define a custom operator to make this work. But this will most likely suffer from poor performance:
create table addresses(id integer primary key, states varchar[]);
insert into addresses values (1,'{"belgium","france","united states"}'),
(2,'{"belgium","ireland","canada"}');
CREATE OR REPLACE FUNCTION pg_catalog."rlike"(
leftarg text,
rightarg text)
RETURNS boolean
LANGUAGE 'sql'
COST 100
IMMUTABLE STRICT PARALLEL SAFE SUPPORT pg_catalog.textlike_support
AS $BODY$
SELECT pg_catalog."like"(rightarg,leftarg);
$BODY$;
ALTER FUNCTION pg_catalog."rlike"(text, text)
OWNER TO postgres;
CREATE OPERATOR ~~~ (
leftarg = text,
rightarg = text,
procedure = rlike
);
select * from addresses where 'bel%'::text ~~~ ANY (states);
It should be possible to define this function in C to make it faster.

PostgreSQL custom aggregate: more efficient option than array concatenation

Consider a custom aggregate intended to take the set union of a bunch of arrays:
CREATE FUNCTION array_union_step (s ANYARRAY, n ANYARRAY) RETURNS ANYARRAY
AS $$ SELECT s || n; $$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE FUNCTION array_union_final (s ANYARRAY) RETURNS ANYARRAY
AS $$
SELECT array_agg(i ORDER BY i) FROM (
SELECT DISTINCT UNNEST(x) AS i FROM (VALUES(s)) AS v(x)
) AS w WHERE i IS NOT NULL;
$$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE AGGREGATE array_union (ANYARRAY) (
SFUNC = array_union_step,
STYPE = ANYARRAY,
FINALFUNC = array_union_final,
INITCOND = '{}',
PARALLEL = SAFE
);
As I understand it, array concatenation in PostgreSQL copies all the elements of both inputs into a new array, so this is quadratic in the total number of elements (before deduplication). Is there a more efficient alternative without writing extension code in C? (Specifically, using either LANGUAGE SQL or LANGUAGE plpgsql.) For instance, maybe it's possible for the step function to take and return a set of rows somehow?
An example of the kind of data this needs to be able to process:
create temp table demo (tag int, values text[]);
insert into demo values
(1, '{"a", "b"}'),
(2, '{"c", "d"}'),
(1, '{"a"}'),
(2, '{"c", "e", "f"}');
select tag, array_union(values) from demo group by tag;
tag | array_union
-----+-------------
2 | {c,d,e,f}
1 | {a,b}
Note in particular that the built-in array_agg cannot be used with this data, because the arrays are not all the same length:
select tag, array_agg(values) from demo group by tag;
ERROR: cannot accumulate arrays of different dimensionality
Array concatenation is expensive. That's why build-in array_agg() uses the internal array builder structure. Unfortunately, you cannot use this API on the SQL level.
I don't think using temp tables for custom aggregation is correct. Creating and dropping tables is expensive (temp table too) or very expensive (with high frequency), INSERT and SELECT is expensive, too. (Tables have a much more complex format than arrays.) If you need really fast aggregation, then write a C functions and use the array builder.
If you cannot use your own C extension, then use the built-in array_agg() function with deduplication and sort on already aggregated data.
CREATE OR REPLACE FUNCTION array_distinct(anyarray)
RETURNS anyarray AS $$
BEGIN
RETURN ARRAY(SELECT DISTINCT v FROM unnest($1) u(v) WHERE v IS NOT NULL ORDER BY v);
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
Call:
SELECT ..., array_distinct(array_agg(somecolumn)) FROM tab;

PostgreSQL Functions - Potentially NULL argument Comparison

I have a simplified function defined like so (many columns and parameters removed):
CREATE OR REPLACE FUNCTION write_no_duplicates(
IN p_foreign_key BIGINT
, IN p_write_value TEXT
, OUT output my_table
)
LANGUAGE plpgsql AS $$
BEGIN
SELECT *
INTO STRICT output
FROM my_table
WHERE my_table.foreign_key = p_foreign_key
AND (my_table.write_value = p_write_value
OR (my_table.write_value IS NULL AND p_write_value IS NULL));
EXCEPTION WHEN NO_DATA_FOUND THEN
INSERT INTO my_table (foreign_key, write_value)
VALUES (p_foreign_key, p_write_value)
RETURNING *
INTO STRICT output;
END $$;
The purpose of the function is to write a new row, but prevent duplicates. We aren't using a unique constraint due to the size & width of the table, and we don't mind if writes are slower.
My question has two parts:
Is there a better way to perform the null comparison?
Is there a better way of ensuring uniqueness?
Thank you for your time!

eliminate duplicate array values in postgres

I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}

Sending one record from cursor to another function Postgres

FYI: I am completely new to using cursors...
So I have one function that is a cursor:
CREATE FUNCTION get_all_product_promos(refcursor, cursor_object_id integer) RETURNS refcursor AS '
BEGIN
OPEN $1 FOR SELECT *
FROM promos prom1
JOIN promo_objects ON (prom1.promo_id = promo_objects.promotion_id)
WHERE prom1.active = true AND now() BETWEEN prom1.start_date AND prom1.end_date
AND promo_objects.object_id = cursor_object_id
UNION
SELECT prom2.promo_id
FROM promos prom2
JOIN promo_buy_objects ON (prom2.promo_id =
promo_buy_objects.promo_id)
LEFT JOIN promo_get_objects ON prom2.promo_id = promo_get_objects.promo_id
WHERE (prom2.buy_quantity IS NOT NULL OR prom2.buy_quantity > 0) AND
prom2.active = true AND now() BETWEEN prom2.start_date AND
prom2.end_date AND promo_buy_objects.object_id = cursor_object_id;
RETURN $1;
END;
' LANGUAGE plpgsql;
SO then in another function I call it and need to process it:
...
--Get the promotions from the cursor
SELECT get_all_product_promos('promo_cursor', this_object_id)
updated := FALSE;
IF FOUND THEN
--Then loop through your results
LOOP
FETCH promo_cursor into this_promotion
--Preform comparison logic -this is necessary as this logic is used in other contexts from other functions
SELECT * INTO best_promo_results FROM get_best_product_promos(this_promotion, this_object_id, get_free_promotion, get_free_promotion_value, current_promotion_value, current_promotion);
...
SO the idea here is to select from the cursor, loop using fetch (next is assumed correct?) and put the record fetched into this_promotion. Then send the record in this_promotion to another function. I can't figure out what to declare the type of this_promotion in get_best_product_promos. Here is what I have:
CREATE OR REPLACE FUNCTION get_best_product_promos(this_promotion record, this_object_id integer, get_free_promotion integer, get_free_promotion_value numeric(10,2), current_promotion_value numeric(10,2), current_promotion integer)
RETURNS...
It tells me: ERROR: plpgsql functions cannot take type record
OK first I tried:
CREATE OR REPLACE FUNCTION get_best_product_promos(this_promotion get_all_product_promos, this_object_id integer, get_free_promotion integer, get_free_promotion_value numeric(10,2), current_promotion_value numeric(10,2), current_promotion integer)
RETURNS...
Because I saw some syntax in the Postgres docs showed a function being created w/ a input parameter that had a type 'tablename' this works, but it has to be a tablename not a function :( I know I am so close, I was told to use cursors to pass records around. So I studied up. Please help.
One possibility would be to define the query you have in get_all_product_promos as a all_product_promos view instead. Then you would automatically have the "all_product_promos%rowtype" type to pass between functions.
That is, something like:
CREATE VIEW all_product_promos AS
SELECT promo_objects.object_id, prom1.*
FROM promos prom1
JOIN promo_objects ON (prom1.promo_id = promo_objects.promotion_id)
WHERE prom1.active = true AND now() BETWEEN prom1.start_date AND prom1.end_date
UNION ALL
SELECT promo_buy_objects.object_id, prom2.*
FROM promos prom2
JOIN promo_buy_objects ON (prom2.promo_id = promo_buy_objects.promo_id)
LEFT JOIN promo_get_objects ON prom2.promo_id = promo_get_objects.promo_id
WHERE (prom2.buy_quantity IS NOT NULL OR prom2.buy_quantity > 0)
AND prom2.active = true
AND now() BETWEEN prom2.start_date AND prom2.end_date
You should be able to verify using EXPLAIN that querying SELECT * FROM all_product_promos WHERE object_id = ? takes the object_id parameter into the two subqueries rather than filtering afterwards. Then from another function you can write:
DECLARE
this_promotion all_product_promos%ROWTYPE;
BEGIN
FOR this_promotion IN
SELECT * FROM all_product_promos WHERE object_id = this_object_id
LOOP
-- deal with promotion in this_promotion
END LOOP;
END
TBH I would avoid using cursors to pass records around in PLPGSQL. In fact, I would avoid using cursors in PLPGSQL full stop- unless you need to pass a whole resultset to another function for some reason. This method of simply looping through the statement is much simpler, with the caveat that the entire resultset is materialised into memory first.
The other disadvantage with this approach is that if you need to add a column to all_product_promos, you will need to recreate all the functions that depend on it, since you can't add columns to a view with 'alter view'. AFAICT this affects named types create with CREATE TYPE too, since ALTER TYPE doesn't seem to allow you to add columns to a type either.
So, you can specify a record format using "CREATE TYPE" to pass between functions. Any relation automagically specifies a type called <relation>%ROWTYPE which you can also use.
The answer:
Select specific fields in the cursor function instead of *
then:
CREATE TYPE get_all_product_promos as (buy_quantity integer, discount_amount numeric(10,2), get_quantity integer, discount_type integer, promo_id integer);
Then I can say:
CREATE OR REPLACE FUNCTION get_best_product_promos(this_promotion get_all_product_promos, this_object_id integer, get_free_promotion integer, get_free_promotion_value numeric(10,2), current_promotion_value numeric(10,2), current_promotion integer)
RETURNS...