Aggregation on fixed size JSONB array in PostgreSQL - postgresql

I'm struggling doing aggregations on a JSONB field in a PostgreSQL database. This is probably easier explained with an example so if create and populate a table called analysis with 2 columns (id and analysis) as follows: -
create table analysis (
id serial primary key,
analysis jsonb
);
insert into analysis
(id, analysis) values
(1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'),
(2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'),
(3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'),
(4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'),
(5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'),
(6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}');
As you can see the analysis JSONB field always contains 2 attributes category and results. The results attribute will always contain an fixed length array of size 16. I've used various functions such as jsonb_array_elements but what I'm trying to do is the following: -
Group by analysis->'category'
Average of each array element
When I want is a statement to return 3 rows grouped by category (i.e. news, sport and weather) and a 16 fixed length array containing averages. To further complicate things, if there are nulls in the array then we should ignore them (i.e. we are not simply summing and averaging by the number of rows). The result should look something like the following: -
category | analysis_average
-----------+--------------------------------------------------------------------------------------------------------------
"news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36]
"sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]
"weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106]
NOTE: Notice the 45 and 36 in the last 2 array itmes on the 1st row which illustrates ignoring the nullss.
I had considered creating a view which exploded the array into 16 columns i.e.
create view analysis_view as
select a.*,
(a.analysis->'results'->>0)::int as result0,
(a.analysis->'results'->>1)::int as result1
/* ... etc for all 16 array entries .. */
from analysis a;
This seems extremely inelegant to me and removes the advantages of using an array in the first place but could probably hack something together using that approach.
Any pointers or tips will be most appreciated!
Also performance is really important here so the higher the performance the better!

This will work for any array length
select category, array_agg(average order by subscript) as average
from (
select
a.analysis->>'category' category,
subscript,
avg(v)::numeric(5,2) as average
from
analysis a,
lateral unnest(
array(select jsonb_array_elements_text(analysis->'results')::int)
) with ordinality s(v,subscript)
group by 1, 2
) s
group by category
;
category | average
----------+----------------------------------------------------------------------------------------------------------
news | {14.33,15.33,16.33,17.33,18.33,19.33,20.33,21.33,22.33,23.33,24.33,25.33,26.33,27.33,45.00,36.00}
sport | {61.00,62.00,63.00,64.00,65.00,66.00,67.00,68.00,69.00,70.00,71.00,72.00,73.00,74.00,75.00,76.00}
weather | {91.00,92.00,93.00,94.00,95.00,96.00,97.00,98.00,99.00,100.00,101.00,102.00,103.00,104.00,105.00,106.00}
table functions - with ordinality
lateral

Because the array is always of the same length, you can use generate_series instead of typing the index of every array element yourself. You CROSS JOIN with that generated series so the index is applied to every category and you can get every element at position s from the array. Then it is just aggregating the data using GROUP BY.
The query then becomes:
SELECT category, array_agg(val ORDER BY s) analysis_average
FROM (
SELECT analysis->'category' category, s, AVG((analysis->'results'->>s)::numeric) val
FROM analysis
CROSS JOIN generate_series(0, 15) s
GROUP BY category,s
) q
GROUP BY category
15 is in this case the last index of the array (16-1).

It can be done in more traditional way like
select
(t.analysis->'category')::varchar,
array_math_avg(array(select jsonb_array_elements_text(t.analysis->'results')::int))::numeric(9,2)[]
from
analysis t
group by 1 order by 1;
but we need to do some preparation:
create type t_array_math_agg as(
c int[],
a numeric[]
);
create or replace function array_math_sum_f(in t_array_math_agg, in numeric[]) returns t_array_math_agg as $$
declare
r t_array_math_agg;
i int;
begin
if $2 is null then
return $1;
end if;
r := $1;
for i in array_lower($2,1)..array_upper($2,1) loop
if coalesce(r.a[i],$2[i]) is null then
r.a[i] := null;
else
r.a[i] := coalesce(r.a[i],0) + coalesce($2[i],0);
r.c[i] := coalesce(r.c[i],0) + 1;
end if;
end loop;
return r;
end; $$ immutable language plpgsql;
create or replace function array_math_avg_final(in t_array_math_agg) returns numeric[] as $$
declare
r numeric[];
i int;
begin
if array_lower($1.a, 1) is null then
return null;
end if;
for i in array_lower($1.a,1)..array_upper($1.a,1) loop
r[i] := $1.a[i] / $1.c[i];
end loop;
return r;
end; $$ immutable language plpgsql;
create aggregate array_math_avg(numeric[]) (
sfunc=array_math_sum_f,
finalfunc=array_math_avg_final,
stype=t_array_math_agg,
initcond='({},{})'
);

Related

Is there a PostGIS function to conditionally merge linestring geometries to the neighboring ones?

I had a lines (multilinestring) table in my PostGIS database (Postgres 11), which I have converted to linestrings and also checked the validity (ST_IsValid()) of new linestring geometries.
create table my_line_tbl as
select
gid gid_multi,
adm_code, t_count,
st_length((st_dump(st_linemerge(geom))).geom)::int len,
(st_dump(st_linemerge(geom))).geom geom
from
my_multiline_tbl
order by gid;
alter table my_line_tbl add column id serial primary key not null;
The first 10 rows look like this:
id, gid_multi, adm_code, t_count, len, geom
1, 1, 30, 5242, 407, LINESTRING(...)
2, 1, 30, 3421, 561, LINESTRING(...)
3, 2, 50, 5248, 3, LINESTRING(...)
4, 2, 50, 1458, 3, LINESTRING(...)
5, 2, 60, 2541, 28, LINESTRING(...)
6, 2, 30, 3325, 4, LINESTRING(...)
7, 2, 20, 1142, 5, LINESTRING(...)
8, 2, 30, 1425, 7, LINESTRING(...)
9, 3, 30, 2254, 4, LINESTRING(...)
10, 3, 50, 2254, 50, LINESTRING(...)
I am trying to develop the logic.
Find all <= 10 m segments and merge those to neighboring geometries
(previous or next) > 10 m
If there are many <= 10 m segments next to
each other merge them to make > 10 m segments (min length: > 10 m)
In case of intersections, merge any <= 10 m segments to the longest neighboring geometry
I thought of using SQL window functions to check the length (st_length()) of succeeding geometries (lead(id) over()), and then merging them, but the problem with this approach is, the successive IDs are not next to each other (do not intersects, st_intersects()).
My code attempt (dynamic SQL) is here, where I try to separate <= 10 and > 10 meter geometries.
with lt10mseg as (
select
id, gid_multi,
len, geom lt10m_geom
from
my_line_tbl
where len <= 10
order by id
), gt10mseg as (
select
id, gid_multi,
len, geom gt10m_geom
from
my_line_tbl
where len > 10
order by id
)
select
st_intersects(lt10m_geom,gt10m_geom)
from
lt10mseg, gt10mseg
order by id
Any help/suggestions (dynamic SQL/PLPGSQL) to continue develop the above logic? The ultimate goal is to get rid of <= 10 m segments by merging them to the neighbors.

Postgresql update column with integer values

I've a column with jsonb type and contains list of elements either in string or integer format.
What I want now is to make all of them as same type e.g either all int or all string format
Tried: this way I get single element but I need to update all of the elements inside of the list.
SELECT parent_path -> 1 AS path
FROM abc
LIMIT 10
OR
Update abc SET parent_path = ARRAY[parent_path]::TEXT[] AS parent_path
FROM abc
OR
UPDATE abc SET parent_path = replace(parent_path::text, '"', '') where id=123
Current Output
path
[6123697, 178, 6023099]
[625953521394212864, 117, 6023181]
["153", "6288361", "553248635949090971"]
[553248635358954983, 178320, 174, 6022967]
[6050684, 6050648, 120, 6022967]
[653, 178238, 6239135, 38, 6023117]
["153", "6288496", "553248635977039112"]
[553248635998143523, 6023185]
[553248635976194501, 6022967]
[553248635976195634, 6022967]
Expected Output
path
[6123697, 178, 6023099]
[625953521394212864, 117, 6023181]
[153, 6288361, 553248635949090971] <----
[553248635358954983, 178320, 174, 6022967]
[6050684, 6050648, 120, 6022967]
[653, 178238, 6239135, 38, 6023117]
[153, 6288496, 553248635977039112] <----
[553248635998143523, 6023185]
[553248635976194501, 6022967]
[553248635976195634, 6022967]
Note: Missing double quotes on the list. I've tried several methods from here but no luck
You will have to unnest them, cleanup each element, then aggregate it back to an array:
The following converts all elements to integers:
select (select jsonb_agg(x.i::bigint order by idx)
from jsonb_array_elements_text(a.path) with ordinality as x(i, idx)
) as clean_path
from abc a;
You can use a scalar subquery to select, unnest, and aggregate the elements:
WITH mytable AS (
SELECT row_number() over () as id, col::JSONB
FROM (VALUES ('[6123697, 178, 6023099]'),
('["6123697", "178", "6023099"]')) as bla(col)
)
SELECT id, (SELECT JSONB_AGG(el::int) FROM jsonb_array_elements_text(col) as el)
FROM mytable

how to update string concatenation in postgres sql with existing value

I have a reports table with value as shown below
id reportIdList
1 123, 124, 125
2 123, 124, 125
3 123, 124, 125, 127
4 123, 124, 125, 127
I need some help with sql to add additional value as in
id reportIdList
1 123, 124, 125, *126*
2 123, 124, 125, *126*
3 123, 124, 125, *126*, 127
4 123, 124, 125, *126*, 127
Currently I have a way to update
update reports set reportIdList = reportIdList || ',126';
But this would update the table as shown below:
id reportIdList
1 123, 124, 125, *126*
2 123, 124, 125, *126*
3 123, 124, 125, 127, *126*
4 123, 124, 125, 127, *126*
Any help is appreciated, thanks
The easiest way is to create a function to deal with your bad design:
create or replace function add_element(p_input text, p_add text)
returns text
as
$$
select string_agg(x::text, ',' order by x)
from (
select trim(nullif(x,''))
from unnest(string_to_array(p_input, ',')) as e(x)
union
select p_add
) t(x);
$$
language sql;
Then you can do:
update the_table
set reportidlist = add_element(reportidlist, 126);
But you should really fix your data model and stop storing comma separated strings.

PostgreSQL: Add condition in where clause using CASE

I am using PostgreSQL 8.2 and I am also new to PostgreSQL.
I have to add one condition in the WHERE clause depending upon specific value (49) of the field (activity.type). Here is my Query:
SELECT activity.*
FROM activity
LEFT JOIN event_types ON activity.customstatusid = event_types.id, getviewableemployees(3222, NULL) AS report
WHERE
(
CASE WHEN activity.type = 49 THEN
'activity.individualid IN(SELECT individualid from prospects where prospects.individualid = activity.individualid)'
ELSE 1
END
)
AND activity.date BETWEEN '2016-10-01' AND '2016-10-06'
AND activity.type IN (21, 22, 49, 50, 37, 199)
AND (event_types.status = 1 or event_types.status IS NULL);
When I run above query in the command line access of PGSQL then I get below error:
ERROR: invalid input syntax for integer: "activity.individualid IN(SELECT individualid from prospects where prospects.individualid = activity.individualid)"
What I am missing here?
Implement your where clause as:
WHERE (
activity.type != 49 OR
activity.individualid IN (
SELECT individualid from prospects
WHERE prospects.individualid = activity.individualid)
)
AND activity.date BETWEEN '2016-10-01' AND '2016-10-06'
AND activity.type IN (21, 22, 49, 50, 37, 199)
AND (event_types.status = 1 or event_types.status IS NULL);
The first clause will only be true when either:
activity.type != 49; or
activity.type == 49 and activity.individualid is found in the subquery.

PostgreSQL, calculating ingredients from normatives

Before some time i get here very reliable query for calculating ingredients from normatives but with time I get need for advancing a bit.
Here are example tables:
CREATE TABLE myusedfood
(mybill int, mydate text, food_code int, food_name text, qtyu integer, meas text);
INSERT INTO myusedfood (mybill, mydate, food_code, food_name, qtyu, meas)
VALUES (1, '03.01.2014', 10, 'spaghetti', 3, 'pcs'),
(2, '04.01.2014', 156, 'mayonnaise', 2, 'pcs'),
(3, '06.01.2014', 173, 'ketchup', 1, 'pcs'),
(4, '07.01.2014', 172, 'bolognese sauce', 2, 'pcs'),
(5, '08.01.2014', 173, 'ketchup', 1, 'pcs'),
(6, '15.01.2014', 175, 'worchester sauce', 2, 'pcs'),
(7, '16.01.2014', 177, 'parmesan', 1, 'pcs'),
(8, '17.01.2014', 10, 'spaghetti', 2, 'pcs'),
(9, '18.01.2014', 156, 'mayonnaise', 1, 'pcs'),
(10, '19.01.2014', 10, 'spaghetti', 2, 'pcs'),
(11, '19.01.2014', 1256, 'spaghetti rinf', 100, 'gramm'),
(12, '20.01.2014', 156, 'mayonnaise', 2, 'pcs'),
(13, '21.01.2014', 173, 'ketchup', 1, 'pcs'),
(14, '19.01.2014', 10, 'spaghetti', 2, 'pcs');
DROP TABLE IF EXISTS myingredients;
CREATE TABLE myingredients
(food_code int, ingr_code int, ingr_name text, qtyi decimal(10, 3), meas text);
INSERT INTO myingredients (food_code, ingr_code, ingr_name, qtyi, meas)
VALUES (10, 1256, 'spaghetti rinf', 75, 'gramm'),
(156, 1144, 'salt', 0.3, 'gramm'),
(10, 1144, 'salt', 0.5, 'gramm'),
(156, 1140, 'fresh egg', 50, 'gramm'),
(172, 1138, 'tomato', 80, 'gramm'),
(156, 1139, 'mustard', 5, 'gramm'),
(172, 1136, 'clove', 1, 'gramm'),
(156, 1258, 'oil', 120, 'gramm'),
(172, 1135, 'laurel', 0.4, 'gramm'),
(10, 1258, 'oil', 0.4, 'gramm'),
(172, 1130, 'corned beef', 40, 'gramm');
First table contains bill's number and date, notmative's/ingredient's code, name, selled quantity.
Second represent normative's code, ingredient's code, ingredient's name, ingredient's quantity in related normative.
With this query I get listed all ingredients from all normatives plus normatives which haven't any ingredients. That work OK.
SELECT SUM(f.qtyu) AS used,
COALESCE(i.ingr_code, f.food_code) AS code,
COALESCE(i.ingr_name, f.food_name) AS f_name,
SUM(COALESCE(i.qtyi, 1) * f.qtyu) AS qty,
COALESCE(i.meas, f.meas) AS meas
FROM myusedfood f LEFT JOIN myingredients i
ON f.food_code = i.food_code
GROUP BY COALESCE(i.ingr_code, f.food_code),
COALESCE(i.ingr_name, f.food_name),
COALESCE(i.meas, f.meas)
ORDER BY code;
With time my real tables becames huge and queries slower. I often need to get results for just one food code but I don't know how to do that properly regarding of speed.
For example with showed tables if we query for ingredient code 1256 result should be
used ing.code ing.name qty meas
------------------------------------------------------
109, 1256, "spaghetti rinf", 775.000, "gramm"
Where article 1256 is "used" 9 times in normative (10) and 100g :) as alone, all in qty of 775 gramms.
Here I have 2 questions:
How to optimize upper query to list only one code instead of all, say code 1256?
How to get list of all records related to code 1256.
Result should look like this:
bill date code name qty
------------------------------------------------------
1, '03.01.2014', 10, 'spaghetti', 225
8, '17.01.2014', 10, 'spaghetti', 150
10, '19.01.2014', 10, 'spaghetti', 150
11, '19.01.2014', 1256, 'spaghetti rinf', 100
14, '19.01.2014', 10, 'spaghetti', 150
------------------------------------------------------
775
Note that ingredient 1256 may be selled in normative (10) or alone as is.
Please help to get those two needed queries.