How to merge JSONB field in a tree structure?

How to merge JSONB field in a tree structure? - postgresql

I have a table in Postgres which stores a tree structure. Each node has a jsonb field: params_diff:
CREATE TABLE tree (id INT, parent_id INT, params_diff JSONB);
INSERT INTO tree VALUES
(1, NULL, '{ "some_key": "some value" }'::jsonb)
, (2, 1, '{ "some_key": "other value", "other_key": "smth" }'::jsonb)
, (3, 2, '{ "other_key": "smth else" }'::jsonb);
The thing I need is to select a node by id with additional generated params field which contains the result of merging all params_diff from the whole parents chain:
SELECT tree.*, /* some magic here */ AS params FROM tree WHERE id = 3;
id | parent_id | params_diff | params
----+-----------+----------------------------+-------------------------------------------------------
3 | 2 | {"other_key": "smth else"} | {"some_key": "other value", "other_key": "smth else"}

Generally, a recursive CTE can do the job. Example:
Use table alias in another query to traverse a tree
We just need a more magic to decompose, process and re-assemble the JSON result. I am assuming from your example, that you want each key once only, with the first value in the search path (bottom-up):
WITH RECURSIVE cte AS (
SELECT id, parent_id, params_diff, 1 AS lvl
FROM tree
WHERE id = 3
UNION ALL
SELECT t.id, t.parent_id, t.params_diff, c.lvl + 1
FROM cte c
JOIN tree t ON t.id = c.parent_id
)
SELECT id, parent_id, params_diff
, (SELECT json_object(array_agg(key ORDER BY lvl)
, array_agg(value ORDER BY lvl))::jsonb
FROM (
SELECT key, value
FROM (
SELECT DISTINCT ON (key)
p.key, p.value, c.lvl
FROM cte c, jsonb_each_text(c.params_diff) p
ORDER BY p.key, c.lvl
) sub1
ORDER BY lvl
) sub2
) AS params
FROM cte
WHERE id = 3;
How?
Walk the tree with a classic recursive CTE.
Create a derived table with all keys and values with jsonb_each_text() in a LATERAL JOIN, remember the level in the search path (lvl).
Use DISTINCT ON to get the "first" (lowest lvl) value for each key. Details:
Select first row in each GROUP BY group?
Sort and aggregate resulting keys and values and feed the arrays to json_object() to build the final params value.
SQL Fiddle (only as far as pg 9.3 can go with json instead of jsonb).

Related

Postgresql - select query with aggregated decisions column as json

I have table which contains specified columns:
id - bigint
decision - varchar(80)
type - varchar(258)
I want to make a select query which in result returns something like this(id, decisionsValues with counts as json, type):
id decisions type
1 {"firstDecisionsValue":countOfThisValue, "secondDecisionsValue": countOfThisValue} entryType
I heard that I can try play with json_agg but it does not allow COUNT method, tried to use json_agg with query:
SELECT ac.id,
json_agg(ac.decision),
ac.type
FROM myTable ac
GROUP BY ac.id, ac.type;
but ends with this(for entry with id 1 there are two occurences of firstDecisionsValue, one occurence of secondDecisionsValue):
id decisions type
1 {"firstDecisionsValue", "firstDecisionsValue", "secondDecisionsValue"} entryType
minimal reproducible example
CREATE TABLE myTable
(
id bigint,
decisions varchar(80),
type varchar(258)
);
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'secondDecisionsValue', 'myType');
Can you provide me any tips how to make it as expected?
1, {"fistDecisionsValue":2, "secondDecisionsValue":1}, entryType

You can try this
SELECT a.id, jsonb_object_agg(a.decisions, a.count), a.type
FROM
( SELECT id, type, decisions, count(*) AS count
FROM myTable
GROUP BY id, type, decisions
) AS a
GROUP BY a.id, a.type
see the result in dbfiddle.

First, you should calculate the count of id, type, decisions for each decisions after that, you should use jsonb_object_agg to create JSON.
Demo
with data as (
select
ac.id,
ac.type,
ac.decisions,
count(*)
from
myTable ac
group by
ac.id,
ac.type,
ac.decisions
)
select
d.id,
d.type,
json_object_agg(d.decisions, d.count)
from
data d
group by
d.id,
d.type

SQL left join case statement

Need some help working out the SQL. Unfortunately the version of tsql is SybaseASE which I'm not too familiar with, in MS SQL I would use a windowed function like RANK() or ROW_NUMBER() in a subquery and join to those results ...
Here's what I'm trying to resolve
TABLE A
Id
1
2
3
TABLE B
Id,Type
1,A
1,B
1,C
2,A
2,B
3,A
3,C
4,B
4,C
I would like to return 1 row for each ID and if the ID has a type 'A' record that should display, if it has a different type then it doesn't matter but it cannot be null (can do some arbitrary ordering, like alpha to prioritize "other" return value types)
Results:
1, A
2, A
3, A
4, B
A regular left join (ON A.id = B.id and B.type = 'A') ALMOST returns what I am looking for however it returns null for the type when I want the 'next available' type.

You can use a INNER JOIN on a SubQuery (FirstTypeResult) that will return the minimum type per Id.
Eg:
SELECT TABLEA.[Id], FirstTypeResult.[Type]
FROM TABLEA
JOIN (
SELECT [Id], Min([Type]) As [Type]
FROM TABLEB
GROUP BY [Id]
) FirstTypeResult ON FirstTypeResult.[Id] = TABLEA.[Id]

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!

Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.

You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3

Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle

You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle

In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id

I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"

Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Postgresql recursive CTE results ordering

I'm working on a query to pull data out of a hierarchy
e.g.
CREATE table org (
id INT PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT);
INSERT INTO org (id, name) VALUES (0, 'top');
INSERT INTO org (id, name, parent_id) VALUES (1, 'middle1', 0);
INSERT INTO org (id, name, parent_id) VALUES (2, 'middle2', 0);
INSERT INTO org (id, name, parent_id) VALUES (3, 'bottom3', 1);
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org;
It works as expected.
3 1 "bottom3"
1 0 "middle1"
0 "top"
It's also returning the data in the order that I expect, and it makes sense to me that it would do this because of the way that the results would be discovered.
The question is, can I count on the order being like this?

Yes, there is a defined order. In the Postgres WITH doc, they give the following example:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ROW(g.f1, g.f2),
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
About which they say in a Tip box (formatting mine):
The recursive query evaluation algorithm produces its output in
breadth-first search order. You can display the results in depth-first
search order by making the outer query ORDER BY a "path" column
constructed in this way.
You do appear to be getting breadth-first output in your case above based on the INSERT statements, so I would say you could, if you wanted, modify your outer SELECT to order it in another fashion.
I believe the analog for depth-first in your case would probably be this:
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org
ORDER BY id;
As I would expect (running things through in my head) that to yield this:
0 "top"
1 0 "middle1"
3 1 "bottom3"

Order by objects relation (PostgreSQL)

Have 2 tables for example:
In 1st: object & parent columns
object | parent
-------+---------
object1| null
object2| object1
object3| null
2nd has: object & reference columns
object | reference
-------+---------
object1| null
object2| null
object3| object1
Need to query tables to order like following: parent is first, then - child(s), objects which have reference(s) to parent.
object1
object2
object3
Is it possible to do in one SQL query or need to sort manually in an array? Seems it is a classical task, probably solution already exists somewhere?

Is this what you're looking for?
CREATE TABLE oparen (object varchar(10), parent varchar(10));
CREATE TABLE oref (object varchar(10), ref varchar(10));
INSERT INTO oparen VALUES
('object1',null),('object2','object1'),
('object3',null),('object4','object2');
INSERT INTO oref VALUES
('object1',null),('object2',null),('object3','object1'),
('object5','object6'),('object6','object1'),('object7','object4');
WITH hier AS (
SELECT parent AS obj, 1 AS rank FROM oparen
WHERE parent IS NOT NULL
UNION
SELECT object, 2 FROM oparen
WHERE parent IS NOT NULL
UNION
SELECT object, 3 FROM oref
WHERE ref IS NOT NULL),
allobj AS (
SELECT object AS obj FROM oparen
UNION
SELECT object FROM oref)
SELECT a.obj, coalesce(h.rank, 4) AS rank
FROM allobj a LEFT JOIN hier h ON a.obj = h.obj
ORDER BY coalesce(h.rank, 4), a.obj;
EDIT: After the improved example in the answer below, the following query should do the trick:
WITH parents AS (
SELECT parent AS obj, 1 AS rank FROM oparen
WHERE parent IS NOT NULL
),
family AS (
SELECT * FROM parents
UNION ALL
SELECT object, 2 FROM oparen op
WHERE parent IS NOT NULL
AND NOT EXISTS (SELECT obj FROM parents WHERE obj = op.object)
),
hier AS (
SELECT * FROM family
UNION ALL
SELECT object AS obj, coalesce(f.rank + 2, 5) AS rank
FROM oref LEFT JOIN family f ON oref.ref = f.obj
WHERE ref IS NOT NULL
),
allobj AS (
SELECT object AS obj FROM oparen
UNION
SELECT object FROM oref)
SELECT a.obj, h.rank AS rank
FROM allobj a LEFT JOIN hier h ON a.obj = h.obj
ORDER BY h.rank, a.obj;
Testbed creation in the top is updated according to the new requirements.

I inserted following data:
INSERT INTO oparen VALUES
('object1',null),('object2','object1'),('object3',null),('object4','object2');
INSERT INTO oref VALUES
('object1',null),('object2',null),('object3','object1'),('object5','object6'),('object6','object1');
Order is incorrect and object2 listed twice. DISTINCT on obj breaks the order also. Should go 6 then 5.

No, does not work: checked for another data and simplified to use and only by oref table content:
INSERT INTO oref VALUES
('object1',null),('object2',null),('object3','object1'),
('object5','object6'),('object6','object1'),('object7','object4'), ('object4','object5');
WITH family AS (
SELECT object AS obj, 1 AS rank FROM oref
WHERE ref IS NULL
),
hier AS (
SELECT * FROM family
UNION ALL
SELECT object AS obj, coalesce(f.rank + 2, 5) AS rank
FROM oref LEFT JOIN family f ON oref.ref = f.obj
WHERE ref IS NOT NULL
),
allobj AS (
SELECT object AS obj FROM oref)
SELECT a.obj, h.rank AS rank
FROM allobj a
LEFT JOIN hier h ON a.obj = h.obj
ORDER BY h.rank, a.obj;
Think need to use recursive queries here. Will write and post here.

Following recursive query works:
WITH RECURSIVE tables(object, rank) AS (
SELECT DISTINCT o.object, 1 AS rank FROM oref o
WHERE o.ref IS NULL
UNION
SELECT o.object, t.rank + 1 AS rank
FROM (SELECT DISTINCT o.object, o.ref FROM oref o
WHERE ref IS NOT NULL) o, tables t
WHERE o.ref = t.object AND rank <= t.rank
),
ordered AS (
SELECT * FROM tables
)
SELECT * FROM tables
WHERE tables.rank = (SELECT MAX(rank) FROM ordered WHERE ordered.object = tables.object)
ORDER BY rank;
Any comments, questions, objections, propositions? ;)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to merge JSONB field in a tree structure? - postgresql

Related

Postgresql - select query with aggregated decisions column as json

SQL left join case statement

How can I SUM distinct records in a Postgres database where there are duplicate records?

Postgresql recursive CTE results ordering

Order by objects relation (PostgreSQL)

Categories

Resources