Postgresql - select query with aggregated decisions column as json - postgresql

I have table which contains specified columns:
id - bigint
decision - varchar(80)
type - varchar(258)
I want to make a select query which in result returns something like this(id, decisionsValues with counts as json, type):
id decisions type
1 {"firstDecisionsValue":countOfThisValue, "secondDecisionsValue": countOfThisValue} entryType
I heard that I can try play with json_agg but it does not allow COUNT method, tried to use json_agg with query:
SELECT ac.id,
json_agg(ac.decision),
ac.type
FROM myTable ac
GROUP BY ac.id, ac.type;
but ends with this(for entry with id 1 there are two occurences of firstDecisionsValue, one occurence of secondDecisionsValue):
id decisions type
1 {"firstDecisionsValue", "firstDecisionsValue", "secondDecisionsValue"} entryType
minimal reproducible example
CREATE TABLE myTable
(
id bigint,
decisions varchar(80),
type varchar(258)
);
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'secondDecisionsValue', 'myType');
Can you provide me any tips how to make it as expected?
1, {"fistDecisionsValue":2, "secondDecisionsValue":1}, entryType

You can try this
SELECT a.id, jsonb_object_agg(a.decisions, a.count), a.type
FROM
( SELECT id, type, decisions, count(*) AS count
FROM myTable
GROUP BY id, type, decisions
) AS a
GROUP BY a.id, a.type
see the result in dbfiddle.

First, you should calculate the count of id, type, decisions for each decisions after that, you should use jsonb_object_agg to create JSON.
Demo
with data as (
select
ac.id,
ac.type,
ac.decisions,
count(*)
from
myTable ac
group by
ac.id,
ac.type,
ac.decisions
)
select
d.id,
d.type,
json_object_agg(d.decisions, d.count)
from
data d
group by
d.id,
d.type

Related

Apply join, sort on date column and select the first row where one of the column value is not null

I have two tables(Table A and Table B) in a Postgres DB.
Both have "id" column in common. Table A has one column called "id" and Table B has three columns: "id, date, value($)".
For each "id" of Table A there exists multiple rows in Table B in the following format - (id, date, value).
For instance, for Table A with "id" as 1 if there exists following rows in Table B:
(1, 2018-06-21, null)
(1, 2018-06-20, null)
(1, 2018-06-19, 202)
(1, 2018-06-18, 200)
I would like to extract the most recent dated non-null value. For example for id - 1, the result should be 202. Please share your thoughts or let me know in case more info is required.
Here is the solution I went ahead with:
with mapping as ( select distinct table1.id, table2.value, table2.date, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
left join table2 on table2.id=table1.id and table2.value is not null
)
select * from mapping where row_number = 1
Let me know if there is scope for improvement.
You may very well want an inner join, not an outer join. If you have an id in table1 that does not exist in table2 or that has only null values you will get NULL for both date and value. This is due to the how outer join works. What it says is if nothing in the right side table matches the ON condition then return NULL for each column in that table. So
with mapping as
(select distinct table1.id
, table2.value
, table2.date
, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
join table2 on table2.id=table1.id and table2.value is not null
)
select *
from mapping
where row_number = 1;
See example of each here. Your query worked because all your test data satisfied the 1st condition of the ON condition. You really need test data that fails to see what your query does.
Caution: DATE and VALUE are very poor choice for a column names. Both are SQL standard reserved words, although not Postgres specifically. Further DATE is a Postgres data type. Having columns with names the same as datatype leads to confusion.

Add Column in table with value partition by group

My table is somethingg like
CREATE TABLE table1
(
_id text,
name text,
data_type int,
data_value int,
data_date timestamp -- insertion time
);
Now due to a system bug, many duplicate entries are created and I need to remove those duplicated and keep only unique entries excluding data_date because it is a system generated date.
My query to do that is something like:
DELETE FROM table1 A
USING ( SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value
HAVING count(data_date) > 1) B
WHERE A._id = B._id
AND A.name = B.name
AND A.data_type = B.data_type
AND A.data_value = B.data_value
AND A.data_date != B.min_date;
However this query works, having millions of records in the table, I want a faster way for it. My idea is to create a new column with value as partition by [_id, name, data_type, data_value] or columns which are in group by. However, I could not find the way to create such column.
I would appretiate if any one may suggest a way to create such column.
Edit 1:
There is another thing to add, I don't want to use CTE or subquery for updating this new column because it will be same as my existing query.
The best way is simply creating a new table without duplicated records:
CREATE...
SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value;
Alternatively, you can create a rank and then filter, but a subquery is needed.
RANK() OVER (PARTITION BY your_variables ORDER BY data_date ASC) r
And then filter r=1.

How Dynamicaly columns in UNPIVOT operator

I currently have the following query:
WITH History AS (
SELECT
kz.*,
kz.__$operation AS operation,
map.tran_begin_time as beginT,
map.tran_end_time as endT
FROM cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all') AS kz
INNER JOIN [cdc].[lsn_time_mapping] map
ON kz.[__$start_lsn] = map.start_lsn
where kz.GUID_BalanceHC_Zalezh = 'DDA9AB3A-A0AF-4623-9362-0000C8C83D63'
),
UnpivotedValues AS(
SELECT guid, GUID_another, field, val, operation, beginT, endT
FROM History
UNPIVOT ( [val] FOR field IN
(
area,
oilwidthmin,
oilwidthmax,
efectivwidthmin,
efectivwidthmax,
etc...
))t
),
UnpivotedWithLastValue AS (
SELECT
*,
--Use LAG() to get the last value for the same field
LAG(val, 1) OVER (PARTITION BY guid, GUID_another, field ORDER BY BeginT) LastVal
FROM UnpivotedValues
)
SELECT * FROM UnpivotedWithLastValue WHERE val <> LastVal OR LastVal IS NULL ORDER BY guid
This query returns the changed values for a single table that has CDC (Change Data Capture) enabled.
I want to create a stored procedure that receives the columns to be unpivoted, and the cdc function (e.g. cdc.fn_cdc_get_all_...) as parameters and returns the result set.
The result for this tables must be joined in one report.
In my case parameter 1 is cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all'). This is the CDC function.
How should I send the list of fields that i want in the result? How's the string?
Also, is there a way to do without dynamic SQL? Dynamic SQL it is not better solution for performance.
As you know SQL Server is declarative by design and does not support macro substitution.
UNPIVOT would clearly be more performant, but here is a simplified example of a UNPIVOT which does not require Dynamic SQL, but only a little XML.
Example
Let's assume your table/results looks like this:
You may notice that I only we only specify key fields to EXCLUDE in the final WHERE
Declare #YourData table (ID int,Active bit,First_Name varchar(50),Last_Name varchar(50),EMail varchar(50),Salary decimal(10,2))
Insert into #YourData values
(1,1,'John','Smith','john.smith#email.com',85600),
(2,0,'Jane','Doe' ,'jane.doe#email.com',83200)
;with cte as (
-- Replace with your Complex Query
Select * from #YourData
)
Select A.ID
,A.Active
,C.*
From cte A
Cross Apply (Select XMLData=cast((Select A.* for XML RAW) as xml)) B
Cross Apply (
Select Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From XMLData.nodes('/row') C1(n)
Cross Apply C1.n.nodes('./#*') C2(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('ID','Active')
) C
Returns

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Postgresql recursive CTE results ordering

I'm working on a query to pull data out of a hierarchy
e.g.
CREATE table org (
id INT PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT);
INSERT INTO org (id, name) VALUES (0, 'top');
INSERT INTO org (id, name, parent_id) VALUES (1, 'middle1', 0);
INSERT INTO org (id, name, parent_id) VALUES (2, 'middle2', 0);
INSERT INTO org (id, name, parent_id) VALUES (3, 'bottom3', 1);
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org;
It works as expected.
3 1 "bottom3"
1 0 "middle1"
0 "top"
It's also returning the data in the order that I expect, and it makes sense to me that it would do this because of the way that the results would be discovered.
The question is, can I count on the order being like this?
Yes, there is a defined order. In the Postgres WITH doc, they give the following example:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ROW(g.f1, g.f2),
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
About which they say in a Tip box (formatting mine):
The recursive query evaluation algorithm produces its output in
breadth-first search order. You can display the results in depth-first
search order by making the outer query ORDER BY a "path" column
constructed in this way.
You do appear to be getting breadth-first output in your case above based on the INSERT statements, so I would say you could, if you wanted, modify your outer SELECT to order it in another fashion.
I believe the analog for depth-first in your case would probably be this:
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org
ORDER BY id;
As I would expect (running things through in my head) that to yield this:
0 "top"
1 0 "middle1"
3 1 "bottom3"