Postgresql array unique aggregation - postgresql

I have a large table with structure
CREATE TABLE t (
id SERIAL primary key ,
a_list int[] not null,
b_list int[] not null,
c_list int[] not null,
d_list int[] not null,
type int not null
)
I want query all unique values from a_list, b_list, c_list, d_list for type like this
select
some_array_unique_agg_function(a_list),
some_array_unique_agg_function(b_list),
some_array_unique_agg_function(c_list),
some_array_unique_agg_function(d_list),
count(1)
where type = 30
For example, for this data
+----+---------+--------+--------+---------+------+
| id | a_list | b_list | c_list | d_list | type |
+----+---------+--------+--------+---------+------+
| 1 | {1,3,4} | {2,4} | {1,1} | {2,4,5} | 30 |
| 1 | {1,2,4} | {2,4} | {4,1} | {2,4,5} | 30 |
| 1 | {1,3,5} | {2} | {} | {2,4,5} | 30 |
+----+---------+--------+--------+---------+------+
I want the next result
+-------------+--------+--------+-----------+-------+
| a_list | b_list | c_list | d_list | count |
+-------------+--------+--------+-----------+-------+
| {1,2,3,4,5} | {2,4} | {1,4} | {2,4,5} | 3 |
+-------------+--------+--------+-----------+-------+
Is there some_array_unique_agg_function for my purposes?

Try this
with cte as (select
unnest( a_list::text[] )::integer as a_list,
unnest( b_list::text[] )::integer as b_list,
unnest( c_list::text[] )::integer as c_list,
unnest( d_list::text[] )::integer as d_list,
(select count(type) from t) as type
from t
where type = 30
)
select array_agg(distinct a_list),array_agg(distinct b_list)
,array_agg(distinct c_list),array_agg(distinct d_list),type from cte group by type ;
Result:
"{1,2,3,4,5}";"{2,4,NULL}";"{1,4,NULL}";"{2,4,5}";3

Related

How to expand columns into individual timesteps in PostgreSQL

I have a table of columns that represent a time series. The datatypes are not important, but anything after timestep2 could potentially be NULL.
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | bar2 | baz2 | NULL |
I am attempting to retrieve a view of the data more suitable for modeling. My modeling use-case requires that I break each time series (row) into rows representing their individual "states" at each step. That is:
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | NULL | NULL | NULL |
| a | foo1 | bar1 | NULL | NULL |
| a | foo1 | bar1 | baz1 | NULL |
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | NULL | NULL | NULL |
| b | foo2 | bar2 | NULL | NULL |
| b | foo2 | bar2 | baz2 | NULL |
How can I accomplish this in PostgreSQL?
Use UNION.
select id, timestep1, timestep2, timestep3, timestep4
from my_table
union
select id, timestep1, timestep2, timestep3, null
from my_table
union
select id, timestep1, timestep2, null, null
from my_table
union
select id, timestep1, null, null, null
from my_table
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
There is a more compact solution, maybe more convenient when dealing with a greater number of timesteps:
select distinct
id,
timestep1,
case when i > 1 then timestep2 end as timestep2,
case when i > 2 then timestep3 end as timestep3,
case when i > 3 then timestep4 end as timestep4
from my_table
cross join generate_series(1, 4) as i
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
Test it in Db<>fiddle.

PostgreSQL crosstab doesn't work as desired

In this example, I expect the resulting pivot table to have values for 4 columns, but instead there's only values for 2.
It should've returned something like this:
| time | trace1 | trace2 | trace3 | trace4 |
| -----------------------------------------|
| t | v | v | v | v |
| t | v | v | v | null |
| t | null | v | v | v |
| t | v | v | null | v |
| t | v | null | v | v |
|------------------------------------------|
but I got this instead:
| time | trace1 | trace2 | trace3 | trace4 |
| -----------------------------------------|
| t | v | v | null | null |
| t | v | v | null | null |
| t | v | v | null | null |
| t | v | null | null | null |
| t | v | null | null | null |
|------------------------------------------|
Even worse, if I remove
order by unixdatetime
, everything will be smashed into only 1 column as below:
| time | trace1 | trace2 | trace3 | trace4 |
| -----------------------------------------|
| t | v | null | null | null |
| t | v | null | null | null |
| t | v | null | null | null |
| t | v | null | null | null |
| t | v | null | null | null |
|------------------------------------------|
Here's the code:
select *
from crosstab(
$$
select
unixdatetime,
gaugesummaryid,
value::double precision
from
(values
(1546300800,187923,1.5),
(1546387200,187923,1.5),
(1546473600,187923,1.5),
(1546560000,187923,1.75),
(1546646400,187923,1.75),
(1546732800,187923,1.75),
(1546819200,187923,1.75),
(1546905600,187923,1.5),
(1546992000,187923,1.5),
(1547078400,187923,1.5),
(1547164800,187923,1.5),
(1547337600,187924,200),
(1547424000,187924,200),
(1547510400,187924,200),
(1547596800,187924,200),
(1547683200,187924,200),
(1547769600,187924,200),
(1547856000,187924,200),
(1547942400,187924,200),
(1548028800,187924,200),
(1548115200,187924,200),
(1548201600,187924,200),
(1548288000,187924,200),
(1546300800,187926,120),
(1546387200,187926,120),
(1546473600,187926,120),
(1546560000,187926,110),
(1546646400,187926,110),
(1546732800,187926,110),
(1546819200,187926,110),
(1546905600,187926,115),
(1546992000,187926,115),
(1547078400,187926,115),
(1547942400,187927,100),
(1548028800,187927,100),
(1548115200,187927,100),
(1548201600,187927,100),
(1548288000,187927,100)
) as t (unixdatetime, gaugesummaryid, value)
order by unixdatetime
$$
) as final_result (
unixdatetime int,
trace1 double precision,
trace2 double precision,
trace3 double precision,
trace4 double precision
);
Here's the link in case you'd like to play around:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=2c4f6098fb89b78898ba1bf6afa7f439
How to get the desired result?
I would recommend you to use filter (where ...) clause instead of a pivot table.
select
unixdatetime,
min(value) filter (where gaugesummaryid = 187923) as trace_1,
min(value) filter (where gaugesummaryid = 187924) as trace_2,
min(value) filter (where gaugesummaryid = 187926) as trace_3,
min(value) filter (where gaugesummaryid = 187927) as trace_4
from table
group by 1;
Note, that you have to use an aggregate function to be able to use the clause. In your case, it does not matter if you use min, max, avg or sum.
Use the 2-argument form of the crosstab function:
SELECT *
FROM crosstab(
$$
SELECT
unixdatetime,
gaugesummaryid,
value::double precision
FROM test
ORDER BY unixdatetime
$$
, 'SELECT DISTINCT gaugesummaryid FROM test ORDER BY 1 LIMIT 4'
) as final_result (
unixdatetime int,
trace1 double precision,
trace2 double precision,
trace3 double precision,
trace4 double precision
)
yields
| unixdatetime | trace1 | trace2 | trace3 | trace4 |
|--------------+--------+--------+--------+--------|
| 1546300800 | 1.5 | | 120 | |
| 1546387200 | 1.5 | | 120 | |
| 1546473600 | 1.5 | | 120 | |
| 1546560000 | 1.75 | | 110 | |
| 1546646400 | 1.75 | | 110 | |
| 1546732800 | 1.75 | | 110 | |
| 1546819200 | 1.75 | | 110 | |
| 1546905600 | 1.5 | | 115 | |
| 1546992000 | 1.5 | | 115 | |
| 1547078400 | 1.5 | | 115 | |
| 1547164800 | 1.5 | | | |
| 1547337600 | | 200 | | |
| 1547424000 | | 200 | | |
| 1547510400 | | 200 | | |
| 1547596800 | | 200 | | |
| 1547683200 | | 200 | | |
| 1547769600 | | 200 | | |
| 1547856000 | | 200 | | |
| 1547942400 | | 200 | | 100 |
| 1548028800 | | 200 | | 100 |
| 1548115200 | | 200 | | 100 |
| 1548201600 | | 200 | | 100 |
| 1548288000 | | 200 | | 100 |
Using this setup:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
unixdatetime bigint,
gaugesummaryid int,
value double precision
);
INSERT INTO test VALUES
(1546300800,187923,1.5),
(1546387200,187923,1.5),
(1546473600,187923,1.5),
(1546560000,187923,1.75),
(1546646400,187923,1.75),
(1546732800,187923,1.75),
(1546819200,187923,1.75),
(1546905600,187923,1.5),
(1546992000,187923,1.5),
(1547078400,187923,1.5),
(1547164800,187923,1.5),
(1547337600,187924,200),
(1547424000,187924,200),
(1547510400,187924,200),
(1547596800,187924,200),
(1547683200,187924,200),
(1547769600,187924,200),
(1547856000,187924,200),
(1547942400,187924,200),
(1548028800,187924,200),
(1548115200,187924,200),
(1548201600,187924,200),
(1548288000,187924,200),
(1546300800,187926,120),
(1546387200,187926,120),
(1546473600,187926,120),
(1546560000,187926,110),
(1546646400,187926,110),
(1546732800,187926,110),
(1546819200,187926,110),
(1546905600,187926,115),
(1546992000,187926,115),
(1547078400,187926,115),
(1547942400,187927,100),
(1548028800,187927,100),
(1548115200,187927,100),
(1548201600,187927,100),
(1548288000,187927,100);
While some of the target values may be missing , you need the 2-argument form of crosstab() (like unutbu provided).
But it makes no sense to use a query producing unstable results as 2nd parameter. Use a VALUES expression (or similar) to provide a stable set of target columns in sync with the resulting column definition list. Like:
SELECT *
FROM crosstab(
$$
SELECT *
FROM (
VALUES
(bigint '1546300800', 187923, float8 '1.5')
, (1546387200,187923,1.5)
, (1546473600,187923,1.5)
-- , ...
, (1548288000,187927,100)
) t (unixdatetime, gaugesummaryid, value)
ORDER BY 1,2
$$
, 'VALUES (187923), (187924), (187926), (187927)' -- !!
) final_result (unixdatetime int
, trace1 float8
, trace2 float8
, trace3 float8
, trace4 float8);
db<>fiddle here
Detailed explanation:
PostgreSQL Crosstab Query
It would be nice to get results for a dynamic number of target columns from a single query. Alas, SQL does not work like that. There are various workarounds. See:
Execute a dynamic crosstab query

Valid periods - SQL VIEW

I have 2 tables (actually there are 4, but for now lets say it's 2) with data like this:
Table PersonA
ClientID ID From Till
1 10 1.1.2017 30.4.2017
1 12 1.8.2017 2.1.2018
Table PersonB
ClientID ID From Till
1 6 1.3.2017 30.6.2017
And I need to generate view that would show something like this:
ClientID From Till PersonA PersonB
1 1.1.2017 28.2.2017 10 NULL
1 1.3.2017 30.4.2017 10 6
1 1.5.2017 30.6.2017 NULL 6
1 1.8.2017 02.1.2018 12 NULL
So basically I need to create view that would show what "persons" each client had in given period.
So when there is an overlap, client have both PersonA and PersonB (same should apply for PersonC and PersonD).
So in the final view one client can't have any overlapping dates.
I don't know how to approach this.
In an adaptation of this algorithm, we can already handle the overlaps:
declare #PersonA table(ClientID int, ID int, [From] date, Till date);
insert into #PersonA values (1,10,'20170101','20170430'),(1,12,'20170801','20180112');
declare #PersonB table(ClientID int, ID int, [From] date, Till date);
insert into #PersonB values (1,6,'20170301','20170630');
declare #PersonC table(ClientID int, ID int, [From] date, Till date);
insert into #PersonC values (1,12,'20170401','20170625');
declare #PersonD table(ClientID int, ID int, [From] date, Till date);
insert into #PersonD values (1,14,'20170501','20170525'),(1,14,'20170510','20171122');
with X(ClientID,EdgeDate)
as (select ClientID
,case
when toggle = 1
then Till
else [From]
end as EdgeDate
from
(
select ClientID,[From],Till from #PersonA
union all
select ClientID,[From],Till from #PersonB
union all
select ClientID,[From],Till from #PersonC
union all
select ClientID,[From],Till from #PersonD
) as concated
cross join
(
select-1 as toggle
union all
select 1 as toggle
) as toggler
),merged
as (select distinct
S.ClientID
,S.EdgeDate as [From]
,min(E.EdgeDate) as Till
from
X as S
inner join X as E
on S.ClientID = E.ClientID
and S.EdgeDate < E.EdgeDate
group by S.ClientID
,S.EdgeDate
),prds
as (select distinct
merged.ClientID
,merged.[From]
,merged.Till
,A.ID as PersonA
,B.ID as PersonB
,C.ID as PersonC
,D.ID as PersonD
from
merged
left join #PersonA as A
on merged.ClientID = A.ClientID
and A.[From] <= merged.[From]
and merged.Till <= A.Till
left join #PersonB as B
on merged.ClientID = B.ClientID
and B.[From] <= merged.[From]
and merged.Till <= B.Till
left join #PersonC as C
on merged.ClientID = C.ClientID
and C.[From] <= merged.[From]
and merged.Till <= C.Till
left join #PersonD as D
on merged.ClientID = D.ClientID
and D.[From] <= merged.[From]
and merged.Till <= D.Till
where not(A.ID is null
and B.ID is null
and C.ID is null
and D.ID is null
)
)
select ClientID
,[From]
,case
when Till = lead([From]
) over(order by Till)
then dateadd(d,-1,Till)
else Till
end as Till
,PersonA
,PersonB
,PersonC
,PersonD
from
prds
order by ClientID
,[From]
,Till;
Output with just the two Person tables given in the question:
+----------+------------+------------+---------+---------+
| ClientID | From | Till | PersonA | PersonB |
+----------+------------+------------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL |
| 1 | 2017-03-01 | 2017-04-29 | 10 | 6 |
| 1 | 2017-04-30 | 2017-06-30 | NULL | 6 |
| 1 | 2017-08-01 | 2018-01-12 | 12 | NULL |
+----------+------------+------------+---------+---------+
Output of script as it is above, with four Person tables:
+----------+------------+------------+---------+---------+---------+---------+
| ClientID | From | Till | PersonA | PersonB | PersonC | PersonD |
+----------+------------+------------+---------+---------+---------+---------+
| 1 | 2017-01-01 | 2017-02-28 | 10 | NULL | NULL | NULL |
| 1 | 2017-03-01 | 2017-03-31 | 10 | 6 | NULL | NULL |
| 1 | 2017-04-01 | 2017-04-29 | 10 | 6 | 12 | NULL |
| 1 | 2017-04-30 | 2017-04-30 | NULL | 6 | 12 | NULL |
| 1 | 2017-05-01 | 2017-05-09 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-10 | 2017-05-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-05-25 | 2017-06-24 | NULL | 6 | 12 | 14 |
| 1 | 2017-06-25 | 2017-06-29 | NULL | 6 | NULL | 14 |
| 1 | 2017-06-30 | 2017-07-31 | NULL | NULL | NULL | 14 |
| 1 | 2017-08-01 | 2017-11-21 | 12 | NULL | NULL | 14 |
| 1 | 2017-11-22 | 2018-01-12 | 12 | NULL | NULL | NULL |
+----------+------------+------------+---------+---------+---------+---------+

Select all columns from two tables

Lets say I have the following:
table_a
| id | date | order_id | sku | price |
--------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 |
table_b
| id | date | order_id | description | type | notes | valid |
-------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | test | AA | | true |
I want to get get all columns from both tables, so the resulting table looks like this:
| id | date | order_id | sku | price | description | type | notes | valid |
---------------------------------------------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 | | | | |
---------------------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | | | test | AA | | true |
I tried union:
(
SELECT *
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT *
from table_b
where table_b.date > Date('today')
)
But I get a:
ERROR: each UNION query must have the same number of columns
How can this be fixed / is there another way to do this?
Easily :)
(
SELECT id, date, order_id, sku, price, NULL AS description, NULL AS type, NULL AS notes, NULL AS valid
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT id, date, order_id, NULL AS sku, NULL AS price, description, type, notes, valid
from table_b
where table_b.date > Date('today')
)
Alternatively, instead of UNION you can just JOIN them:
SELECT *
FROM table_a A
JOIN table_b B USING ( id )
WHERE A.date > TIMESTAMP 'TODAY'
AND B.date > TIMESTAMP 'TODAY';
See more options: https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-JOIN

Recursive SQL PostgreSQL Empty Result Set

The categories table:
=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+-------
public | categories | table | pgsql
public | products | table | pgsql
public | ticketlines | table | pgsql
(3 rows)
Contents of categories:
=# select * from categories;
id | name | parentid
----+--------+----------
1 | Rack |
2 | Women | 1
3 | Shorts | 2
4 | Wares |
5 | Toys | 4
6 | Trucks | 5
(6 rows)
Running the following query:
WITH RECURSIVE nodes_cte(name, id, parentid, depth, path) AS (
-- Base case?
SELECT c.name,
c.id,
c.parentid,
1::INT AS depth,
c.id::TEXT AS path
FROM categories c
WHERE c.parentid = ''
UNION ALL
-- nth case
SELECT c.name,
c.id,
c.parentid,
n.depth + 1 AS depth,
(n.path || '->' || c.id::TEXT)
FROM nodes_cte n
JOIN categories c on n.id = c.parentid
)
SELECT * FROM nodes_cte AS n GROUP BY n.name, n.id, n.parentid, n.depth, n.path ORDER BY n.id ASC
;
yields these results:
name | id | parentid | depth | path
--------+----+----------+-------+---------
Rack | 1 | | 1 | 1
Women | 2 | 1 | 2 | 1->2
Shorts | 3 | 2 | 3 | 1->2->3
Wares | 4 | | 1 | 4
Toys | 5 | 4 | 2 | 4->5
Trucks | 6 | 5 | 3 | 4->5->6
(6 rows)
Great!
But given a similar table (categories):
=# \d categories
Table "public.categories"
Column | Type | Modifiers
----------+-------------------+-----------
id | character varying | not null
name | character varying | not null
parentid | character varying |
image | bytea |
Indexes:
"categories_pkey" PRIMARY KEY, btree (id)
"categories_name_inx" UNIQUE, btree (name)
Referenced by:
TABLE "products" CONSTRAINT "products_fk_1" FOREIGN KEY (category) REFERENCES categories(id)
=# select * from categories;
id | name | parentid | image
--------------------------------------+-------+--------------------------------------+-------
611572c9-326d-4cf9-ae4a-af5269fc788e | Rack | |
22d15300-40b5-4f43-a8d1-902b8d4c5409 | Women | 611572c9-326d-4cf9-ae4a-af5269fc788e |
6b061073-96f4-49a1-9205-bab7c878f0cf | Wares | |
3f018dfb-e6ee-40d1-9dbc-31e6201e7625 | Toys | 6b061073-96f4-49a1-9205-bab7c878f0cf |
(4 rows)
the same query produces zero rows.
Why?
Is it something to do with primary / foreign keys?
WHERE COALESCE(parent_id, '') = ''
Worked. Thank you.