Postgresql - Split a string by hyphen and group by the second part of the string - postgresql

I have the data stored in the below format :
resource_name | readiops | writeiops
90832-00:29:3E 3.21 4.00
90833-00:30:3E 2.12 3.45
90834-00:31:3E 2.33 2.78
90832-00:29:3E 4.21 6.00
I want to be able to do a split on resource_name column by "-" and group it by the second part of the split so that the above data looks like below :
array_serial | ldev | readiops | writeiops
90832 00:29:3E 3.21,4.21 4.00,6.00
90833 00:30:3E 2.12 3.45
90834 00:31:3E 2.33 2.78
The resource_name is split into array_serial & ldev .
i have tried using the below query just to get an error .
SELECT
SUBSTRING(resource_name, 0, STRPOS(resource_name, ':')) AS array_serial,
SUBSTRING(resource_name,1, STRPOS(resource_name, ':')) AS ldev
FROM table
GROUP BY SUBSTRING(resource_name, 0, STRPOS(resource_name, ':'))
I am new to postgres . So kindly help .

Use split_part():
with my_table(resource_name, readiops, writeiops) as (
values
('90832-00:29:3E', 3.21, 4.00),
('90833-00:30:3E', 2.12, 3.45),
('90834-00:31:3E', 2.33, 2.78),
('90832-00:29:3E', 4.21, 6.00)
)
select
split_part(resource_name::text, '-', 1) as array_serial,
split_part(resource_name::text, '-', 2) as ldev,
string_agg(readiops::text, ',') as readiops,
string_agg(writeiops::text, ',') as writeiops
from my_table
group by 1, 2;
array_serial | ldev | readiops | writeiops
--------------+----------+-----------+-----------
90832 | 00:29:3E | 3.21,4.21 | 4.00,6.00
90833 | 00:30:3E | 2.12 | 3.45
90834 | 00:31:3E | 2.33 | 2.78
(3 rows)

Related

PostgreSQL calculate with calculated value from previous rows

The problem I need to solve:
In order to calculate the number of hours per day that are used for (public) holidays or days of illness, the average working hours are used from the previous 3 months (with a starting value of 8 hours per day).
The tricky part is that the calculated value of the previous month will need to be factored in, meaning if there was a public holiday last month, which had been assigned a calculated value of 8.5 hours, these calculated hours will influence the average working hours per day for that last month, which then is being used to assigned working hours to current months' holidays.
So far I only have come up with the following, which doesn't factor in the row-by-row calculation, yet:
WITH
const (h_target, h_extra) AS (VALUES (8.0, 20)),
monthly_sums (c_month, d_work, d_off, h_work) AS (VALUES
('2018-12', 16, 5, 150.25),
('2019-01', 20, 3, 171.25),
('2019-02', 15, 5, 120.5)
),
calc AS (
SELECT
ms.*,
(ms.d_work + ms.d_off) AS d_total,
(ms.h_work + ms.d_off * const.h_target) AS h_total,
(avg((ms.h_work + ms.d_off * const.h_target) / (ms.d_work + ms.d_off))
OVER (ORDER BY ms.c_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW))::numeric(10,2)
AS h_off
FROM monthly_sums AS ms
CROSS JOIN const
)
SELECT
calc.c_month,
calc.d_work,
calc.d_off,
calc.d_total,
calc.h_work,
calc.h_off,
(d_off * lag(h_off, 1, const.h_target) OVER (ORDER BY c_month)) AS h_off_sum,
(h_work + d_off * lag(h_off, 1, const.h_target) OVER (ORDER BY c_month)) AS h_sum
FROM calc CROSS JOIN const;
...giving the following result:
c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.0 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.77 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.52 | 43.85 | 164.35
(3 rows)
This calculates correctly for the first row and for the second row for columns that rely on previous row values (lag) but the average hours per day calculation is obviously wrong as I couldn't figure out how to feed the current row value (h_sum) back into the calculation for the new h_off.
The desired result should be as follows:
c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.0 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.84 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.64 | 44.2 | 164.7
(3 rows)
...meaning h_off is used for next months' h_off_sum and resulting h_sum and h_sum's of available months (at most three) in turn result into the calculation of current months' h_off (essentially avg(h_sum / d_total) over up to three months).
So, actual calculation is:
c_month | calculation | h_off
---------+----------------------------------------------------+-------
| | 8.00 << initial
.---------------------- uses ---------------------^
2018-12 | ((190.25 / 21)) / 1 | 9.06
.------------ uses ---------------^
2019-01 | ((190.25 / 21) + (198.43 / 23)) / 2 | 8.84
.--- uses --------^
2019-02 | ((190.25 / 21) + (198.43 / 23) + (164.7 / 20)) / 3 | 8.64
P.S.: I am using PostgreSQL 11, so I have the latest features at hands if that makes any difference.
I wasn't able to solve that inter-column + inter-row calculation problem with the use of window functions at all and not without falling back to a special use of a recursive CTE as well as introducing special-purpose columns for the days (d_total_1) and hours (h_sum_1) of the 3rd historical month (as you cannot join in the recursive temporary table more than once).
In addition, I added a 4th row to the input data and used an additional index column which I can refer to when joining, which is usually made up with a sub-query like this:
SELECT ROW_NUMBER() OVER (ORDER BY c_month) AS row_num, * FROM monthly_sums
So, here's my take at it:
WITH RECURSIVE calc AS (
SELECT
monthly_sums.row_num,
monthly_sums.c_month,
monthly_sums.d_work,
monthly_sums.d_off,
monthly_sums.h_work,
(monthly_sums.d_off * 8)::numeric(10,2) AS h_off_sum,
monthly_sums.d_work + monthly_sums.d_off AS d_total,
0.0 AS d_total_1,
(monthly_sums.h_work + monthly_sums.d_off * 8)::numeric(10,2) AS h_sum,
0.0 AS h_sum_1,
(
(monthly_sums.h_work + monthly_sums.d_off * 8)
/
(monthly_sums.d_work + monthly_sums.d_off)
)::numeric(10,2) AS h_off
FROM
(
SELECT * FROM (VALUES
(1, '2018-12', 16, 5, 150.25),
(2, '2019-01', 20, 3, 171.25),
(3, '2019-02', 15, 5, 120.5),
(4, '2019-03', 19, 2, 131.75)
) AS tmp (row_num, c_month, d_work, d_off, h_work)
) AS monthly_sums
WHERE
monthly_sums.row_num = 1
UNION ALL
SELECT
monthly_sums.row_num,
monthly_sums.c_month,
monthly_sums.d_work,
monthly_sums.d_off,
monthly_sums.h_work,
lat_off.h_off_sum::numeric(10,2),
lat_days.d_total,
calc.d_total AS d_total_1,
lat_sum.h_sum::numeric(10,2),
calc.h_sum AS h_sum_1,
lat_calc.h_off::numeric(10,2)
FROM
(
SELECT * FROM (VALUES
(1, '2018-12', 16, 5, 150.25),
(2, '2019-01', 20, 3, 171.25),
(3, '2019-02', 15, 5, 120.5),
(4, '2019-03', 19, 2, 131.75)
) AS tmp (row_num, c_month, d_work, d_off, h_work)
) AS monthly_sums
INNER JOIN calc ON (calc.row_num = monthly_sums.row_num - 1),
LATERAL (SELECT monthly_sums.d_work + monthly_sums.d_off AS d_total) AS lat_days,
LATERAL (SELECT monthly_sums.d_off * calc.h_off AS h_off_sum) AS lat_off,
LATERAL (SELECT monthly_sums.h_work + lat_off.h_off_sum AS h_sum) AS lat_sum,
LATERAL (SELECT
(calc.h_sum_1 + calc.h_sum + lat_sum.h_sum)
/
(calc.d_total_1 + calc.d_total + lat_days.d_total)
AS h_off
) AS lat_calc
WHERE
monthly_sums.row_num > 1
)
SELECT c_month, d_work, d_off, d_total, h_work, h_off, h_off_sum, h_sum FROM calc
;
...which gives:
c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.00 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.83 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.65 | 44.15 | 164.65
2019-03 | 19 | 2 | 21 | 131.75 | 8.00 | 17.30 | 149.05
(4 rows)
(PostgreSQL's default type conversion behavior is to round numeric values and so the result is slightly different than initially expected but actually correct)
Please note that PostgreSQL is generally pretty picky about data types and refuses to process queries like this whenever there is a discrepancy that could potentially lead to loss of precision (e.g. numeric vs. integer), which is why I have used explicit types for the columns in both places.
One of the final pieces of the puzzle was solved by using LATERAL subqueries, which enables me to have one calculation reference the result of a previous one and even shift around columns in the final output independent of the calculation hierarchy.
If anyone can come up with a simpler variant I'd be happy to learn about it.

CTE RECURSIVE optimization, how to?

I need to optimize the performance of a commom WITH RECURSIVE query... We can limit the depth of the tree and decompose in many updates, and can also change representation (use array)... I try some options but perhaps there are a "classic optimization solution" that I'm not realizing.
All details
There are a t_up table, to be updated, with a composit primary key (pk1,pk2), one attribute attr and an array of references to primary keys... And a unnested representation t_scan, with the references; like this:
pk1 | pk2 | attr | ref_pk1 | ref_pk2
n | 123 | 1 | |
n | 456 | 2 | |
r | 123 | 1 | w | 123
w | 123 | 5 | n | 456
r | 456 | 2 | n | 123
r | 123 | 1 | n | 111
n | 111 | 4 | |
... | ...| ... | ... | ...
There are no loops.
UPDATE t_up SET x = pairs
FROM (
WITH RECURSIVE tree as (
SELECT pk1, pk2, attr, ref_pk1, ref_pk2,
array[array[0,0]]::bigint[] as all_refs
FROM t_scan
UNION ALL
SELECT c.pk1, c.pk2, c.attr, c.ref_pk1, c.ref_pk2
,p.all_refs || array[c.attr,c.pk2]
FROM t_scan c JOIN tree p
ON c.ref_pk1=p.pk1 AND c.ref_pk2=p.pk2 AND c.pk2!=p.pk2
AND array_length(p.all_refs,1)<5 -- 5 or 6 avoiding endless loops
)
SELECT pk1, pk2, array_agg_cat(all_refs) as pairs
FROM (
SELECT distinct pk1, pk2, all_refs
FROM tree
WHERE array_length(all_refs,1)>1 -- ignores initial array[0,0].
) t
GROUP BY 1,2
ORDER BY 1,2
) rec
WHERE rec.pk1=t_up.pk1 AND rec.pk2=t_up.pk2
;
To test:
CREATE TABLE t_scan(
pk1 char,pk2 bigint, attr bigint,
ref_pk1 char, ref_pk2 bigint
);
INSERT INTO t_scan VALUES
('n',123, 1 ,NULL,NULL),
('n',456, 2 ,NULL,NULL),
('r',123, 1 ,'w' ,123),
('w',123, 5 ,'n' ,456),
('r',456, 2 ,'n' ,123),
('r',123, 1 ,'n' ,111),
('n',111, 4 ,NULL,NULL);
Running only rec you will obtain:
pk1 | pk2 | pairs
-----+-----+-----------------
r | 123 | {{0,0},{1,123}}
r | 456 | {{0,0},{2,456}}
w | 123 | {{0,0},{5,123}}
But, unfortunately, to appreciate the "Big Data performance problem", you need to see it in a real database... I am preparing a public Github that run with OpenStreetMap Big Data.

How to count rows using a variable date range provided by a table in PostgreSQL

I suspect I require some sort of windowing function to do this. I have the following item data as an example:
count | date
------+-----------
3 | 2017-09-15
9 | 2017-09-18
2 | 2017-09-19
6 | 2017-09-20
3 | 2017-09-21
So there are gaps in my data first off, and I have another query here:
select until_date, until_date - (lag(until_date) over ()) as delta_days from ranges
Which I have generated the following data:
until_date | delta_days
-----------+-----------
2017-09-08 |
2017-09-11 | 3
2017-09-13 | 2
2017-09-18 | 5
2017-09-21 | 3
2017-09-22 | 1
So I'd like my final query to produce this result:
start_date | ending_date | total_items
-----------+-------------+--------------
2017-09-08 | 2017-09-10 | 0
2017-09-11 | 2017-09-12 | 0
2017-09-13 | 2017-09-17 | 3
2017-09-18 | 2017-09-20 | 15
2017-09-21 | 2017-09-22 | 3
Which tells me the total count of items from the first table, per day, based on the custom ranges from the second table.
In this particular example, I would be summing up total_items BETWEEN start AND end (since there would be overlap on the dates, I'd subtract 1 from the end date to not count duplicates)
Anyone know how to do this?
Thanks!
Use the daterange type. Note that you do not have to calculate delta_days, just convert ranges to dataranges and use the operator <# - element is contained by.
with counts(count, date) as (
values
(3, '2017-09-15'::date),
(9, '2017-09-18'),
(2, '2017-09-19'),
(6, '2017-09-20'),
(3, '2017-09-21')
),
ranges (until_date) as (
values
('2017-09-08'::date),
('2017-09-11'),
('2017-09-13'),
('2017-09-18'),
('2017-09-21'),
('2017-09-22')
)
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
order by 1;
daterange | total_items
-------------------------+-------------
[2017-09-08,2017-09-11) | 0
[2017-09-11,2017-09-13) | 0
[2017-09-13,2017-09-18) | 3
[2017-09-18,2017-09-21) | 17
[2017-09-21,2017-09-22) | 3
(5 rows)
Note, that in the dateranges above lower bounds are inclusive while upper bound are exclusive.
If you want to calculate items per day in the dateranges:
select
daterange, total_items,
round(total_items::dec/(upper(daterange)- lower(daterange)), 2) as items_per_day
from (
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
) s
order by 1
daterange | total_items | items_per_day
-------------------------+-------------+---------------
[2017-09-08,2017-09-11) | 0 | 0.00
[2017-09-11,2017-09-13) | 0 | 0.00
[2017-09-13,2017-09-18) | 3 | 0.60
[2017-09-18,2017-09-21) | 17 | 5.67
[2017-09-21,2017-09-22) | 3 | 3.00
(5 rows)

How to traverse a hierarchical tree-structure structure backwards using recursive queries

I'm using PostgreSQL 9.1 to query hierarchical tree-structured data, consisting of edges (or elements) with connections to nodes. The data are actually for stream networks, but I've abstracted the problem to simple data types. Consider the example tree table. Each edge has length and area attributes, which are used to determine some useful metrics from the network.
CREATE TEMP TABLE tree (
edge text PRIMARY KEY,
from_node integer UNIQUE NOT NULL, -- can also act as PK
to_node integer REFERENCES tree (from_node),
mode character varying(5), -- redundant, but illustrative
length numeric NOT NULL,
area numeric NOT NULL,
fwd_path text[], -- optional ordered sequence, useful for debugging
fwd_search_depth integer,
fwd_length numeric,
rev_path text[], -- optional unordered set, useful for debugging
rev_search_depth integer,
rev_length numeric,
rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
('A', 1, 4, 'start', 1.1, 0.9),
('B', 2, 4, 'start', 1.2, 1.3),
('C', 3, 5, 'start', 1.8, 2.4),
('D', 4, 5, NULL, 1.2, 1.3),
('E', 5, NULL, 'end', 1.1, 0.9);
Which can be illustrated below, where the edges represented by A-E are connected with nodes 1-5. The NULL to_node (Ø) represents the end node. The from_node is always unique, so it can act as PK. If this network flows like a drainage basin, it would be from top to bottom, where the starting tributary edges are A, B, C and the ending outflow edge is E.
The documentation for WITH provide a nice example of how to use search graphs in recursive queries. So, to get the "forwards" information, the query starts at the end, and works backwards:
WITH RECURSIVE search_graph AS (
-- Begin at ending nodes
SELECT E.from_node, 1 AS search_depth, E.length
, ARRAY[E.edge] AS path -- optional
FROM tree E WHERE E.to_node IS NULL
UNION ALL
-- Accumulate each edge, working backwards (upstream)
SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
, o.edge|| sg.path -- optional
FROM tree o, search_graph sg
WHERE o.to_node = sg.from_node
)
UPDATE tree SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;
edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1
The above makes sense, and scales well for large networks. For example, I can see edge B is 3 edges from the end, and the forward path is {B,D,E} with a total length of 3.5 from the tip to the end.
However, I cannot figure out a good way to build a reverse query. That is, from each edge, what are the accumulated "upstream" edges, lengths and areas. Using WITH RECURSIVE, the best I have is:
WITH RECURSIVE search_graph AS (
-- Begin at starting nodes
SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
, ARRAY[S.edge] AS path -- optional
FROM tree S WHERE from_node IN (
-- Starting nodes have a from_node without any to_node
SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)
UNION ALL
-- Accumulate edges, working forwards
SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
, c.edge || sg.path -- optional
FROM tree c, search_graph sg
WHERE c.from_node = sg.to_node
)
UPDATE tree SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3
I would like to build aggregates into the second term of the recursive query, since each downstream edge connects to 1 or many upstream edges, but aggregates are not allowed with recursive queries. Also, I'm aware that the join is sloppy, since the with recursive result has multiple join conditions for edge.
The expected result for the reverse / backwards query is:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8
How can I write this update query?
Note that I'm ultimately more concerned about accumulating accurate length and area totals, and that the path attributes are for debugging. In my real-world case, forwards paths are up to a couple hundred, and I expect reverse paths in the tens of thousands for large and complex catchments.
UPDATE 2:
I rewrote the original recursive query so that all accumulation/aggregation is done outside the recursive part. It should perform better than the previous version of this answer.
This is very much alike the answer from #a_horse_with_no_name for a similar question.
WITH
RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS
(
SELECT edge, from_node, to_node, length, area, from_node AS "start_node"
FROM tree
UNION ALL
SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node
FROM tree o
JOIN search_graph p ON p.from_node = o.to_node
)
SELECT array_agg(edge) AS "edges"
-- ,array_agg(from_node) AS "nodes"
,count(edge) AS "edge_count"
,sum(length) AS "length_sum"
,sum(area) AS "area_sum"
FROM search_graph
GROUP BY start_node
ORDER BY start_node
;
Results are as expected:
start_node | edges | edge_count | length_sum | area_sum
------------+-------------+------------+------------+------------
1 | {A} | 1 | 1.1 | 0.9
2 | {B} | 1 | 1.2 | 1.3
3 | {C} | 1 | 1.8 | 2.4
4 | {D,B,A} | 3 | 3.5 | 3.5
5 | {E,D,C,B,A} | 5 | 6.4 | 6.8

Equivalent to unpivot() in PostgreSQL

Is there a unpivot equivalent function in PostgreSQL?
Create an example table:
CREATE TEMP TABLE foo (id int, a text, b text, c text);
INSERT INTO foo VALUES (1, 'ant', 'cat', 'chimp'), (2, 'grape', 'mint', 'basil');
You can 'unpivot' or 'uncrosstab' using UNION ALL:
SELECT id,
'a' AS colname,
a AS thing
FROM foo
UNION ALL
SELECT id,
'b' AS colname,
b AS thing
FROM foo
UNION ALL
SELECT id,
'c' AS colname,
c AS thing
FROM foo
ORDER BY id;
This runs 3 different subqueries on foo, one for each column we want to unpivot, and returns, in one table, every record from each of the subqueries.
But that will scan the table N times, where N is the number of columns you want to unpivot. This is inefficient, and a big problem when, for example, you're working with a very large table that takes a long time to scan.
Instead, use:
SELECT id,
unnest(array['a', 'b', 'c']) AS colname,
unnest(array[a, b, c]) AS thing
FROM foo
ORDER BY id;
This is easier to write, and it will only scan the table once.
array[a, b, c] returns an array object, with the values of a, b, and c as it's elements.
unnest(array[a, b, c]) breaks the results into one row for each of the array's elements.
You could use VALUES() and JOIN LATERAL to unpivot the columns.
Sample data:
CREATE TABLE test(id int, a INT, b INT, c INT);
INSERT INTO test(id,a,b,c) VALUES (1,11,12,13),(2,21,22,23),(3,31,32,33);
Query:
SELECT t.id, s.col_name, s.col_value
FROM test t
JOIN LATERAL(VALUES('a',t.a),('b',t.b),('c',t.c)) s(col_name, col_value) ON TRUE;
DBFiddle Demo
Using this approach it is possible to unpivot multiple groups of columns at once.
EDIT
Using Zack's suggestion:
SELECT t.id, col_name, col_value
FROM test t
CROSS JOIN LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
<=>
SELECT t.id, col_name, col_value
FROM test t
,LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
db<>fiddle demo
Great article by Thomas Kellerer found here
Unpivot with Postgres
Sometimes it’s necessary to normalize de-normalized tables - the opposite of a “crosstab” or “pivot” operation. Postgres does not support an UNPIVOT operator like Oracle or SQL Server, but simulating it, is very simple.
Take the following table that stores aggregated values per quarter:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
And the following sample data:
customer_id | q1 | q2 | q3 | q4
------------+-----+-----+-----+----
1 | 100 | 210 | 203 | 304
2 | 150 | 118 | 422 | 257
3 | 220 | 311 | 271 | 269
But we want the quarters to be rows (as they should be in a normalized data model).
In Oracle or SQL Server this could be achieved with the UNPIVOT operator, but that is not available in Postgres. However Postgres’ ability to use the VALUES clause like a table makes this actually quite easy:
select c.customer_id, t.*
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4')
) as t(turnover, quarter)
order by customer_id, quarter;
will return the following result:
customer_id | turnover | quarter
------------+----------+--------
1 | 100 | Q1
1 | 210 | Q2
1 | 203 | Q3
1 | 304 | Q4
2 | 150 | Q1
2 | 118 | Q2
2 | 422 | Q3
2 | 257 | Q4
3 | 220 | Q1
3 | 311 | Q2
3 | 271 | Q3
3 | 269 | Q4
The equivalent query with the standard UNPIVOT operator would be:
select customer_id, turnover, quarter
from customer_turnover c
UNPIVOT (turnover for quarter in (q1 as 'Q1',
q2 as 'Q2',
q3 as 'Q3',
q4 as 'Q4'))
order by customer_id, quarter;
FYI for those of us looking for how to unpivot in RedShift.
The long form solution given by Stew appears to be the only way to accomplish this.
For those who cannot see it there, here is the text pasted below:
We do not have built-in functions that will do pivot or unpivot. However,
you can always write SQL to do that.
create table sales (regionid integer, q1 integer, q2 integer, q3 integer, q4 integer);
insert into sales values (1,10,12,14,16), (2,20,22,24,26);
select * from sales order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
pivot query
create table sales_pivoted (regionid, quarter, sales)
as
select regionid, 'Q1', q1 from sales
UNION ALL
select regionid, 'Q2', q2 from sales
UNION ALL
select regionid, 'Q3', q3 from sales
UNION ALL
select regionid, 'Q4', q4 from sales
;
select * from sales_pivoted order by regionid, quarter;
regionid | quarter | sales
----------+---------+-------
1 | Q1 | 10
1 | Q2 | 12
1 | Q3 | 14
1 | Q4 | 16
2 | Q1 | 20
2 | Q2 | 22
2 | Q3 | 24
2 | Q4 | 26
(8 rows)
unpivot query
select regionid, sum(Q1) as Q1, sum(Q2) as Q2, sum(Q3) as Q3, sum(Q4) as Q4
from
(select regionid,
case quarter when 'Q1' then sales else 0 end as Q1,
case quarter when 'Q2' then sales else 0 end as Q2,
case quarter when 'Q3' then sales else 0 end as Q3,
case quarter when 'Q4' then sales else 0 end as Q4
from sales_pivoted)
group by regionid
order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
Hope this helps, Neil
Pulling slightly modified content from the link in the comment from #a_horse_with_no_name into an answer because it works:
Installing Hstore
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib and run in your database.
Assuming that your source (e.g., wide data) table has one 'id' column, named id_field, and any number of 'value' columns, all of the same type, the following will create an unpivoted view of that table.
CREATE VIEW vw_unpivot AS
SELECT id_field, (h).key AS column_name, (h).value AS column_value
FROM (
SELECT id_field, each(hstore(foo) - 'id_field'::text) AS h
FROM zcta5 as foo
) AS unpiv ;
This works with any number of 'value' columns. All of the resulting values will be text, unless you cast, e.g., (h).value::numeric.
Just use JSON:
with data (id, name) as (
values (1, 'a'), (2, 'b')
)
select t.*
from data, lateral jsonb_each_text(to_jsonb(data)) with ordinality as t
order by data.id, t.ordinality;
This yields
|key |value|ordinality|
|----|-----|----------|
|id |1 |1 |
|name|a |2 |
|id |2 |1 |
|name|b |2 |
dbfiddle
I wrote a horrible unpivot function for PostgreSQL. It's rather slow but it at least returns results like you'd expect an unpivot operation to.
https://cgsrv1.arrc.csiro.au/blog/2010/05/14/unpivotuncrosstab-in-postgresql/
Hopefully you can find it useful..
Depending on what you want to do... something like this can be helpful.
with wide_table as (
select 1 a, 2 b, 3 c
union all
select 4 a, 5 b, 6 c
)
select unnest(array[a,b,c]) from wide_table
You can use FROM UNNEST() array handling to UnPivot a dataset, tandem with a correlated subquery (works w/ PG 9.4).
FROM UNNEST() is more powerful & flexible than the typical method of using FROM (VALUES .... ) to unpivot datasets. This is b/c FROM UNNEST() is variadic (with n-ary arity). By using a correlated subquery the need for the lateral ORDINAL clause is eliminated, & Postgres keeps the resulting parallel columnar sets in the proper ordinal sequence.
This is, BTW, FAST -- in practical use spawning 8 million rows in < 15 seconds on a 24-core system.
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm -- field labels, now expanded to two rows
, ax.anm -- field data, now expanded to two rows
, ax.someval -- manually incl. data
, ax.rankednum -- manually assigned ranks
,ax.genser -- auto-generate ranks
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm -- expanded into two rows by outer UNNEST()
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id -- outer relation
)
,( /** ordinal relationship preserved in variadic UNNEST() **/
SELECT ARRAY[ 'first name', 'last name' ]::text[] -- exp. into 2 rows
AS fanm
)
,( SELECT ARRAY[ 'z','x','y'] -- only 3 rows gen'd, but ordinal rela. kept
AS someval
)
,( SELECT ARRAY[ 1,2,3,4,5 ] -- 5 rows gen'd, ordinal rela. kept.
AS rankednum
)
,( SELECT ARRAY( /** you may go wild ... **/
SELECT generate_series(1, 15, 3 )
AS genser
)
)
) ax ( anm, fanm, someval, rankednum , genser )
;
RESULT SET:
+--------+----------------+-----------+----------+---------+-------
| id | fanm | anm | someval |rankednum| [ etc. ]
+--------+----------------+-----------+----------+---------+-------
| 2 | first name | john | z | 1 | .
| 2 | last name | doe | y | 2 | .
| 2 | [null] | [null] | x | 3 | .
| 2 | [null] | [null] | [null] | 4 | .
| 2 | [null] | [null] | [null] | 5 | .
| 1 | first name | jane | z | 1 | .
| 1 | last name | doe | y | 2 | .
| 1 | | | x | 3 | .
| 1 | | | | 4 | .
| 1 | | | | 5 | .
| 4 | first name | jodi | z | 1 | .
| 4 | last name | roe | y | 2 | .
| 4 | | | x | 3 | .
| 4 | | | | 4 | .
| 4 | | | | 5 | .
| 3 | first name | jerry | z | 1 | .
| 3 | last name | roe | y | 2 | .
| 3 | | | x | 3 | .
| 3 | | | | 4 | .
| 3 | | | | 5 | .
+--------+----------------+-----------+----------+---------+ ----
Here's a way that combines the hstore and CROSS JOIN approaches from other answers.
It's a modified version of my answer to a similar question, which is itself based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to that question.
-- Example wide data with a column for each year...
WITH example_wide_data("id", "2001", "2002", "2003", "2004") AS (
VALUES
(1, 4, 5, 6, 7),
(2, 8, 9, 10, 11)
)
-- that is tided to have "year" and "value" columns
SELECT
id,
r.key AS year,
r.value AS value
FROM
example_wide_data w
CROSS JOIN
each(hstore(w.*)) AS r(key, value)
WHERE
-- This chooses columns that look like years
-- In other cases you might need a different condition
r.key ~ '^[0-9]{4}$';
It has a few benefits over other solutions:
By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name, or even the value.
It doesn't require PL/pgSQL - it's all SQL