Crosstab using PostgreSQL - postgresql

I have table table1 which consist of following details.
Example:
create table table1
(
slno varchar(10),
joiningdate date,
joiningtime time
);
Inserting some rows:
insert into table1 values('a1','09-08-2011','10:00:00');
insert into table1 values('a1','09-08-2011','10:00:00');
insert into table1 values('a2','19-08-2011','11:00:00');
insert into table1 values('a2','20-08-2011','12:00:00');
Now I need to display it into following format:
slno joiningdate 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
--------------------------------------------------------------------------------------------------------------------------------------
a1 09-08-2011 2
a2 19-08-2011 1
a2 20-08-2011 1
For which I have tried the following script:
select *
from crosstab('
select slno,joiningdate , to_char(joiningtime, ''HH24'') as tc, count(tc)::int
from table1
group by 1,2,3,4
order by 1,2,3,4'
,$$VALUES('01'),('02'),('03'),('04'),('05'),('06'),('07'),('08'),('09'),('10'),
('11'),('12'),('13'),('14'),('15'),('16'),('17'),('18'),('19'),('20'),
('21'),('22'),('23')$$)
as ct(slno varchar,joiningdate date,"01" int,"02" int,"03" int,"04" int,"05" int,"06" int,"07" int,"08" int,"09" int,"10" int,
"11" int,"12" int,"13" int,"14" int,"15" int,"16" int,"17" int,"18" int,"19" int,"20" int,
"21" int,"22" int, "23" int);
But got stuck how to count tc(joiningtime hours) and add it to appropriate column.

First, produce a series of rows with the hourly counts.
select
slno, joiningdate,
hour,
sum(case when extract(hour from joiningtime) = hour then 1 end)
from table1
cross join generate_series(0,23) h(hour)
group by slno, joiningdate, hour;
then, because crosstab can't deal with multiple column row keys, consolodate the row key using a composite type:
CREATE TYPE ctrowid as ( slno text, joiningdate date );
select
ROW(slno, joiningdate) :: ctrowid,
hour,
sum(case when extract(hour from joiningtime) = hour then 1 end)
from table1
cross join generate_series(0,23) h(hour)
group by slno, joiningdate, hour
order by 1,2;
so the query produces tuples of (rowid, category, value) as required by crosstab. Then wrap it in a crosstab, e.g.
SELECT
*
FROM
crosstab('
select
ROW(slno, joiningdate)::ctrowid,
hour::text,
sum(case when extract(hour from joiningtime) = hour then 1 end)::integer
from table1
cross join generate_series(0,23) h(hour)
group by slno, joiningdate, hour
order by 1, 2
')
ct(rowid ctrowid, h0 integer, h1 integer, h2 integer, h3 integer, h4 integer, h5 integer, h6 integer, h7 integer, h8 integer, h9 integer, h10 integer, h11 integer, h12 integer, h13 integer, h14 integer, h15 integer, h16 integer, h17 integer, h18 integer, h19 integer, h20 integer, h21 integer, h22 integer, h23 integer);
producing:
rowid | h0 | h1 | h2 | h3 | h4 | h5 | h6 | h7 | h8 | h9 | h10 | h11 | h12 | h13 | h14 | h15 | h16 | h17 | h18 | h19 | h20 | h21 | h22 | h23
-----------------+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
(a1,2011-08-09) | | | | | | | | | | | 2 | | | | | | | | | | | | |
(a2,2011-08-19) | | | | | | | | | | | | 1 | | | | | | | | | | | |
(a2,2011-08-20) | | | | | | | | | | | | | 1 | | | | | | | | | | |
(3 rows)
You can then unpack the rowid into separate fields in an outer query if you want.
Yes, the need to specify all the columns is ugly, and makes crosstab much less useful than it should be.

I'm sure there is a more efficient way of doing this, but this query should give you what you are looking for. I only provided a portion of your table, but I'm sure you can write out the rest.
SELECT DISTINCT ON(slno,joiningdate) slno,joiningdate,
CASE WHEN joiningtime = '01:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "01",
CASE WHEN joiningtime = '02:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "02",
CASE WHEN joiningtime = '03:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "03",
CASE WHEN joiningtime = '10:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "10",
CASE WHEN joiningtime = '11:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "11",
CASE WHEN joiningtime = '12:00:00' THEN count(*) OVER (PARTITION BY joiningtime) ELSE NULL END AS "12"
FROM table1

Related

Lederboards in PostgreSQL and get 2 next and previous rows

We use Postgresql 14.1
I have a sample data that contains over 50 million records.
base table:
+------+----------+--------+--------+--------+
| id | item_id | battles| wins | damage |
+------+----------+--------+--------+--------+
| 1 | 255 | 35 | 52.08 | 1245.2 |
| 2 | 255 | 35 | 52.08 | 1245.2 |
| 3 | 255 | 35 | 52.08 | 1245.3 |
| 4 | 255 | 35 | 52.08 | 1245.3 |
| 5 | 255 | 35 | 52.09 | 1245.4 |
| 6 | 255 | 35 | 52.08 | 1245.3 |
| 7 | 255 | 35 | 52.08 | 1245.3 |
| 8 | 255 | 35 | 52.08 | 1245.7 |
| 1 | 460 | 18 | 47.35 | 1010.1 |
| 2 | 460 | 27 | 49.18 | 1518.9 |
| 3 | 460 | 16 | 50.78 | 1171.2 |
+------+----------+--------+--------+--------+
We need to get the target row number and 2 next and 2 previous rows as quickly as possible.
Indexed columns:
id
item_id
Sorting:
damage (DESC)
wins (DESC)
battles (ASC)
id (ASC)
At the example, we need to find the row number and +- 2 rows where id = 4 and item_id = 255. The result table should be:
+------+----------+--------+--------+--------+------+
| id | item_id | battles| wins | damage | rank |
+------+----------+--------+--------+--------+------+
| 5 | 255 | 35 | 52.09 | 1245.4 | 2 |
| 3 | 255 | 35 | 52.08 | 1245.3 | 3 |
| 4 | 255 | 35 | 52.08 | 1245.3 | 4 |
| 6 | 255 | 35 | 52.08 | 1245.3 | 5 |
| 7 | 255 | 35 | 52.08 | 1245.3 | 6 |
+------+----------+--------+--------+--------+------+
How can I do this with Row number windows function?
Is there is any way optimize in query to make it faster because other columns have no indexes?
CREATE OR REPLACE FUNCTION find_top(in_id integer, in_item_id integer) RETURNS TABLE (
r_id int,
r_item_id int,
r_battles int,
r_wins real,
r_damage real,
r_rank bigint,
r_eff real,
r_frags int
) AS $$
DECLARE
center_place bigint;
BEGIN
SELECT place INTO center_place FROM
(SELECT
id, item_id,
ROW_NUMBER() OVER (ORDER BY damage DESC, wins DESC, battles, id) AS place
FROM
public.my_table
WHERE
item_id = in_item_id
AND battles >= 20
) AS s
WHERE s.id = in_id;
RETURN QUERY SELECT
s.place, pt.id, pt.item_id, pt.battles, pt.wins, pt.damage
FROM
(
SELECT * FROM
(SELECT
ROW_NUMBER () OVER (ORDER BY damage DESC, wins DESC, battles, id) AS place,
id, item_id
FROM
public.my_table
WHERE
item_id = in_item_id
AND battles >= 20) x
WHERE x.place BETWEEN (center_place - 2) AND (center_place + 2)
) s
JOIN
public.my_table pt
ON pt.id = s.id AND pt.item_id = s.item_id;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION find_top(in_id integer, in_item_id integer) RETURNS TABLE (
r_id int,
r_item_id int,
r_battles int,
r_wins real,
r_damage real,
r_rank bigint,
r_eff real,
r_frags int
) AS $$
BEGIN
RETURN QUERY
SELECT c.*, B.ord -3 AS row_number
FROM
( SELECT array_agg(id) OVER w AS id
, array_agg(item_id) OVER w AS item_id
FROM public.my_table
WINDOW w AS (ORDER BY damage DESC, wins DESC, battles, id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
) AS a
CROSS JOIN LATERAL unnest(a.id, a.item_id) WITH ORDINALITY AS b(id, item_id, ord)
INNER JOIN public.my_table AS c
ON c.id = b.id
AND c.item_id = b.item_id
WHERE a.item_id[3] = in_item_id
AND a.id[3] = in_id
ORDER BY b.ord ;
END ; $$ LANGUAGE plpgsql;
test result in dbfiddle

How to calculate a computed column's value based on its previous value in PostgreSQL

I'm trying to calculate an adjusted cost base based on a given table's data but can't figure out how to use the previous computed value in the current row.
CREATE TABLE transactions (
datetime timestamp NOT NULL,
type varchar(25) NOT NULL,
amount INT NOT NULL,
shares INT NOT NULL,
symbol VARCHAR(20) NOT NULL
);
With data:
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (100, 'Buy', 10, now() - INTERVAL '14 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (330, 'Buy', 30, now() - INTERVAL '11 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (222, 'Buy', 22, now() - INTERVAL '10 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (245, 'Buy', 24, now() - INTERVAL '8 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (150, 'Sell', 15, now() - INTERVAL '7 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (210, 'Buy', 20, now() - INTERVAL '6 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (235, 'Buy', 22, now() - INTERVAL '5 days', 'XYZ');
INSERT INTO transactions(amount, type, shares, datetime, symbol) VALUES (110, 'Sell', 10, now() - INTERVAL '4 days', 'XYZ');
This is as far as I got:
WITH cte AS (
WITH shares AS (
SELECT transactions.*,
sum(CASE WHEN transactions.type = 'Sell'
THEN transactions.shares * -1 --reduction of shares
ELSE transactions.shares END)
OVER (
PARTITION BY transactions.symbol
ORDER BY transactions.symbol, transactions.datetime ROWS UNBOUNDED PRECEDING ) AS total_shares
FROM transactions)
SELECT shares.*, coalesce(lag(shares.total_shares) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) as previous_shares FROM shares)
SELECT cte.*,
CASE WHEN cte.type = 'Buy' THEN
-- [Previous total_acb] + cte.amount
ELSE
-- [Previous total_acb] x ((cte.previous_shares – shares) / cte.previous_shares)
END
AS total_acb
FROM cte
Expected result (total_acb is the value I'm trying to compute):
datetime | type | amount | shares | symbol | total_shares | previous_shares | total_acb
----------------------------+------+--------+--------+--------+--------------+-----------------+-----------
2018-01-10 14:09:38.882593 | Buy | 100 | 10 | XYZ | 10 | 0 | 100.00
2018-01-13 14:09:38.887738 | Buy | 330 | 30 | XYZ | 40 | 10 | 430.00
2018-01-14 14:09:38.890691 | Buy | 222 | 22 | XYZ | 62 | 40 | 552.00
2018-01-16 14:09:38.893328 | Buy | 245 | 24 | XYZ | 86 | 62 | 797.00
2018-01-17 14:09:38.905877 | Sell | 150 | 15 | XYZ | 71 | 86 | 657.98
2018-01-18 14:09:38.910944 | Buy | 210 | 20 | XYZ | 91 | 71 | 867.98
2018-01-19 14:09:38.915023 | Buy | 235 | 22 | XYZ | 113 | 91 | 1102.98
2018-01-20 14:09:38.917985 | Sell | 110 | 10 | XYZ | 103 | 113 | 1005.37
The easiest way to do this kind of recursive computation is a plpgsql function.
create or replace function calculate_totals()
returns table (
datetime timestamp,
type text,
amount dec,
shares dec,
symbol text,
total_shares dec,
total_acb dec)
language plpgsql as $$
declare
rec record;
curr_symbol text = '';
begin
for rec in
select *
from transactions
order by symbol, datetime
loop
if rec.symbol <> curr_symbol then
curr_symbol = rec.symbol;
total_acb = 0;
total_shares = 0;
end if;
if rec.type = 'Buy' then
total_acb = round(total_acb + rec.amount, 2);
total_shares = total_shares + rec.shares;
else
total_acb = round(total_acb * (total_shares - rec.shares) / total_shares, 2);
total_shares = total_shares - rec.shares;
end if;
select rec.datetime, rec.type, rec.amount, rec.shares, rec.symbol
into datetime, type, amount, shares, symbol;
return next;
end loop;
end $$;
The result is slightly different from the one given in the question (due to the author's mistake):
select *
from calculate_totals();
datetime | type | amount | shares | symbol | total_shares | total_acb
---------------------------+------+--------+--------+--------+--------------+-----------
2018-01-10 23:28:56.66738 | Buy | 100 | 10 | XYZ | 10 | 100.00
2018-01-13 23:28:56.66738 | Buy | 330 | 30 | XYZ | 40 | 430.00
2018-01-14 23:28:56.66738 | Buy | 222 | 22 | XYZ | 62 | 652.00
2018-01-16 23:28:56.66738 | Buy | 245 | 24 | XYZ | 86 | 897.00
2018-01-17 23:28:56.66738 | Sell | 150 | 15 | XYZ | 71 | 740.55
2018-01-18 23:28:56.66738 | Buy | 210 | 20 | XYZ | 91 | 950.55
2018-01-19 23:28:56.66738 | Buy | 235 | 22 | XYZ | 113 | 1185.55
2018-01-20 23:28:56.66738 | Sell | 110 | 10 | XYZ | 103 | 1080.63
(8 rows)

DB2 unpivot convert columns to rows

I'm asking this question with reference to the study material available at How to convert columns to rows and rows to columns. I have similar query explained in section UNPIVOTING. Here is my set up.
Table definition
CREATE TABLE MYTABLE (
ID INTEGER,
CODE_1 VARCHAR,
CODE_2 VARCHAR,
CODE_3 VARCHAR,
CODE_1_DT DATE,
CODE_2_DT DATE,
CODE_3_DT DATE,
WHO COLUMNS
);
Table Data
ID | CODE_1 | CODE_2 | CODE_3 | CODE_1_DT | CODE_2_DT | CODE_3_DT | UPDATED_BY
1 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER1
2 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER2
3 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER3
My SQL to convert columns to row
SELECT Q.CODE, Q.CODE_DT FROM MYTABLE AS MT,
TABLE VALUES(
(MT.CODE_1, MT.CODE_1_DT),
(MT.CODE_2, MT.CODE_2_DT),
(MT.CODE_3, MT.CODE_3_DT),
) AS Q(CODE, CODE_DT)
WHERE MT.ID=1;
Expected output is
CODE | CODE_DT
CD1 | 20100101
CD2 | 20160101
CD3 | 20170101
I'm not able to get the expected result and getting error related to cardinality or cardinality multiplier. I don't know what's going wrong or sq. is correct...any pointers?
Try this
select id1, code, date
from mytable t,
lateral (values (t.id, t.code_1, t.code_1_dt),
(t.id, t.code_2, t.code_2_dt),
(t.id, t.code_3, t.code_3_dt)
) as q (id1, code, date)

Dynamic column names in a postgres crosstab query

I am trying to pivot the data in a query in postgres. The query I am currently using is as follows
SELECT
product_number,
month,
sum(quantity)
FROM forecasts
WHERE date_trunc('month', extract_date) = date_trunc('month', current_date)
GROUP BY product_number, month
ORDER BY product_number, month;
The output of the query is something like what is shown below where each product will have 13 months of data.
+--------+------------+----------+
| Number | Month | Quantity |
+--------+------------+----------+
| 1 | 2016-10-01 | 7592 |
| 1 | 2016-11-01 | 6796 |
| 1 | 2016-12-01 | 6512 |
| 1 | 2017-01-01 | 6160 |
| 1 | 2017-02-01 | 6475 |
| 1 | 2017-03-01 | 6016 |
| 1 | 2017-04-01 | 6616 |
| 1 | 2017-05-01 | 6536 |
| 1 | 2017-06-01 | 6256 |
| 1 | 2017-07-01 | 6300 |
| 1 | 2017-08-01 | 5980 |
| 1 | 2017-09-01 | 5872 |
| 1 | 2017-10-01 | 5824 |
+--------+------------+----------+
I am trying to pivot the data so that it looks something like
+--------+-----------+-----------+-----------+----------+-----+
| Number | 2016-10-1 | 2016-11-1 | 2016-12-1 | 2017-1-1 | ... |
+--------+-----------+-----------+-----------+----------+-----+
| 1 | 100 | 100 | 200 | 250 | ... |
| ... | | | | | |
+--------+-----------+-----------+-----------+----------+-----+
Where all the data for each product is shown in a row for the 13 months.
I tried using a basic crosstab query
SELECT *
FROM
crosstab('SELECT product_number, month::TEXT, sum(quantity)
FROM forecasts
WHERE date_trunc(''month'', extract_date) = date_trunc(''month'', ''2016-10-1''::DATE)
GROUP BY product_number, month
ORDER BY product_number, month')
As mthreport(product_number text, m0 DATE, m1 DATE, m2 DATE,
m3 DATE, m4 DATE, m5 DATE, m6 DATE,
m7 DATE, m8 DATE, m9 DATE, m10 DATE,
m11 DATE, m12 DATE, m13 DATE)
But I get the following error
ERROR: invalid return type Detail: SQL rowid datatype does not match return rowid datatype.
If the column name were set in the crosstab i.e. if I could define and put the names into the crosstab output this works, but since the dates keep changing I am not sure how to define them
I think I missing something very basic here. Any help would be really appreciated.
Hoping, i have understood your problem correctly.
Column m1, m2 .. m13 are not of date type. These columns will contain sum of quantity. So, data type will be same as sum(quantity).
I think below query will solve your problem
SELECT *
FROM
crosstab($$SELECT product_number, month, sum(quantity)::bigint
FROM forecasts
GROUP BY product_number, month
ORDER BY product_number, month$$)
As mthreport(product_number int, m0 bigint, m1 bigint, m2 bigint,
m3 bigint, m4 bigint, m5 bigint, m6 bigint,
m7 bigint, m8 bigint, m9 bigint, m10 bigint,
m11 bigint, m12 bigint , m13 bigint)

PostgreSQL - How can I replace NULL values with values from another column based on a common unique identifier in a PSQL VIEW

I have three foreign identifiers in my PSQL view. How could I replace the NULL second_id values with the third_id values based on their common first_id?
Currently:
first_id | second_id | third_id
----------+-----------+----------
1 | | 11
1 | | 11
1 | | 11
1 | 22 | 22
2 | 33 | 33
3 | 44 | 44
4 | 55 | 55
5 | 66 | 66
6 | | 77
6 | | 77
6 | | 77
6 | | 77
6 | 88 | 88
Should be:
first_id | second_id | third_id
----------+-----------+----------
1 | 22 | 11
1 | 22 | 11
1 | 22 | 11
1 | 22 | 22
2 | 33 | 33
3 | 44 | 44
4 | 55 | 55
5 | 66 | 66
6 | 88 | 77
6 | 88 | 77
6 | 88 | 77
6 | 88 | 77
6 | 88 | 88
How can I make this change?
The NULL values in the second_id column should be filled i.e. there shouldn't be blank cells.
If the second_id column shares a value with the third_id column, this value should fill the blank cells in the second_id column.
They should both be based on their common first_id.
Thanks so much. I really appreciate it.
The second_id is really a CASE WHEN modification of the third_id. This modification is made in the view.
VIEW:
View "public.my_view"
Column | Type | Modifiers | Storage | Description
-----------------------------+-----------------------------+-----------+----------+-------------
row_number | bigint | | plain |
first_id | integer | | plain |
second_id | integer | | plain |
third_id | integer | | plain |
first_type | character varying(255) | | extended |
date_1 | timestamp without time zone | | plain |
date_2 | timestamp without time zone | | plain |
date_3 | timestamp without time zone | | plain |
View definition:
SELECT row_number() OVER (PARTITION BY t.first_id) AS row_number,
t.first_id,
CASE
WHEN t.localization_key::text = 'rq.bkd'::text THEN t.third_id
ELSE NULL::integer
END AS second_id,
t.third_id,
t.first_type,
CASE
WHEN t.localization_key::text = 'rq.bkd'::text THEN t.created_at
ELSE NULL::timestamp without time zone
END AS date_1,
CASE
WHEN t.localization_key::text = 'st.appt'::text THEN t.created_at
ELSE NULL::timestamp without time zone
END AS date_2,
CASE
WHEN t.localization_key::text = 'st.eta'::text THEN t.created_at
ELSE NULL::timestamp without time zone
END AS date_3
FROM my_table t
WHERE (t.localization_key::text = 'rq.bkd'::text OR t.localization_key::text = 'st.appt'::text OR t.localization_key::text = 'st.eta'::text) AND t.first_type::text = 'thing'::text
ORDER BY t.created_at DESC;
Here is a link to the table definition that the view is using (my_table).
https://gist.github.com/dankreiger/376f6545a0acff19536d
Thanks again for your help.
You can get it by:
select a.first_id, coalesce(a.second_id,b.second_id), a.third_id
from my_table a
left outer join
(
select first_id, second_id from my_table
where second_id is not null
) b
using (first_id)
So the update should be:
update my_table a set second_id = b.second_id
from
(
select first_id, second_id from my_table
where second_id is not null
) b
where b.first_id = a.first_id and a.second_id is null
You can not UPDATE the underlying table my_table because it does not have the second_id column so you should make the view display the data the way you want it. That is fairly straightforward with a CTE:
CREATE VIEW my_view AS
WITH second (first, id) AS (
SELECT first_id, third_id
FROM my_table
WHERE t.localization_key = 'rq.bkd')
SELECT
row_number() OVER (PARTITION BY t.first_id) AS row_number,
t.first_id,
s.id AS second_id,
t.third_id,
t.first_type,
CASE
WHEN t.localization_key = 'rq.bkd' THEN t.created_at
END AS date_1,
CASE
WHEN t.localization_key = 'st.appt' THEN t.created_at
END AS date_2,
CASE
WHEN t.localization_key = 'st.eta' THEN t.created_at
END AS date_3
FROM my_table t
JOIN second s ON s.first = t.first_id
WHERE (t.localization_key = 'rq.bkd'
OR t.localization_key = 'st.appt'
OR t.localization_key = 'st.eta')
AND t.first_type = 'thing'
ORDER BY t.created_at DESC;
This assumes that where my_table.localization_key = 'rq.bkd' you do have exactly 1 third_id value; if not you should add the appropriate qualifiers such as ORDER BY first_id ASC NULLS LAST LIMIT 1 or some other suitable filter. Also note that the CTE is JOINed, not LEFT JOINed, assuming there is always a valid pair (first_id, third_id) without NULLs.