DB2 unpivot convert columns to rows - db2

I'm asking this question with reference to the study material available at How to convert columns to rows and rows to columns. I have similar query explained in section UNPIVOTING. Here is my set up.
Table definition
CREATE TABLE MYTABLE (
ID INTEGER,
CODE_1 VARCHAR,
CODE_2 VARCHAR,
CODE_3 VARCHAR,
CODE_1_DT DATE,
CODE_2_DT DATE,
CODE_3_DT DATE,
WHO COLUMNS
);
Table Data
ID | CODE_1 | CODE_2 | CODE_3 | CODE_1_DT | CODE_2_DT | CODE_3_DT | UPDATED_BY
1 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER1
2 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER2
3 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER3
My SQL to convert columns to row
SELECT Q.CODE, Q.CODE_DT FROM MYTABLE AS MT,
TABLE VALUES(
(MT.CODE_1, MT.CODE_1_DT),
(MT.CODE_2, MT.CODE_2_DT),
(MT.CODE_3, MT.CODE_3_DT),
) AS Q(CODE, CODE_DT)
WHERE MT.ID=1;
Expected output is
CODE | CODE_DT
CD1 | 20100101
CD2 | 20160101
CD3 | 20170101
I'm not able to get the expected result and getting error related to cardinality or cardinality multiplier. I don't know what's going wrong or sq. is correct...any pointers?

Try this
select id1, code, date
from mytable t,
lateral (values (t.id, t.code_1, t.code_1_dt),
(t.id, t.code_2, t.code_2_dt),
(t.id, t.code_3, t.code_3_dt)
) as q (id1, code, date)

Related

How can I write a function with two tables inputs and one table output in PostgreSQL?

I want to create a function that can create a table, in which part of the columns is derived from the other two tables.
input table1:
This is a static table for each loan. Each loan has only one row with information related to that loan. For example, original unpaid balance, original interest rate...
| id | loan_age | ori_upb | ori_rate | ltv |
| --- | -------- | ------- | -------- | --- |
| 1 | 360 | 1500 | 4.5 | 0.6 |
| 2 | 360 | 2000 | 3.8 | 0.5 |
input table2:
This is a dynamic table for each loan. Each loan has seraval rows show the loan performance in each month. For example, current unpaid balance, current interest rate, delinquancy status...
| id | month| cur_upb | cur_rate |status|
| ---| --- | ------- | -------- | --- |
| 1 | 01 | 1400 | 4.5 | 0 |
| 1 | 02 | 1300 | 4.5 | 0 |
| 1 | 03 | 1200 | 4.5 | 1 |
| 2 | 01 | 2000 | 3.8 | 0 |
| 2 | 02 | 1900 | 3.8 | 0 |
| 2 | 03 | 1900 | 3.8 | 1 |
| 2 | 04 | 1900 | 3.8 | 2 |
output table:
The output table contains information from table1 and table2. Payoffupb is the last record of cur_upb in table2. This table is built for model development.
| id | loan_age | ori_upb | ori_rate | ltv | payoffmonth| payoffupb | payoffrate |lastStatus | modification |
| ---| -------- | ------- | -------- | --- | ---------- | --------- | ---------- |---------- | ------------ |
| 1 | 360 | 1500 | 4.5 | 0.6 | 03 | 1200 | 4.5 | 1 | null |
| 2 | 360 | 2000 | 3.8 | 0.5 | 04 | 1900 | 3.8 | 2 | null |
Most columns in the output table can directly get or transferred from columns in the two input tables, but some columns can not get then leave blank.
My main question is how to write a function to take two tables as inputs and output another table?
I already wrote the feature transformation part for data files in 2018, but I need to do the same thing again for data files in some other years. That's why I want to create a function to make things easier.
As you want to insert the latest entry of table2 against each entry of table1 try this
insert into table3 (id, loan_age, ori_upb, ori_rate, ltv,
payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id)
t1.id, t1.loan_age, t1.ori_upb, t1.ori_rate, t1.ltv, t2.month, t2.cur_upb,
t2.cur_rate, t2.status
from
table1 t1
inner join
table2 t2 on t1.id=t2.id
order by t1.id , t2.month desc
DEMO1
EDIT for your updated question:
Function to do the above considering table1, table2, table3 structure will be always identical.
create or replace function insert_values(table1 varchar, table2 varchar, table3 varchar)
returns int as $$
declare
count_ int;
begin
execute format('insert into %I (id, loan_age, ori_upb, ori_rate, ltv, payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id) t1.id, t1.loan_age, t1.ori_upb,
t1.ori_rate,t1.ltv,t2.month,t2.cur_upb, t2.cur_rate, t2.status
from %I t1 inner join %I t2 on t1.id=t2.id order by t1.id , t2.month desc',table3,table1,table2);
GET DIAGNOSTICS count_ = ROW_COUNT;
return count_;
end;
$$
language plpgsql
and call above function like below which will return the number of inserted rows:
select * from insert_values('table1','table2','table3');
DEMO2

Join and combine tables to get common rows in a specific column together in Postgres

I have a couple of tables in Postgres database. I have joined and merges the tables. However, I would like to have common values in a specific column to appear together in the final table (In the end, I would like to perform groupby and maximum value calculation on the table).
The schema of the test tables looks like this:
Schema (PostgreSQL v11)
CREATE TABLE table1 (
id CHARACTER VARYING NOT NULL,
seq CHARACTER VARYING NOT NULL
);
INSERT INTO table1 (id, seq) VALUES
('UA502', 'abcdef'), ('UA503', 'ghijk'),('UA504', 'lmnop')
;
CREATE TABLE table2 (
id CHARACTER VARYING NOT NULL,
score FLOAT
);
INSERT INTO table2 (id, score) VALUES
('UA502', 2.2), ('UA503', 2.6),('UA504', 2.8)
;
CREATE TABLE table3 (
id CHARACTER VARYING NOT NULL,
seq CHARACTER VARYING NOT NULL
);
INSERT INTO table3 (id, seq) VALUES
('UA502', 'qrst'), ('UA503', 'uvwx'),('UA504', 'yzab')
;
CREATE TABLE table4 (
id CHARACTER VARYING NOT NULL,
score FLOAT
);
INSERT INTO table4 (id, score) VALUES
('UA502', 8.2), ('UA503', 8.6),('UA504', 8.8);
;
I performed join and union and oepration of the tables to get the desired columns.
Query #1
SELECT table1.id, table1.seq, table2.score
FROM table1 INNER JOIN table2 ON table1.id = table2.id
UNION
SELECT table3.id, table3.seq, table4.score
FROM table3 INNER JOIN table4 ON table3.id = table4.id
;
The output looks like this:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA503 | uvwx | 8.6 |
| UA504 | lmnop | 2.8 |
| UA503 | ghijk | 2.6 |
However, the desired output should be:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA504 | lmnop | 2.8 |
| UA503 | uvwx | 8.6 |
| UA503 | ghijk | 2.6 |
View on DB Fiddle
How should I modify my query to get the desired output?

How to fill Null with the previous value in PostgreSQL?

I have a table which contains Null values. I need to replace them with a previous non-Null value.
This is an example of data which I have:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | Null |
2018-01-03 | A | 0 | Null |
2018-01-04 | A | 0 | Null |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | Null |
2018-01-07 | B | 0 | Null |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | Null |
2018-01-10 | A | 0 | Null |
The result should look like this:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | 1 |
2018-01-03 | A | 0 | 1 |
2018-01-04 | A | 0 | 1 |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | 2 |
2018-01-07 | B | 0 | 2 |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | 3 |
2018-01-10 | A | 0 | 3 |
I tried the following query, but in this case, only the first Null value will be replaced.
select
date,
category,
start_period,
case
when period_number isnull then lag(period_number) over()
else period_number
end as period_number
from period_table;
Also, I tried to use first_value() window function, but I don't know how to set up the correct window.
Any help is highly appreciated.
You can join table with itself and get desired value. Assuming your date column is the primary key or unique.
update your_table upd set period_number = tbl.period_number
from
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t
inner join your_table tbl on tbl.date = t.d2
where t.date= upd.date
If you don't need to update the table but only a select statement then
select yt.date, yt.category, yt.start_period, tbl.period_number
from your_table yt
inner join
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t on yt.date = t.date
inner join your_table tbl on tbl.date = t.d2
If you replace your case statement with:
(
select
_.period_number
from
period_table as _
where
_.period_number is not null
and _.category = period_table.category
and _.date <= period_table.date
order by
_.date desc
limit 1
) as period_number
Then it should have the intended effect. It's nowhere near as elegant as a window function but I don't think window functions are quite flexible enough for your specific use case here (Or at least, if they are, I don't know how to flex them that much)
Examples of windows function and frame clause:
select
date,category,score
,FIRST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
) as last_score
from testing.rec_test
order by date, category
select
date,category,score
,LAST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as last_score
from testing.rec_test
order by date, category

Dynamic column names in a postgres crosstab query

I am trying to pivot the data in a query in postgres. The query I am currently using is as follows
SELECT
product_number,
month,
sum(quantity)
FROM forecasts
WHERE date_trunc('month', extract_date) = date_trunc('month', current_date)
GROUP BY product_number, month
ORDER BY product_number, month;
The output of the query is something like what is shown below where each product will have 13 months of data.
+--------+------------+----------+
| Number | Month | Quantity |
+--------+------------+----------+
| 1 | 2016-10-01 | 7592 |
| 1 | 2016-11-01 | 6796 |
| 1 | 2016-12-01 | 6512 |
| 1 | 2017-01-01 | 6160 |
| 1 | 2017-02-01 | 6475 |
| 1 | 2017-03-01 | 6016 |
| 1 | 2017-04-01 | 6616 |
| 1 | 2017-05-01 | 6536 |
| 1 | 2017-06-01 | 6256 |
| 1 | 2017-07-01 | 6300 |
| 1 | 2017-08-01 | 5980 |
| 1 | 2017-09-01 | 5872 |
| 1 | 2017-10-01 | 5824 |
+--------+------------+----------+
I am trying to pivot the data so that it looks something like
+--------+-----------+-----------+-----------+----------+-----+
| Number | 2016-10-1 | 2016-11-1 | 2016-12-1 | 2017-1-1 | ... |
+--------+-----------+-----------+-----------+----------+-----+
| 1 | 100 | 100 | 200 | 250 | ... |
| ... | | | | | |
+--------+-----------+-----------+-----------+----------+-----+
Where all the data for each product is shown in a row for the 13 months.
I tried using a basic crosstab query
SELECT *
FROM
crosstab('SELECT product_number, month::TEXT, sum(quantity)
FROM forecasts
WHERE date_trunc(''month'', extract_date) = date_trunc(''month'', ''2016-10-1''::DATE)
GROUP BY product_number, month
ORDER BY product_number, month')
As mthreport(product_number text, m0 DATE, m1 DATE, m2 DATE,
m3 DATE, m4 DATE, m5 DATE, m6 DATE,
m7 DATE, m8 DATE, m9 DATE, m10 DATE,
m11 DATE, m12 DATE, m13 DATE)
But I get the following error
ERROR: invalid return type Detail: SQL rowid datatype does not match return rowid datatype.
If the column name were set in the crosstab i.e. if I could define and put the names into the crosstab output this works, but since the dates keep changing I am not sure how to define them
I think I missing something very basic here. Any help would be really appreciated.
Hoping, i have understood your problem correctly.
Column m1, m2 .. m13 are not of date type. These columns will contain sum of quantity. So, data type will be same as sum(quantity).
I think below query will solve your problem
SELECT *
FROM
crosstab($$SELECT product_number, month, sum(quantity)::bigint
FROM forecasts
GROUP BY product_number, month
ORDER BY product_number, month$$)
As mthreport(product_number int, m0 bigint, m1 bigint, m2 bigint,
m3 bigint, m4 bigint, m5 bigint, m6 bigint,
m7 bigint, m8 bigint, m9 bigint, m10 bigint,
m11 bigint, m12 bigint , m13 bigint)

Select all columns from two tables

Lets say I have the following:
table_a
| id | date | order_id | sku | price |
--------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 |
table_b
| id | date | order_id | description | type | notes | valid |
-------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | test | AA | | true |
I want to get get all columns from both tables, so the resulting table looks like this:
| id | date | order_id | sku | price | description | type | notes | valid |
---------------------------------------------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 | | | | |
---------------------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | | | test | AA | | true |
I tried union:
(
SELECT *
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT *
from table_b
where table_b.date > Date('today')
)
But I get a:
ERROR: each UNION query must have the same number of columns
How can this be fixed / is there another way to do this?
Easily :)
(
SELECT id, date, order_id, sku, price, NULL AS description, NULL AS type, NULL AS notes, NULL AS valid
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT id, date, order_id, NULL AS sku, NULL AS price, description, type, notes, valid
from table_b
where table_b.date > Date('today')
)
Alternatively, instead of UNION you can just JOIN them:
SELECT *
FROM table_a A
JOIN table_b B USING ( id )
WHERE A.date > TIMESTAMP 'TODAY'
AND B.date > TIMESTAMP 'TODAY';
See more options: https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-JOIN