Dynamic column names in a postgres crosstab query - postgresql

I am trying to pivot the data in a query in postgres. The query I am currently using is as follows
SELECT
product_number,
month,
sum(quantity)
FROM forecasts
WHERE date_trunc('month', extract_date) = date_trunc('month', current_date)
GROUP BY product_number, month
ORDER BY product_number, month;
The output of the query is something like what is shown below where each product will have 13 months of data.
+--------+------------+----------+
| Number | Month | Quantity |
+--------+------------+----------+
| 1 | 2016-10-01 | 7592 |
| 1 | 2016-11-01 | 6796 |
| 1 | 2016-12-01 | 6512 |
| 1 | 2017-01-01 | 6160 |
| 1 | 2017-02-01 | 6475 |
| 1 | 2017-03-01 | 6016 |
| 1 | 2017-04-01 | 6616 |
| 1 | 2017-05-01 | 6536 |
| 1 | 2017-06-01 | 6256 |
| 1 | 2017-07-01 | 6300 |
| 1 | 2017-08-01 | 5980 |
| 1 | 2017-09-01 | 5872 |
| 1 | 2017-10-01 | 5824 |
+--------+------------+----------+
I am trying to pivot the data so that it looks something like
+--------+-----------+-----------+-----------+----------+-----+
| Number | 2016-10-1 | 2016-11-1 | 2016-12-1 | 2017-1-1 | ... |
+--------+-----------+-----------+-----------+----------+-----+
| 1 | 100 | 100 | 200 | 250 | ... |
| ... | | | | | |
+--------+-----------+-----------+-----------+----------+-----+
Where all the data for each product is shown in a row for the 13 months.
I tried using a basic crosstab query
SELECT *
FROM
crosstab('SELECT product_number, month::TEXT, sum(quantity)
FROM forecasts
WHERE date_trunc(''month'', extract_date) = date_trunc(''month'', ''2016-10-1''::DATE)
GROUP BY product_number, month
ORDER BY product_number, month')
As mthreport(product_number text, m0 DATE, m1 DATE, m2 DATE,
m3 DATE, m4 DATE, m5 DATE, m6 DATE,
m7 DATE, m8 DATE, m9 DATE, m10 DATE,
m11 DATE, m12 DATE, m13 DATE)
But I get the following error
ERROR: invalid return type Detail: SQL rowid datatype does not match return rowid datatype.
If the column name were set in the crosstab i.e. if I could define and put the names into the crosstab output this works, but since the dates keep changing I am not sure how to define them
I think I missing something very basic here. Any help would be really appreciated.

Hoping, i have understood your problem correctly.
Column m1, m2 .. m13 are not of date type. These columns will contain sum of quantity. So, data type will be same as sum(quantity).
I think below query will solve your problem
SELECT *
FROM
crosstab($$SELECT product_number, month, sum(quantity)::bigint
FROM forecasts
GROUP BY product_number, month
ORDER BY product_number, month$$)
As mthreport(product_number int, m0 bigint, m1 bigint, m2 bigint,
m3 bigint, m4 bigint, m5 bigint, m6 bigint,
m7 bigint, m8 bigint, m9 bigint, m10 bigint,
m11 bigint, m12 bigint , m13 bigint)

Related

SUM OVER PARTITION ON Date range

Im trying to do a cumulative sum over specific periods of time for every row in Postgres, example:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 4 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 3 | Aaron |
|---------------------|------------------|------------------|
| 22-05-1990 | 7 | Aaron |
|---------------------|------------------|------------------|
Expected result, taking a range of 60 days:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 38 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 41 | Aaron |
|---------------------|------------------|------------------|
| 01-05-1990 | 10 | Aaron |
|---------------------|------------------|------------------|
I tried with the following but the results are not correct:
WITH tab AS (SELECT * FROM table_with_values)
SELECT tab.Date, SUM(tab.Value)
FILTER (WHERE tab.Date<=tab.Date AND tab.Date >=t.Date - INTERVAL '60 DAY')
OVER(PARTITION BY tab.Employee ORDER BY tab.Date ROWS BETWEEN UNBOUND PRECEDENT AND CURRENT ROW)
AS values_cumulative, tab.Employee
FROM tab
Try this:
SELECT date, employee, sum(bvalue)
FROM (
SELECT a.*, b.date as bdate, b.value as bvalue
FROM testtable a
LEFT JOIN testtable b ON
a.employee = b.employee AND
b.date <= a.date AND
b.date >= a.date - integer '60') c
GROUP BY employee, date
ORDER BY date ASC;
date | employee | sum
------------+----------+-----
1990-01-25 | Aaron | 34
1990-02-15 | Aaron | 38
1990-03-02 | Aaron | 41
1990-05-01 | Aaron | 10
(4 Zeilen)

Computing rolling sums efficiently in PostgreSQL

Supposing I have a set of transactions (purchases) with dates for a set of customers, I want to calculate a rolling x day sum of purchase amount and number of purchases by customer in that same window. I've gotten it to work using a window function, but I have to fill in for dates where the customer did not make any purchases. In so doing, I'm using a Cartesian product. Is there a more efficient approach so that it's more scalable as the number of customers – and time window – increases?
Edit: As noted in the comments, I'm on PostgreSQL v9.3.
Here's sample data (note that some customers may have 0, 1, or multiple purchases on a given date):
| id | cust_id | txn_date | amount |
|----|---------|------------|--------|
| 1 | 123 | 2017-08-17 | 10 |
| 2 | 123 | 2017-08-17 | 5 |
| 3 | 123 | 2017-08-18 | 5 |
| 4 | 123 | 2017-08-20 | 50 |
| 5 | 123 | 2017-08-21 | 100 |
| 6 | 456 | 2017-08-01 | 5 |
| 7 | 456 | 2017-08-01 | 5 |
| 8 | 456 | 2017-08-01 | 5 |
| 9 | 456 | 2017-08-30 | 5 |
| 10 | 456 | 2017-08-01 | 1000 |
| 11 | 789 | 2017-08-15 | 1000 |
| 12 | 789 | 2017-08-30 | 1000 |
And here's the desired output:
| cust_id | txn_date | sum_dly_txns | tot_txns_7d | cnt_txns_7d |
|---------|------------|--------------|-------------|-------------|
| 123 | 2017-08-17 | 15 | 15 | 2 |
| 123 | 2017-08-18 | 5 | 20 | 3 |
| 123 | 2017-08-20 | 50 | 70 | 4 |
| 123 | 2017-08-21 | 100 | 170 | 5 |
| 456 | 2017-08-01 | 1015 | 1015 | 4 |
| 456 | 2017-08-30 | 5 | 5 | 1 |
| 789 | 2017-08-15 | 1000 | 1000 | 1 |
| 789 | 2017-08-30 | 1000 | 1000 | 1 |
Here's SQL that produces the totals as desired:
SELECT *
FROM (
-- One row per day per user
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
),
-- Every possible transaction date for every user
dummydates AS (
SELECT txn_date, uids.cust_id
FROM (
SELECT generate_series(
timestamp '2017-08-01'
,timestamp '2017-08-30'
,interval '1 day')::date
) d(txn_date)
CROSS JOIN (SELECT DISTINCT cust_id FROM daily_txns) uids
),
txns_dummied AS (
SELECT
d.cust_id
,d.txn_date
,COALESCE(sum_dly_txns,0) AS sum_dly_txns
,COALESCE(cnt_dly_txns,0) AS cnt_dly_txns
FROM dummydates d
LEFT JOIN daily_txns dx
ON d.txn_date = dx.txn_date
AND d.cust_id = dx.cust_id
ORDER BY d.txn_date, d.cust_id
)
SELECT
cust_id
,txn_date
,sum_dly_txns
,SUM(COALESCE(sum_dly_txns,0)) OVER w AS tot_txns_7d
,SUM(cnt_dly_txns) OVER w AS cnt_txns_7d
FROM txns_dummied
WINDOW w AS (
PARTITION BY cust_id
ORDER BY txn_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7d moving window
)
ORDER BY cust_id, txn_date
) xfers
WHERE sum_dly_txns > 0 -- Omit dates with no transactions
;
SQL Fiddle
Instead of ROWS BETWEEN 6 PRECEDING AND CURRENT ROW did you want to write RANGE '6 days' PRECEEDING ?
This must be what you are looking for:
SELECT DISTINCT
cust_id
,txn_date
,SUM(amount) OVER (PARTITION BY cust_id, txn_date) sum_dly_txns
,SUM(amount) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
,COUNT(*) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
from transactions
ORDER BY cust_id, txn_date
Edit: Since you are using an old version (I tested the one above on my postgresql 11), the point above will not make much sense so you will need to old-fashioned SQL (that is, witout window functions).
It is a bit less efficient but does a fair job.
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
)
SELECT t1.cust_id, t1.txn_date, t1.sum_dly_txns, SUM(t2.sum_dly_txns), SUM(t2.cnt_dly_txns)
from daily_txns t1
join daily_txns t2 ON t1.cust_id = t2.cust_id and t2.txn_date BETWEEN t1.txn_date - 7 and t1.txn_date
group by t1.cust_id, t1.txn_date, t1.sum_dly_txns
order by t1.cust_id, t1.txn_date

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?
Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

DB2 unpivot convert columns to rows

I'm asking this question with reference to the study material available at How to convert columns to rows and rows to columns. I have similar query explained in section UNPIVOTING. Here is my set up.
Table definition
CREATE TABLE MYTABLE (
ID INTEGER,
CODE_1 VARCHAR,
CODE_2 VARCHAR,
CODE_3 VARCHAR,
CODE_1_DT DATE,
CODE_2_DT DATE,
CODE_3_DT DATE,
WHO COLUMNS
);
Table Data
ID | CODE_1 | CODE_2 | CODE_3 | CODE_1_DT | CODE_2_DT | CODE_3_DT | UPDATED_BY
1 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER1
2 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER2
3 | CD1 | CD2 | CD3 | 20100101 | 20160101 | 20170101 | USER3
My SQL to convert columns to row
SELECT Q.CODE, Q.CODE_DT FROM MYTABLE AS MT,
TABLE VALUES(
(MT.CODE_1, MT.CODE_1_DT),
(MT.CODE_2, MT.CODE_2_DT),
(MT.CODE_3, MT.CODE_3_DT),
) AS Q(CODE, CODE_DT)
WHERE MT.ID=1;
Expected output is
CODE | CODE_DT
CD1 | 20100101
CD2 | 20160101
CD3 | 20170101
I'm not able to get the expected result and getting error related to cardinality or cardinality multiplier. I don't know what's going wrong or sq. is correct...any pointers?
Try this
select id1, code, date
from mytable t,
lateral (values (t.id, t.code_1, t.code_1_dt),
(t.id, t.code_2, t.code_2_dt),
(t.id, t.code_3, t.code_3_dt)
) as q (id1, code, date)

Order by created_date if less than 1 month old, else sort by updated_date

SQL Fiddle: http://sqlfiddle.com/#!15/1da00/5
I have a table that looks something like this:
products
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| chair | 50 | 10/12/2016 | 1/4/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| toaster | 20 | 1/9/2017 | 1/9/2017 |
+-----------+-------+--------------+--------------+
I want to order this table in a way where if the product was created less than 30 days those results should show first (and be ordered by the updated date). If the product was created 30 or more days ago I want it to show after (and have it ordered by updated date within that group)
This is what the result should look like:
products - desired results
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| toaster | 20 | 1/9/2017 | 1/9/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| chair | 50 | 10/12/2016 | 1/4/2017 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
+-----------+-------+--------------+--------------+
I've started writing this query:
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
ORDER BY order_index, created_date DESC
but that only bring the rows with created_date less thatn 30 days to the top, and then ordered by created_date. I want to also sort the rows where order_index = 1 by updated_date
Unfortunately in version 9.3 only positional column numbers or expressions involving table columns can be used in order by so order_index is not available to case at all and its position is not well defined because it comes after * in the column list.
This will work.
order by
created_date <= ( current_date - 30 ) , case
when created_date > ( current_date - 30 ) then created_date
else updated_date end desc
Alternatively a common table expression can be used to wrap the result and then that can be ordered by any column.
WITH q AS(
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
)
SELECT * FROM q
ORDER BY
order_index ,
CASE order_index
WHEN 0 THEN created_date
WHEN 1 THEN updated_date
END DESC;
A third approach is to exploit nulls.
order by
case
when created_date > ( current_date - 30 ) then created_date
end desc nulls last,
updated_date desc;
This approach can be useful when the ordering columns are of different types.