Linear regression with postgres - postgresql

I use Postgres and i have a large number of rows with values and date per station.
(Dates can be separated by several days.)
id | value | idstation | udate
--------+-------+-----------+-----
1 | 5 | 12 | 1984-02-11 00:00:00
2 | 7 | 12 | 1984-02-17 00:00:00
3 | 8 | 12 | 1984-02-21 00:00:00
4 | 9 | 12 | 1984-02-23 00:00:00
5 | 4 | 12 | 1984-02-24 00:00:00
6 | 8 | 12 | 1984-02-28 00:00:00
7 | 9 | 14 | 1984-02-21 00:00:00
8 | 15 | 15 | 1984-02-21 00:00:00
9 | 14 | 18 | 1984-02-21 00:00:00
10 | 200 | 19 | 1984-02-21 00:00:00
Forgive what may be a silly question, but I'm not much of a database guru.
Is it possible to directly enter a SQL query that will calculate linear regression per station for each date, knowing that the regression must be calculate only with actual id date, previous id date and next id date ?
For example linear regression for id 2 must be calculate with value 7(actual),5(previous),8(next) for dates 1984-02-17 , 1984-02-11 and 1984-02-21
Edit : I have to use regr_intercept(value,udate) but i really don't know how to do this if i have to use only actual, previous and next value/date for each lines.
Edit2 : 3 rows added to idstation(12); id and dates numbers are changed
Hope you can help me, thank you !

This is the combination of Joop's statistics and Denis's window functions:
WITH num AS (
SELECT id, idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
-- id + the ids of the {prev,next} records
-- within the same idstation group
, drag AS (
SELECT id AS center
, LAG(id) OVER www AS prev
, LEAD(id) OVER www AS next
FROM thedata
WINDOW www AS (partition by idstation ORDER BY id)
)
-- junction CTE between ID and its three feeders
, tri AS (
SELECT center AS this, center AS that FROM drag
UNION ALL SELECT center AS this , prev AS that FROM drag
UNION ALL SELECT center AS this , next AS that FROM drag
)
SELECT t.this, n.idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM num n
JOIN tri t ON t.that = n.id
GROUP BY t.this, n.idstation
;
Results:
INSERT 0 7
this | idstation | intercept | slope | rsq | avgx | avgy
------+-----------+-------------------+-------------------+-------------------+------------------+------------------
1 | 12 | -46 | 1 | 1 | 52 | 6
2 | 12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
3 | 12 | -10.6666666666667 | 0.333333333333333 | 1 | 54.5 | 7.5
4 | 14 | | | | 51 | 9
5 | 15 | | | | 51 | 15
6 | 18 | | | | 51 | 14
7 | 19 | | | | 51 | 200
(7 rows)
The clustering of the group-of-three can probably be done more elegantly using a rank() or row_number() function, which would also allow larger sliding windows to be used.

DROP SCHEMA zzz CASCADE;
CREATE SCHEMA zzz ;
SET search_path=zzz;
CREATE TABLE thedata
( id INTEGER NOT NULL PRIMARY KEY
, value INTEGER NOT NULL
, idstation INTEGER NOT NULL
, udate DATE NOT NULL
);
INSERT INTO thedata(id,value,idstation,udate) VALUES
(1 ,5 ,12 ,'1984-02-21' )
,(2 ,7 ,12 ,'1984-02-23' )
,(3 ,8 ,12 ,'1984-02-26' )
,(4 ,9 ,14 ,'1984-02-21' )
,(5 ,15 ,15 ,'1984-02-21' )
,(6 ,14 ,18 ,'1984-02-21' )
,(7 ,200 ,19 ,'1984-02-21' )
;
WITH a AS (
SELECT idstation
, (udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984
, value AS value
FROM thedata
)
SELECT idstation
, regr_intercept(value,idate) AS intercept
, regr_slope(value,idate) AS slope
, regr_r2(value,idate) AS rsq
, regr_avgx(value,idate) AS avgx
, regr_avgy(value,idate) AS avgy
FROM a
GROUP BY idstation
;
output:
idstation | intercept | slope | rsq | avgx | avgy
-----------+-------------------+-------------------+-------------------+------------------+------------------
15 | | | | 51 | 15
14 | | | | 51 | 9
19 | | | | 51 | 200
12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667
18 | | | | 51 | 14
(5 rows)
Note: if you want a spline-like regression you should also use the lag() and lead() window functions, like in Denis's answer.

If the average is ok for you you could use avg build in... Something like
SELECT avg("value") FROM "my_table" WHERE "idstation" = 3;
Should do. For more complicated things you will need to write some pl/SQL-function I'm afraid or check for a addon on PostgreSQL.

Look into window functions. If I get your question correctly, lead() and lag() will likely give you precisely what you want. Example usage:
select idstation as idstation,
id as curr_id,
udate as curr_date,
lag(id) over w as prev_id,
lag(udate) over w as prev_date,
lead(id) over w as next_id,
lead(udate) over w as next_date
from dates
window w as (
partition by idstation order by udate, id
)
order by idstation, udate, id
http://www.postgresql.org/docs/current/static/tutorial-window.html

Related

Month Not Printing When No Transaction In Particular Month

I had written a code for getting employee attrition details, Showing employee count of opening, join, left & closing, Month wise.
Issue here is that if there is no value in any above four column, system is not generating the month.
Please suggest the solution.
OUTPUT:
yyear | mmonth | charmonth | opening | incoming | relived | closing
-------+--------+-----------+---------+----------+---------+---------
2018 | 4 | Apr-18 | 14 | 2 | 0 | 16
2018 | 5 | May-18 | 16 | 1 | 0 | 17
2018 | 8 | Aug-18 | 17 | 3 | 0 | 20
2018 | 9 | Sep-18 | 20 | 1 | 0 | 21
2018 | 10 | Oct-18 | 21 | 23 | 4 | 40
2018 | 11 | Nov-18 | 40 | 5 | 1 | 44
2018 | 12 | Dec-18 | 44 | 2 | 0 | 46
2019 | 1 | Jan-19 | 46 | 1 | 0 | 47
2019 | 2 | Feb-19 | 47 | 1 | 0 | 48
2019 | 3 | Mar-19 | 48 | 6 | 1 | 53
2019 | 4 | Apr-19 | 53 | 1 | 0 | 54
2019 | 5 | May-19 | 54 | 3 | 1 | 56
2019 | 6 | Jun-19 | 56 | 2 | 0 | 58
(13 rows)
If you see the sequence of month, June-18, July-18 is missing.
Code:
WITH table_1 AS (
select
startdate as ddate,
enddate as lastday,
extract('month' from startdate) as mmonth,
extract('year' from startdate) as yyear,
to_char(to_timestamp(startdate),'Mon-YY') as months
from shr_period
where startdate >= DATE('2018-01-01')
and enddate <= DATE('2019-07-01')
and ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
)
SELECT
table_1.yyear,
table_1.mmonth,
table_1.months as charmonth,
(SELECT
COUNT(*)
FROM shr_emp_job OPENING
WHERE OPENING.dateofjoining < table_1.ddate
and OPENING.relieveddate is null
and OPENING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS OPENING,
count(*) as incoming,
(select count(*)
from shr_emp_job rel
where rel.relieveddate is not null
and rel.dateofjoining <= table_1.lastday
and rel.dateofjoining >= table_1.ddate
and rel.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) as relived,
(SELECT COUNT(*)
FROM shr_emp_job CLOSING
WHERE CLOSING.dateofjoining <= table_1.lastday
and relieveddate is null
and CLOSING.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
) AS CLOSING
FROM
shr_emp_job
JOIN table_1 ON table_1.mmonth = extract('month' from shr_emp_job.dateofjoining)
AND table_1.yyear = extract('year' from shr_emp_job.dateofjoining)
where shr_emp_job.ad_org_id = 'C9D035B52FAF46329D9654B1ECA0289F'
GROUP BY table_1.mmonth, table_1.yyear, table_1.ddate, table_1.lastday, charmonth
ORDER BY table_1.yyear, table_1.mmonth;
As a quick look try changing your JOIN from an inner join to an outer join. So instead of
FROM
shr_emp_job
JOIN table_1 ON
do
FROM
shr_emp_job
RIGHT OUTER JOIN table_1 ON
This tells Postgres to keep the selected columns from the right mentioned table (table_1) even when there is no matching values in the left mentioned table (shr_emp_job). For those conditions NULL is supplied for the missing values.

postgres tablefunc, sales data grouped by product, with crosstab of months

TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.
After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |

Calculate out price in FIFO SQL

Using Postgres 11
Using FIFO, i would like to calculate the price of items taken from the inventory, to keep track of the value of the total inventory.
Dataset is as follows:
ID | prodno | amount_purchased | amount_taken | price | created_at
uuid 13976 10 NULL 130 <timestamp>
uuid 13976 10 NULL 150 <timestamp>
uuid 13976 10 NULL 110 <timestamp>
uuid 13976 10 NULL 100 <timestamp>
uuid 13976 NULL 14 ?? <timestamp>
Before inserting the row with amount_taken i would need to calculate what the avg price of each of the 14 items is, which in this case would be 135,71, but how to calculate this relatively efficient?
My initial idea was to delegate the rows into two temp tables, one where amount_taken is null, and one where it is not null, and then calculate all the rows down, but seeing as this table could become rather large, rather fast (since most of the time, only 1 item would be taken from the inventory), i worry this would be a decent solution in the short term, but would slow down, as the table becomes larger. So, what's the better solution internet?
Given this setup:
CREATE TABLE test (
id int
, prodno int
, quantity numeric
, price numeric
, created_at timestamp
);
INSERT INTO test VALUES
(1, 13976, 10, 130, NOW())
, (2, 13976, 10, 150, NOW()+'1 hours')
, (3, 13976, 10, 110, NOW()+'2 hours')
, (4, 13976, 10, 100, NOW()+'3 hours')
, (5, 13976, -14, NULL, NOW()+'4 hours')
, (6, 13976, -1, NULL, NOW()+'5 hours')
, (7, 13976, -10, NULL, NOW()+'6 hours')
;
then the SQL
SELECT id, prodno, created_at, qty_sold
-- 5
, round((cum_sold_cost - coalesce(lag(cum_sold_cost) over w, 0))/qty_sold, 2) as fifo_price
, qty_bought, prev_bought, total_cost
, prev_total_cost
, cum_sold_cost
, coalesce(lag(cum_sold_cost) over w, 0) as prev_cum_sold_cost
FROM (
SELECT id, tneg.prodno, created_at, qty_sold, tpos.qty_bought, prev_bought, total_cost, prev_total_cost
-- 4
, round(prev_total_cost + ((tneg.cum_sold - tpos.prev_bought)/(tpos.qty_bought - tpos.prev_bought))*(total_cost-prev_total_cost), 2) as cum_sold_cost
FROM (
SELECT id, prodno, created_at, -quantity as qty_sold
, sum(-quantity) over w as cum_sold
FROM test
WHERE quantity < 0
WINDOW w AS (PARTITION BY prodno ORDER BY created_at)
-- 1
) tneg
LEFT JOIN (
SELECT prodno
, sum(quantity) over w as qty_bought
, coalesce(sum(quantity) over prevw, 0) as prev_bought
, quantity * price as cost
, sum(quantity * price) over w as total_cost
, coalesce(sum(quantity * price) over prevw, 0) as prev_total_cost
FROM test
WHERE quantity > 0
WINDOW w AS (PARTITION BY prodno ORDER BY created_at)
, prevw AS (PARTITION BY prodno ORDER BY created_at ROWS BETWEEN unbounded preceding AND 1 preceding)
-- 2
) tpos
-- 3
ON tneg.cum_sold BETWEEN tpos.prev_bought AND tpos.qty_bought
AND tneg.prodno = tpos.prodno
) t
WINDOW w AS (PARTITION BY prodno ORDER BY created_at)
yields
| id | prodno | created_at | qty_sold | fifo_price | qty_bought | prev_bought | total_cost | prev_total_cost | cum_sold_cost | prev_cum_sold_cost |
|----+--------+----------------------------+----------+------------+------------+-------------+------------+-----------------+---------------+--------------------|
| 5 | 13976 | 2019-03-07 21:07:13.267218 | 14 | 135.71 | 20 | 10 | 2800 | 1300 | 1900.00 | 0 |
| 6 | 13976 | 2019-03-07 22:07:13.267218 | 1 | 150.00 | 20 | 10 | 2800 | 1300 | 2050.00 | 1900.00 |
| 7 | 13976 | 2019-03-07 23:07:13.267218 | 10 | 130.00 | 30 | 20 | 3900 | 2800 | 3350.00 | 2050.00 |
tneg contains information about quantities sold
| id | prodno | created_at | qty_sold | cum_sold |
|----+--------+----------------------------+----------+----------|
| 5 | 13976 | 2019-03-07 21:07:13.267218 | 14 | 14 |
| 6 | 13976 | 2019-03-07 22:07:13.267218 | 1 | 15 |
| 7 | 13976 | 2019-03-07 23:07:13.267218 | 10 | 25 |
tpos contains information about quantities bought
| prodno | qty_bought | prev_bought | cost | total_cost | prev_total_cost |
|--------+------------+-------------+------+------------+-----------------|
| 13976 | 10 | 0 | 1300 | 1300 | 0 |
| 13976 | 20 | 10 | 1500 | 2800 | 1300 |
| 13976 | 30 | 20 | 1100 | 3900 | 2800 |
| 13976 | 40 | 30 | 1000 | 4900 | 3900 |
We match rows in tneg with rows in tpos on the condition that cum_sold is between qty_bought and prev_bought.
cum_sold is the cumulative amount sold, qty_bought is the cumulative amount bought, and prev_bought is the previous value of qty_bought.
| id | prodno | created_at | qty_sold | cum_sold | qty_bought | prev_bought | total_cost | prev_total_cost | cum_sold_cost |
|----+--------+----------------------------+----------+----------+------------+-------------+------------+-----------------+---------------|
| 5 | 13976 | 2019-03-07 21:07:13.267218 | 14 | 14 | 20 | 10 | 2800 | 1300 | 1900.00 |
| 6 | 13976 | 2019-03-07 22:07:13.267218 | 1 | 15 | 20 | 10 | 2800 | 1300 | 2050.00 |
| 7 | 13976 | 2019-03-07 23:07:13.267218 | 10 | 25 | 30 | 20 | 3900 | 2800 | 3350.00 |
The fraction
((tneg.cum_sold - tpos.prev_bought)/(tpos.qty_bought - tpos.prev_bought)) as frac
measures how far cum_sold lies in between qty_bought and prev_bought. We use this fraction to compute
cum_sold_cost, the cumulative cost associated with buying cum_sold items.
cum_sold_cost lies frac distance between prev_total_cost and total_cost.
Once you obtain cum_sold_cost, you have everything you need to compute marginal FIFO unit prices.
For each line of tneg, the difference between cum_sold_cost and its previous value is the cost of the qty_sold.
FIFO price is simply the ratio of this cost and qty_sold.

How to Calculate Median Price Per Unit Using PERCENTILE_CONT and GROUP BY id

I'm using postgres 9.5 and trying to calculate median and average price per unit with a GROUP BY id. Here is the query in DBFIDDLE
Here is the data
id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
Using percentile_cont this is my query:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
This query returns:
id| avg_price | median_price | avg_pp_unit | median_pp_unit
--+-----------+--------------+--------------+---------------
1 | 62 | 50 | 6 | 7
2 | 74 | 60 | 5 | 5
I'm pretty sure average calculation is correct. Is this the correct way to calculate median price per unit?
This post suggests this is correct (although performance is poor) but I'm curious if the division in the median calculation could skew the result.
Calculating median with PERCENTILE_CONT and grouping
The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value.
https://en.wikipedia.org/wiki/Median
So your median price is 55, and the median units is 9
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
I'm unsure what you intend for "median price per unit":
CREATE TABLE t(
id INTEGER NOT NULL
,price INTEGER NOT NULL
,units INTEGER NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
, percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
t;
| med_price | med_units
----|-----------|-----------
1 | 55 | 9
If column "price" represents a "unit price" then you don't need to divide 55 by 9, but if "price" is an "order total" then you would need to divide by units: 55/9 = 6.11

Temporal Aggregation in PostgreSQL

I am working on a Java implementation for temporal aggregation using a PostgreSQL database.
My table looks like this
Value | Start | Stop
(int) | (Date) | (Date)
-------------------------------
1 | 2004-01-01 | 2010-01-01
4 | 2000-01-01 | 2008-01-01
So to visualize this periods:
------------------------------
----------------------------------------
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
[ 4 ][ 5=4+1 ][ 1 ]
My algorithm now calculates temporal aggregations of the data, e.g. SUM():
Value | Start | Stop
-------------------------------
4 | 2000-01-01 | 2004-01-01
5 | 2004-01-01 | 2008-01-01
1 | 2008-01-01 | 2010-01-01
In order to test the gained results, I now would like to query the data directly using PostgreSQL. I know that there is no easy way to this problem, yet. However, there surely is a way to get the same results. The aggregations Count, Max, Min, Sum and Average should be supported. I do not mind a bad or slow solution, it just has to work.
A query I found so far which should work similarly is the following:
select count(*), ts, te
from ( checkout a normalize checkout b using() ) checkoutNorm
group by ts, te;
My adoption looks like this:
select count(*), start, stop
from ( myTable a normalize myTable b using() ) myTableNorm
group by start, stop;
However, an error was reported ERROR: syntax error at or near "normalize" -- LINE 2: from ( ndbs_10 a normalize ndbs_10 b using() ) ndbsNorm.
Does anyone has a solution to this problem? It does not have to be based on the above query, as long as it works. Thanks a lot.
Your question was really hard to understand. But I think I figured it out.
You want a running sum over value. Values are only applicable between start and stop of a time period. So they have to be added at the begin of that period and deducted at the end.
In addition you want the begin and end of the resulting period the sum is valid for.
That should do it:
-- DROP SCHEMA x CASCADE;
CREATE SCHEMA x;
CREATE TABLE x.tbl(val int, start date, stop date);
INSERT INTO x.tbl VALUES
(4 ,'2000-01-01' ,'2008-01-01')
,(7 ,'2001-01-01' ,'2009-01-01')
,(1 ,'2004-01-01' ,'2010-01-01')
,(2 ,'2005-01-01' ,'2006-01-01');
WITH a AS (
SELECT start as ts, val FROM x.tbl
UNION ALL
SELECT stop, val * (-1) FROM x.tbl
ORDER BY 1, 2)
SELECT sum(val) OVER w AS val_sum
,ts AS start
,lead(ts) OVER w AS stop
FROM a
WINDOW w AS (ORDER BY ts)
ORDER BY ts;
val_sum | start | stop
--------+------------+------------
4 | 2000-01-01 | 2001-01-01
11 | 2001-01-01 | 2004-01-01
12 | 2004-01-01 | 2005-01-01
14 | 2005-01-01 | 2006-01-01
12 | 2006-01-01 | 2008-01-01
8 | 2008-01-01 | 2009-01-01
1 | 2009-01-01 | 2010-01-01
0 | 2010-01-01 |
Edit after request
For all requested aggregate functions:
SELECT period
,val_sum
,val_count
,val_sum::float /val_count AS val_avg
,(SELECT min(val) FROM x.tbl WHERE start < y.stop AND stop > y.start) AS val_min
,(SELECT max(val) FROM x.tbl WHERE start < y.stop AND stop > y.start) AS val_max
,start
,stop
FROM (
WITH a AS (
SELECT start as ts, val, 1 AS c FROM x.tbl
UNION ALL
SELECT stop, val, -1 FROM x.tbl
ORDER BY 1, 2)
SELECT count(*) OVER w AS period
,sum(val*c) OVER w AS val_sum
,sum(c) OVER w AS val_count
,ts AS start
,lead(ts) OVER w AS stop
FROM a
WINDOW w AS (ORDER BY ts)
ORDER BY ts
) y
WHERE stop IS NOT NULL;
period | val_sum | val_count | val_avg | val_min | val_max | start | stop
--------+---------+-----------+---------+---------+---------+------------+------------
1 | 4 | 1 | 4 | 4 | 4 | 2000-01-01 | 2001-01-01
2 | 11 | 2 | 5.5 | 4 | 7 | 2001-01-01 | 2004-01-01
3 | 12 | 3 | 4 | 1 | 7 | 2004-01-01 | 2005-01-01
4 | 14 | 4 | 3.5 | 1 | 7 | 2005-01-01 | 2006-01-01
5 | 12 | 3 | 4 | 1 | 7 | 2006-01-01 | 2008-01-01
6 | 8 | 2 | 4 | 1 | 7 | 2008-01-01 | 2009-01-01
7 | 1 | 1 | 1 | 1 | 1 | 2009-01-01 | 2010-01-01
min() and max could possibly be optimized, but that should be good enough.
CTE (WITH clause) and and subqueries are exchangeable, as you can see.