postgresql: splitting time period at event - postgresql

I have a table of country-periods. In some cases, certain country attributes (e.g. the capital) changes on a date within a time period. Here I would like to split the country-period into two new periods, one before and one after this change.
Example:
Country | start_date | end_date | event_date
A | 1960-01-01 | 1999-12-31 | 1994-07-20
B | 1926-01-01 | 1995-12-31 | NULL
Desired output:
Country | start_date | end_date | event_date
A | 1960-01-01 | 1994-07-19 | 1994-07-20
A | 1994-07-20 | 1999-12-31 | 1994-07-20
B | 1926-01-01 | 1995-12-31 | NULL
I considered starting off with generate_series along these lines:
SELECT country, min(p1) as sdate1, max(p1) as sdate2,
min(p2) as sdate2, min(p2) as edate2
FROM
(SELECT country,
generate_series(start_date, (event_date-interval '1 day'), interval '1 day')::date as p1,
generate_series(event_date, end_date, interval '1 day')::date as p2
FROM table)t
GROUP BY country
But these seems way to inefficient and messy. Unfortunately I don't have any experience when it comes to writing functions. Any ideas on how I can solve this?

You can do UNION instead. This way you don't generate unnecessary rows
SELECT country, start_date,
CASE WHEN event_date BETWEEN start_date AND end_date
THEN event_date - 1
ELSE end_date
END AS end_date, event_date
FROM table1
UNION ALL
SELECT country, event_date, end_date, event_date
FROM table1
WHERE event_date BETWEEN start_date AND end_date
ORDER BY country, start_date, end_date, event_date
Here is a SQLFiddle demo
Output:
| country | start_date | end_date | event_date |
|---------|------------|------------|------------|
| A | 1960-01-01 | 1994-07-19 | 1994-07-20 |
| A | 1994-07-20 | 1999-12-31 | 1994-07-20 |
| B | 1926-01-01 | 1995-12-31 | (null) |

Related

Grouping with different timespans

currently I am struggling to achieve some aggregation that is kinda overlapping.
The current structure of my table is:
|ymd |id|costs|
|--------|--|-----|
|20200101|a |10 |
|20200102|a |12 |
|20200101|b |13 |
|20200101|c |15 |
|20200102|c |1 |
However i'd like to group it in a way that I had different timespan per item. Considering that I am running this query on the 20200103, the result i am trying to achieve is:
| timespan | id | costs |
|------------|----|-------|
| last 2 days| a | 22 |
| last 1 day | a | 12 |
| last 2 days| b | 13 |
| last 1 day | b | 0 |
| last 2 days| c | 16 |
| last 1 day | c | 1 |
I have tried many things, but so far I wasn't able to achieve what I need. This is the query that I have tried, with no correct results:
SELECT
CASE
WHEN ymd BETWEEN date_add(current_date(),-2) AND to_date(current_date()) THEN '2 days'
WHEN ymd BETWEEN date_add(current_date(),-1) AND to_date(current_date()) THEN '1 day'
END AS timespan,
id,
sum(costs) AS costs
FROM `table`
GROUP BY
CASE
WHEN ymd BETWEEN date_add(current_date(),-2) AND to_date(current_date()) THEN '2 days'
WHEN ymd BETWEEN date_add(current_date(),-1) AND to_date(current_date()) THEN '1 day'
END,
id
You can build a derived table that stores the timestamps, cross join it with the list of distinct users to generate all possible combinations, then bring the table with a left join and aggregate:
select d.timespan, i.id, coalesce(sum(t.costs), 0) costs
from (select distinct id from mytable) i
cross join (
select 1 n, 'last 1 day' timespan
union all select 2, 'last 2 day'
) d
left join mytable t
on t.ymd between date_add(current_date(), - d.n) and current_date()
group by d.n, d.timespan, i.id

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?
Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

Aggregate data per week

I'd like to aggregate data weekly according to a date and a value.
I have a table like this :
create table test (t_val integer, t_date date);
insert into test values(1,'2017-02-09'),(2,'2017-02-10'),(4,'2017-02-16');
This is the query :
WITH date_range AS (
SELECT MIN(t_date) as start_date,
MAX(t_date) as end_date
FROM test
)
SELECT
date_part('year', f.date) as date_year,
date_part('week', f.date) as date_week,
f.val
FROM generate_series( (SELECT start_date FROM date_range), (SELECT end_date FROM date_range), '7 day')d
LEFT JOIN
(
SELECT t_val as val, t_date as date
FROM test
WHERE t_date >= (SELECT start_date FROM date_range)
AND t_date <= (SELECT end_date FROM date_range)
GROUP BY t_val, t_date
) f
ON f.date BETWEEN d.date AND (d.date + interval '7 day')
GROUP BY date_part('year', f.date),date_part('week', f.date), f.val;
I expect a result like this :
| Year | Week | Val |
| 2017 | 6 | 3 |
| 2017 | 7 | 4 |
BUt the query returns :
| Year | Week | Val |
| 2017 | 6 | 1 |
| 2017 | 6 | 2 |
| 2017 | 7 | 4 |
What is missing ?

Dynamic column names in a postgres crosstab query

I am trying to pivot the data in a query in postgres. The query I am currently using is as follows
SELECT
product_number,
month,
sum(quantity)
FROM forecasts
WHERE date_trunc('month', extract_date) = date_trunc('month', current_date)
GROUP BY product_number, month
ORDER BY product_number, month;
The output of the query is something like what is shown below where each product will have 13 months of data.
+--------+------------+----------+
| Number | Month | Quantity |
+--------+------------+----------+
| 1 | 2016-10-01 | 7592 |
| 1 | 2016-11-01 | 6796 |
| 1 | 2016-12-01 | 6512 |
| 1 | 2017-01-01 | 6160 |
| 1 | 2017-02-01 | 6475 |
| 1 | 2017-03-01 | 6016 |
| 1 | 2017-04-01 | 6616 |
| 1 | 2017-05-01 | 6536 |
| 1 | 2017-06-01 | 6256 |
| 1 | 2017-07-01 | 6300 |
| 1 | 2017-08-01 | 5980 |
| 1 | 2017-09-01 | 5872 |
| 1 | 2017-10-01 | 5824 |
+--------+------------+----------+
I am trying to pivot the data so that it looks something like
+--------+-----------+-----------+-----------+----------+-----+
| Number | 2016-10-1 | 2016-11-1 | 2016-12-1 | 2017-1-1 | ... |
+--------+-----------+-----------+-----------+----------+-----+
| 1 | 100 | 100 | 200 | 250 | ... |
| ... | | | | | |
+--------+-----------+-----------+-----------+----------+-----+
Where all the data for each product is shown in a row for the 13 months.
I tried using a basic crosstab query
SELECT *
FROM
crosstab('SELECT product_number, month::TEXT, sum(quantity)
FROM forecasts
WHERE date_trunc(''month'', extract_date) = date_trunc(''month'', ''2016-10-1''::DATE)
GROUP BY product_number, month
ORDER BY product_number, month')
As mthreport(product_number text, m0 DATE, m1 DATE, m2 DATE,
m3 DATE, m4 DATE, m5 DATE, m6 DATE,
m7 DATE, m8 DATE, m9 DATE, m10 DATE,
m11 DATE, m12 DATE, m13 DATE)
But I get the following error
ERROR: invalid return type Detail: SQL rowid datatype does not match return rowid datatype.
If the column name were set in the crosstab i.e. if I could define and put the names into the crosstab output this works, but since the dates keep changing I am not sure how to define them
I think I missing something very basic here. Any help would be really appreciated.
Hoping, i have understood your problem correctly.
Column m1, m2 .. m13 are not of date type. These columns will contain sum of quantity. So, data type will be same as sum(quantity).
I think below query will solve your problem
SELECT *
FROM
crosstab($$SELECT product_number, month, sum(quantity)::bigint
FROM forecasts
GROUP BY product_number, month
ORDER BY product_number, month$$)
As mthreport(product_number int, m0 bigint, m1 bigint, m2 bigint,
m3 bigint, m4 bigint, m5 bigint, m6 bigint,
m7 bigint, m8 bigint, m9 bigint, m10 bigint,
m11 bigint, m12 bigint , m13 bigint)

Order by created_date if less than 1 month old, else sort by updated_date

SQL Fiddle: http://sqlfiddle.com/#!15/1da00/5
I have a table that looks something like this:
products
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| chair | 50 | 10/12/2016 | 1/4/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| toaster | 20 | 1/9/2017 | 1/9/2017 |
+-----------+-------+--------------+--------------+
I want to order this table in a way where if the product was created less than 30 days those results should show first (and be ordered by the updated date). If the product was created 30 or more days ago I want it to show after (and have it ordered by updated date within that group)
This is what the result should look like:
products - desired results
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| toaster | 20 | 1/9/2017 | 1/9/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| chair | 50 | 10/12/2016 | 1/4/2017 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
+-----------+-------+--------------+--------------+
I've started writing this query:
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
ORDER BY order_index, created_date DESC
but that only bring the rows with created_date less thatn 30 days to the top, and then ordered by created_date. I want to also sort the rows where order_index = 1 by updated_date
Unfortunately in version 9.3 only positional column numbers or expressions involving table columns can be used in order by so order_index is not available to case at all and its position is not well defined because it comes after * in the column list.
This will work.
order by
created_date <= ( current_date - 30 ) , case
when created_date > ( current_date - 30 ) then created_date
else updated_date end desc
Alternatively a common table expression can be used to wrap the result and then that can be ordered by any column.
WITH q AS(
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
)
SELECT * FROM q
ORDER BY
order_index ,
CASE order_index
WHEN 0 THEN created_date
WHEN 1 THEN updated_date
END DESC;
A third approach is to exploit nulls.
order by
case
when created_date > ( current_date - 30 ) then created_date
end desc nulls last,
updated_date desc;
This approach can be useful when the ordering columns are of different types.