Postgresql date_part on muliple records - postgresql

I've multiple records whit same who value and different when
table:
I've an attendance table, with this structure:
who | when | why
"when" timestamp with time zone NOT NULL,
"who" text,
I need to calculate the difference between each record for the same who.
I've tried :
DATE_PART('hour', table."when"::timestamp - table."when"::timestamp)
but don't seems to work.
i.e.
A | 2017-03-01 08:30
A | 2017-03-01 12:30
B | 2017-03-01 08:30
B | 2017-03-01 12:30
Need to get total hours for A and B separated

You need a window function in order access the value of the "previous" row:
select who,
when,
when - lag(when) over (partition by who order by when) as diff
from the_table
order by who, when;
If you only ever have two rows per who, or just care for the first and last value of when, use a simple aggregation:
select who,
max(when) - min(when)
from the_table
group by who
order by who;

Related

Calculate time difference over the first and the last table row

Within postgresql, I'm trying to write a query that calculates the time difference between a time stamp of the first row and a time stamp of the last row:
(select public."ImportLogs"."DateTimeStamp" as migration_start from public."ImportLogs" order by public."ImportLogs"."DateTimeStamp" asc limit 1)
union
(select public."ImportLogs"."DateTimeStamp" as migration_end from public."ImportLogs" order by public."ImportLogs"."DateTimeStamp" desc limit 1);
I tried to get the time difference between migration_start and migration_end, but I couldn't get it to work. How can I achieve this?
We can substract min(DateTimeStamp) from the max(DateTimeStamp)` and cast the difference as time.
select
cast(
max(DateTimeStamp)
- min(DateTimeStamp)
as time) TimeDiffernce
from ImportLogs
| timediffernce |
| :------------ |
| 00:00:10 |
db<>fiddle here

Simple temporal view?

How can I write a query to compute the end date per ID in postgres? I am able to do this in-memory with Python, but I would rather keep it simple and just create a view.
My table appends any new combination of system_1_id and system_2_id along with the date of the file the data was from (I am reading a snapshot mapping file which is sent a few times per week). It looks like this:
system_1_id system_2_id start_date is_current
123456 05236 2016-06-01 False
123456 98899 2017-01-03 False
123456 05236 2017-04-15 True
To:
system_1_id system_2_id start_date end_date
123456 05236 2016-06-01 2017-01-02
123456 98899 2017-01-03 2017-04-14
123456 05236 2017-04-15
Note that there can only be one system_2_id assigned to a system_1_id at a time, but they can be recycled and even reassigned at a later date.
The end date is simply just 1 day less than the next row date for the same ID
My goal is eventually to be able to join other tables to the data and pull the accurate ids per date:
where t1.system_2_id = t2.system_2_id and t1.report_date >= t2.start_date and t1.report_date <= t2.end_date
A simple temporal table without worrying about triggers or rules or using an extension.
The lead() window function will do this for you, with your example data:
select
system_1_id,
system_2_id,
start_date,
cast(lead(start_date, 1, Null) over(partition by system_1_id order by start_date) - interval '1 day' as date) as end_date
from
the_table;

Postgresql: Create a date sequence, use it in date range query

I'm not great with SQL but I have been making good progress on a project up to this point. Now I am completely stuck.
I'm trying to get a count for the number of apartments with each status. I want this information for each day so that I can trend it over time. I have data that looks like this:
table: y_unit_status
unit | date_occurred | start_date | end_date | status
1 | 2017-01-01 | 2017-01-01 | 2017-01-05 | Occupied No Notice
1 | 2017-01-06 | 2017-01-06 | 2017-01-31 | Occupied Notice
1 | 2017-02-01 | 2017-02-01 | | Vacant
2 | 2017-01-01 | 2017-01-01 | | Occupied No Notice
And I want to get output that looks like this:
date | occupied_no_notice | occupied_notice | vacant
2017-01-01 | 2 | 0 | 0
...
2017-01-10 | 1 | 1 | 0
...
2017-02-01 | 1 | 0 | 1
Or, this approach would work:
date | status | count
2017-01-01 | occupied no notice | 2
2017-01-01 | occupied notice | 0
date_occurred: Date when the status of the unit changed
start_date: Same as date_occurred
end_date: Date when status stopped being x and changed to y.
I am pulling in the number of bedrooms and a property id so the second approach of selecting counts for one status at a time would produce a relatively large number of rows vs. option 1 (if that matters).
I've found a lot of references that have gotten me close to what I'm looking for but I always end up with a sort of rolling, cumulative count.
Here's my query, which produces a column of dates and counts, which accumulate over time rather than reflecting a snapshot of counts for a particular day. You can see my references to another table where I'm pulling in a property id. The table schema is Property -> Unit -> Unit Status.
WITH t AS(
SELECT i::date from generate_series('2016-06-29', '2017-08-03', '1 day'::interval) i
)
SELECT t.i as date,
u.hproperty,
count(us.hmy) as count --us.hmy is the id
FROM t
LEFT OUTER JOIN y_unit_status us ON t.i BETWEEN us.dtstart AND
us.dtend
INNER JOIN y_unit u ON u.hmy = us.hunit -- to get property id
WHERE us.sstatus = 'Occupied No Notice'
AND t.i >= us.dtstart
AND t.i <= us.dtend
AND u.hproperty = '1'
GROUP BY t.i, u.hproperty
ORDER BY t.i
limit 1500
I also tried a FOR loop, iterating over the dates to determine cases where the date was between start and end but my logic wasn't working. Thanks for any insight!
You are on the right track, but you'll need to handle NULL values in end_date. If those means that status is assumed to be changed somewhere in the future (but not sure when it will change), the containment operators (#> and <#) for the daterange type are perfect for you (because ranges can be "unbounded"):
with params as (
select date '2017-01-01' date_from,
date '2017-02-02' date_to
)
select date_from + d, status, count(unit)
from params
cross join generate_series(0, date_to - date_from) d
left join y_unit_status on daterange(start_date, end_date, '[]') #> date_from + d
group by 1, 2
To achieve the first variant, you can use conditional aggregation:
with params as (
select date '2017-01-01' date_from,
date '2017-02-02' date_to
)
select date_from + d,
count(unit) filter (where status = 'Occupied No Notice') occupied_no_notice,
count(unit) filter (where status = 'Occupied Notice') occupied_notice,
count(unit) filter (where status = 'Vacant') vacant
from params
cross join generate_series(0, date_to - date_from) d
left join y_unit_status on daterange(start_date, end_date, '[]') #> date_from + d
group by 1
Notes:
The syntax filter (where <predicate>) is new to 9.4+. Before that, you can use CASE (and the fact that most aggregate functions does not include NULL values) to emulate it.
You can even index the expression daterange(start_date, end_date, '[]') (using gist) for better performance.
http://rextester.com/HWKDE34743

How to select records in date order that total to an arbitrary amount?

I have a table of fuel deliveries as follows:
Date Time Qty
20160101 0800 4500
20160203 0900 6000
20160301 0810 3400
20160328 1710 5300
20160402 1201 6000
I know that on April 1st I had 10,000 litres in the tank so now I want to select just the deliveries that make up the total. This means I want the records for 20160328,20160301 and 20160203. I am using Postgres and I want to know how to structure a select statement that would accomplish this task.
I understand how to use the where clause to filter records whose date is less than on equal April 1st but I do not know how to instruct Postgres to select the records in reverse date order until the quantity selected is greater than or equal to 10,000.
with d as (
select *, sum(qty) over (order by date desc, time desc) as total
from delivery
where date between '20160101' and '20160401'
)
select *
from d
where total < 10000
union
(
select *
from d
where total >= 10000
order by date desc, time desc
limit 1
)
order by date desc, time desc
;
date | time | qty | total
------------+----------+------+-------
2016-03-28 | 17:10:00 | 5300 | 5300
2016-03-01 | 08:10:00 | 3400 | 8700
2016-02-03 | 09:00:00 | 6000 | 14700
The data:
create table delivery (date date, time time, qty int);
insert into delivery (date, time, qty) values
('20160101','0800',4500),
('20160203','0900',6000),
('20160301','0810',3400),
('20160328','1710',5300),
('20160402','1201',6000);
You can create a running total using a window function based on descending order of date and time, like so:
SELECT
Date,
Time,
Qty
FROM
(
SELECT
Date,
Time,
Qty,
SUM(Qty) OVER (ORDER BY Date DESC, Time DESC) AS Running_Total
FROM
fuel_deliveries
WHERE
Date < '20160402'
) rt
WHERE
Running_Total <= 10000;
The inner/sub query gets you the running total, but you then want to filter on it where the value is less than or equal to 10000.

SQL query to convert date ranges to per day records

Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.