Obtaining a date-bound running total on postgresql - postgresql

I have a database query running on Postgresql 9.3 that looks like this in order to obtain a running balance of accounting entries:
select *,(sum(amount) over(partition
by
ae.account_id
order by
ae.date_posted,
ae.account_id
)) as formula0_1_
from
account_entry as ae
-- where ae.date_posted> '2014-01-01'
order by account_id desc, date_posted asc
expected output without the where clause would be:
id | date | amount | running balance
1 2014-01-01 10 10
2 2014-01-02 10 20
what I'm getting with the where clause:
id | date | amount | running balance
2 2014-01-02 10 10
How can I make this this query return me the same correct results if I try filtering by a date range (the bit commented above)?

You need to select and calculate your running balances first over all the data, and then put a WHERE clause in an outer SELECT.
SELECT
*
FROM
(SELECT
*,
SUM(amount) OVER (
PARTITION BY
ae.account_id
ORDER BY
ae.date_posted,
ae.account_id
) AS formula0_1_
FROM
account_entry AS ae) AS total
WHERE
total.date_posted > '2014-01-01'
ORDER BY
account_id DESC,
date_posted ASC;

Related

MySQL group by timestamp difference

I need to write mysql query which will group results by difference between timestamps.
Is it possible?
I have table with locations and every row has created_at (timestamp) and I want to group results by difference > 1min.
Example:
id | lat | lng | created_at
1. | ... | ... | 2020-05-03 06:11:35
2. | ... | ... | 2020-05-03 06:11:37
3. | ... | ... | 2020-05-03 06:11:46
4. | ... | ... | 2020-05-03 06:12:48
5. | ... | ... | 2020-05-03 06:12:52
Result of this data should be 2 groups (1,2,3) and (4,5)
It depends on what you actually want. If youw want to group together records that belong to the same minute, regardless of the difference with the previous record, then simple aggregation is enough:
select
date_format(created_at, '%Y-%m-%d %H:%i:00') date_minute,
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from mytable
group by date_minute
On the other hand, if you want to build groups of consecutive records that have less than 1 minute gap in between, this is a gaps and islands problem. Here is on way to solve it using window functions (available in MySQL 8.0):
select
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from (
select
t.*,
sum(case when created_at < lag_created_at + interval 1 minute then 0 else 1 end)
over(order by created_at) grp
from (
select
t.*,
lag(created_at) over(order by created_at) lag_created_at
from mytable t
) t
) t
group by grp

Gaps and Islands - get a list of dates unemployed over a date range with Postgresl

I have a table called Position, in this table, I have the following, dates are inclusive (yyyy-mm-dd), below is a simplified view of the employment dates
id, person_id, start_date, end_date , title
1 , 1 , 2001-12-01, 2002-01-31, 'admin'
2 , 1 , 2002-02-11, 2002-03-31, 'admin'
3 , 1 , 2002-02-15, 2002-05-31, 'sales'
4 , 1 , 2002-06-15, 2002-12-31, 'ops'
I'd like to be able to calculate the gaps in employment, assuming some of the dates overlap to produce the following output for the person with id=1
person_id, start_date, end_date , last_position_id, gap_in_days
1 , 2002-02-01, 2002-02-10, 1 , 10
1 , 2002-06-01, 2002-06-14, 3 , 14
I have looked at numerous solutions, UNIONS, Materialized views, tables with generated calendar date ranges, etc. I really am not sure what is the best way to do this. Is there a single query where I can get this done?
step-by-step demo:db<>fiddle
You just need the lead() window function. With this you are able to get a value (start_date in this case) to the current row.
SELECT
person_id,
end_date + 1 AS start_date,
lead - 1 AS end_date,
id AS last_position_id,
lead - (end_date + 1) AS gap_in_days
FROM (
SELECT
*,
lead(start_date) OVER (PARTITION BY person_id ORDER BY start_date)
FROM
positions
) s
WHERE lead - (end_date + 1) > 0
After getting the next start_date you are able to compare it with the current end_date. If they differ, you have a gap. These positive values can be filtered within the WHERE clause.
(if 2 positions overlap, the diff is negative. So it can be ignored.)
first you need to find what dates overlaps Determine Whether Two Date Ranges Overlap
then merge those ranges as a single one and keep the last id
finally calculate the ranges of days between one end_date and the next start_date - 1
SQL DEMO
with find_overlap as (
SELECT t1."id" as t1_id, t1."person_id", t1."start_date", t1."end_date",
t2."id" as t2_id, t2."start_date" as t2_start_date, t2."end_date" as t2_end_date
FROM Table1 t1
LEFT JOIN Table1 t2
ON t1."person_id" = t2."person_id"
AND t1."start_date" <= t2."end_date"
AND t1."end_date" >= t2."start_date"
AND t1.id < t2.id
), merge_overlap as (
SELECT
person_id,
start_date,
COALESCE(t2_end_date, end_date) as end_date,
COALESCE(t2_id, t1_id) as last_position_id
FROM find_overlap
WHERE t1_id NOT IN (SELECT t2_id FROM find_overlap WHERE t2_ID IS NOT NULL)
), cte as (
SELECT *,
LEAD(start_date) OVER (partition by person_id order by start_date) next_start
FROM merge_overlap
)
SELECT *,
DATE_PART('day',
(next_start::timestamp - INTERVAL '1 DAY') - end_date::timestamp
) as days
FROM cte
WHERE next_start IS NOT NULL
OUTPUT
| person_id | start_date | end_date | last_position_id | next_start | days |
|-----------|------------|------------|------------------|------------|------|
| 1 | 2001-12-01 | 2002-01-31 | 1 | 2002-02-11 | 10 |
| 1 | 2002-02-11 | 2002-05-31 | 3 | 2002-06-15 | 14 |

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

How to select records in date order that total to an arbitrary amount?

I have a table of fuel deliveries as follows:
Date Time Qty
20160101 0800 4500
20160203 0900 6000
20160301 0810 3400
20160328 1710 5300
20160402 1201 6000
I know that on April 1st I had 10,000 litres in the tank so now I want to select just the deliveries that make up the total. This means I want the records for 20160328,20160301 and 20160203. I am using Postgres and I want to know how to structure a select statement that would accomplish this task.
I understand how to use the where clause to filter records whose date is less than on equal April 1st but I do not know how to instruct Postgres to select the records in reverse date order until the quantity selected is greater than or equal to 10,000.
with d as (
select *, sum(qty) over (order by date desc, time desc) as total
from delivery
where date between '20160101' and '20160401'
)
select *
from d
where total < 10000
union
(
select *
from d
where total >= 10000
order by date desc, time desc
limit 1
)
order by date desc, time desc
;
date | time | qty | total
------------+----------+------+-------
2016-03-28 | 17:10:00 | 5300 | 5300
2016-03-01 | 08:10:00 | 3400 | 8700
2016-02-03 | 09:00:00 | 6000 | 14700
The data:
create table delivery (date date, time time, qty int);
insert into delivery (date, time, qty) values
('20160101','0800',4500),
('20160203','0900',6000),
('20160301','0810',3400),
('20160328','1710',5300),
('20160402','1201',6000);
You can create a running total using a window function based on descending order of date and time, like so:
SELECT
Date,
Time,
Qty
FROM
(
SELECT
Date,
Time,
Qty,
SUM(Qty) OVER (ORDER BY Date DESC, Time DESC) AS Running_Total
FROM
fuel_deliveries
WHERE
Date < '20160402'
) rt
WHERE
Running_Total <= 10000;
The inner/sub query gets you the running total, but you then want to filter on it where the value is less than or equal to 10000.

which is more efficient, select array_agg over partition, or select array (subquery)?

I have data like:
group_id | day | amount
----------+-------------+-------
1 | 15 Nov 2015 | 5.0
1 | 15 Nov 2015 | 6.0
1 | 14 Nov 2015 | 3.0
2 | 17 Nov 2015 | 5.0
2 | 15 Nov 2015 | 5.0
and I want to select the top ten amounts for each (group_id, day). I tried writing things like:
Postgres 9.4
select max(x.group_id), max(x.day), max(x.amounts)
from (select group_id, day, array_agg(amount) over w as amounts,
row_number() over w as r
from my_table window w as (partition group_id, day
order by amount desc)) as x
where x.r<=10 group by x.group_id,x.day
It also occurred to me that I could write a much more straightforward query:
select a.day, a.group_id, array(select amount
from my_table
where day=a.day and group_id=a.group_id
order by amount desc limit 10)
from my_table as a group by a.day, a.group_id
Which does exactly what I want. This led me to the question: assuming I can tweak the first example to get what I want, which query would be faster? Is the subquery slower than the partitions ?
You probably should use an analytic function.
Dont know why you also have MAX, MIN outside the subquery. Your querys doesnt seem to be equivalents.
Your request of top 10 by group should be:
WITH ranked as (
SELECT group_id,
day,
row_number() OVER
(partition by group_id, day ORDER BY ammount DESC) rn
FROM my_table
)
SELECT group_id,
day,
array_agg(amount) over (partition by group_id, day ORDER BY rn)
FROM ranked
WHERE rn <=10