Getting the last value per calendar month in Postgres - postgresql

I have a daily loan schedule in a Postgresql database that looks as follows:
date | interest | closing_balance
1 Jan 21 | 100 | 30000
2 Jan 21 | 99 | 29910
....
31 Jan 21 | 98 | 28000
1 Feb 21
2 Feb 21
...
28 Feb 21 | 90 | 27000
I want to sum the interest column per month and then get the last value for each month for the closing_balance column.
The following seems to work to get the summed up value of the interest column per month:
SELECT date_trunc('month', "my_table"."date") AS my_month,
SUM("my_table"."interest") AS "interest_sum"
FROM "my_table"
GROUP BY my_month
ORDER BY my_month
I'm struggling to get the closing balance for each month. The above example should return 2 rows for Jan and Feb with 28000 and 27000 respectively. How should I update the query to calculate this?

You need a window function which will select the last row inside the window associated to each row resulting from GROUP BY my_month once the rows have been ordered by date inside the window. See the manual for more explaination : 3.5. Window Functions, 4.2.8. Window Function Calls, 9.22. Window Functions
Try this :
SELECT date_trunc('month', "my_table"."date") AS my_month
, SUM("my_table"."interest") AS "interest_sum"
, last_value("my_table"."closing_balance") OVER (ORDER BY date) AS last_closing_balance
FROM "my_table"
GROUP BY my_month
ORDER BY my_month

Related

How to split and aggregate days into different month

db fiddle
run select *, return_date - pickup_date as total from order_history order by id; return the following result:
id pickup_date return_date date_ranges total
1 2020-03-01 2020-03-12 [2020-03-01,2020-04-01) 11
2 2020-03-01 2020-03-22 [2020-03-01,2020-04-01) 21
3 2020-03-11 2020-03-22 [2020-03-01,2020-04-01) 11
4 2020-02-11 2020-03-22 [2020-02-01,2020-03-01) 40
5 2020-01-01 2020-01-22 [2020-01-01,2020-02-01) 21
6 2020-01-01 2020-04-22 [2020-01-01,2020-02-01) 112
for example:
--id=6. total = 112. 112 = 22+ 31 + 29 + 30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.
but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')
step-by-step demo: db<>fiddle
Not quite perfect, but a sketch:
SELECT
id,
ARRAY_AGG( -- 4
LEAST(return_date, gs + interval '1 month - 1 day') -- 2
- GREATEST(pickup_date, gs) -- 3
+ interval '1 day'
)
FROM order_history,
generate_series( -- 1
date_trunc('month', pickup_date),
date_trunc('month', return_date),
interval '1 month'
) gs
GROUP BY id
Generate a set of months that are included in the given date range
a) Calculate the last day of the month (first of a month + 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
Same for pickup day. Afterwards calculate the difference of the days kept in one month.
Aggregate the values for one month.
Open questions / Potential enhancements:
You said:
jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?
If you don't want to count the very first day, then you can change (3.) to
GREATEST(pickup_date + interval '1 day', gs)
There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.

How to get last value with condition in postgreSQL?

I have a table in postgres with three columns, one with a group, one with a date and the last with a value.
grp
mydate
value
A
2021-01-27
5
A
2021-01-23
10
A
2021-01-15
15
B
2021-01-26
7
B
2021-01-24
12
B
2021-01-15
17
I would like to create a view with a sequence of dates and the most recent value on table for each date according with group.
grp
mydate
value
A
2021-01-27
5
A
2021-01-26
10
A
2021-01-25
10
A
2021-01-24
10
A
2021-01-23
10
A
2021-01-22
15
A
2021-01-21
15
A
2021-01-20
15
A
2021-01-19
15
A
2021-01-18
15
A
2021-01-17
15
A
2021-01-16
15
A
2021-01-15
15
B
2021-01-27
7
B
2021-01-26
7
B
2021-01-25
12
B
2021-01-24
12
B
2021-01-23
17
B
2021-01-22
17
B
2021-01-21
17
B
2021-01-20
17
B
2021-01-19
17
B
2021-01-18
17
B
2021-01-17
17
B
2021-01-16
17
B
2021-01-15
17
SQL code to generate the table:
CREATE TABLE foo (
grp char(1),
mydate date,
value integer);
INSERT INTO foo VALUES
('A', '2021-01-27', 5),
('A', '2021-01-23', 10),
('A', '2021-01-15', 15),
('B', '2021-01-26', 7),
('B', '2021-01-24', 12),
('B', '2021-01-15', 17)
I have so far managed to generate a visualization with the sequence of dates joined with the distinct groups, but I am failing to get the most recent value.
SELECT DISTINCT(foo.grp), (date_trunc('day'::text, dd.dd))::date AS mydate
FROM foo, generate_series((( SELECT min(foo.mydate) AS min
FROM foo))::timestamp without time zone, (now())::timestamp without time zone, '1 day'::interval) dd(dd)
step-by-step demo:db<>fiddle
SELECT
grp,
gs::date as mydate,
value
FROM (
SELECT
*,
COALESCE( -- 2
lead(mydate) OVER (PARTITION BY grp ORDER BY mydate) - 1, -- 1
mydate
) as prev_date
FROM foo
) s,
generate_series(mydate, prev_date, interval '-1 day') as gs -- 3
ORDER BY grp, mydate DESC -- 4
lead() window function shifts the next value of an ordered group (= partition) into the current one. The group is already defined, the order is the date. This can be used to create the required date range. Since you don't want to have the last date twice (as end of the first range and beginning of the next one) the end date stops - 1 (one day before the next group starts)
This is for the very last records of the groups: They don't have a following record, so lead() yield NULL. To avoid this, COALESCE() sets them to the current record.
Now, you can create a date range with the current and the next date value using generate_series().
Finally you can generate the required order

How to group rows without using GROUP BY clause

Let's say I have simple table:
Date Price
-----------------------
2012-01-05 23
2015-04-08 145
2016-03-09 12
2015-09-09 87
2000-01-15 23
2016-01-15 89
2016-07-12 23
2012-04-08 65
I want to group this rows by year but without using GROUP BY clause. It would be good if I could add another column that would contain year or character that would indicate group, like this:
Date Price Group
-------------------------------
2012-01-05 23 1
2015-04-08 145 2
2016-03-09 12 3
2015-09-09 87 2
2000-01-15 23 4
2016-01-15 89 3
2016-07-12 23 3
2012-04-08 65 1
I tried use over() clause but to be honest I don't know which function use with over().
Combination of extract year from date and dense_rank will do the trick
select *,
dense_rank () OVER(order by extract(year from Date))
from YOURTABLE
Try to do the CASE if you only want to add another column
SELECT DATE,
PRICE,
CASE DATE_PART('YEAR', DATE) WHEN 2015 THEN 1
WHEN 2016 THEN 2 ... END
FROM MYTABLE
But if you want to get the aggregate of something then you do OVER() or GROUP BY

Convert day of year (from extract) back to a date

I am trying to group data by the day of the year that it falls on. I have been able to achieve this with the code below. The issue is that I lose the information as to which day (i.e. Jan 1st, Jan 2nd etc) each grouping represents. I am simply left with a number (e.g. 1, 2 etc.) representing the day of the year. Is there any to convert this number back into the more descriptive date? Thanks a lot.
CREATE TABLE tmp2 AS
SELECT extract(doy from trd_exctn_dt) as day_of_year
,sum(dollar_vol) AS dollar_vol
FROM tmp
GROUP BY extract(doy from trd_exctn_dt);
Current Output:
day_of_year | dollar_vol
------------|------------
1 10
2 15
3 7
Desired Output: N.b. The exact format of the first column doesn't matter too much. I would be happy with DD/MM, MM/DD or any other clear output.
day_of_year | dollar_vol
------------|------------
Jan 1 | 10
Jan 2 | 15
Jan 3 | 7
Using the to_char fucntion:
SELECT to_char(trd_exctn_dt,'MM/DD') as day_of_year ,sum(dollar_vol) AS dollar_vol
FROM tmp
GROUP BY day_of_year ;

Creating sequence of dates and inserting each date into query

I need to find certain data within first day of current month to the last day of current month.
select count(*) from q_aggr_data as a
where a.filial_='fil1'
and a.operator_ like 'unit%'
and date_trunc('day',a.s_end_)='"+ date_to_search+ "'
group by a.s_name_,date_trunc('day',a.s_end_)
date_to_searh here is 01.09.2014,02.09.2014, 03.09.2014,...,30.09.2014
I've tried to loop through i=0...30 and make 30 queries, but that takes too long and extremely naive. Also to the days where there is no entry it should return 0. I've seen how to generate date sequences, but can't get my head around on how to inject those days one by one into the query
By creating not only a series, but a set of 1 day ranges, any timestamp data can be joined to the range using >= with <
Note in particular that this approach avoids functions on the data (such as truncating to date) and because of this it permits the use indexes to assist query performance.
If some data looked like this:
CREATE TABLE my_data
("data_dt" timestamp)
;
INSERT INTO my_data
("data_dt")
VALUES
('2014-09-01 08:24:00'),
('2014-09-01 22:48:00'),
('2014-09-02 13:12:00'),
('2014-09-03 03:36:00'),
('2014-09-03 18:00:00'),
Then that can be joined, using an outer join so unmatched ranges are still reported to a generated set of ranges (dt_start & dt_end pairs)
SELECT
r.dt_start
, count(d.data_dt)
FROM (
SELECT
dt_start
, dt_start + INTERVAL '1 Day' dt_end
FROM
generate_series('2014-09-01 00:00'::timestamp,
'2014-09-30 00:00', '1 Day') AS dt_start
) AS r
LEFT OUTER JOIN my_data d ON d.data_dt >= r.dt_start
AND d.data_dt < r.dt_end
GROUP BY
r.dt_start
ORDER BY
r.dt_start
;
and a result such as this is produced:
| DT_START | COUNT |
|----------------------------------|-------|
| September, 01 2014 00:00:00+0000 | 2 |
| September, 02 2014 00:00:00+0000 | 1 |
| September, 03 2014 00:00:00+0000 | 2 |
| September, 04 2014 00:00:00+0000 | 2 |
...
| September, 29 2014 00:00:00+0000 | 0 |
| September, 30 2014 00:00:00+0000 | 0 |
See this SQLFiddle demo
One way to solve this problem is to group by truncated date.
select count(*)
from q_aggr_data as a
where a.filial_='fil1'
and a.operator_ like 'unit%'
group by date_trunc('day',a.s_end_), a.s_name_;
The other way is to use a window function, for getting the count over truncated date for example.
Please check if this query satisfies your requirements:
select sum(matched) -- include s_name_, s_end_ if you want to verify the results
from
(select a.filial_
, a.operator_
, a.s_name_
, generate_series s_end_
, (case when a.filial_ = 'fil1' then 1 else 0 end) as matched
from q_aggr_data as a
right join generate_series('2014-09-01', '2014-09-30', interval '1 day')
on a.s_end_ = generate_series
and a.filial_ = 'fil1'
and a.operator_ like 'unit%') aa
group by s_name_, s_end_
order by s_end_, s_name_
http://sqlfiddle.com/#!15/e8edf/3