How to group rows without using GROUP BY clause - postgresql

Let's say I have simple table:
Date Price
-----------------------
2012-01-05 23
2015-04-08 145
2016-03-09 12
2015-09-09 87
2000-01-15 23
2016-01-15 89
2016-07-12 23
2012-04-08 65
I want to group this rows by year but without using GROUP BY clause. It would be good if I could add another column that would contain year or character that would indicate group, like this:
Date Price Group
-------------------------------
2012-01-05 23 1
2015-04-08 145 2
2016-03-09 12 3
2015-09-09 87 2
2000-01-15 23 4
2016-01-15 89 3
2016-07-12 23 3
2012-04-08 65 1
I tried use over() clause but to be honest I don't know which function use with over().

Combination of extract year from date and dense_rank will do the trick
select *,
dense_rank () OVER(order by extract(year from Date))
from YOURTABLE

Try to do the CASE if you only want to add another column
SELECT DATE,
PRICE,
CASE DATE_PART('YEAR', DATE) WHEN 2015 THEN 1
WHEN 2016 THEN 2 ... END
FROM MYTABLE
But if you want to get the aggregate of something then you do OVER() or GROUP BY

Related

Getting the last value per calendar month in Postgres

I have a daily loan schedule in a Postgresql database that looks as follows:
date | interest | closing_balance
1 Jan 21 | 100 | 30000
2 Jan 21 | 99 | 29910
....
31 Jan 21 | 98 | 28000
1 Feb 21
2 Feb 21
...
28 Feb 21 | 90 | 27000
I want to sum the interest column per month and then get the last value for each month for the closing_balance column.
The following seems to work to get the summed up value of the interest column per month:
SELECT date_trunc('month', "my_table"."date") AS my_month,
SUM("my_table"."interest") AS "interest_sum"
FROM "my_table"
GROUP BY my_month
ORDER BY my_month
I'm struggling to get the closing balance for each month. The above example should return 2 rows for Jan and Feb with 28000 and 27000 respectively. How should I update the query to calculate this?
You need a window function which will select the last row inside the window associated to each row resulting from GROUP BY my_month once the rows have been ordered by date inside the window. See the manual for more explaination : 3.5. Window Functions, 4.2.8. Window Function Calls, 9.22. Window Functions
Try this :
SELECT date_trunc('month', "my_table"."date") AS my_month
, SUM("my_table"."interest") AS "interest_sum"
, last_value("my_table"."closing_balance") OVER (ORDER BY date) AS last_closing_balance
FROM "my_table"
GROUP BY my_month
ORDER BY my_month

How to get last value with condition in postgreSQL?

I have a table in postgres with three columns, one with a group, one with a date and the last with a value.
grp
mydate
value
A
2021-01-27
5
A
2021-01-23
10
A
2021-01-15
15
B
2021-01-26
7
B
2021-01-24
12
B
2021-01-15
17
I would like to create a view with a sequence of dates and the most recent value on table for each date according with group.
grp
mydate
value
A
2021-01-27
5
A
2021-01-26
10
A
2021-01-25
10
A
2021-01-24
10
A
2021-01-23
10
A
2021-01-22
15
A
2021-01-21
15
A
2021-01-20
15
A
2021-01-19
15
A
2021-01-18
15
A
2021-01-17
15
A
2021-01-16
15
A
2021-01-15
15
B
2021-01-27
7
B
2021-01-26
7
B
2021-01-25
12
B
2021-01-24
12
B
2021-01-23
17
B
2021-01-22
17
B
2021-01-21
17
B
2021-01-20
17
B
2021-01-19
17
B
2021-01-18
17
B
2021-01-17
17
B
2021-01-16
17
B
2021-01-15
17
SQL code to generate the table:
CREATE TABLE foo (
grp char(1),
mydate date,
value integer);
INSERT INTO foo VALUES
('A', '2021-01-27', 5),
('A', '2021-01-23', 10),
('A', '2021-01-15', 15),
('B', '2021-01-26', 7),
('B', '2021-01-24', 12),
('B', '2021-01-15', 17)
I have so far managed to generate a visualization with the sequence of dates joined with the distinct groups, but I am failing to get the most recent value.
SELECT DISTINCT(foo.grp), (date_trunc('day'::text, dd.dd))::date AS mydate
FROM foo, generate_series((( SELECT min(foo.mydate) AS min
FROM foo))::timestamp without time zone, (now())::timestamp without time zone, '1 day'::interval) dd(dd)
step-by-step demo:db<>fiddle
SELECT
grp,
gs::date as mydate,
value
FROM (
SELECT
*,
COALESCE( -- 2
lead(mydate) OVER (PARTITION BY grp ORDER BY mydate) - 1, -- 1
mydate
) as prev_date
FROM foo
) s,
generate_series(mydate, prev_date, interval '-1 day') as gs -- 3
ORDER BY grp, mydate DESC -- 4
lead() window function shifts the next value of an ordered group (= partition) into the current one. The group is already defined, the order is the date. This can be used to create the required date range. Since you don't want to have the last date twice (as end of the first range and beginning of the next one) the end date stops - 1 (one day before the next group starts)
This is for the very last records of the groups: They don't have a following record, so lead() yield NULL. To avoid this, COALESCE() sets them to the current record.
Now, you can create a date range with the current and the next date value using generate_series().
Finally you can generate the required order

PostgreSQL: first date cumulative score

I have this sample table:
id date score
11 1/1/2017 14:32 25.34
4 1/2/2017 12:14 34.34
25 1/2/2017 18:08 37.15
4 3/2/2017 23:42 47.24
4 4/2/2017 23:42 54.12
25 7/3/2017 22:07 65.21
11 9/3/2017 21:02 74.6
25 10/3/2017 5:15 11.3
4 10/3/2017 7:11 22.45
My aim is to calculates the first(!) date (YYYY-MM-DD) on which an id's cumulative score has reached 100 (>=). For that, I've written the following code:
SELECT date(date),id, score,
sum(score) over (partition by id order by date(date) rows unbounded preceding) as cumulative_score
FROM test_q1
GROUP BY id, date, score
Order by id, date
It returns:
date id score cumulative_score
1/1/2017 11 25.34 25.34
9/3/2017 11 74.6 99.94
1/2/2017 4 34.34 34.34
3/2/2017 4 47.24 81.58
4/2/2017 4 54.12 135.7
10/3/2017 4 22.45 158.15
1/2/2017 25 37.15 37.15
7/3/2017 25 65.21 102.36
10/3/2017 25 11.3 113.66
I tried to add either WHERE cumulative_score >= 100 or HAVING cumulative score >= 100, but it returns_
ERROR: column "cumulative_score" does not exist
LINE 4: WHERE cumulative_score >= 100
^
SQL state: 42703
Character: 206
Anyone knows how to solve this?
Thanks
What I expect is:
date id score cumulative_score
4/2/2017 4 54.12 135.7
7/3/2017 25 65.21 102.36
And the output just id and date.
Try this:
with cumulative_sum AS (
SELECT id,date,sum(score) over( partition by id order by date) as sum from test_q1
),
above_100_score_rank AS (
SELECT *, rank() over (partition by id order by sum) AS rank
FROM cumulative_sum where sum > 100
)
SELECT * FROM above_100_score_rank WHERE rank= 1;

PostgreSQL - filter function for dates

I am trying to use the built-in filter function in PostgreSQL to filter for a date range in order to sum only entries falling within this time-frame.
I cannot understand why the filter isn't being applied.
I am trying to filter for all product transactions that have a created_at date of the previous month (so in this case that were created in June 2017).
SELECT pt.created_at::date, pt.customer_id,
sum(pt.amount/100::double precision) filter (where (date_part('month', pt.created_at) =date_part('month', NOW() - interval '1 month') and
date_part('year', pt.created_at) = date_part('year', NOW()) ))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date,pt.customer_id
Please find my expected results (sum of the amount for each day in the previous month - for each customer_id if an entry for that day exists) and the actual results I get from the query - below (using date_trunc).
Expected results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
Results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
2017-05-30 XX YYY
2017-05-25 XX YYY
2017-05-15 XX YYY
2017-04-30 XX YYY
2017-03-02 XX YYY
2016-11-02 XX YYY
The actual results give me the sum for all dates in the database, so no date time-frame is being applied in the query for a reason I cannot understand. I'm seeing dates that are both not for June 2017 and also from previous years.
Use date_trunc(..) function:
SELECT pt.created_at::date, pt.customer_id, c.name,
sum(pt.amount/100::double precision) filter (where date_trunc('month', pt.created_at) = date_trunc('month', NOW() - interval '1 month'))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date

How do I add totals/subtotals to a set of results without grouping the row data?

I'm constructing a SQL query for a business report. I need to have both subtotals (grouped by file number) and grand totals on the report.
I'm entering unknown SQL territory, so this is a bit of a first attempt. The query I made is almost working. The only problem is that the entries are being grouped -- I need them separated in the report.
Here is my sample data:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3B Mar 28/10 1 3
3B Mar 28/10 5 10
When I run this query
SELECT
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM SubtotalTesting
GROUP BY FileNumber, Date WITH ROLLUP
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
Date
I get the following:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 6 13 <--
3B NULL 6 13
NULL NULL 17 38
What I want is:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 1 3 <--
3B Mar 28/10 5 10 <--
3B NULL 6 13
NULL NULL 17 38
I can clearly see why the entries are being grouped, but I have no idea how to separate them while still returning the subtotals and grand total.
I'm a bit green when it comes to doing advanced SQL queries like this, so if I'm taking the wrong approach to the problem by using WITH ROLLUP, please suggest some preferred alternatives -- you don't have to write the whole query for me, I just need some direction. Thanks!
WITH SubtotalTesting (FileNumber, Date, Cost, Charge) AS
(
SELECT '3', CAST('2009-22-12' AS DATETIME), 5, 10
UNION ALL
SELECT '3', '2010-13-06', 6, 15
UNION ALL
SELECT '3B', '2010-28-03', 1, 3
UNION ALL
SELECT '3B', '2010-28-03', 5, 10
),
q AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY filenumber) AS rn
FROM SubTotalTesting
)
SELECT rn,
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM q
GROUP BY
FileNumber, Date, rn WITH ROLLUP
HAVING GROUPING(rn) <= GROUPING(Date)
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END),
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END),
Date