Groupby year, calculate sum and percentage per year - postgresql

I have a table with the columns
datefield area
I want to calculate sum of area per year and a percentage column
year sum percentage
2022 5 12
2023 10 24
2024 6 15
[null] 20 49
(I have many more years in the table which I want to include)
WITH total as(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea
from thetable
group by extract(YEAR from "datefield")
)
select total.theyear, total.totalareal,
totalarea/(SUM(totalarea) OVER (PARTITION BY theyear))*100
from total
I get correct sum, but all the percentages are 100..
What am I doing wrong?
Some sample data:
2019 7.05
2020 4.77
2020 3.56
2021 1.64
2021 8.37
2021 3.51
2021 1.43
2021 9.94
2022 1.91
2022 5.3
I would like the result
2019 7.05 15
2020 8.33 18
2021 24.89 52
2022 7.21 15

WITH
total as
(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea,
SUM(sum(area)) OVER() as SUM_totalarea
from thetable
group by extract(YEAR from "datefield")
)
SELECT theyear, totalarea, 100.0 * totalarea / SUM_totalarea AS PERCENTAGE
FROM total

Related

Billing cycle, get a date every month (no such Feb 30)

I have a column called anchor which is a timestamp. I have a row with value of jan 30 2020. I want to compare this to feb 29 2020, and it should give me 1 month. Even though its not 30 days, but feb has no more days after 29. I am trying to bill every month.
Here is my sql fiddle - http://sqlfiddle.com/#!17/6906d/2
create table subscription (
id serial,
anchor timestamp
);
insert into subscription (anchor) values
('2020-01-30T00:00:00.0Z'),
('2019-01-30T00:00:00.0Z');
select id,
anchor,
AGE('2020-02-29T00:00:00.0Z', anchor) as "monthsToFeb29-2020",
AGE('2019-02-28T00:00:00.0Z', anchor) as "monthsToFeb28-2019"
from subscription;
Is it possible to get age in the way I am speaking?
My expected results:
For age from jan 30 2020 to feb 29 2020 i expect 1.0 month
For age from jan 30 2020 to feb 28 2019 i expect -11.0 month
For age from jan 30 2019 to feb 29 2020 i expect 13.0 month
For age from jan 30 2019 to feb 28 2019 i expect 1.0 month
(this is how momentjs library does it for those node/js guys out there):
const moment = require('moment');
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 29 2020', 'MMM DD YYYY'), 'months', true) === -13.0
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 28 2019', 'MMM DD YYYY'), 'months', true) === -1.0
How about:
select round(('2/29/2020'::date - '1/30/2020'::date) / 30.0);
round
-------
1
select round(('02/28/2019'::date - '1/30/2020'::date ) / 30.0);
round
-------
-11
select round(('2/29/2020'::date - '1/30/2019'::date) / 30.0);
round
-------
13
select round(('2/28/2019'::date - '01/30/2019'::date) / 30.0);
round
-------
1
The date subtraction gives you a integer value of days, then you divide by a 30 day month and round to nearest integer. You could put this in a function and use that.

Group query results by month and year in postgresql with emply month sum

Based on this answer by Burak Arslan
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
Is there a way to get months that have no results to show in the query?
So let's say I have :
id transDate Product Qty
1234 04/12/2019 ABCD 2
1245 04/05/2019 ABCD 1
1231 02/07/2019 ABCD 6
I also need to the the third Month returns with a 0 value
MonthYear totalQty
02/2019 6
03/2019 0
04/2019 3
Thanks,
---- UPDATE ---
Here is the final query that that gets last 24 months from the current date. with year and month ready for any charts.
Thanks to #a_horse_with_no_name
SELECT
--ONLY USE THE NEXT LINE IF YOU NEED TO HAVE THE ID IN YOUR RESULT
CASE WHEN t."ItemId" IS NULL THEN 10607 ELSE t."ItemId" END AS "ItemId",
TO_CHAR(y."transactionDate", 'yyyy-mm-dd') AS txn_month,
TO_CHAR(y."transactionDate", 'yyyy') AS "Year",
TO_CHAR(y."transactionDate", 'Mon') AS "Month",
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(
TO_CHAR(CURRENT_DATE - INTERVAL '24 month', 'yyyy-mm-01')::date ,
TO_CHAR(CURRENT_DATE, 'yyyy-mm-01')::date,
INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10607
GROUP BY txn_month, "Year", "Month", t."ItemId"
ORDER BY txn_month ASC;
EXEMPLE OUTPUT
ItemId txn_month Year Month TotalSold
10607 2018-03-01 2018 Mar 2
10607 2018-04-01 2018 Apr 0
10607 2018-05-01 2018 May 8
10607 2018-06-01 2018 Jun 12
10607 2018-07-01 2018 Jul 6
10607 2018-08-01 2018 Aug 4
10607 2018-09-01 2018 Sep 6
10607 2018-10-01 2018 Oct 8
10607 2018-11-01 2018 Nov 4
10607 2018-12-01 2018 Dec 0
10607 2019-01-01 2019 Jan 2
10607 2019-02-01 2019 Feb 3
10607 2019-03-01 2019 Mar 4
10607 2019-04-01 2019 Apr 1
10607 2019-05-01 2019 May 4
10607 2019-06-01 2019 Jun 3
10607 2019-07-01 2019 Jul 5
10607 2019-08-01 2019 Aug 6
10607 2019-09-01 2019 Sep 6
10607 2019-10-01 2019 Oct 6
10607 2019-11-01 2019 Nov 3
10607 2019-12-01 2019 Dec 0
10607 2020-01-01 2020 Jan 4
10607 2020-02-01 2020 Feb 2
10607 2020-03-01 2020 Mar 0
Left join to a list of months:
SELECT t.txn_month,
coalesce(sum(yt.amount),0) as monthly_sum
FROM generate_series(date '2019-02-01', date '2019-04-01', interval '1 month') as t(txn_month)
left join yourtable yt on date_trunc('month', yt.transdate) = t.txn_month
GROUP BY t.txn_month
Online example
In your actual query you need to move the conditions from the WHERE clause to the JOIN condition. Putting them into the WHERE clause turns the outer join back into an inner join:
SELECT t."ItemId",
y."transactionDate" AS txn_month,
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(date '2018-01-01', date '2020-04-01', INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10606
-- this WHERE clause isn't really needed because of the date values provided to generate_series()
WHERE AND y."transactionDate" >= NOW() - INTERVAL '2 year'
GROUP BY txn_month, t."ItemId"
ORDER BY txn_month DESC;

PostgreSQL - How can I SUM until a certain hour of the day?

I'm trying to create a metric for a PostgreSQL integrated dashboard which would show today's "Total Payment Value" (TPV) of a certain product, as well as yesterday's TPV of the same product, up until the same moment as today, so if I'm accessing the dashboard at 5 pm, it will show what it was yesterday until 5 pm and today's TPV.
edit: My question wasn't very clear so I'm adding a few more lines and editing the query, which had a mistake.
I tried this:
select
sum(case when table.product in (13,14,15,16) then amount else 0 end) as "TPV"
,date_trunc('day', table.date) as "Day"
from table
where
date > current_date - 1
group by date_trunc('day', table.date)
order by 2,1
I only want to sum the amount when product = 13, 14, 15 or 16
An example of the product, date and amount would be like this:
product amount date
8 4750 19/03/2019 00:21
14 7840 12/04/2019 22:40
14 15000 22/03/2019 18:27
14 11715 19/03/2019 00:12
14 1054 22/03/2019 18:22
14 18491 17/03/2019 14:28
14 12253 17/03/2019 14:30
14 27600 17/03/2019 14:32
14 3936 17/03/2019 14:28
14 19007 19/03/2019 00:14
8 9400 19/03/2019 00:21
8 4750 19/03/2019 00:21
8 25000 19/03/2019 00:17
14 10346 22/03/2019 18:23
I would like to have a metric that always calculates the sum of the product value today up until the current moment - when the "product" corresponds to values 13, 14, 15 or 16 - as well as the same metric for yesterday, e.g., it's 1 PM now, I want today's TPV until 1 PM and yesterday's TPV until 1 PM as well!

How to sort attendance date along with the month?

Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.
Table
Attendance Date
---------------
26 Feb 2018
19 Dec 2018
18 Dec 2018
14 Dec 2018
12 June 2018
7 Dec 2018
5 Feb 2018
Query
select distinct
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t1.l_time,'HH12:mi AM')]::text), ',')
from
(select (al1.create_time AT TIME ZONE 'UTC+5:30')::time as l_time
from users.access_log as al1
where al1.user_id = al.user_id
and al1.login_status = 1
and al1.create_time::date = al.create_time::date
order by al1.create_time::time ASC
) as t1
) as login_time,
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t2.o_time,'HH:mi AM')]::text), ',')
from
(select (al2.create_time AT TIME ZONE 'UTC+5:30')::time as o_time
from users.access_log as al2
where al2.user_id = al.user_id
and al2.login_status = 0
and al2.create_time::date = al.create_time::date
order by al2.create_time::time ASC
) as t2
) as logout_time,
al.create_time::date
from users.access_log as al
where al.user_id = ?;
Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.

PostgreSQL - GROUP subsequent rows

I have a table which contains some records ordered by date.
And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).
Example:
create table tbl (id int, date timestamp without time zone,
position int);
insert into tbl values
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)
Of course if I simply group by position I will get wrong result as positions could be the same for different groups:
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION
I will get:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
But I want:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.
I'm using PostgreSQL 9.2
There is probably more elegant solution but try this:
WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position
THEN position
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position
THEN lag(grouping_col,1) OVER(ORDER BY id)
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position
There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:
The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.
So you have a series such as:
rownum datediff diff
1 1 0 ^
2 2 0 | first group
3 3 0 v
4 5 1 ^
5 6 1 | second group
6 7 1 v
7 9 2 ^
8 10 2 v third group