Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.
Table
Attendance Date
---------------
26 Feb 2018
19 Dec 2018
18 Dec 2018
14 Dec 2018
12 June 2018
7 Dec 2018
5 Feb 2018
Query
select distinct
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t1.l_time,'HH12:mi AM')]::text), ',')
from
(select (al1.create_time AT TIME ZONE 'UTC+5:30')::time as l_time
from users.access_log as al1
where al1.user_id = al.user_id
and al1.login_status = 1
and al1.create_time::date = al.create_time::date
order by al1.create_time::time ASC
) as t1
) as login_time,
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t2.o_time,'HH:mi AM')]::text), ',')
from
(select (al2.create_time AT TIME ZONE 'UTC+5:30')::time as o_time
from users.access_log as al2
where al2.user_id = al.user_id
and al2.login_status = 0
and al2.create_time::date = al.create_time::date
order by al2.create_time::time ASC
) as t2
) as logout_time,
al.create_time::date
from users.access_log as al
where al.user_id = ?;
Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.
Related
I have a table with the columns
datefield area
I want to calculate sum of area per year and a percentage column
year sum percentage
2022 5 12
2023 10 24
2024 6 15
[null] 20 49
(I have many more years in the table which I want to include)
WITH total as(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea
from thetable
group by extract(YEAR from "datefield")
)
select total.theyear, total.totalareal,
totalarea/(SUM(totalarea) OVER (PARTITION BY theyear))*100
from total
I get correct sum, but all the percentages are 100..
What am I doing wrong?
Some sample data:
2019 7.05
2020 4.77
2020 3.56
2021 1.64
2021 8.37
2021 3.51
2021 1.43
2021 9.94
2022 1.91
2022 5.3
I would like the result
2019 7.05 15
2020 8.33 18
2021 24.89 52
2022 7.21 15
WITH
total as
(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea,
SUM(sum(area)) OVER() as SUM_totalarea
from thetable
group by extract(YEAR from "datefield")
)
SELECT theyear, totalarea, 100.0 * totalarea / SUM_totalarea AS PERCENTAGE
FROM total
I have a column called anchor which is a timestamp. I have a row with value of jan 30 2020. I want to compare this to feb 29 2020, and it should give me 1 month. Even though its not 30 days, but feb has no more days after 29. I am trying to bill every month.
Here is my sql fiddle - http://sqlfiddle.com/#!17/6906d/2
create table subscription (
id serial,
anchor timestamp
);
insert into subscription (anchor) values
('2020-01-30T00:00:00.0Z'),
('2019-01-30T00:00:00.0Z');
select id,
anchor,
AGE('2020-02-29T00:00:00.0Z', anchor) as "monthsToFeb29-2020",
AGE('2019-02-28T00:00:00.0Z', anchor) as "monthsToFeb28-2019"
from subscription;
Is it possible to get age in the way I am speaking?
My expected results:
For age from jan 30 2020 to feb 29 2020 i expect 1.0 month
For age from jan 30 2020 to feb 28 2019 i expect -11.0 month
For age from jan 30 2019 to feb 29 2020 i expect 13.0 month
For age from jan 30 2019 to feb 28 2019 i expect 1.0 month
(this is how momentjs library does it for those node/js guys out there):
const moment = require('moment');
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 29 2020', 'MMM DD YYYY'), 'months', true) === -13.0
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 28 2019', 'MMM DD YYYY'), 'months', true) === -1.0
How about:
select round(('2/29/2020'::date - '1/30/2020'::date) / 30.0);
round
-------
1
select round(('02/28/2019'::date - '1/30/2020'::date ) / 30.0);
round
-------
-11
select round(('2/29/2020'::date - '1/30/2019'::date) / 30.0);
round
-------
13
select round(('2/28/2019'::date - '01/30/2019'::date) / 30.0);
round
-------
1
The date subtraction gives you a integer value of days, then you divide by a 30 day month and round to nearest integer. You could put this in a function and use that.
Based on this answer by Burak Arslan
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
Is there a way to get months that have no results to show in the query?
So let's say I have :
id transDate Product Qty
1234 04/12/2019 ABCD 2
1245 04/05/2019 ABCD 1
1231 02/07/2019 ABCD 6
I also need to the the third Month returns with a 0 value
MonthYear totalQty
02/2019 6
03/2019 0
04/2019 3
Thanks,
---- UPDATE ---
Here is the final query that that gets last 24 months from the current date. with year and month ready for any charts.
Thanks to #a_horse_with_no_name
SELECT
--ONLY USE THE NEXT LINE IF YOU NEED TO HAVE THE ID IN YOUR RESULT
CASE WHEN t."ItemId" IS NULL THEN 10607 ELSE t."ItemId" END AS "ItemId",
TO_CHAR(y."transactionDate", 'yyyy-mm-dd') AS txn_month,
TO_CHAR(y."transactionDate", 'yyyy') AS "Year",
TO_CHAR(y."transactionDate", 'Mon') AS "Month",
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(
TO_CHAR(CURRENT_DATE - INTERVAL '24 month', 'yyyy-mm-01')::date ,
TO_CHAR(CURRENT_DATE, 'yyyy-mm-01')::date,
INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10607
GROUP BY txn_month, "Year", "Month", t."ItemId"
ORDER BY txn_month ASC;
EXEMPLE OUTPUT
ItemId txn_month Year Month TotalSold
10607 2018-03-01 2018 Mar 2
10607 2018-04-01 2018 Apr 0
10607 2018-05-01 2018 May 8
10607 2018-06-01 2018 Jun 12
10607 2018-07-01 2018 Jul 6
10607 2018-08-01 2018 Aug 4
10607 2018-09-01 2018 Sep 6
10607 2018-10-01 2018 Oct 8
10607 2018-11-01 2018 Nov 4
10607 2018-12-01 2018 Dec 0
10607 2019-01-01 2019 Jan 2
10607 2019-02-01 2019 Feb 3
10607 2019-03-01 2019 Mar 4
10607 2019-04-01 2019 Apr 1
10607 2019-05-01 2019 May 4
10607 2019-06-01 2019 Jun 3
10607 2019-07-01 2019 Jul 5
10607 2019-08-01 2019 Aug 6
10607 2019-09-01 2019 Sep 6
10607 2019-10-01 2019 Oct 6
10607 2019-11-01 2019 Nov 3
10607 2019-12-01 2019 Dec 0
10607 2020-01-01 2020 Jan 4
10607 2020-02-01 2020 Feb 2
10607 2020-03-01 2020 Mar 0
Left join to a list of months:
SELECT t.txn_month,
coalesce(sum(yt.amount),0) as monthly_sum
FROM generate_series(date '2019-02-01', date '2019-04-01', interval '1 month') as t(txn_month)
left join yourtable yt on date_trunc('month', yt.transdate) = t.txn_month
GROUP BY t.txn_month
Online example
In your actual query you need to move the conditions from the WHERE clause to the JOIN condition. Putting them into the WHERE clause turns the outer join back into an inner join:
SELECT t."ItemId",
y."transactionDate" AS txn_month,
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(date '2018-01-01', date '2020-04-01', INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10606
-- this WHERE clause isn't really needed because of the date values provided to generate_series()
WHERE AND y."transactionDate" >= NOW() - INTERVAL '2 year'
GROUP BY txn_month, t."ItemId"
ORDER BY txn_month DESC;
I've been trying to calculate 7 Day Return Rate (also known as Classic Retention Rate, as described here: https://www.braze.com/blog/calculate-retention-rate/) and then taking a 30 day average to reduce noise in Postgresql.
However, I'm sure I'm doing something wrong. First of all, the numbers look waaay higher than intuitively I feel they should be (generally around 5% for the rest of the sector). Also, I believe the first 7 days should show 0, as theoretically users should take at least 7 days to count as a "return". However, I get around 40-70%, as shown below.
Would someone mind taking a look at the code below and seeing if there are any errors? 7 Day Return Rate is a really common metric for apps, and I haven't found any questions using postgresql that calculate it to this level of sophistication on Stack Exchange (or even the rest of the web), so I feel like a solid response could be very useful to a lot of people.
Sample data
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate
The problem is a bit easier than your query makes it seem, you can actually do it with just one subquery and one out query as follows:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) #> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
Note that this weights each day equally rather than each user equally across days. If you want to weight the average by user across days then you can update the outer calculation using more aggregates and windows to compute the value with weightings.
Reference: http://sqlfiddle.com/#!17/ee17e/1/0
If you don't have access to array_agg (but have access to window functions) you can use:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
I have a table which contains some records ordered by date.
And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).
Example:
create table tbl (id int, date timestamp without time zone,
position int);
insert into tbl values
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)
Of course if I simply group by position I will get wrong result as positions could be the same for different groups:
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION
I will get:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
But I want:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.
I'm using PostgreSQL 9.2
There is probably more elegant solution but try this:
WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position
THEN position
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position
THEN lag(grouping_col,1) OVER(ORDER BY id)
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position
There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:
The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.
So you have a series such as:
rownum datediff diff
1 1 0 ^
2 2 0 | first group
3 3 0 v
4 5 1 ^
5 6 1 | second group
6 7 1 v
7 9 2 ^
8 10 2 v third group