Number of entries between dates - postgresql

I have a table with the following structure: -
day, id
2016-03-13, 123
2016-03-13, 123
2016-03-13, 231
2016-03-14, 231
2016-03-14, 231
2016-03-15, 129
And I'd like to build a table that looks like: -
id, d1, d7, d14
123, 1, 1, 1
231, 1, 2, 2
129, 1, 1, 1
Essentially for a given id, list the number of days which have an entry within a time window. So if id 123 has 10 entries within the last 14 days - d14 would be 10.
So far I have: -
SELECT
day,
id
FROM
events
WHERE
datediff (DAY, day, getdate()) <= 7
GROUP BY
day,
id

This query will do:
SELECT
id,
COUNT(DISTINCT CASE WHEN current_date - day <= 1 THEN 1 END) d1,
COUNT(DISTINCT CASE WHEN current_date - day <= 7 THEN 1 END) d7,
COUNT(DISTINCT CASE WHEN current_date - day <= 14 THEN 1 END) d14
FROM
events
GROUP BY
id
ORDER BY
id
Or, since PostgreSQL 9.4, slightly more concise:
SELECT
id,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 1) d1,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 7) d7,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 14) d14
FROM
events
GROUP BY
id
ORDER BY
id

try this:
SELECT id
, count(case when DAY = getdate() then 1 else null end) as d1
, count(case when DAY + 7 >= getdate() then 1 else null end) as d7
, count(case when DAY + 14 >= getdate() then 1 else null end) as d14
FROM events
WHERE DAY between DAY >= getdate() - 14
--or if you can have day > today ... and DAY between getdate() - 14 and getdate()
GROUP By id

Related

Postgresql : Create a new table based on another table and create a date column using start date and end date

I have a table shown below (sample) and I want to create a new table with an extra column 'NewDate' which will look at StartDate and show the last date of the month for start date and subsequently last date of every month till the end date for each ID and if my ID has End Date as Null the series will stop at the last date of current month which is May 2022.
ID StartDate EndDate
100 1/01/2022 26/04/2022
101 20/04/2022 Null
102 1/01/2022 27/02/2022
....
I am using Postgresql and my Expected Output:
ID StartDate EndDate NewDate
100 1/01/2022 26/04/2022 31/01/2022
100 1/01/2022 26/04/2022 28/02/2022
100 1/01/2022 26/04/2022 31/03/2022
100 1/01/2022 26/04/2022 30/04/2022
101 20/04/2022 Null 30/04/2022
101 20/04/2022 Null 31/05/2022
102 1/01/2022 27/02/2022 31/01/2022
102 1/01/2022 27/02/2022 28/02/2022
...
demo
(
SELECT
id,
start_date,
end_date,
(new_date::date + interval '1 month - 1 day')::date
FROM
test_date,
generate_series((date_trunc('month', start_date)), (date_trunc('month', end_date) + interval '1 month - 1 day'), interval '1 month') g (new_date)
ORDER BY
id)
UNION ALL ((
SELECT
id,
start_date,
end_date,
(date_trunc('month', start_date) + interval '1 month - 1 day')::date
FROM
test_date
WHERE
test_date.end_date IS NULL)
UNION ALL (
SELECT
id,
start_date,
end_date,
(date_trunc('month', (date_trunc('month', start_date) + interval '1 month - 1 day')::date) + interval '2 month - 1 day')::date
FROM
test_date
WHERE
test_date.end_date IS NULL)
ORDER BY
id);
key gotta: How to get the last day of month in postgres?
Maybe there is some simple version, but anyway this way works.

I need help in writing a subquery

I have a query like this to create date series:
Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"
And the table looks like this:
month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Now I want to count the status from the KYC table.
So I try this:
Select
(Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"),
count(*) filter (where status = 4) as "KYC_Success"
From kyc
group by 1
I hope the result will be like this:
Month | KYC_Success
Jan | 234
Feb | 435
Mar | 546
Apr | 157
But it said
error: more than one row returned by a subquery used as an expression
What should I change in this query?
Let us assume that the table KYC has a timestamp column called created_date and the status column, and, that you want to count the success status per month - even if there was zero success items in a month.
SELECT thang.month
, count(CASE WHEN kyc.STATUS = 'success' THEN 1 END) AS successes
FROM (
SELECT to_char(created_date, 'Mon') AS Month
, created_date::DATE AS start_date
, (created_date::DATE + interval '1 month - 1 day ')::DATE AS end_date
FROM generate_series(DATE '2021-01-26', DATE '2022-04-26', interval '1 month') AS g(created_date)
) AS "thang"
LEFT JOIN kyc ON kyc.created_date>= thang.start_date
AND kyc.created_date < thang.end_date
GROUP BY thang.month;

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.

postgresql daysdiff between two dates grouped by month

I have a table with the date columns (start_date, end_date) and I want to calculate the difference between these dates and grouped by the month.
I am able to get the datediff in days, but I do not know how to group this in month, any suggestions?
Table:
id Start_date End_date days
1234 2014-06-03 2014-07-05 32
12345 2014-02-02 2014-05-10 97
Expected results:
month diff_days
2 26
3 30
4 31
5 10
6 27
7 5
I think your expected output numbers are off a little. You might want to double-check.
I use a calendar table myself, but this query uses a CTE and date arithmetic. Avoiding the hard-coded date '2014-01-01' and the interval for 365 days is straightforward, but it makes the query harder to read, so I just used those values directly.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
group by calendar_month
order by calendar_month;
calendar_month count
--
2 27
3 31
4 30
5 10
6 28
7 5
As a rule of thumb, you should never group by the month alone--doing that risks grouping data from different years. This is a safer version that includes the year, and which also restricts output to a single calendar year.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 700) n
)
select extract (year from calendar_date) calendar_year, extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
where calendar_date between '2014-01-01' and '2014-12-31'
group by calendar_year, calendar_month
order by calendar_year, calendar_month;
SQL Fiddle
with min_max as (
select min(start_date) as start_date, max(end_date) as end_date
from t
), g as (
select daterange(d::date, (d + interval '1 month')::date, '[)') as r
from generate_series(
(select date_trunc('month', start_date) from min_max),
(select end_date from min_max),
'1 month'
) g(d)
)
select *
from (
select
to_char(lower(r), 'YYYY Mon') as "Month",
sum(upper(r) - lower(r)) as days
from (
select t.r * g.r as r
from
(
select daterange(start_date, end_date, '[]') as r
from t
) t
inner join
g on t.r && g.r
) s
group by 1
) s
order by to_timestamp("Month", 'YYYY Mon')
;
Month | days
----------+------
2014 Feb | 27
2014 Mar | 31
2014 Apr | 30
2014 May | 10
2014 Jun | 28
2014 Jul | 5
Range data types
Range functions and operators