Count distinct dates between two timestamps - postgresql

I want to count %days when a user was active. A query like this
select
a.id,
a.created_at,
CURRENT_DATE - a.created_at::date as days_since_registration,
NOW() as current_d
from public.accounts a where a.id = 3257
returns
id created_at days_since_registration current_d tot_active
3257 2022-04-01 22:59:00.000 1 2022-04-02 12:00:0.000 +0400 2
The person registered less than 24 hours ago (less than a day ago), but there are two distinct dates between the registration and now. Hence, if a user was active one hour before midnight and one hour after midnight, he is two days active in less than a day (active 200% of days)
What is the right way to count distinct dates and get 2 for a user, who registered at 23:00:00 two hours ago?

WITH cte as (
SELECT 42 as userID,'2022-04-01 23:00:00' as d
union
SELECT 42,'2022-04-02 01:00:00' as d
)
SELECT
userID,
count(d),
max(d)::date-min(d)::date+1 as NrOfDays,
count(d)/(max(d)::date-min(d)::date+1) *100 as PercentageOnline
FROM cte
GROUP BY userID;
output:
userid
count
nrofdays
percentageonline
42
2
2
100

Related

Mixing DISTINCT with GROUP_BY Postgres

I am trying to get a list of:
all months in a specified year that,
have at least 2 unique rows based on their date
and ignore specific column values
where I got to is:
SELECT DATE_PART('month', "orderDate") AS month, count(*)
FROM public."Orders"
WHERE "companyId" = 00001 AND "orderNumber" != 1 and DATE_PART('year', ("orderDate")) = '2020' AND "orderNumber" != NULL
GROUP BY month
HAVING COUNT ("orderDate") > 2
The HAVING_COUNT sort of works in place of DISTINCT insofar as I can be reasonably sure that condition filters the condition of data required.
However, being able to use DISTINCT based on a given date within a month would return a more reliable result. Is this possible with Postgres?
A sample line of data from the table:
Sample Input
"2018-12-17 20:32:00+00"
"2019-02-26 14:38:00+00"
"2020-07-26 10:19:00+00"
"2020-10-13 19:15:00+00"
"2020-10-26 16:42:00+00"
"2020-10-26 19:41:00+00"
"2020-11-19 20:21:00+00"
"2020-11-19 21:22:00+00"
"2020-11-23 21:10:00+00"
"2021-01-02 12:51:00+00"
without the HAVING_COUNT this produces
month
count
7
1
10
2
11
3
Month 7 can be discarded easily as only 1 record.
Month 10 is the issue: we have two records. But from the data above, those records are from the same day. Similarly, month 11 only has 2 distinct records by day.
The output should therefore be ideally:
month
count
11
2
We have only two distinct dates from the 2020 data, and they are from month 11 (November)
I think you just want to take the distinct count of dates for each month:
SELECT
DATE_PART('month', orderDate) AS month,
COUNT(DISTINCT orderDate::date) AS count
FROM Orders
WHERE
companyId = 1 AND
orderNumber != 1 AND
DATE_PART('year', orderDate) = '2020'
GROUP BY
DATE_PART('month', orderDate)
HAVING
COUNT(DISTINCT orderDate::date) > 2;

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month. This calcualtion is to be done for 12 months in a single query. Output should be as below.
Month Count
01/07/2019 50
01/08/2019 34
01/09/2019 23
01/10/2019 98
01/11/2019 10
01/12/2019 5
01/01/2020 32
01/02/2020 65
01/03/2020 23
01/04/2020 12
01/05/2020 64
01/06/2020 54
01/07/2020 78
I am able to get the value only for one month. I want to get it for all months in a single query.
This is my current query:
SELECT COUNT(DISTINCT TWO_MONTHS_AGO.USER_ID), TWO_MONTHS_AGO.MONTH AS INVOICE_MONTH
FROM (
SELECT USER_ID, LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
WHERE invoice_amt > 0
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 2)
GROUP BY user_id
) AS TWO_MONTHS_AGO
LEFT JOIN (
SELECT user_id,LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 1)
GROUP BY USER_ID
) AS ONE_MONTH_AGO ON TWO_MONTHS_AGO.USER_ID = ONE_MONTH_AGO.USER_ID
WHERE ONE_MONTH_AGO.USER_ID IS NULL
GROUP BY INVOICE_MONTH;
Thank you in advance.
Lona
Probably lots of different approaches but the way I would do it is as follows:
Summarise data by user and month for the last 13 months (you need 12 months plus the previous month to that first month
Compare "this" month (that has data) to "next" month and select records where there is no "next" month data
Summarise this dataset by month and distinct userid
For example, assuming a table created as follows:
create table INVOICE_DATA (
USERID varchar(4),
INVOICE_DT date,
INVOICE_AMT NUMBER(10,2)
);
the following query should give you what you want - you may need to adjust it depending on whether you are including this month, or only up to the end of last month, in your calculation, etc.:
--Summarise data by user and month
WITH MONTH_SUMMARY AS
(
SELECT USERID
,TO_CHAR(INVOICE_DT,'YYYY-MM') "INVOICE_MONTH"
,TO_CHAR(ADD_MONTHS(INVOICE_DT,1),'YYYY-MM') "NEXT_MONTH"
,SUM(INVOICE_AMT) "MONTHLY_TOTAL"
FROM INVOICE_DATA
WHERE INVOICE_DT >= TRUNC(ADD_MONTHS(current_date(),-13),'MONTH') -- Last 13 months of data
GROUP BY 1,2,3
),
--Get data for users with invoices in this month but not the next month
USER_DATA AS
(
SELECT USERID, INVOICE_MONTH, MONTHLY_TOTAL
FROM MONTH_SUMMARY MS_THIS
WHERE NOT EXISTS
(
SELECT USERID
FROM MONTH_SUMMARY MS_NEXT
WHERE
MS_THIS.USERID = MS_NEXT.USERID AND
MS_THIS.NEXT_MONTH = MS_NEXT.INVOICE_MONTH
)
AND MS_THIS.INVOICE_MONTH < TO_CHAR(current_date(),'YYYY-MM') -- Don't include this month as obviously no next month to compare to
)
SELECT INVOICE_MONTH, COUNT(DISTINCT USERID) "USER_COUNT"
FROM USER_DATA
GROUP BY INVOICE_MONTH
ORDER BY INVOICE_MONTH
;

PostgreSQL: Get rows which date is 5 days old than payment_date

Currently I need to send an Email to all users that have 5 days with their payment due_date expired and are status=1 (pending to pay) for the current month and year because they might have future dates or past dates. example
due_date= 27/06/2018 send email after 5 days 1/05/2018
my Query to grab all users with a interval within 5 days is the following:
SELECT payments_payment.id, payments_payment.due_date
FROM payments_payment
WHERE payments_payment.due_date < NOW() - '5 day'::interval
AND payments_payment.status = 1
AND EXTRACT(year FROM payments_payment.due_date) = EXTRACT(year FROM NOW())
AND EXTRACT(month FROM payments_payment.due_date) = EXTRACT(month FROM NOW())
ORDER BY payments_payment.due_date ASC;
Need to make a different approach since the question is inverse for that reason I need to get the difference between 2 dates and see if it matches my day limit here is the Query.
PostgreSQL Query:
SELECT due_date
FROM payments_payment
WHERE payments_payment.due_date + interval '5 day' < current_date
AND payments_payment.status = 1
Explanation
Get all payment dates where status equals 1 and month equals current month and year where the due_date substracted by current date is equals to 5 days.

Postgres count month on the basis of user_id

I want to count number of month for particular user from subscription table. for example user_id = 1 occur 10 times in subscription table like in January it appears 2 time and in February = 0 and again in march = 1 like that
user_id type started_on ended_on
2 P 2009-10-21 2010-03-18
2 F 2010-03-18 2010-03-20
2 P 2010-03-20 2012-05-19
2 F 2012-05-19 till now
This is pretty basic SQL, I reccomend you read some manual before asking it here. About the Aggregate functions for example. But there you go:
If you want one user:
SELECT
count(distinct month)
FROM
subscription
WHERE
user_id=your_user_id_number
If you want every user's:
SELECT
count(distinct month),
user_id
FROM
subscription
GROUP BY
user_id
Edit:
Ok, so you want the month difference between two date columns, here is how you do it with age():
SELECT user_id, extract(YEAR from age(coalesce(ended_on,current_date),started_on)) * 12 + extract(MONTH FROM age(coalesce(ended_on,current_date),started_on))
FROM subscription