PostgreSQL count how many late every month in a year - postgresql

I have this field named late_in that contains data like this 2017-05-29 08:36:44 where the limit for entry time is 08:30:00 every day.
What I want to do is to get the year, month and how many times he late in that month even if it zero late in the month.
I want the result look something like this:
year month late
-------------------
2017 1 6
2017 2 0
2017 3 14
and continue until the end of year.

You are looking for conditional aggregation:
select extract(year from late_in) as year,
extract(month from late_in ) as month,
count(*) filter (where late_in::time > time '08:30:00') as late
from the_table
group by extract(year from late_in),
extract(month from late_in );
This assumes that late_in is defined as timestamp.
The expression late_in::time returns only the time part of the value and the filter() clause for the aggregation will result in only those rows being counted where the condition is true, i.e. where the time part is after 08:30

Related

Count distinct dates between two timestamps

I want to count %days when a user was active. A query like this
select
a.id,
a.created_at,
CURRENT_DATE - a.created_at::date as days_since_registration,
NOW() as current_d
from public.accounts a where a.id = 3257
returns
id created_at days_since_registration current_d tot_active
3257 2022-04-01 22:59:00.000 1 2022-04-02 12:00:0.000 +0400 2
The person registered less than 24 hours ago (less than a day ago), but there are two distinct dates between the registration and now. Hence, if a user was active one hour before midnight and one hour after midnight, he is two days active in less than a day (active 200% of days)
What is the right way to count distinct dates and get 2 for a user, who registered at 23:00:00 two hours ago?
WITH cte as (
SELECT 42 as userID,'2022-04-01 23:00:00' as d
union
SELECT 42,'2022-04-02 01:00:00' as d
)
SELECT
userID,
count(d),
max(d)::date-min(d)::date+1 as NrOfDays,
count(d)/(max(d)::date-min(d)::date+1) *100 as PercentageOnline
FROM cte
GROUP BY userID;
output:
userid
count
nrofdays
percentageonline
42
2
2
100

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

DB2: Bi-monthly query for a DB2 report

I am currently writing a Crystal Report that has a DB2 query as its backend. I have finished the query but am stuck on the date portion of it. I am going to be running it twice a month - once on the 16th, and once on the 1st of the next month. Here's how it should work:
If I run it on the 16th of the month, it will give me results from the 1st of that same month to the 15th of that month.
If I run it on the 1st of the next month, it will give me results from the 16th of the previous month to the last day of the previous month.
This comes down a basic bi-monthly report. I've found plenty of hints to do this in T-SQL, but no efficient ways on how to accomplish this in DB2. I'm having a hard time wrapping my head around the logic to get this to consistently work, taking into account differences in month lengths and such.
There are 2 expressions for start and end date of an interval depending on the report date passed, which you may use in your where clause.
The logic is as follows:
1) If the report date is the 1-st day of a month, then:
DATE_START is 16-th of the previous month
DATE_END is the last day of the previous month
2) Otherwise:
DATE_START is 1-st of the current month
DATE_END is 15-th of the current month
SELECT
REPORT_DATE
, CASE DAY(REPORT_DATE) WHEN 1 THEN REPORT_DATE - 1 MONTH + 15 ELSE REPORT_DATE - DAY(REPORT_DATE) + 1 END AS DATE_START
, CASE DAY(REPORT_DATE) WHEN 1 THEN REPORT_DATE - 1 ELSE REPORT_DATE - DAY(REPORT_DATE) + 15 END AS DATE_END
FROM
(
VALUES
DATE('2020-02-01')
, DATE('2020-02-05')
, DATE('2020-02-16')
) T (REPORT_DATE);
The result is:
|REPORT_DATE|DATE_START|DATE_END |
|-----------|----------|----------|
|2020-02-01 |2020-01-16|2020-01-31|
|2020-02-05 |2020-02-01|2020-02-15|
|2020-02-16 |2020-02-01|2020-02-15|
In Db2 (for Unix, Linux and Windows) it could be a WHERE Condition like
WHERE
(CASE WHEN date_part('days', CURRENT date) > 15 THEN yourdatecolum >= this_month(CURRENT date) AND yourdatecolum < this_month(CURRENT date) + 15 days
ELSE yourdatecolum > this_month(CURRENT date) - 1 month + 15 DAYS AND yourdatecolum < this_month(CURRENT date)
END)
Check out the THIS_MONTH function - there are multiple ways to do it. Also DAYS_TO_END_OF_MONTH might be helpful

Counting by Week in Hive

I'm trying to produce a fully refreshed set of numbers each week, pulling from a table in hive. Right now I using this method:
SELECT
COUNT(DISTINCT case when timestamp between TO_DATE("2016-01-28") and TO_DATE("2016-01-30") then userid end) as week_1,
COUNT(DISTINCT case when timestamp between TO_DATE("2016-01-28") and TO_DATE("2016-02-06") then userid end) as week_2
FROM Data;
I'm trying to get something more along the lines of:
SELECT
Month(timestamp), Week(timestamp), COUNT (DISTINCT userid)
FROM Data
Group By Month, Week
But my week runs Sunday to Saturday. Is there a smarter way to be doing this that works in HIVE?
Solution found:
You can simply create your own formula instead of going with pre-defined function for "week of the year" Advantage: you will be able to take any set of 7 days for a week.
In your case since you want the week should start from Sunday-Saturday we will just need the first date of sunday in a year
eg- In 2016, First Sunday is on '2016-01-03' which is 3rd of Jan'16 --assumption considering the timestamp column in the format 'yyyy-mm-dd'
SELECT
count(distinct UserId), lower(datediff(timestamp,'2016-01-03') / 7) + 1 as week_of_the_year
FROM table.data
where timestamp>='2016-01-03'
group by lower(datediff(timestamp,'2016-01-03') / 7) + 1;
I see that you need the data to be grouped by week. you can just do this :
SELECT weekofyear(to_date(timestamp)), COUNT (DISTINCT userid) FROM Data Group By weekofyear(to_date(timestamp))

count data in current month - not 30 days back Postgres statment

Ive this query which return data for 30 days from current date , need to modify it to return data for current month only not 30 days from current date
SELECT count(1) AS counter FROM users.logged WHERE createddate >=
date_trunc('month', CURRENT_DATE);
any tips how to tweak this query , at based on Postgres
regards
Something like this should work.
SELECT count(1) AS counter
FROM users.logged
WHERE date_trunc('month', createddate) = date_trunc('month', current_date);
It is already supposed to return the values in current month. Truncation does the conversion 10 Nov 2013 14:16 -> 01 Nov 2013 00:00 and it will return the data since the beginning of this month. The problem seems to be something else.
Ive this query which return data for 30 days from current date , need to modify it to return data for current month only not 30 days from current date
That's incorrect. Your query:
SELECT count(1) AS counter FROM users.logged WHERE createddate >= date_trunc('month', CURRENT_DATE);
returns all dates >= Nov 1st 00:00:00, in other words what you say that you want already. Or then, you've simplified your query and left out the more important bits — those that are broken. If not:
It might be that you've dates in the future and that you're getting incorrect counts as a result. If so, add an additional criteria in the where clause:
AND created_date < date_trunc('month', CURRENT_DATE) + interval '1 month'
It might also be that your sample data has a bizarre row with a time zone such that it looks like the timestamp is from this month but the date/time arithmetics land it last month.
This is will give you data for the current month only. I try to extract month and year. The last step is you can compare created date against current date-time.
SELECT count(1) AS counter
FROM users.logged
WHERE
EXTRACT(MONTH FROM createddate) = EXTRACT(MONTH FROM current_date)
AND EXTRACT(YEAR FROM createddate) = EXTRACT(YEAR FROM current_date);