Postgres: longest streak per developer regardless of Saturdays and Sundays - postgresql

I got the information I needed from my last post about Postgres: Defining the longest streak (in days) per developer.
However now I want know the longest streak per developer regardless of Saturdays or Sundays. For instance, Bob worked from Thursday 18, Friday 19, Monday 22 and Tuesday 23, hence Bob streak is 4 days.
I understand I can use the DOW window function, which gives me 0 as Sunday , 1 Monday and so on. But
I don’t see how I can apply DOW function in the last solution proposed by Gordon Linoff.
Can some of you help me in this matter? Cheers,

WITH
working_limits AS (
SELECT
MIN(mr_date) AS start_date,
MAX(mr_date) AS end_date
FROM
xxx
),
working_days AS (
SELECT
ROW_NUMBER() OVER () AS day_number,
s.d::date AS date
FROM
GENERATE_SERIES((SELECT start_date FROM working_limits),
(SELECT end_date FROM working_limits),
'1 day') AS s(d)
WHERE
EXTRACT(dow FROM s.d) BETWEEN 1 AND 5),
worked_days AS (
SELECT
ROW_NUMBER() OVER () AS day_number,
developer,
mr_date AS date
FROM
xxx
ORDER BY
developer,
mr_date
)
SELECT
y.developer,
MAX(y.days)
FROM (
SELECT
x.developer,
COUNT(*) AS days
FROM (
SELECT
wngd.date,
wd.developer,
wngd.day_number - wd.day_number AS delta
FROM
working_days wngd INNER JOIN worked_days wd
ON
wngd.date = wd.date) AS x
GROUP BY
x.developer,
x.delta) AS y
GROUP BY
y.developer;

Related

extract days of daterange grouped by month postresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

Extracting data that is relevant to the financial year as set by the parameter date

I have a student_table and in this table there is a column student_financial_aid_type and the next column is date_ , so the value of student_financial_aid_type e.g. = 'direct' and the date_ 1/04/2018. I have used CTE tables and I have a parameter date at the beginning of the code, so that I get the number of students as of that day. e.g. my parameter date is 20/04/2019.
My financial year runs from april to march eg 1/04/18 - 31/3/19.
My question is where, it indicates that the student received some form of financial aid in the financial year, I will have an output column that says either 'Y' or 'N'. So using the example above, because the date 1/04/2018 is not in the financial year of the parameter date (20/04/19), it's actually in the previous financial year (1/04/18 - 31/3/19) then I would want this to be 'N' in the output column as in the financial year of the parameter date (20/04/19) the student did not receive any financial aid. However if I happen to change the parameter date 2/06/18, then the date that the student received the financial aid (1/04/18) is in the dame financial year as the parameter date, therefore my output column will now have 'Y' to reflect this. So however I do this it has to be dynamic and respond to the parameter date as that is the one that I as the user will be changing as and when
I have tried using date_part and I have managed to have the month number of the date that the student received the payout, from this point on I was thinking of using the month number as an indicator to what FY year it falls in, but I am not sure how to go about this.
WITH
parameter_date as (
select '2019-04-26':: date p_date),
student_cohort as (select * from (
SELECT Distinct
ms.studentid,ss.student_admission_date,ms.graduation_date
FROM master_student_table ms
left join student_semeter ss on ms.student_id=ss.student_id ,
parameter_date, p
AND ss.student_admission_date <= p_date -- i.e. began studies less than
or equal to p_date
AND (ms.graduation_date is null or ms.graduation_date > p_date)) -- i.e.
student finished studies more than p_date or IS NULL
)x ),
student_finance as (select * from ( select date_part('month', st.date_::
date)
date_part, st.date_, st.studentid,st.student_financial_aid_type
from student_table st
left join student_cohort s on st.studentid = s.studentid
where st.student_financial_aid_type in ('direct' , 'indirect')
) x )
select distinct
s.student_id,
s.graduation_date,
s.admissiondate_date,
sf.date_,
-- this is what I would like it to be -- case when sf.date is in the same
--financial year as the parameter_date
--then 'Y' else 'N' end was_financial_aid_received_in_the_fy,
sf.date_part
from
cohort s
left join student_finance sf on s.student_id = sf.student_id and
sf.student_financial_aid_type = 'direct'
left join student_finance sf1 on s.student_id = sf1.student_id and
sf1.student_financial_aid_type = 'indirect' `
I would love for the output column 'was_financial_aid_received_in_the_fy' from the case statement, to have 'Y' if the sf.date_ that the student received financial aid is in the same FY year as the parameter_date and 'N' if this isn't the case
Thank you very much for all your help
I think this question basically boils down to the following:
Given a parameter date, figure out the financial year for that date.
Figure out if other dates fall in this financial year.
This is a great place to use dateranges, one of my favorite types. We can figure out the financial year from the parameter date and use a daterange to represent it. If the parameter date is before April, the financial year should be from April 1 of the previous year (inclusive) to April 1 of this year (exclusive). If the parameter date is after April, the financial year should be April 1 of this year (inclusive) to April 1 of next year (exclusive).
Here's a query that should demonstrate how to do this:
WITH parameter_date as (
select '2019-04-26'::date p_date
), fiscal_year as (
select daterange(
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int-1
ELSE date_part('year', p_date)::int END,
4, 1),
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int
ELSE date_part('year', p_date)::int+1 END,
4, 1),
'[)') as f_year
FROM parameter_date
),
test_data as (
select test_date::date from (values
('2019-04-01'),
('2018-04-01'),
('2019-03-02'),
('2020-12-01'),
('2017-05-26'),
('2020-02-27'),
('2020-04-01')
) v(test_date)
)
select test_date,
CASE WHEN test_date <# fiscal_year.f_year THEN 'Y' ELSE 'N' END as in_f_year
from test_data, fiscal_year;
test_date | in_f_year
------------+-----------
2019-04-01 | Y
2018-04-01 | N
2019-03-02 | N
2020-12-01 | N
2017-05-26 | N
2020-02-27 | Y
2020-04-01 | N
(7 rows)

Getting Dates by Selecting a week in oracle

I have a textbox with random numbers from 1 to 52 which are week numbers of a calendar and a drop down which mentions as years.
For example if I select 2 in a textbox with year 2014, then I want the dates to be mentioned as 05-1-2014 - 11-1-2014. Is it possible to do it.
Also I have tried one query which doesnt match my requirement
SELECT date_val, TO_CHAR (date_val, 'ww')
FROM (SELECT TO_DATE ('01-jan-2013', 'DD-MON-YYYY') + LEVEL AS date_val
FROM DUAL
CONNECT BY LEVEL <= 365)
Please help.
Try this. Here 2 is the number of week in the year (FirstSunday+(NumberOfWeek-1)*7 as WeekStart, FirstSunday+ NumberOfWeek*7-1 as WeekEnd) and 2014 is a year:
select
FirstSunday+(2-1)*7 as WeekStart,
FirstSunday+ 2*7-1 as WeekEnd
from
(
Select NEXT_DAY(TO_DATE('01/01/'||'2014','DD/MM/YYYY')-7, 'SUN') as FirstSunday
from dual
)
SQLFiddle demo
Try this too,
SELECT start_date,
start_date + 6 end_day
FROM(
SELECT TRUNC(Trunc(to_date('2014', 'YYYY'),'YYYY')+ 1 * 7,'IW')-1 start_date
FROM duaL
);