PostgreSQL Data Selection

PostgreSQL Data Selection - postgresql

Is it possible to write PostgreSQL code that looks at the sample data in the selects only the persons who have been active for the whole first quarter( 01/01/2018 to 03/31/2018) as shown in the desired output? Note that person H should not be selected because they are missing January.
Sample Data
Person Start Date End Date
A 1/1/2018 1/31/2018
A 2/1/2018 2/28/2018
A 3/1/2018 3/31/2018
B 1/1/2018 2/28/2018
C 1/1/2018 2/28/2018
C 3/1/2018 3/31/2018
D 2/1/2018 3/31/2018
E 2/1/2018 2/28/2018
F 1/1/2018 3/31/2018
G 1/1/2018 4/30/2018
H 2/1/2018 4/30/2018
Desired Output
Person
A
C
F
G

Assuming your columns are proper DATE columns and there are no overlaps, you could do something like this:
select person
from the_table
group by person
having sum(end_date - start_date + 1) >= date '2018-03-31' - date '2018-01-01' + 1
order by person;
Subtracting one date from another yields the number of days between those two dates. Then the sum of all differences is compared to the difference between the start and end date of the quarter.
Online example: https://rextester.com/OIN10602

Related

Group by date and count of categorical variable date wise

Date frame having two categorical variable column with date time stamp.
Date
Time
Va
Vb
01-01-2023
05:55
A
B
01-01-2023
06:25
A
01-01-2023
17:42
B
01-01-2023
19:17
A
B
02-01-2023
05:55
A
B
02-01-2023
06:25
A
B
02-01-2023
17:42
A
B
02-01-2023
19:17
A
To group by the set by date and count Va and Vb for a date.
Expected Result:
Va
Vb
01-01-2023
3
3
02-01-2023
4
3
Wrote in previous slide

If you are using an SQL database (and those empty values are NULL):
Select Date, Count(Va) as Va, Count(Vb) as Vb
from sourceTable
group by date;
DBFiddle demo

Try:
out = df[['Date', 'Va', 'Vb']].groupby('Date').count()
print(out)
Prints:
Va Vb
Date
01-01-2023 3 3
02-01-2023 4 3

Postgres tsrange, filter by date and time

I have an events table that has a field called duration thats of type tsrange and that captures the beginning and end time of an event thats of type timestamp. What I want is to be able to filter all events across a certain date range and then filter those events by time. So for instance, a user should be able to filter for all events happening between (inclusive) 12-15-2019 to 12-17-2019 and that are playing at 9PM. To do this, the user submits a date range which filters all events in that date range:
WHERE lower(duration)::date <# '[start, finish]'::daterange
In the above start and finish are user submitted parameters.
Then I want to filter those events by events that are playing during a specific time e.g. 9PM, essentially only show events that have 9PM between their start and end time.
So if I have the following table:
id duration
--- ------------------------------------
A 2019-12-21 19:00...2019-12-22 01:00
B 2019-12-17 16:00...2019-12-17 18:00
C 2019-12-23 19:00...2019-12-23 21:00
D 2019-12-23 19:00...2019-12-24 01:00
E 2019-12-27 14:00...2019-12-27 16:00
And the user submits a date range of 2019-12-21 to 2019-12-27 then event B will be filtered out. Then the user submits a time of 9:00PM (21:00), in which case A, C, and D will be returned.
EDIT
I was able to get it to work using the following:
WHERE duration #> (lower(duration)::date || ' 21:00:00')::timestamp
Where the 21:00 above is the user data, but it seems a bit hackish

A tsrange contains a timestamp at 9 p.m. if and only if 9 p.m. on the starting day or 9 p.m. on the following day are part of the range.
You can use that to write your condition.
An example:
lower(r)::date + TIME '21:00' <# r OR
(lower(r)::date + 1) + TIME '21:00' <# r
is a test if r contains some timestamp at 9 p.m.

The user input from 2019-12-21 to 2019-12-27 at 21:00 means that he is interested in
select generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
t
---------------------
2019-12-21 21:00:00
2019-12-22 21:00:00
2019-12-23 21:00:00
2019-12-24 21:00:00
2019-12-25 21:00:00
2019-12-26 21:00:00
2019-12-27 21:00:00
(7 rows)
Hence you should check whether the duration column contains one of the timestamp:
select distinct e.*
from events e
cross join generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
where duration #> t
id | duration
----+-----------------------------------------------
A | ["2019-12-21 19:00:00","2019-12-22 01:10:00")
C | ["2019-12-23 19:00:00","2019-12-23 21:10:00")
D | ["2019-12-23 19:00:00","2019-12-24 01:10:00")
(3 rows)

T-SQL Create Indicator Based on Ranges

I have a table that is setup three columns:
EventName | StartDate | EndDate
FunRun 1/1/2018 1/10/2018
DumbRun 2/1/2018 2/5/2018
I have a separate dates table that has every date in the year with approximately 100 different attributes.
CalendarDate | DayOfWeek | WeekendInd | etc...
1/1/2018 Sunday 1
1/2/2018 Monday 0
1/15/2018 Wednesday 0
I want to join the two tables to create an indicator if the calendar date is between the dates on the even table.
CalendarDate | DayOfWeek | WeekendInd | EventInd
1/1/2018 Sunday 1 1
1/2/2018 Monday 0 1
1/15/2018 Wednesday 0 0
I cannot seem to use a recursive CTE in a subquery. This table is already joined to 5 other subqueries. Any suggestions?

As I understand the question, you don't need recursion, just join 2 tables
select
CalendarDate, DayOfWeek, WeekendInd, EventInd = isnull(EventInd, 0)
from
CalendarTable a
outer apply (
select
distinct EventInd = 1
from
EventsTable b
where
a.CalendarDate between b.StartDate and b.EndDate
) q

Getting attendance of an employee with a date series in a particular range in Postgres

I have a attendance table with employee_id, date and punch-in time.
Emp_Id PunchTime
101 10/10/2016 07:15
101 10/10/2016 12:20
101 10/10/2016 12:50
101 10/10/2016 16:31
102 10/10/2016 07:15
Here I have the date only for the working days. I want to get the attendance list of a employee with series of given date period. I need the day also. Result should look like as follows
date | day |employee_id | Intime | outtime |
2016-10-09 | sunday | 101 | | |
2016-10-10 | monday | 101 | 2016-10-10 7:15AM |2016-10-10 4:31 PM |

You can generate a list of dates and then do an outer join on them:
The following displays all days in October:
select d.date, a.emp_id,
min(punchtime) as intime,
max(punchtime) as outtime
from generate_series(date '2016-10-01', date '2016-11-01' - 1, interval '1' day) as d (date)
left join attendance a on d.date = a.punchtime::date
group by d.date, a.emp_id;
order by d.date, a.emp_id;
As you want the first and last timestamp from each day this can be done using a simple group by query.
This will however not repeat the emp_id for the non_existing days.

Something like the following will generate a list of the range of dates (starting and ending with whatever range is found in your punchtime table), with employees and intime, outtime for each. Check the SQL fiddle here:
http://sqlfiddle.com/#!15/d93bd/1
WITH RECURSIVE minmax AS
(
SELECT MIN(CAST(time AS DATE)) AS min, MAX(CAST(time as DATE)) AS max
FROM emp_time
),
dates AS
(
SELECT m.min as datepart
FROM minmax m
RIGHT JOIN emp_time e ON m.min = CAST(e.time as DATE)
UNION ALL
SELECT d.datepart + 1 FROM dates d, minmax mm
WHERE d.datepart + 1 <= mm.max
)
SELECT d.datepart as date, e.emp, MIN(e.time) as intime, MAX(e.time) as outtime FROM dates d
LEFT JOIN emp_time e ON d.datepart = CAST(e.time as DATE)
GROUP BY d.datepart, e.emp
ORDER BY d.datepart;

Trying to set a variable inside a case statement.

I'm trying to update a date dimension table from the accounting years table of our ERP System. If I run the following Query:
SELECT fcname FYName
,min(fdstart) YearStart
,max(fdend) YearEnd
,max(fnnumber) PeriodCount
FROM M2MData01.dbo.glrule GLR
GROUP BY fcname
I get the following data:
FYName YearStart YearEnd PeriodCount
FY 2000 1/1/2000 12:00:00 AM 12/31/2000 12:00:00 AM 12
FY 2001 1/1/2001 12:00:00 AM 12/31/2001 12:00:00 AM 12
FY 2002 1/1/2002 12:00:00 AM 12/31/2002 12:00:00 AM 12
FY 2003 1/1/2003 12:00:00 AM 12/31/2003 12:00:00 AM 12
FY 2004 1/1/2004 12:00:00 AM 12/31/2004 12:00:00 AM 12
FY 2005 1/1/2005 12:00:00 AM 12/31/2005 12:00:00 AM 12
FY 2006 1/1/2006 12:00:00 AM 12/31/2006 12:00:00 AM 12
FY 2007 1/1/2007 12:00:00 AM 12/31/2007 12:00:00 AM 12
FY 2008 1/1/2008 12:00:00 AM 12/31/2008 12:00:00 AM 12
FY 2009 1/1/2009 12:00:00 AM 12/31/2009 12:00:00 AM 12
FY 2010 1/1/2010 12:00:00 AM 12/31/2010 12:00:00 AM 12
In my case my company has 12 periods per year which roughly correspond to months. Basically, I am trying to create an update statement to set Fiscal Quarters which will follow this logic:
1. If PeriodCount is divisible by 4 then the number of periods in a quarter is PeriodCount/4.
2. If PeriodNumber is in the first quarter (in this case periods 1 through 3) then FiscalQuarter =1 and so on for quarters 2 through 4.
The problem is that I cannot be guaranteed that everyone uses 12 periods, some companies I support use a different number such as 10.
I started creating the following select statement:
DECLARE #QuarterSize INT
DECLARE #SemesterSize INT
SELECT TST.Date,
CASE WHEN glr.PeriodCount % 4 = 0 THEN
-- Can Be divided into quarters. Quarter size is PeriodCount/4
set #quartersize = (GLR.PeriodCount/4)
CASE
END
ELSE 0
End
FROM m2mdata01.dbo.AllDates TST
INNER JOIN (
SELECT fcname FYName
,min(fdstart) YearStart
,MAX(fdend) YearEnd
,MAX(fnnumber) PeriodCount
FROM M2MData01.dbo.glrule GLR
GROUP BY fcname ) GLR
ON TST.DATE >= GLR.YearStart AND TST.DATE <= GLR.YearEnd
Can I set the value of a variable inside a case statement like this? What's the best way to accomplish this? Am I forced to use a cursor statement and check each date in my dimension against the range in the table above?

Not sure what you want to do here - you can assign variable outside case statement in select clause. Such as
SELECT
SomeCol,
#var = CASE
WHEN condition1 THEN some value
WHEN condition2 THEN other value
END,
OtherCol
FROM
...
Note that #var value be set to the value evaluated at the last row. As said earlier, I am not sure how you intend to use you #quartersize variable. If the value is needed on every row then u shouldn't be using variable at all.

It may not be the most elegant solution, but here is what I ended up with.
I linked a copy of the script details to a grouped by version of the same thing.
SELECT fcname FYName, fdstart PeriodStart, fdend PeriodEnd, fnnumber PeriodNo, GLRAGG.AGGFYName,
GLRAGG.QuarterSize, GLRAGG.PeriodCount, GLRAGG.Quarterific, GLRAGG.SemesterSize, GLRAGG.Semesterific
FROM M2MData01.dbo.glrule GLR
INNER JOIN
(SELECT fcname AGGFYName, min(fdstart) YearStart,
MAX(fdend) YearEnd, MAX(fnnumber) PeriodCount,
(Max(fnnumber) / 4) QuarterSize, CASE WHEN Max(fnnumber) % 4 = 0 THEN 'Yes' ELSE 'No' END AS Quarterific,
(Max(fnnumber) / 2) SemesterSize, CASE WHEN Max(fnnumber) % 2 = 0 THEN 'Yes' ELSE 'No' END AS Semesterific
FROM M2MData01.dbo.glrule
GROUP BY fcname) GLRAGG
ON GLR.FCNAME = GLRAGG.AGGFYNAME
This isn't a big deal because that table only has 12 rows for each year, in this case only 132 total rows.
That produces every fiscal period with the total number of periods in each Fiscal Year and whether it can be evenly divisible by 4 and 2. It then uses the "Quarterific" value to determine whether to do so in the update statement and I can get by wtihout using variables.
It may not be the best way, but it works and is performant given the small data set.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL Data Selection - postgresql

Related

Group by date and count of categorical variable date wise

Postgres tsrange, filter by date and time

T-SQL Create Indicator Based on Ranges

Getting attendance of an employee with a date series in a particular range in Postgres

Trying to set a variable inside a case statement.

Categories

Resources