update a column with the latest hiredate of an employee - tsql

I have an employee status history table.
I need to create one more column that should copy the min(EffectiveStartDate) on each row till the employee is rehired. I need to get the length of service of the employee where the date will be passed by UI.
How can i achieve in SQL server 2014

This answer has a few assumptions.
Assumptions
The data set is only for one Employee at a time. If it is not,
and there is another column, such as EmployeeID, then you will
want to specify that in a partition by clause inside the over
clause where my comments denote that.
That the EmployeeStatusCatalog values have the below meanings:
A: Active
L: Leave (of Absence)
I: Inactive
That a "Hire" or "Rehire" transaction is considered to happen
either at initial A status, or after an I status has ended.
Sample Data Setup
Did not include the EmployeeStatusId column, as my assumption is that it is not relevant to creating the expected outcome.
declare #employee table
(
EffectiveStartDate date not null
, EffectiveEndDate date not null
, EmployeeStatusCatalog char(1) not null
)
insert into #employee
values ('2008-02-29', '2016-05-31', 'A')
, ('2016-06-01', '2016-06-30', 'A')
, ('2016-07-01', '2016-07-30', 'L')
, ('2016-07-31', '2016-09-02', 'A')
, ('2016-09-03', '2016-10-09', 'I')
, ('2016-10-10', '2016-11-01', 'A')
, ('2016-11-02', '2016-12-02', 'L')
, ('2016-12-03', '2016-12-05', 'I')
, ('2016-12-06', '2016-12-06', 'A')
, ('2016-12-07', '2017-01-01', 'L')
, ('2017-01-02', '9999-12-31', 'A')
Answer
As you may or may not know, this is a classic gaps and islands scenario. Where each segment between Hire/Rehire dates is an island (no gaps in this example).
I used a CTE to move the I status forward one row (via LAG function), and then get the running count of the number of I rows to give each island a "ID" number.
After that, used a min function, while partitioning by the island number, to determine the minimum EffectiveStartDate for each island.
; with inactive_dts as
(
--move the I status forward one row
select e.EffectiveStartDate
, e.EffectiveEndDate
, e.EmployeeStatusCatalog
, lag(e.EmployeeStatusCatalog, 1, 'A') over (/*partion by here*/ order by e.EffectiveStartDate asc) as prev_status
from #employee as e
where 1=1
)
, active_island_nbr as
(
--get the running count of the number of I rows
select a.EffectiveStartDate
, a.EffectiveEndDate
, a.EmployeeStatusCatalog
, a.prev_status
, sum(case a.prev_status when 'I' then 1 else 0 end) over (/*partition by here*/ order by a.EffectiveStartDate asc) as ActiveIslandNbr
from inactive_dts as a
)
select min(a.EffectiveStartDate) over (partition by a.ActiveIslandNbr) as HireRehireDate
, a.EffectiveStartDate
, a.EffectiveEndDate
, a.EmployeeStatusCatalog
from active_island_nbr as a
Results
HireRehireDate EffectiveStartDate EffectiveEndDate EmployeeStatusCatalog
2008-02-29 2008-02-29 2016-05-31 A
2008-02-29 2016-06-01 2016-06-30 A
2008-02-29 2016-07-01 2016-07-30 L
2008-02-29 2016-07-31 2016-09-02 A
2008-02-29 2016-09-03 2016-10-09 I
2016-10-10 2016-10-10 2016-11-01 A
2016-10-10 2016-11-02 2016-12-02 L
2016-10-10 2016-12-03 2016-12-05 I
2016-12-06 2016-12-06 2016-12-06 A
2016-12-06 2016-12-07 2017-01-01 L
2016-12-06 2017-01-02 9999-12-31 A

Related

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

SQL Passing Table Date for Date

Wondering how I can pass the RECEIVEDDATETIME to the statement below. What am I missing in my SQL statement?
I added an inner join to my CALENDAR table and TESTDATA table to pass the T.RECEIVEDDATETIME as the date from the original example thanks to Mark Barinstein.
This statement gets the C.WORKDATE from my tblCalendar, but I need to have the T.RECEIVEDDATETIME to pass to get my desired "DUEDATE".
I created a "tblCalendar" because I read that it was easier to reference a Calendar for true workdays...to exclude weekends and holidays and to account for leap years. Unsure if this is the best practice, but seemed straight forward to not code for exceptions. So I created the tblCalendar that includes ALL DATEs from 2017 until 2050 and holidays. The data below only represents partially January 2019 as I haven't found a way to attach a table here:
tblCalendar (partial)
DATE NUMDAYOFWK DAYOFWK HOLIDAY
01/01/2019 3 Tuesday YES
01/02/2019 4 Wednesday
01/03/2019 5 Thursday
01/04/2019 6 Friday
01/05/2019 7 Saturday
01/06/2019 1 Sunday
01/07/2019 2 Monday
01/08/2019 3 Tuesday
01/09/2019 4 Wednesday
01/10/2019 5 Thursday
01/11/2019 6 Friday
01/12/2019 7 Saturday
01/13/2019 1 Sunday
01/14/2019 2 Monday
01/15/2019 3 Tuesday
01/16/2019 4 Wednesday
01/17/2019 5 Thursday
01/18/2019 6 Friday
01/19/2019 7 Saturday
01/20/2019 1 Sunday
01/21/2019 2 Monday YES
The tblTestData table holds the core data where I reference all fields needed for my reports.
tblTestData Columns (partial) - DeliveryDays would reference the 2nd parameter BusDayAdd that was noted in the previous SQL.
ID RECEIVEDDATE DeliveryDays Address
T-20190116-255 01/16/2019 2 1234 Address
T-20190117-255 01/17/2019 2 3657 Address
T-20190118-222 01/18/2019 2 9999 Address
T-20190119-255 01/19/2019 2
T-20190120-255 01/20/2019
T-20190121-255 01/21/2019
T-20190303-1 03/03/2019
The Desired end results would look like the following taking into account the RECEIVEDDATETIME in my tblTestData and reference the tblCalendar table to exclude weekends and holidays to give the correct due date.
ID RECEIVEDDATE DeliveryDays DueDate Address
T-20190116-255 1/16/2019 2 1/18/2019 1234 Address
T-20190117-255 1/17/2019 2 1/22/2019 3657 Address
T-20190118-222 1/18/2019 2 1/23/2019 9999 Address
T-20190119-255 1/19/2019 2 1/23/2019 10000 Address
T-20190120-255 1/20/2019 2 1/23/2019 10001 Address
T-20190121-255 1/21/2019 2 1/23/2019 10002 Address
T-20190121-256 1/22/2019 2 1/24/2019 10003 Address
T-20190303-1 3/3/2019 3 3/6/2019 10004 Address
T-20190121-257 3/15/2019 7 3/26/2019 10005 Address
I have tried various statements by rewriting the code to wrap the SQL string to pass the table "RECEIVEDDATETIME", but each time the "DUEDATE" comes back {NULL}.
SELECT T.ID, VARCHAR_format(T.RECEIVEDDATETIME, 'MM/DD/YYYY') RECDATE,
(select VARCHAR_FORMAT(WORKDATE,'MM/DD/YYYY') DUEDATE
from
(Select
WORKDATE, T.RECEIVEDDATETIME,
sum(case when C.HOLIDAY='YES' or C.NUMDAYOFWK in (7,1) then 0 else 1 end) over (order by C.WORKDATE) BUSDAYADD
from tblCALENDAR C
--ADDED INNER JOIN TO GET T.RECEIVEDDATETIME TO FEED AUTOMATICALLY FROM TESTDATA TABLE
INNER JOIN TESTDATA T
ON
VARCHAR_FORMAT(C.WORKDATE, 'MM/DD/YYYY') = VARCHAR_FORMAT(T.RECEIVEDDATETIME,'MM/DD/YYYY')
where C.WORKDATE > VARCHAR_FORMAT(T.RECEIVEDDATETIME,'MM/DD/YYYY')) -- 1-st PARAMETER TO CAPTURE RECEIVEDDATETIME
WHERE BUSDAYADD = ? -- 2-nd parameter to add the number of days needed to be added to RECEIVEDDATETIME
order by WORKDATE --3rd Parameter
fetch first 1 row only)
FROM TESTDATA T
WHERE ID = 'T-20190303-1'
When I run the SQL, I get {NULL} for my results for DUEDATE:
ID RECDATE DUEDATE
T-20190303-1 03/03/2019 {NULL}
The results should be:
ID RECDATE DUEDATE
T-20190303-1 03/03/2019 03/05/2019
Any help is appreciated.
You provided inconsistent data in both tables: your calendar ends too early for all records in the tblTestData table except the 1-st one.
Let me provide absolutely the same query, which I already sent earlier to your another question.
with tblCalendar (DATE, HOLIDAY) as (values
(date(to_date('01/16/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/17/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/18/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/19/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/20/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/21/2019', 'MM/DD/YYYY')), 'YES')
, (date(to_date('01/22/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/23/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/24/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/25/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/26/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/27/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/28/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/29/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/30/2019', 'MM/DD/YYYY')), '')
, (date(to_date('01/31/2019', 'MM/DD/YYYY')), '')
, (date(to_date('02/01/2019', 'MM/DD/YYYY')), '')
, (date(to_date('02/02/2019', 'MM/DD/YYYY')), '')
, (date(to_date('02/03/2019', 'MM/DD/YYYY')), '')
, (date(to_date('02/04/2019', 'MM/DD/YYYY')), '')
, (date(to_date('02/05/2019', 'MM/DD/YYYY')), '')
)
, tblTestData (ID, RECEIVEDDATE, DeliveryDays) as (values
('T-20190116-255', date(to_date('01/16/2019', 'MM/DD/YYYY')), 2)
, ('T-20190117-255', date(to_date('01/17/2019', 'MM/DD/YYYY')), 2)
, ('T-20190118-222', date(to_date('01/18/2019', 'MM/DD/YYYY')), 2)
, ('T-20190119-255', date(to_date('01/19/2019', 'MM/DD/YYYY')), 2)
, ('T-20190120-255', date(to_date('01/20/2019', 'MM/DD/YYYY')), 2)
, ('T-20190121-255', date(to_date('01/21/2019', 'MM/DD/YYYY')), 2)
, ('T-20190121-256', date(to_date('01/22/2019', 'MM/DD/YYYY')), 2)
, ('T-20190303-1' , date(to_date('01/23/2019', 'MM/DD/YYYY')), 3)
, ('T-20190121-257', date(to_date('01/24/2019', 'MM/DD/YYYY')), 7)
)
select m.*, t.date as DUEDATE
--, dayofweek(date) as DAYOFWK, dayname(date) as DAY
from tblTestData m
, table
(
select date
from table
(
select
date
, sum(case when HOLIDAY='YES' or dayofweek(date) in (7,1) then 0 else 1 end) over (order by date) as dn_
from tblCalendar t
where t.date > m.RECEIVEDDATE
)
where dn_ = m.DeliveryDays
fetch first 1 row only
) t;
The result is:
ID RECEIVEDDATE DAYS DUEDATE
-------------- ------------ ---- ----------
T-20190116-255 2019-01-16 2 2019-01-18
T-20190117-255 2019-01-17 2 2019-01-22
T-20190118-222 2019-01-18 2 2019-01-23
T-20190119-255 2019-01-19 2 2019-01-23
T-20190120-255 2019-01-20 2 2019-01-23
T-20190121-255 2019-01-21 2 2019-01-23
T-20190121-256 2019-01-22 2 2019-01-24
T-20190303-1 2019-01-23 3 2019-01-28
T-20190121-257 2019-01-24 7 2019-02-04
The SELECT of the DUEDATE is
INNER JOIN TESTDATA T
ON VARCHAR_FORMAT(C.WORKDATE, 'MM/DD/YYYY') = VARCHAR_FORMAT(T.RECEIVEDDATETIME,'MM/DD/YYYY')
where C.WORKDATE > VARCHAR_FORMAT(T.RECEIVEDDATETIME,'MM/DD/YYYY'))
This means you join on the Equality and then restrict to Workdate > Receiveddatetime
that might be the problem...

Calculate past 3 month average for every past 3rd month

I am using SQL Server 2014. I have a table like this
create table revenue (id varchar(2), trasdate date, revenue int);
insert into revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
;
I am preparing a SQL query to create a SSRS report. I want to calculate a past 3 month average for current and every past 3rd month with result like below. As we are in month of September right now. The result should show something like this:
**id Period Revenue_3Mon**
aa March-May 233
aa June-Aug 516
ab March-May 233
ab June-Aug 516
Though I can figure out about the Period column. I was mainly focussing on getting the Revenue_3Mon. So I initially tried with the below query after some googling. But this query throws an error as incorrect syntax near 'rows' and if I remove rows from the query then it throws an error as Incorrect syntax near the keyword 'between'. And incorrect syntax near i.
select i.id,i.mon,
avg([i.mon_revenue]) over (partition by i.id, i.mon order by [i.id],
[i.mon] rows between 3 preceding and 1 preceding row) as revenue_3mon --
-- using 3 preceding and 1 preceding row you exclude the current row
from (select a.id, month(a.trasdate) as mon,
sum(a.revenue) as mon_revenue
from revenue a
group by a.id, month(a.trasdate)) i
group by i.id, i.mon
order by i.id,i.mon;
After few efforts, I gave up on this query and came up with new solution which was a bit close to my expectation (after lots of trial and errors).
Declare #count as int;
declare #max as int;
set #count = 4
declare #temp as table (id varchar(2), monthoftrasdate int, revenue int,
[3monavg] int);
SET #MAX = (SELECT distinct MAX(a.ROWNUM) FROM (SELECT id, month(trasdate)
as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
) A GROUP BY A.ID);
while (#count <= #max )
begin
WITH CTE AS (
SELECT id, month(trasdate) as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS
ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
)
insert into #temp
SELECT A.ID,A.MON, a.TotalRevenue
,( SELECT avg(b.TotalRevenue) as avgrev
FROM CTE B
WHERE B.ROWNUM BETWEEN A.ROWNUM-3 AND A.ROWNUM-1
AND A.ID = B.ID --AND A.mon = B.mon
--and b.ROWNUM < a.ROWNUM
and (a.mon > 3 and a.ROWNUM > 3)
GROUP BY B.id
) AS REVENUE_3MON
FROM CTE A
set #count = #count + 1
end
select distinct a.* from #temp a
The reason I had to use 'distinct' is because the query was showing duplicate records for every id and every month. So far the result shows like below
id MonthofTrasdate Revenue 3MonAvg
aa 1 400 NULL
aa 2 700 NULL
aa 3 350 NULL
aa 4 150 483
aa 5 200 400
aa 6 600 233
aa 7 500 316
aa 8 450 433
aa 9 1234 516
ab 1 400 NULL
ab 2 700 NULL
ab 3 350 NULL
ab 4 150 483
ab 5 200 400
ab 6 600 233
ab 7 500 316
ab 8 450 433
ab 9 1234 516
This pulls out past 3 month average for every month. But i will just manipulate the rest on SSRS the way i want it.
As currently my table has no data for previous year. This works for me showing the appropriate result for next couple of months for now. But my concern is when I have to show my boss for next year Jan, Feb and March then it should be able to pull also for these months as well like Oct-Dec (Previous year), Nov-Jan and Dec - Feb. I am struggling to figure out the proper way to put this in my query.
Can you please help me out with this query? And also let me know what is wrong with my former query.
Problems with your first attempt:
You enclosed some of the aliases and column names in square brackets like [i.mon_revenue]. There is no need for square brackets, but if you want to use them, you have to break them up at the dot: [i].[mon_revenue].
In your window function expression, there is one row too many (in the end).
Window functions are applied at the very end (after the rest of the respective query), so you also have to include i.mon_revenue in your GROUP BY clause of the outer query.
Knowing that the inner query will produce one row per id and mon, there will never be preceding rows in an id-mon partition. Therefore, you must not partition by both, but only by id.
To simplify the query after resolving the issues: ordering by a partition column generally makes no sense, and since - as already mentioned - the inner query returns unique id-mon combinations, you don't have to group by these in the outer query. Looking at that query, we see that the outer query just directly selects and uses the values from the inner query, which makes a separation in two queries unneccessary. So, in fact, you wanted to perform the following query, which will produce the rolling 3-month average (I added the monthly TotalRevenue as well):
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, MONTH(trasdate)
ORDER BY id, MONTH(trasdate);
Suggestions on your second attempt:
When calculating the #MAX value, you rely on the fact that each id has revenues for the same number of months. Are you sure?
The code inside the WHILE loop does not depend on #count, so it will add the same data into the #temp table multiple times, which is probably the reason why you thought you needed a DISTINCT. Therfore: No need for the variables, no need for a loop and a #temp, no need for DISTINCT.
The conditions A.mon > 3 and A.rownum > 3 are redundant with your current data. In general, I guess, you don't want to explicitly excluse the months from January to March, so A.mon > 3 should be removed. A.rownum > 3 could be removed, too, unless you really don't want to see a 3-month average when there are only 2 preceding months or less.
As the subquery for the average is restricted to only one id, there's no need for a GROUP BY.
Since the ROW_NUMBER function doesn't care about gaps in the months, I suggest to use a different numbering function, for example DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum. Of course, the comparison in the WHERE clause of the subquery then has to be changed to B.mnum BETWEEN A.mnum+1 AND A.mnum+3.
So, your second attempt can be reduced to this, which will produce the same result as the above, at least with your sample data, where no gaps in the months exist:
WITH CTE AS (
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum
FROM revenue
GROUP BY id, MONTH(trasdate)
)
SELECT id, mon, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Now, guess what, an expression like my mnum using DATEDIFF increases by one every month as we move to the past, regardless of a change of years, so this might be useful for grouping as well, whether you want to (or can?) use Window functions or not:
With OVER()
SELECT id, MONTH(MIN(trasdate)) AS mon, YEAR(MIN(trasdate)) AS yr, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MIN(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
ORDER BY id, DATEDIFF(month, trasdate, GETDATE()) DESC;
Without OVER()
WITH CTE AS (
SELECT id, MIN(trasdate) AS min_dt, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, trasdate, GETDATE()) AS mnum
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
)
SELECT id, MONTH(min_dt) AS mon, YEAR(min_dt) AS yr, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Both queries allow for retrieving the minimum and maximum date for each period (including month and year).
If you instead wanted what you originally posted under The result should show something like this (just grouping by previous 3-months intervals), you just would have to group your original revenue table by id and (DATEDIFF(month, trasdate, GETDATE())-1)/3 (filtering WHERE DATEDIFF(month, trasdate, GETDATE()) > 0). If so, this kind of grouping and aggregation could, of course, be done also by the Report Server.
I think this should do what you want:
select r.*,
avg(r.mon_revenue) over (partition by r.id
order by r.mon_min
rows between 3 preceding and 1 preceding row
) as revenue_3mon
-- using 3 preceding and 1 preceding row you exclude the current row
from (select r.id, month(r.trasdate) as mon,
min(r.trasdate) as mon_min,
sum(r.revenue) as mon_revenue
from revenue r
group by r.id, year(r.trasdate), month(r.trasdate)
) 4
order by r.id, r.mon, r.mon_min;
Notes:
I fixed the code so it recognizes years as well as dates.
The expression [i.mon_revenue] is not a valid column reference (in your case). You have no column with the name "i.mon_revenue" (with the . in the name).
I changed the column alias to r to match the table.
I added a date column for each month to make it easier to express the ordering.
The outer group by is not necessary.
There are several syntax errors in your code. This should give you what you need. The inner query is the important bit but hopefully this will be enough to get you on your way.
I switch our the temp table for variable and changed the revenue column to not be INT as you have decimal values in there but other than that your original sample table is unchanged
DECLARE #revenue table (id varchar(2), trasdate date, revenue float)
insert into #revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
SELECT
*
FROM
(
SELECT
*
, MONTH(trasdate) as MonthNumber
, AVG(revenue) OVER (PARTITION BY id
ORDER BY
id
, MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) as ThreeMonthAvg
FROM #revenue
) a
WHERE MONTH(GETDATE()) - MonthNumber IN (0, 3, 6, 9)
This gives the following results
aa 2018-06-04 600 6 400
aa 2018-09-01 1234.5 9 516.666666666667
ab 2018-03-01 350 3 700
ab 2018-06-04 600 6 233.333333333333
ab 2018-09-01 1234.5 9 516.666666666667

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth