Trying to set a variable inside a case statement. - tsql

I'm trying to update a date dimension table from the accounting years table of our ERP System. If I run the following Query:
SELECT fcname FYName
,min(fdstart) YearStart
,max(fdend) YearEnd
,max(fnnumber) PeriodCount
FROM M2MData01.dbo.glrule GLR
GROUP BY fcname
I get the following data:
FYName YearStart YearEnd PeriodCount
FY 2000 1/1/2000 12:00:00 AM 12/31/2000 12:00:00 AM 12
FY 2001 1/1/2001 12:00:00 AM 12/31/2001 12:00:00 AM 12
FY 2002 1/1/2002 12:00:00 AM 12/31/2002 12:00:00 AM 12
FY 2003 1/1/2003 12:00:00 AM 12/31/2003 12:00:00 AM 12
FY 2004 1/1/2004 12:00:00 AM 12/31/2004 12:00:00 AM 12
FY 2005 1/1/2005 12:00:00 AM 12/31/2005 12:00:00 AM 12
FY 2006 1/1/2006 12:00:00 AM 12/31/2006 12:00:00 AM 12
FY 2007 1/1/2007 12:00:00 AM 12/31/2007 12:00:00 AM 12
FY 2008 1/1/2008 12:00:00 AM 12/31/2008 12:00:00 AM 12
FY 2009 1/1/2009 12:00:00 AM 12/31/2009 12:00:00 AM 12
FY 2010 1/1/2010 12:00:00 AM 12/31/2010 12:00:00 AM 12
In my case my company has 12 periods per year which roughly correspond to months. Basically, I am trying to create an update statement to set Fiscal Quarters which will follow this logic:
1. If PeriodCount is divisible by 4 then the number of periods in a quarter is PeriodCount/4.
2. If PeriodNumber is in the first quarter (in this case periods 1 through 3) then FiscalQuarter =1 and so on for quarters 2 through 4.
The problem is that I cannot be guaranteed that everyone uses 12 periods, some companies I support use a different number such as 10.
I started creating the following select statement:
DECLARE #QuarterSize INT
DECLARE #SemesterSize INT
SELECT TST.Date,
CASE WHEN glr.PeriodCount % 4 = 0 THEN
-- Can Be divided into quarters. Quarter size is PeriodCount/4
set #quartersize = (GLR.PeriodCount/4)
CASE
END
ELSE 0
End
FROM m2mdata01.dbo.AllDates TST
INNER JOIN (
SELECT fcname FYName
,min(fdstart) YearStart
,MAX(fdend) YearEnd
,MAX(fnnumber) PeriodCount
FROM M2MData01.dbo.glrule GLR
GROUP BY fcname ) GLR
ON TST.DATE >= GLR.YearStart AND TST.DATE <= GLR.YearEnd
Can I set the value of a variable inside a case statement like this? What's the best way to accomplish this? Am I forced to use a cursor statement and check each date in my dimension against the range in the table above?

Not sure what you want to do here - you can assign variable outside case statement in select clause. Such as
SELECT
SomeCol,
#var = CASE
WHEN condition1 THEN some value
WHEN condition2 THEN other value
END,
OtherCol
FROM
...
Note that #var value be set to the value evaluated at the last row. As said earlier, I am not sure how you intend to use you #quartersize variable. If the value is needed on every row then u shouldn't be using variable at all.

It may not be the most elegant solution, but here is what I ended up with.
I linked a copy of the script details to a grouped by version of the same thing.
SELECT fcname FYName, fdstart PeriodStart, fdend PeriodEnd, fnnumber PeriodNo, GLRAGG.AGGFYName,
GLRAGG.QuarterSize, GLRAGG.PeriodCount, GLRAGG.Quarterific, GLRAGG.SemesterSize, GLRAGG.Semesterific
FROM M2MData01.dbo.glrule GLR
INNER JOIN
(SELECT fcname AGGFYName, min(fdstart) YearStart,
MAX(fdend) YearEnd, MAX(fnnumber) PeriodCount,
(Max(fnnumber) / 4) QuarterSize, CASE WHEN Max(fnnumber) % 4 = 0 THEN 'Yes' ELSE 'No' END AS Quarterific,
(Max(fnnumber) / 2) SemesterSize, CASE WHEN Max(fnnumber) % 2 = 0 THEN 'Yes' ELSE 'No' END AS Semesterific
FROM M2MData01.dbo.glrule
GROUP BY fcname) GLRAGG
ON GLR.FCNAME = GLRAGG.AGGFYNAME
This isn't a big deal because that table only has 12 rows for each year, in this case only 132 total rows.
That produces every fiscal period with the total number of periods in each Fiscal Year and whether it can be evenly divisible by 4 and 2. It then uses the "Quarterific" value to determine whether to do so in the update statement and I can get by wtihout using variables.
It may not be the best way, but it works and is performant given the small data set.

Related

T-SQL Dynamic Date based on Today's Month

My fiscal year begins on April 1 and I need to include 1 full year of historical data plus current fiscal year as of today. In DAX this looks like:
DATESBETWEEN(Calendar_Date
,IF(MONTH(TODAY()) < 4
,DATE(YEAR(TODAY())-2, 4, 1)
,DATE(YEAR(TODAY())-1, 4, 1)
)
,DATE(TODAY())
)
I need to create this same range as a filter in a T-SQL query, preferably in the "WHERE" clause, but I am totally new to sql and have been unsuccessful in finding a solution online. Any help from more experienced people would be much appreciated!
If you just want to find these values and use as a where filter this is fairly straightforward date arithmetic, the logic for which you already have in your DAX code:
declare #dates table(d date);
insert into #dates values
('20190101')
,('20190601')
,('20200213')
,('20201011')
,('20190101')
,(getdate())
;
select d
,dateadd(month,3,dateadd(year,datediff(year,0,dateadd(month,-4,d))-1,0)) as TraditionalMethod
,case when month(d) < 4
then datetime2fromparts(year(d)-2,4,1,0,0,0,0,0)
else datetime2fromparts(year(d)-1,4,1,0,0,0,0,0)
end as YourDAXTranslated
from #dates;
Which outputs:
d
TraditionalMethod
YourDAXTranslated
2019-01-01
2017-04-01 00:00:00.000
2017-04-01 00:00:00
2019-06-01
2018-04-01 00:00:00.000
2018-04-01 00:00:00
2020-02-13
2018-04-01 00:00:00.000
2018-04-01 00:00:00
2020-10-11
2019-04-01 00:00:00.000
2019-04-01 00:00:00
2019-01-01
2017-04-01 00:00:00.000
2017-04-01 00:00:00
2021-07-22
2020-04-01 00:00:00.000
2020-04-01 00:00:00
However, I would suggest that you may be better served by creating a Dates Table to which you apply filters and from which you join to your transactional data to return the values you require. In an appropriately configured environment this will make full use of available indexes and should provide very good performance.
A very basic tally table approach to generate such a Dates Table is as follows, which returns all dates and their fiscal year start dates for 2015-01-01 to 2042-05-18:
with t as (select t from(values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) as t(t))
,d as (select dateadd(day,row_number() over (order by (select null))-1,'20150101') as d from t,t t2,t t3,t t4)
select d as DateValue
,case when month(d) < 4
then datetime2fromparts(year(d)-1,4,1,0,0,0,0,0)
else datetime2fromparts(year(d),4,1,0,0,0,0,0)
end as FinancialYearStart
from d
order by DateValue;

7 Day Return/Retention Rate

I've been trying to calculate 7 Day Return Rate (also known as Classic Retention Rate, as described here: https://www.braze.com/blog/calculate-retention-rate/) and then taking a 30 day average to reduce noise in Postgresql.
However, I'm sure I'm doing something wrong. First of all, the numbers look waaay higher than intuitively I feel they should be (generally around 5% for the rest of the sector). Also, I believe the first 7 days should show 0, as theoretically users should take at least 7 days to count as a "return". However, I get around 40-70%, as shown below.
Would someone mind taking a look at the code below and seeing if there are any errors? 7 Day Return Rate is a really common metric for apps, and I haven't found any questions using postgresql that calculate it to this level of sophistication on Stack Exchange (or even the rest of the web), so I feel like a solid response could be very useful to a lot of people.
Sample data
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate
The problem is a bit easier than your query makes it seem, you can actually do it with just one subquery and one out query as follows:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) #> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
Note that this weights each day equally rather than each user equally across days. If you want to weight the average by user across days then you can update the outer calculation using more aggregates and windows to compute the value with weightings.
Reference: http://sqlfiddle.com/#!17/ee17e/1/0
If you don't have access to array_agg (but have access to window functions) you can use:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;

subtract two rows values using t-sql

i want to subtract values from of month 7 salary from month 8 salary
using t-sql can anyone help me i m new in tsql
ID Year Month Salary
1088 2017 8 -29766.250 0.000
1088 2015 7 -58.500 0.000
The simplest approach: use a subquery for each value (i.e. replace a and b in select a - b with subqueries):
select
(select salary from mytable where year = 2017 and month = 8) -
(select salary from mytable where year = 2015 and month = 7) as diff;

Convert day of year (from extract) back to a date

I am trying to group data by the day of the year that it falls on. I have been able to achieve this with the code below. The issue is that I lose the information as to which day (i.e. Jan 1st, Jan 2nd etc) each grouping represents. I am simply left with a number (e.g. 1, 2 etc.) representing the day of the year. Is there any to convert this number back into the more descriptive date? Thanks a lot.
CREATE TABLE tmp2 AS
SELECT extract(doy from trd_exctn_dt) as day_of_year
,sum(dollar_vol) AS dollar_vol
FROM tmp
GROUP BY extract(doy from trd_exctn_dt);
Current Output:
day_of_year | dollar_vol
------------|------------
1 10
2 15
3 7
Desired Output: N.b. The exact format of the first column doesn't matter too much. I would be happy with DD/MM, MM/DD or any other clear output.
day_of_year | dollar_vol
------------|------------
Jan 1 | 10
Jan 2 | 15
Jan 3 | 7
Using the to_char fucntion:
SELECT to_char(trd_exctn_dt,'MM/DD') as day_of_year ,sum(dollar_vol) AS dollar_vol
FROM tmp
GROUP BY day_of_year ;

Postgresql. Find first entry for each unique ID | Date that matches a set of criteria

Sorry if this has been answered but I'm finding it difficult to phrase the question without an example. I have a set of IDs that I keep information (if it's available) for every hour of every day. I'm trying to conduct a daily study where specific hours are more relevant than others, so I would like to take the daily data for each ID and filter it by a specified list of times and if there is no data at the first time in the list, check if the next time in the list has data, until data is found at one of the times or no data is available for any of the specified times. For example
ID | Data_Date | Data_Time | Some_data
1 1/4/2015 10:00:00 Z
1 1/4/2015 12:00:00 Z
1 1/5/2015 12:00:00 A
1 1/5/2015 13:00:00 B
2 1/5/2015 13:00:00 C
2 1/5/2015 11:00:00 D
I'd like to take the data from 12:00:00 if available, otherwise use 11:00:00 and if neither exist use 13:00:00. 10:00:00 will not be in the list of times I care about so it is ignored. The query should return
ID | Data_Date | Data_Time | Some_data
1 1/4/2015 12:00:00 Z
1 1/5/2015 12:00:00 A
2 1/5/2015 11:00:00 D
It is not the case that it is always the earliest time. If this is unclear I'll do my best to elaborate. Any assistance is appreciaed
I don't really know if this helps you or not since you may have tried it already, but it is a starting point that may help you and I'm hopping we can discuss ideas and work together to solve this (two heads are better than one) :
with raw as (select '12:00:00'::time as starting_hour)
select id, data_date, data_time, some_data
from test, raw
where data_time =
case when data_time = starting_hour
then data_time
else
case when data_time = starting_hour + '-1h'
then data_time
else
case when data_time = starting_hour + '1h'
then data_time
else null
end
end
end;
This will give you 12:00, 11:00 and 13:00. We just need to stop searching when it finds rows that match one of the hours. I'll keep on trying :)