Print all dates of current year on SQL Server 2008R2 - tsql

The following codes will print all dates throughout the current year on SQL Server 2008R2
with x (dy, yr)
as (
select dy, year (dy) yr
from (
select getdate () - datepart (dy, getdate ()) + 1 dy
-- the first date of the current year
) tmp1
union all
select dateadd (dd, 1, dy), yr
from x
where year (dateadd (dd, 1, dy)) = yr
)
select x.dy
from x
option (maxrecursion 400)
But there are some points that I cannot understand
As far as I can see, the first date should've been printed 400 times, are all the repetitions filtered out?
when I change 400 to less than 364, the following error returns:
[Err] 42000 - [SQL Server]The statement terminated. The maximum
recursion 363 has been exhausted before statement completion.
But how does the processor know when the statement is going to complete?

What you are dealing with here is a recursive CTE. You should probably just read more about how it works.
Basically,
It obtains the first row set from the anchor part (the first SELECT, the left part of UNION ALL).
That row set becomes aliased as x in the second SELECT (the right part of UNION ALL), called the recursive part.
The recursive part produces another row set based on x, which becomes a new x at the next iteration. That is, not the combined row set of the initial x and the last result set becomes a new x, but only the last result set.
The previous step is repeated again against the new x, and the cycle goes on until either of these is true:
another iteration produces no result set;
the MAXRECURSION limit is reached.
The final result set consists of all the partial result sets obtained from both parts of the recursive CTE.
Applying the above to your particular query:
The first SELECT produces one row containing this year's 1st of January (the date), and that becomes the first x table.
For every row of x the second SELECT produces a row containing the corresponding next date if it belongs to the same year. So, the recursive part's first iteration effectively gives us the 2nd of January. According to the above description, the result set becomes a new x.
The following iteration results in the 3rd of January, the next one produces the 4th and so on.
If the MAXRECURSION option value has safely allowed us to arrive at the moment when x contains the 31st of December, then another iteration will reveal that the next day in fact belongs to a different year. That will result in an empty row set produced, which in turn will result in termination of the recursive CTE's execution.

This is not an answer this is just another way of writing your sql. Andriy M has left you with a cool answer, you should give him the credit for the right answer.
;with x (dy)
as (
select dateadd(year, datediff(year, 0, getdate()), 0) dy
union all
select dy + 1
from x
where year (dy) = year(dy+1)
)
select x.dy
from x
option (maxrecursion 400)

Related

Postgres Function: how to return the first full set of data that occurs after specified date/time

I have a requirement to extract rows of data, but only if all said rows make a full set. We have a sequence table that is updated every minute, with data for 80 bins. We need to know the status of bins 1 thru 80 every minute as part of our production process.
I am generating a new report (postgres function) that needs to take a snapshot at roughly 00:01:00:AM (IE 1 minute past midnight). Initially I thougtht this to be an easy task, just grab the first 80 rows of data that occur at/after this time, however I see that, depending on network activity and industrial computer priorities, the table is not religiously updated at exactly 00:01:00AM or any minute for that matter. Updates can occur milliseconds or even seconds later, and take 500ms to 800ms to update the database. Sometimes a given minute can be missing altogether (production processes take precedence over data capture, but the sequence data is not super critical anyway)
My thinking is it would be more reliable to look for the first complete set of data anytime from 00:01:00AM onwards. So effectively, I have a table that looks a bit like this:
Apologies, I know you prefer for images of this manner to not be pasted in this manner, but I could not figure out how to create a textual table like this here (carriage return or Enter button is ignored!)
Basically, the above table is typical, but 1st minute is not guaranteed, and for that matter, I would not be 100% confident that all 80 bins are logged for a given minute. Hence my question: how to return the first complete set of data, where all 80 bins (rows) have been captured for a particular minute?
Thinking about it, I could do some sort of rowcount in the function, ensuring there are 80 rows for a given minute, but this seems less intuitive. I would like to know for sure that for each row of a given minute, bin 1 is represented, bint 2, bin 3...
Ultimately a call to this function will supply a min/max date/time and that period of time will be checked for the first available minute with a full set of bins data.
I am reasonably sure this will involve a window function, as all rows have to be assessed prior to data extraction. I've used windows functions a few times now, but still a green newbie compared to others here, so help is appreciated.
My final code, thanks to help from #klin:-
StartTime = DATE_TRUNC('minute', tme1);
EndTime = DATE_TRUNC('day', tme1) + '23 hours'::interval;
SELECT "BinSequence".*
FROM "BinSequence"
JOIN(
SELECT "binMinute" AS binminute, count("binMinute")
FROM "BinSequence"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
HAVING COUNT (DISTINCT "binBinNo") = 80 -- verifies that each and every bin is represented in returned data
) theseTuplesOnly
ON theseTuplesOnly.binminute = "binMinute"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
ORDER BY 1
LIMIT 80
Use the aggregate function count(*) grouping data by minutes (date_trunc('minute', datestamp) gives full minutes from datestamp), e.g.:
create table bins(datestamp time, bin int);
insert into bins values
('00:01:10', 1, 'a'),
('00:01:20', 2, 'b'),
('00:01:30', 3, 'c'),
('00:01:40', 4, 'd'),
('00:02:10', 3, 'e'),
('00:03:10', 2, 'f'),
('00:03:10', 3, 'g'),
('00:03:10', 4, 'h');
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
order by 1
minute | count
----------+-------
00:01:00 | 4
00:02:00 | 1
00:03:00 | 3
(3 rows)
If you are not sure that all bins are unique in consecutive minutes, use distinct (this will make the query slower):
select date_trunc('minute', datestamp) as minute, count(distinct bin)
...
You cannot select counts in aggregated minnutes and all columns of the table in a single simple select. If you want to do that, you should join a derived table or use the operator in or use a window function. A join seems to be the simplest:
select b.*, count
from bins b
join (
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
having count(bin) = 4
) s
on date_trunc('minute', datestamp) = minute
order by 1;
datestamp | bin | param | count
-----------+-----+-------+-------
00:01:10 | 1 | a | 4
00:01:20 | 2 | b | 4
00:01:30 | 3 | c | 4
00:01:40 | 4 | d | 4
(4 rows)
Note also how to use having() to filter results in the above query.
You can test the query here.

Tableau - Calculating average where date is less than value from another data source

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

TSQL Cursor Alternative to Speed up my query

Row Status Time
1 Status1 1383264075
2 Status1 1383264195
3 Status1 1383264315
4 Status2 1383264435
5 Status2 1383264555
6 Status2 1383264675
7 Status2 1383264795
8 Status1 1383264915
9 Status3 1383265035
10 Status3 1383265155
11 Status2 1383265275
12 Status3 1383265395
13 Status1 1383265515
14 Status1 1383265535
15 Status2 1383265615
The [Time] column holds POSIX time
I want to be able to calculate the number of seconds a given [Status] is active for within a given time period without using CURSORS. If this is the only then that is fine as I've already done that.
Using the above sample data extract, how do I calculate how long "Status1" has been active for?
That is, Substract Row1.[Time] from Row4.[Time], Substract Row8.[Time] from Row9.[Time], Substract Row13.[Time] from Row15.[Time].
Thankyou in advance
Assuming that each row represents that the specific Status is active from the specified Time until the next row, one would have to somehow calculate the difference between row N and N+1. One way would be to use a nested query (try it here: SQL Fiddle).
SELECT SUM(Duration) as Duration
FROM (
SELECT f.Status, s.Time-f.Time as Duration
FROM Table1 f
JOIN Table1 s on s.Row = f.Row+1
WHERE f.Status = 'Status1') a
The solution by #erikxiv will work if the Row values have no gaps. If they do have gaps, you could try the following method:
SELECT
TotalDuration = SUM(next.Time - curr.Time)
FROM
dbo.atable AS curr
CROSS APPLY
(
SELECT TOP (1) Time
FROM dbo.atable
WHERE Row > curr.Row
ORDER BY Row ASC
) AS next
WHERE
curr.Status = 'Status1'
;
For every row matching the specified status, the correlated subquery in the CROSS APPLY clause will fetch the next Time value based on the ascending order of Row. The current row's time is then subtracted from the next row's time and all the differences are added up using SUM().
Please note that in both solutions it is implied that the order of Row values follows the order of Time values. In other words, ORDER BY Row is assumed to be equivalent to ORDER BY Time or, if Time can have duplicates, to ORDER BY Time, Row.

TSQL Syntax, Replace Existing "Wrong Value" with previous "Correct Value"

I have an application that makes an entry every hour in a MS SQL database.
The last entry on the 12th FEB is a zero value and is showing in my weekly report.
What I want to do is take the value from the previous count and enter into the filed instead of the zero value.
Can someone offer some advice on how to this because it is beyond my TSQL skills?
SELECT * FROM [dbo].[CountDetails]
WHERE [updateTime] < '2013-02.13'
AND [updateTime] > '2013-02.12'
AND ( DATEPART(hh,[updateTime])= '22' OR DATEPART(hh,[updateTime])= '23' )
Note: The application is supposed to zero the count a Midnight but on the 12th FEB it happened early and I know why.
EDIT: There are 5 IP addresses in total and 6 counters in total because 192.168.168.11 has 2 counters. So 2111 to 2116 is an entire entry for all available counters at 22:58 and 2117 to 2122 is an entire entry for all available counters at 23:58. I need to replace the 23:58 values with the corresponding value from 22:58.
Guessing here, but an update that joins on the ipAddress, counterNumber, and the datetime excluding fractional seconds, separated by an hour (do the SELECT part first for safety):
UPDATE b
SET count = a.count
-- SELECT *
FROM dbo.CountDetails a
JOIN dbo.CountDetails b ON a.ipAddress = b.ipAddress AND a.counterNumber = b.counterNumber
AND CONVERT(VARCHAR(20),b.updateTime,120) = CONVERT(VARCHAR(20),DATEADD(HOUR,1,a.updateTime),120)

Unexpected SQL results: string vs. direct SQL

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.
The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.
From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.