counting all occurrences in the last year - tsql

I have a question, although I can't really go into specifics.
Will the following query:
SELECT DISTINCT tableOuter.Property, (SELECT COUNT(ID) FROM table AS tableInner WHERE tableInner.Property = tableOuter.Property)
FROM table AS tableOuter
WHERE tableOuter.DateTime > DATEADD(year, -1, GETDATE())
AND tableOuter.Property IN (
...
)
Select one instance of each property in the IN clause, together with how often a row with that property occured in the last year?
I just read up on Correlated Subqueries on MSDN, but am not sure if I got it right.

If i understand you corrrecly, you want to get all occurences of each Property in the last year, am i right?
Then use GROUP BY with a HAVING clause:
SELECT tableOuter.Property, COUNT(*) AS Count
FROM table AS tableOuter
GROUP BY tableOuter.Property
HAVING tableOuter.DateTime > DATEADD(year, -1, GETDATE())
AND tableOuter.Property IN ( .... )

Related

T-SQL Union returns duplicated row (date)

I was trying to solve the issue of returning sum for each date in a specific date range, no matter if data is present for a day or not.
I have found that the best way would be to use a table pre-populated with all dates, select date range and union it with my data.
For some reason couldn't get left join to work, but union looks like working almost perfectly. The only issue is that it returns duplicates for dates where data is present in my data table.
SELECT NULL AS Visitors
,D.DATE AS Day
FROM Database.support.dates D
WHERE D.DATE BETWEEN #Start_date
AND #End_date
UNION
SELECT count(V.id) AS Visitors
,DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime)) AS Day
FROM Database.Clients.Clients V
WHERE V.CreateTime BETWEEN #Start_date
AND #End_date
AND V.WADID = #WADID
AND (
V.WAPID = #WAPID
OR #WAPID IS NULL
)
GROUP BY DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
ORDER BY Day DESC
Was researching it whole last day and still can't get it working :/
I think that left joining your calendar table to your current query actually was the right thing to do. That being said, you can remedy your current situation by simply aggregating on the day, e.g. using MAX. By default, NULL values for each day would be ignored, so long as there is a non null visitor count present:
WITH cte AS (
SELECT NULL AS Visitors, D.DATE AS Day
FROM Database.support.dates D
WHERE D.DATE BETWEEN #Start_date AND #End_date
UNION
SELECT COUNT(V.id), DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
FROM Database.Clients.Clients V
WHERE V.CreateTime BETWEEN #Start_date AND #End_date AND
V.WADID = #WADID AND (V.WAPID = #WAPID OR #WAPID IS NULL)
GROUP BY DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
)
SELECT Day, MAX(Visitors) AS Visitors -- filter off unwanted NULL values
FROM cte
GROUP BY Day
ORDER BY Day DESC;

SQL Time Series Completion Script

Version: SQL Server 2014
Objective: Create a complete time series with existing date range records.
Initial Data Setup:
IF OBJECT_ID('tempdb..#DataSet') IS NOT NULL
DROP TABLE #DataSet;
CREATE TABLE #DataSet (
RowID INT
,StartDt DATETIME
,EndDt DATETIME
,Col1 FLOAT);
INSERT INTO #DataSet (
RowID
,StartDt
,EndDt
,Col1)
VALUES
(1234,'1/1/2016','12/31/2999',100)
,(1234,'7/23/2016','7/27/2016',90)
,(1234,'7/26/2016','7/31/2016',80)
,(1234,'10/1/2016','12/31/2999',75);
Desired Results:
RowID, StartDt, EndDt, Col1
1234, '01/01/2016', '07/22/2016', 100
1234, '07/23/2016', '07/26/2016', 90
1234, '07/26/2016', '07/31/2016', 80
1234, '08/01/2016', '09/30/2016', 100
1234, '10/01/2016', '12/31/2999', 75
Not an easy task I will admit, If anyone has a suggestion on how to tackle this utilizing SQL alone (Microsoft 2014 TSQL) I would greatly appreciate it. Please keep in mind it is SQL and we want to try to avoid cursors at all costs based on performance for large data sets.
Thanks in Advance.
Also as an FYI I was able to achieve half of this by utilizing a LEAD windows function to set the End Date of the current record to the Startdate-1 of the next. The other half of filling gaps back in from previous records still eludes me.
Updated for the 9/31 to 9/30 date.
The following query does essentially what you are asking. You can tweak it to fit your requirements. Note that when checking the results of my query, your desired results contain 09/31/2016 which is not a valid date.
WITH
RankedData AS
(
SELECT RowID, StartDt, EndDt, Col1,
DATEADD(day, -1, StartDt) AS PrevEndDt,
RANK() OVER(ORDER BY StartDt, EndDt, RowID) AS rank_no
FROM #DataSet
),
HasGapsData AS
(
SELECT a.RowID, a.StartDt,
CASE WHEN b.PrevEndDt <= a.EndDt THEN b.PrevEndDt ELSE a.EndDt END AS EndDt,
a.Col1, a.rank_no
FROM RankedData a
LEFT JOIN RankedData b ON a.rank_no = b.rank_no - 1
)
SELECT RowID, StartDt, EndDt, Col1
FROM HasGapsData
UNION ALL
SELECT a.RowID,
DATEADD(day, 1, a.EndDt) AS StartDt,
DATEADD(day, -1, b.StartDt) AS EndDt,
a.Col1
FROM HasGapsData a
INNER JOIN HasGapsData b ON a.rank_no = b.rank_no - 1
WHERE DATEDIFF(day, a.EndDt, b.StartDt) > 1
ORDER BY StartDt, EndDt;

multiple extract() with WHERE clause possible?

So far I have come up with the below:
WHERE (extract(month FROM orders)) =
(SELECT min(extract(month from orderdate))
FROM orders)
However, that will consequently return zero to many rows, and in my case, many, because many orders exist within that same earliest (minimum) month, i.e. 4th February, 9th February, 15th Feb, ...
I know that a WHERE clause can contain multiple columns, so why wouldn't the below work?
WHERE (extract(day FROM orderdate)), (extract(month FROM orderdate)) =
(SELECT min(extract(day from orderdate)), min(extract(month FROM orderdate))
FROM orders)
I simply get: SQL Error: ORA-00920: invalid relational operator
Any help would be great, thank you!
Sample data:
02-Feb-2012
14-Feb-2012
22-Dec-2012
09-Feb-2013
18-Jul-2013
01-Jan-2014
Output:
02-Feb-2012
14-Feb-2012
Desired output:
02-Feb-2012
I recreated your table and found out you just messed up the brackets a bit. The following works for me:
where
(extract(day from OrderDate),extract(month from OrderDate))
=
(select
min(extract(day from OrderDate)),
min(extract(month from OrderDate))
from orders
)
Use something like this:
with cte1 as (
select
extract(month from OrderDate) date_month,
extract(day from OrderDate) date_day,
OrderNo
from tablename
), cte2 as (
select min(date_month) min_date_month, min(date_day) min_date_day
from cte1
)
select cte1.*
from cte1
where (date_month, date_day) = (select min_date_month, min_date_day from cte2)
A common table expression enables you to restructure your data and then use this data to do your select. The first cte-block (cte1) selects the month and the day for each of your table rows. Cte2 then selects min(month) and min(date). The last select then combines both ctes to select all rows from cte1 that have the desired month and day.
There is probably a shorter solution to that, however I like common table expressions as they are almost all the time better to understand than the "optimal, shortest" query.
If that is really what you want, as bizarre as it seems, then as a different approach you could forget the extracts and the subquery against the table to get the minimums, and use an analytic approach instead:
select orderdate
from (
select o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
from orders o
)
where rn = 1;
ORDERDATE
---------
01-JAN-14
The row_number() effectively adds a pseudo-column to every row in your original table, based on the month and day in the order date. The rn values are unique, so there will be one row marked as 1, which will be from the earliest day in the earliest month. If you have multiple orders with the same day/month, say 01-Jan-2013 and 01-Jan-2014, then you'll still only get exactly one with rn = 1, but which is picked is indeterminate. You'd need to add further order by conditions to make it deterministic, but I have no idea what you might want.
That is done in the inner query; the outer query then filters so that only the records marked with rn = 1 is returned; so you get exactly one row back from the overall query.
This also avoids the situation where the earliest day number is not in the earliest month number - say if you only had 01-Jan-2014 and 02-Feb-2014; comparing the day and month separately would look for 01-Feb-2014, which doesn't exist.
SQL Fiddle (with Thomas Tschernich's anwer thrown in too, giving the same result for this data).
To join the result against your invoice table, you don't need to join to the orders table again - especially not with a cross join, which is skewing your results. You can do the join (at least) two ways:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
) o, invoices i
WHERE i.invno = o.invno
AND rn = 1;
Or:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT orderno, orderdate, invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
)
WHERE rn = 1
) o, invoices i
WHERE i.invno = o.invno;
The first looks like it does more work but the execution plans are the same.
SQL Fiddle with your pastebin-supplied query that gets two rows back, and these two that get one.

SQL alias gives "invalide column name" for Group By

I have a problem trying to make an alias for new column and using it in GROUP BY clause:
SELECT TOP 100 Percent
count(id) AS [items_by_day],
(SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, [date]))) AS [date_part]
FROM [MyDB].[dbo].[MyTable]
GROUP BY DAY([date]), MONTH([date]), YEAR([date]), date_part
I get the following error:
Msg 207, Level 16, State 1, Line 5
Invalid column name 'date_part'.
How is it possible to solve the problem?
How about a subquery?
See my demo at sqlfiddle
Select Count(*) as nrOfRecords, sq.[items_by_day], sq.[date_part]
From (
SELECT TOP 100 Percent count(id) AS [items_by_day]
,(Select Dateadd(dd, 0, Datediff(dd, 0, [date]))) AS [date_part]
From [MyTable]
Group By id, date
) as sq
Group by sq.[items_by_day], sq.[date_part]
The part (SELECT DateAdd(... DateDiff(...)) seems to return the plain date. Can you explain what i am missing?
You cannot use a column alias in a GROUP BY, aliases are for display, unless when the alias is in a subquery, in this case , it becomes the column name.

Best way to get rid of unwanted sql subselects?

I have a table called Registrations with the following fields:
Id
DateStarted (not null)
DateCompleted (nullable)
I have a bar chart which shows the number of registrations started and completed by date.
My query looks like:
;
WITH Initial(DateStarted, StartCount)
as (
select Datestarted, COUNT(*)
FROM Registrations
GROUP BY DateStarted
)
select I.DateStarted, I.StartCount, COUNT(DISTINCT R.RegistrationId) as CompleteCount
from Initial I
inner join Registrations R
ON (I.DateStarted = R.DateCompleted)
GROUP BY I.DateStarted, I.StartCount
which returns a table that looks like:
DateStarted StartCount CompleteCount
2009-08-01 1033 903
2009-08-02 540 498
The query just has one of those code smell problems. What is a better way of doing this?
EDIT: so why wont the below work? you could throw coalesce() statements around the counts in the last select statement if you wanted to make the counts zero instead of null. it will also include dates that have completed (or ended in the example below) registrations even though that date doesn't have started registrations.
I am assuming the following table structure (roughly).
create table temp
(
id int,
start_date datetime,
end_date datetime
)
insert into temp values (1, '8/1/2009', '8/1/2009')
insert into temp values (2, '8/1/2009', '8/2/2009')
insert into temp values (3, '8/1/2009', null)
insert into temp values (4, '8/2/2009', '8/2/2009')
insert into temp values (5, '8/2/2009', '8/3/2009')
insert into temp values (6, '8/2/2009', '8/4/2009')
insert into temp values (7, '8/4/2009', null)
Then you could do the following to get what you want.
with start_helper as
(
select start_date, count(*) as count from temp group by start_date
),
end_helper as
(
select end_date, count(*) as count from temp group by end_date
)
select coalesce(a.start_date, b.end_date) as date, a.count as start_count, b.count as end_count
from start_helper a full outer join end_helper b on a.start_date = b.end_date
where coalesce(a.start_date, b.end_date) is not null
I would think the full outer join is necessary since a record can be completed today that started yesterday but we may have not started a new record today so you would lose a day from your results.
Off-hand, I think this does it:
SELECT
DateStarted
, COUNT(*) as StartCount
, SUM(CASE
WHEN DateCompleted = DateStated THEN 1
ELSE 0 END
) as CompleteCount
FROM Registration
GROUP BY DateStarted
OK, apparently I had the requirements wrong before. Given that the CompleteCounts are independent of the StartDate, then this is how I would do it:
;WITH StartDays AS
(
SELECT DateStarted
, Count(*) AS CompleteCount
FROM Registration
GROUP BY DateStarted
)
, CompleteDays AS
(
SELECT DateCompleted
, Count(*) AS StartCount
FROM Registration
GROUP BY DateCompleted
)
SELECT
DateStarted
, COALESCE(StartCount, 0) AS StartCount
, COALESCE(CompleteCount, 0) AS CompleteCount
FROM StartDays
FULL OUTER JOIN CompleteDays ON DateStarted = DateCompleted
Which actually is pretty close to what you had.
I don't see a problem. I see a common table expression being used.
You didn't provide DDL for the tables, so I'm not going to try to reproduce this. However, I think you may be able to directly substitute the SELECT for the use of Initial.
I believe the following is identical in function to what you have:
select DS.DateStarted
, count(distinct DS.RegistrationId) as StartCount
, count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
inner join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted
I'm a bit confused by the name of the column DateStarted in the results. It looks to just be a date where both some things started and some things ended. And the counts are the number or registrations started and completed that day.
The inner join is throwing away any date where there is either 0 starts or 0 completes. To get all:
select coalesce(DS.DateStarted, DC.DateCompleted) as "Date"
, count(distinct DS.RegistrationId) as StartCount
, count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
full outer join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted, DC.DateCompleted
If you wanted to include dates that are neither DateStarted nor DateCompleted, with counts of 0 and 0, then you will need a source of dates and I think it would be clearer to use two correlated sub-queries in select clause instead of joins and count distinct:
select DateSource."Date"
, (select count(*)
from Registrations
where DateStarted = DateSource."Date") as StartCount
, (select count (*)
from Registrations
where DateCompleted = DateSource."Datge") as CompleteCount
from DateSource -- implementation of date source left as exercise
where DateSource.Date between #LowDate and #HighDate