I have the following sample data:
declare #table table
(
Tbl char(2)
, TxDate date
, AutoIdx int
, Debit float
, Credit float
, Order_No varchar(50)
, ExtOrderNum varchar(50)
, Reference varchar(50)
, Reference2 varchar(50)
, Description varchar(100)
, AuditNumber varchar(50)
)
insert into #table
(
Tbl
, TxDate
, AutoIdx
, Debit
, Credit
, Order_No
, ExtOrderNum
, Reference
, Reference2
, Description
, AuditNumber
)
values
('GL','2020-03-18',877224,0,2306.9,'PO07673','blue/Prov','60','GRV8399','Purchase Order','26382.0001')
, ('AP','2020-03-18',265586,0,2306.9,'PO07673','blue/Prov','60','GRV8399','Purchase Order','26382.0001')
, ('AP','2020-03-18',265607,0,2306.9,'PO07673','blue/Prov','60','GRV8455','Purchase Order','26391.0001')
, ('GL','2020-03-18',877518,0,2306.9,'PO07673','blue/Prov','60','GRV8455','Purchase Order','26391.0001')
, ('GL','2020-03-18',877530,2.57,0,'PO07673','60',60,'GRV8481','Accounts Payable','26391.0002')
, ('GL','2020-03-18',877525,0,23008.37,'PO07673','60',60,'GRV8481','Purchase Order','26391.0002')
, ('AP','2020-03-18',265608,0,23008.37,'PO07673','60',60,'GRV8481','Purchase Order','26391.0002')
When I run this query:
select
rank() over (order by TxDate,AuditNumber) rnk
, *
from #table
I get the following Results:
I'm trying to generate a new number according to each different AuditNumber, however it doesn't count from 1 to 3, but 1,3,5.
I've tried partition by & row_number, etc. but then I get only the number 1.
What am I missing?
My Expected Results:
Use DENSE_RANK, which always advances the rank counter by 1 for each group of identically-valued records:
SELECT
*, DENSE_RANK() OVER (ORDER BY TxDate, AuditNumber) rnk
FROM #table;
Related
I need to split date ranges that overlap. I have a primary table (I've called it Employment for this example), and I need to return all Begin-End date ranges for a person from this table. I also have multiple sub tables (represented by Car and Food), and I want to return the value that was active in the sub tables during the times given in the main tables. This will involve splitting the main table date ranges when a sub table item changes.
I don't want to return sub table information for dates not in the main tables.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
For the above data, the results should be:
Person_ID Employment Food Car Begin_Date End_Date
123 ACME 1986-01-01 1986-04-30
123 ACME Red Car 1986-05-01 1990-12-31
123 Office Corp Red Car 1995-05-15 1996-12-31
123 Office Corp Eggs Red Car 1997-01-01 1997-03-09
123 Office Corp Red Car 1997-03-10 1997-06-23
123 Office Corp 1997-06-24 1997-07-02
123 Office Corp Blue Car 1997-07-03 1998-10-03
123 Job 3 Blue Car 1998-10-04 2001-02-22
123 Job 3 Donuts Blue Car 2001-02-23 2001-02-25
123 Job 3 Blue Car 2001-02-26 2999-12-31
The first row is his time working for ACME, where he didn't have a car or a weird food obsession. In the second row, he purchased a car, and still worked at ACME. In the third row, he changed jobs to Office Corp, but still has the Red Car. Note how we're not returning data during his unemployment gap, even though he had the Red Car. We only want to know what was in the Car and Food tables during the times there are values in the Employment table.
I found a solution for SQL Server 2012 that uses the LEAD/LAG functions to accomplish this, but I'm stuck with 2008 R2.
To change the 2012 solution from that blog to work with 2008, you need to replace the LEAD in the following
with
ValidDates as …
,
ValidDateRanges1 as
(
select EmployeeNo, Date as ValidFrom, lead(Date,1) over (partition by EmployeeNo order by Date) ValidTo
from ValidDates
)
There are a number of ways to do this, but one example is a self join to the same table + 1 row (which is effectively what a lead does). One way to do this is to put a rownumber on the previous table (so it is easy to find the next row) by adding another intermediate CTE (eg ValidDatesWithRowno). Then do a left outer join to that table where EmployeeNo is the same and rowno = rowno + 1, and use that value to replace the lead. If you wanted a lead 2, you would join to rowno + 2, etc. So the 2008 version would look something like
with
ValidDates as …
,
ValidDatesWithRowno as --This is the ValidDates + a RowNo for easy self joining below
(
select EmployeeNo, Date, ROW_NUMBER() OVER (ORDER BY EmployeeNo, Date) as RowNo from ValidDates
)
,
ValidDateRanges1 as
(
select VD.EmployeeNo, VD.Date as ValidFrom, VDLead1.Date as ValidTo
from ValidDatesWithRowno VD
left outer join ValidDatesWithRowno VDLead1 on VDLead1.EmployeeNo = VD.EmployeeNo
and VDLead1.RowNo = VD.RowNo + 1
)
The rest of the solution described looks like it will work like you want on 2008.
Here is the answer I came up with. It works, but it's not very pretty.
It goes it two waves, first splitting any overlapping Employment/Car dates, then running the same SQL a second time add the Food dates and split any overlaps again.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
DECLARE #Person_ID INT = 123;
--A table to hold date ranges that need to be merged together
DECLARE #DatesToMerge TABLE
(
ID INT,
Person_ID INT,
Date_Type VARCHAR(10),
Begin_Date DATETIME,
End_Date DATETIME
)
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Car])
, Person_ID
, 'Car'
, Begin_Date
, End_Date
FROM #Car
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Employment])
, Person_ID
, 'Employment'
, Begin_Date
, End_Date
FROM #Employment
WHERE Person_ID = #Person_ID;
--A table to hold the merged #Employment and Car records
DECLARE #EmploymentAndCar TABLE
(
RowNumber INT,
Person_ID INT,
Begin_Date DATETIME,
End_Date DATETIME
)
;
WITH CarCTE AS
(--This CTE grabs just the Car rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Car'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Car dates for each #Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Car and #Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Car'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
INSERT INTO #EmploymentAndCar ( RowNumber, Person_ID, Begin_Date, End_Date )
SELECT F.RowNumber
, F.Person_ID
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
ORDER BY F.Begin_Date
--------------------------------------------------------------------------------------------------
--Now that the Employment and Car dates have been merged, empty the DatesToMerge table
DELETE FROM #DatesToMerge;
--Reload the DatesToMerge table with the newly-merged Employment and Car records,
--and the Food records that still need to be merged
INSERT INTO #DatesToMerge
SELECT RowNumber
, Person_ID
, 'PtBCar'
, Begin_Date
, End_Date
FROM #EmploymentAndCar
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Food])
, Person_ID
, 'Food'
, Begin_Date
, End_Date
FROM #Food
WHERE Person_ID = #Person_ID
;
WITH CarCTE AS
(--This CTE grabs just the Food rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Food'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Food dates for each Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Food and Car/Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Food'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
SELECT DISTINCT
F.Person_ID
, Employment
, Car
, Food
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
LEFT JOIN #Car Car
ON Car.[Begin_Date] <= F.Begin_Date
AND Car.[End_Date] >= F.[EndDate]
AND Car.Person_ID = #Person_ID
LEFT JOIN #Food Food
ON Food.[Begin_Date] <= F.[Begin_Date]
AND Food.[End_Date] >= F.[EndDate]
AND Food.Person_ID = #Person_ID
ORDER BY F.Begin_Date
If anyone has a more elegant solution, I will be happy to accept their answer.
I want to insert a dynamic number of rows into a table, based on information in that table.
I can do it using the code below, but I'm wondering if there's a way to avoid the loop.
The commented out section was my best attempt at what I was trying to do, but it gave me an error of:
"The reference to column "iCount" is not allowed in an argument to a TOP, OFFSET, or FETCH clause. Only references to columns at an outer scope or standalone expressions and subqueries are allowed here."
DECLARE #TableX TABLE (
TDate DATE
, TType INT
, Fruit NVARCHAR(20)
, Vegetable NVARCHAR(20)
, Meat NVARCHAR(20)
, Bread NVARCHAR(20)
)
INSERT INTO #TableX VALUES
('2016-11-10',1,'Apple','Artichoke',NULL,NULL)
, ('2016-11-10',1,'Banana','Beet',NULL,NULL)
, ('2016-11-10',1,'Canteloupe','Cauliflower',NULL,NULL)
, ('2016-11-10',1,'Durian','Daikon',NULL,NULL)
, ('2016-11-10',2,NULL,NULL,'Rabbit','Rye')
, ('2016-11-10',2,NULL,NULL,'Sausage','Sourdough')
, ('2016-11-11',1,'Elderberry','Eggplant',NULL,NULL)
, ('2016-11-11',2,NULL,NULL,'Turkey','Tortilla')
, ('2016-11-11',2,NULL,NULL,'Venison','Vienna')
SELECT * FROM #TableX
DECLARE #BlankRow TABLE (
ID INT IDENTITY
, TDate DATE
, TType INT
, iCount INT
)
DECLARE #Counter1 INT = 0
, #RowCount INT
; WITH BR1
AS (
SELECT TDate, TType, COUNT(*) AS iCount
FROM #TableX
WHERE TType = 1
GROUP BY TDate, TType
)
, BR2
AS (
SELECT TDate, TType, COUNT(*) AS iCount
FROM #TableX
WHERE TType = 2
GROUP BY TDate, TType
)
INSERT INTO #BlankRow
SELECT ISNULL(BR1.TDate, BR2.TDate) AS TDate,
CASE WHEN ISNULL(BR1.iCount,0) < ISNULL(BR2.iCount,0) THEN 1 ELSE 2 END AS TType,
ABS(ISNULL(BR1.iCount,0) - ISNULL(BR2.iCount,0)) AS iCount
FROM BR1
FULL JOIN BR2
ON BR1.TDate = BR2.TDate
WHILE #Counter1 < (SELECT MAX(ID) FROM #BlankRow)
BEGIN
SET #Counter1 += 1
SET #RowCount = (SELECT iCount FROM #BlankRow WHERE ID = #Counter1)
INSERT INTO #TableX
SELECT TOP (#RowCount) tx.TDate, br.TType, NULL, NULL, NULL, NULL
FROM #TableX tx
LEFT JOIN #BlankRow br
ON tx.TDate = br.TDate
WHERE br.ID = #Counter1
END
/*INSERT INTO #TableX
SELECT TOP (tx.iCount) tx.TDate, br.TType, NULL, NULL, NULL, NULL
FROM #TableX tx
JOIN #BlankRow br
ON tx.TDate = br.TDate*/
SELECT *
FROM #TableX
ORDER BY TDate, TType,
ISNULL(Fruit,REPLICATE(CHAR(255),20)),
ISNULL(Vegetable,REPLICATE(CHAR(255),20)),
ISNULL(Meat,REPLICATE(CHAR(255),20)),
ISNULL(Bread,REPLICATE(CHAR(255),20))
The data is silly, I know, but my end goal is to have two different Tablix's in ReportBuilder that end up with the same number of rows so the headers of my groups show up at the same place on the page.
Something like this:
declare #TableX table(TDate date
,TType int
,Fruit nvarchar(20)
,Vegetable nvarchar(20)
,Meat nvarchar(20)
,Bread nvarchar(20)
);
insert into #TableX values
('2016-11-10',1,'Apple','Artichoke',NULL,NULL)
,('2016-11-10',1,'Banana','Beet',NULL,NULL)
,('2016-11-10',1,'Canteloupe','Cauliflower',NULL,NULL)
,('2016-11-10',1,'Durian','Daikon',NULL,NULL)
,('2016-11-10',2,NULL,NULL,'Rabbit','Rye')
,('2016-11-10',2,NULL,NULL,'Sausage','Sourdough')
,('2016-11-11',1,'Elderberry','Eggplant',NULL,NULL)
,('2016-11-11',2,NULL,NULL,'Turkey','Tortilla')
,('2016-11-11',2,NULL,NULL,'Venison','Vienna');
with DataRN as
(
select *
,row_number() over (partition by TDate, TType order by TDate) rn
from #TableX
)
,RowsRN as
(
select tt.TDate
,tt.TType
,td.rn
from (select distinct TDate, TType
from #TableX
) tt
full join (select distinct t1.TDate
,row_number() over (partition by t1.TDate, t1.TType order by t1.TDate) rn
from #TableX t1
) td
on(tt.TDate = td.TDate)
)
select r.TDate
,r.TType
,d.Fruit
,d.Vegetable
,d.Meat
,d.Bread
from DataRN d
full join RowsRN r
on(d.TDate = r.TDate
and d.TType = r.TType
and d.rn = r.rn
)
order by r.TDate
,r.TType
,isnull(d.Fruit,REPLICATE(CHAR(255),20))
,isnull(d.Vegetable,REPLICATE(CHAR(255),20))
,isnull(d.Meat,REPLICATE(CHAR(255),20))
,isnull(d.Bread,REPLICATE(CHAR(255),20))
In response to your comment, here is how you would use another cte to generate the full list of dates that you would need, if you havn't got a Dates reference table already (These are tremendously useful):
declare #MinDate date = (select min(TDate) from #TableX);
declare #MaxDate date = (select max(TDate) from #TableX);
with Dates as
(
select #MinDate as DateValue
union all
select dateadd(d,1,DateValue)
from Dates
where DateValue < #MaxDate
)
select DateValue
from Dates
option (maxrecursion 0);
I'm trying to update a date in a temporary table using a parameter and looking at the last row number's date.
DECLARE #multiDayCourseDaysBetween INT = 3;
CREATE TABLE #Courses(TempId INT IDENTITY(1,1)
, [Date] DATE
, CourseTypeId INT
, OrganisationId INT
, Reference VARCHAR(100)
, CreatedByUserId INT
, CourseTypeCategoryId INT
, TrainersRequired INT);
CREATE TABLE #TempDates(TempId INT
, [Date] DATE
, LagDate DATE);
INSERT INTO #Courses([Date])
Values('2016-06-01')
INSERT INTO #Courses([Date])
Values('2016-06-02')
INSERT INTO #Courses([Date])
Values('2016-06-03')
INSERT INTO #TempDates(tempId, [date], LagDate)
SELECT TempId, [Date]
, LAG(c.[Date],1) OVER (ORDER BY [Date]) as LagDate
FROM #Courses c
UPDATE #TempDates
SET [Date] = DATEADD(dd, #multiDayCourseDaysBetween, LAG([Date],1) OVER (ORDER BY [Date]))
WHERE LagDate IS NOT NULL
But I receive an error - 'Windowed functions can only appear in the SELECT or ORDER BY clauses.'
For example the original dates would be
2016-06-01
2016-06-02
2016-06-03
but I would need them to become
2016-06-01
2016-06-04
2016-06-07
based off of 3 as a parameter.
Thanks for any help
Try changing the last statement to something like below :
WITH b AS (
SELECT
TempId
, [Date]
, FIRST_VALUE([Date]) OVER (ORDER BY [Date]) as FirstDate
, ROW_NUMBER() OVER (ORDER BY [Date]) AS rowRank
FROM
#TempDates
)
UPDATE b
SET [Date] = DATEADD(day, (rowRank-1)* #multiDayCourseDaysBetween, FirstDate)
WHERE
rowRank > 1;
I am running the following query which is terribly inefficient and can take hours. I am having SQL brain farts today and I do not know how to improve this query. There are several nullable varchar fields, and I need to identify the duplicate rows (all columns containing identical values as another row)
select * from transactions x where exists (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions y
where Coalesce(x.ColA, '') = Coalesce(x.ColA, '') and
Coalesce(x.ColB, '') = Coalesce(x.ColB, '') and
Coalesce(x.ColC, '') = Coalesce(x.ColC, '')
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
)
Why does this take so long to run? There has to be a better way.
You could improve it by
removing unnecesssary checks
putting a composite index on ColA, ColB and ColC
What is unnecessary? It seems to be unnecessary to join the table with itself. Why don't you use a simple GROUP BY? You also don't need the WHERE:
SELECT COALESCE(ColA, '') AS ColA,
COALESCE(ColB, '') AS ColB,
COALESCE(ColC, '') AS ColC,
Count(*) As Cnt
FROM transactions t
GROUP BY COALESCE(ColA, ''), COALESCE(ColB, ''), COALESCE(ColC, '')
HAVING Count(*) > 1
Does this work?
DECLARE #transactions TABLE (
ColA INT
, ColB INT
, ColC INT
, ColD INT
, ColE INT
, ColF INT
)
DECLARE #Counter1 INT = 0
WHILE #Counter1 < 10000
BEGIN
SET #Counter1 += 1
INSERT INTO #transactions
SELECT ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
END
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ColA, ColB, ColC, ColD, ColE, ColF
ORDER BY ColA, ColB, ColC, ColD, ColE, ColF) AS rn
FROM #transactions
)
SELECT * FROM Dupe WHERE rn > 1
You can use an ISNULL on anything where you need to compare a value that might be null. Note that most of this I've written is just to generate a useful data set. With 6 columns and 10,000 rows I got 42 identical rows in less than a second. No triples. Bumped it up to 100,000 rows and I got 3,489 duplicate rows, including some triples. Took 3 seconds.
Here's an example using text. This whole thing took 25 seconds on 100,000 records, although my timer shows that less than 4 of that was finding the duplicates, with the remainder being the table population.
DECLARE #transactions2 TABLE (
ColA NVARCHAR(30)
, ColB NVARCHAR(30)
, ColC NVARCHAR(30)
, ColD NVARCHAR(30)
, ColE NVARCHAR(30)
, ColF NVARCHAR(30)
)
DECLARE #names TABLE (
ID INT IDENTITY
, Name NVARCHAR(30)
)
DECLARE #Counter2 INT = 0
, #ColA NVARCHAR(30)
, #ColB NVARCHAR(30)
, #ColC NVARCHAR(30)
, #ColD NVARCHAR(30)
, #ColE NVARCHAR(30)
, #ColF NVARCHAR(30)
INSERT INTO #names VALUES
('Anderson, Arthur')
, ('Broberg, Bruce')
, ('Chan, Charles')
, ('Davidson, Darwin')
, ('Eggert, Emily')
, ('Fox, Francesca')
, ('Garbo, Greta')
, ('Hollande, Hortense')
, ('Iguadolla, Ignacio')
, ('Jackson, Jurimbo')
, ('Katana, Ken')
, ('Lawrence, Larry')
, ('McDonald, Michael')
, ('Nyugen, Nathan')
, ('O''Dell, Oliver')
, ('Peterson, Phillip')
, ('Quigley, Quentin')
, ('Ramallah, Rodolfo')
, ('Smith, Samuel')
, ('Turner, Theodore')
, ('Uno, Umberto')
, ('Victor, Victoria')
, ('Wallace, William')
, ('Xing, Xiopan')
, ('Young, Yvette')
, ('Zapata, Zorro')
, (NULL)
WHILE #Counter2 < 100000
BEGIN
SET #Counter2 += 1
SET #ColA = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColB = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColC = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColD = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColE = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColF = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
INSERT INTO #transactions2
SELECT #ColA, #ColB, #ColC, #ColD, #ColE, #ColD
END
PRINT CAST(GETDATE() AS DateTime2 (3))
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')
ORDER BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')) AS rn
FROM #transactions2
)
SELECT * FROM Dupe WHERE rn > 1 ORDER BY rn
PRINT CAST(GETDATE() AS DateTime2 (3))
Here is a much faster way using a subquery join. It ran in under 10 seconds
select * from transactions x
join (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
) dups on
dups.ColA = x.ColA and
dups.ColB = x.ColB and
dups.ColC = x.ColC
The important thing about this query is that it returns both/all rows, not just the duplicate(s)
If this is a one time job, and involves a huge number of rows, and not to be made as a View, then perhaps you'd opt to INSERT SELECT it into a table with UNIQUE index with IGNORE_DUP_KEY option.
I am trying to add a column that calculate the Percentage of total revenue and I am stuck with the following error:
Error: Msg 207, Level 16, State 1, Line 14 Invalid column name
'Customerkey'.
In that line I’m trying to join Table 1 and Table 3 but MS SQL Server won’t recognize T.Customerkey even though customerkey exists in the dbo.FactInternetSales table.
Also, when I add T.Grand_Tot_Rev in my Group By clause, it returns 0.04 for every row. I know it's wrong because I do not want T.Grand_Tot_Rev to be part of the aggregate, because it should remain constant for every record. How can I achieve that I am looking for? Thank you in advance. By the way, I am using the AdventureWorksDW2012 database.
SELECT fs.CustomerKey ,
M.Total_sales ,
M.Total_cost ,
M.Total_sales - M.Total_cost AS Total_Margin ,
T.Grand_Tot_Rev( M.Total_sales / T.Grand_Tot_Rev ) * 100 AS Prct_Total_Revenue
FROM dbo.FactInternetSales fs , -- Table 1 --
(
SELECT customerkey ,
SUM( SalesAmount )AS Total_Sales ,
SUM( TotalProductCost )Total_cost
FROM dbo.FactInternetSales
GROUP BY customerkey
) M , --Table 2 --
(
SELECT SUM( SalesAmount )AS Grand_Tot_Rev
FROM dbo.FactInternetSales
) T --Table 3 --
WHERE fs.CustomerKey = M.CustomerKey -- Join 1 --
AND M.CustomerKey = T.Customerkey -- Join 2 --
GROUP BY fs.CustomerKey ,
M.Total_sales ,
M.Total_cost ,
T.Grand_Tot_Rev
ORDER BY 2 DESC;
If you want the T.Grand_Tot_Rev as a constant over all rows try removing the second join AND M.CustomerKey = T.Customerkey -- Join 2 -- so the query looks like this:
SELECT fs.CustomerKey ,
M.Total_sales ,
M.Total_cost ,
M.Total_sales - M.Total_cost AS Total_Margin ,
T.Grand_Tot_Rev,
( M.Total_sales / T.Grand_Tot_Rev ) * 100 AS Prct_Total_Revenue
FROM dbo.FactInternetSales fs , -- Table 1 --
(
SELECT customerkey ,
SUM( SalesAmount )AS Total_Sales ,
SUM( TotalProductCost )Total_cost
FROM dbo.FactInternetSales
GROUP BY customerkey
) M , --Table 2 --
(
SELECT SUM( SalesAmount )AS Grand_Tot_Rev
FROM dbo.FactInternetSales
) T --Table 3 --
WHERE fs.CustomerKey = M.CustomerKey -- Join 1 --
--AND M.CustomerKey = T.Customerkey -- Join 2 --
GROUP BY fs.CustomerKey ,
M.Total_sales ,
M.Total_cost ,
T.Grand_Tot_Rev
ORDER BY 2 DESC;
Another way to write the same query that is a bit more compact and might have slightly better performance:
;WITH
T AS (
SELECT SUM(SalesAmount) AS Grand_Tot_Rev
FROM dbo.FactInternetSales
),
M AS (
SELECT customerkey ,
SUM(SalesAmount) AS Total_Sales ,
SUM(TotalProductCost) AS Total_cost
FROM dbo.FactInternetSales
GROUP BY CustomerKey
)
SELECT
customerkey ,
Total_Sales ,
Total_cost,
Total_Sales - Total_cost AS Total_Margin ,
Grand_Tot_Rev,
Total_Sales / Grand_Tot_Rev * 100 AS Prct_Total_Revenue
FROM M, T
ORDER BY 2 DESC;
To see the really small values you can force a conversion to a wider data type:
;WITH
T AS (
SELECT CAST(SUM(SalesAmount) AS decimal) AS Grand_Tot_Rev
FROM dbo.FactInternetSales
),
M AS (
SELECT customerkey ,
CAST(SUM(SalesAmount) AS decimal(15,10)) AS Total_Sales ,
CAST(SUM(TotalProductCost) AS decimal(15,10)) AS Total_cost
FROM dbo.FactInternetSales
GROUP BY CustomerKey
)
SELECT
customerkey ,
Total_Sales ,
Total_cost,
Total_Sales - Total_cost AS Total_Margin ,
Grand_Tot_Rev,
Total_Sales / Grand_Tot_Rev * 100 AS Prct_Total_Revenue
FROM M, T
ORDER BY 2 DESC;