Query to find duplicate rows in a table

Query to find duplicate rows in a table - tsql

I am running the following query which is terribly inefficient and can take hours. I am having SQL brain farts today and I do not know how to improve this query. There are several nullable varchar fields, and I need to identify the duplicate rows (all columns containing identical values as another row)
select * from transactions x where exists (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions y
where Coalesce(x.ColA, '') = Coalesce(x.ColA, '') and
Coalesce(x.ColB, '') = Coalesce(x.ColB, '') and
Coalesce(x.ColC, '') = Coalesce(x.ColC, '')
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
)
Why does this take so long to run? There has to be a better way.

You could improve it by
removing unnecesssary checks
putting a composite index on ColA, ColB and ColC
What is unnecessary? It seems to be unnecessary to join the table with itself. Why don't you use a simple GROUP BY? You also don't need the WHERE:
SELECT COALESCE(ColA, '') AS ColA,
COALESCE(ColB, '') AS ColB,
COALESCE(ColC, '') AS ColC,
Count(*) As Cnt
FROM transactions t
GROUP BY COALESCE(ColA, ''), COALESCE(ColB, ''), COALESCE(ColC, '')
HAVING Count(*) > 1

Does this work?
DECLARE #transactions TABLE (
ColA INT
, ColB INT
, ColC INT
, ColD INT
, ColE INT
, ColF INT
)
DECLARE #Counter1 INT = 0
WHILE #Counter1 < 10000
BEGIN
SET #Counter1 += 1
INSERT INTO #transactions
SELECT ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
END
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ColA, ColB, ColC, ColD, ColE, ColF
ORDER BY ColA, ColB, ColC, ColD, ColE, ColF) AS rn
FROM #transactions
)
SELECT * FROM Dupe WHERE rn > 1
You can use an ISNULL on anything where you need to compare a value that might be null. Note that most of this I've written is just to generate a useful data set. With 6 columns and 10,000 rows I got 42 identical rows in less than a second. No triples. Bumped it up to 100,000 rows and I got 3,489 duplicate rows, including some triples. Took 3 seconds.
Here's an example using text. This whole thing took 25 seconds on 100,000 records, although my timer shows that less than 4 of that was finding the duplicates, with the remainder being the table population.
DECLARE #transactions2 TABLE (
ColA NVARCHAR(30)
, ColB NVARCHAR(30)
, ColC NVARCHAR(30)
, ColD NVARCHAR(30)
, ColE NVARCHAR(30)
, ColF NVARCHAR(30)
)
DECLARE #names TABLE (
ID INT IDENTITY
, Name NVARCHAR(30)
)
DECLARE #Counter2 INT = 0
, #ColA NVARCHAR(30)
, #ColB NVARCHAR(30)
, #ColC NVARCHAR(30)
, #ColD NVARCHAR(30)
, #ColE NVARCHAR(30)
, #ColF NVARCHAR(30)
INSERT INTO #names VALUES
('Anderson, Arthur')
, ('Broberg, Bruce')
, ('Chan, Charles')
, ('Davidson, Darwin')
, ('Eggert, Emily')
, ('Fox, Francesca')
, ('Garbo, Greta')
, ('Hollande, Hortense')
, ('Iguadolla, Ignacio')
, ('Jackson, Jurimbo')
, ('Katana, Ken')
, ('Lawrence, Larry')
, ('McDonald, Michael')
, ('Nyugen, Nathan')
, ('O''Dell, Oliver')
, ('Peterson, Phillip')
, ('Quigley, Quentin')
, ('Ramallah, Rodolfo')
, ('Smith, Samuel')
, ('Turner, Theodore')
, ('Uno, Umberto')
, ('Victor, Victoria')
, ('Wallace, William')
, ('Xing, Xiopan')
, ('Young, Yvette')
, ('Zapata, Zorro')
, (NULL)
WHILE #Counter2 < 100000
BEGIN
SET #Counter2 += 1
SET #ColA = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColB = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColC = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColD = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColE = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColF = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
INSERT INTO #transactions2
SELECT #ColA, #ColB, #ColC, #ColD, #ColE, #ColD
END
PRINT CAST(GETDATE() AS DateTime2 (3))
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')
ORDER BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')) AS rn
FROM #transactions2
)
SELECT * FROM Dupe WHERE rn > 1 ORDER BY rn
PRINT CAST(GETDATE() AS DateTime2 (3))

Here is a much faster way using a subquery join. It ran in under 10 seconds
select * from transactions x
join (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
) dups on
dups.ColA = x.ColA and
dups.ColB = x.ColB and
dups.ColC = x.ColC
The important thing about this query is that it returns both/all rows, not just the duplicate(s)

If this is a one time job, and involves a huge number of rows, and not to be made as a View, then perhaps you'd opt to INSERT SELECT it into a table with UNIQUE index with IGNORE_DUP_KEY option.

Related

SQL Server split overlapping date ranges

I need to split date ranges that overlap. I have a primary table (I've called it Employment for this example), and I need to return all Begin-End date ranges for a person from this table. I also have multiple sub tables (represented by Car and Food), and I want to return the value that was active in the sub tables during the times given in the main tables. This will involve splitting the main table date ranges when a sub table item changes.
I don't want to return sub table information for dates not in the main tables.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
For the above data, the results should be:
Person_ID Employment Food Car Begin_Date End_Date
123 ACME 1986-01-01 1986-04-30
123 ACME Red Car 1986-05-01 1990-12-31
123 Office Corp Red Car 1995-05-15 1996-12-31
123 Office Corp Eggs Red Car 1997-01-01 1997-03-09
123 Office Corp Red Car 1997-03-10 1997-06-23
123 Office Corp 1997-06-24 1997-07-02
123 Office Corp Blue Car 1997-07-03 1998-10-03
123 Job 3 Blue Car 1998-10-04 2001-02-22
123 Job 3 Donuts Blue Car 2001-02-23 2001-02-25
123 Job 3 Blue Car 2001-02-26 2999-12-31
The first row is his time working for ACME, where he didn't have a car or a weird food obsession. In the second row, he purchased a car, and still worked at ACME. In the third row, he changed jobs to Office Corp, but still has the Red Car. Note how we're not returning data during his unemployment gap, even though he had the Red Car. We only want to know what was in the Car and Food tables during the times there are values in the Employment table.
I found a solution for SQL Server 2012 that uses the LEAD/LAG functions to accomplish this, but I'm stuck with 2008 R2.

To change the 2012 solution from that blog to work with 2008, you need to replace the LEAD in the following
with
ValidDates as …
,
ValidDateRanges1 as
(
select EmployeeNo, Date as ValidFrom, lead(Date,1) over (partition by EmployeeNo order by Date) ValidTo
from ValidDates
)
There are a number of ways to do this, but one example is a self join to the same table + 1 row (which is effectively what a lead does). One way to do this is to put a rownumber on the previous table (so it is easy to find the next row) by adding another intermediate CTE (eg ValidDatesWithRowno). Then do a left outer join to that table where EmployeeNo is the same and rowno = rowno + 1, and use that value to replace the lead. If you wanted a lead 2, you would join to rowno + 2, etc. So the 2008 version would look something like
with
ValidDates as …
,
ValidDatesWithRowno as --This is the ValidDates + a RowNo for easy self joining below
(
select EmployeeNo, Date, ROW_NUMBER() OVER (ORDER BY EmployeeNo, Date) as RowNo from ValidDates
)
,
ValidDateRanges1 as
(
select VD.EmployeeNo, VD.Date as ValidFrom, VDLead1.Date as ValidTo
from ValidDatesWithRowno VD
left outer join ValidDatesWithRowno VDLead1 on VDLead1.EmployeeNo = VD.EmployeeNo
and VDLead1.RowNo = VD.RowNo + 1
)
The rest of the solution described looks like it will work like you want on 2008.

Here is the answer I came up with. It works, but it's not very pretty.
It goes it two waves, first splitting any overlapping Employment/Car dates, then running the same SQL a second time add the Food dates and split any overlaps again.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
DECLARE #Person_ID INT = 123;
--A table to hold date ranges that need to be merged together
DECLARE #DatesToMerge TABLE
(
ID INT,
Person_ID INT,
Date_Type VARCHAR(10),
Begin_Date DATETIME,
End_Date DATETIME
)
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Car])
, Person_ID
, 'Car'
, Begin_Date
, End_Date
FROM #Car
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Employment])
, Person_ID
, 'Employment'
, Begin_Date
, End_Date
FROM #Employment
WHERE Person_ID = #Person_ID;
--A table to hold the merged #Employment and Car records
DECLARE #EmploymentAndCar TABLE
(
RowNumber INT,
Person_ID INT,
Begin_Date DATETIME,
End_Date DATETIME
)
;
WITH CarCTE AS
(--This CTE grabs just the Car rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Car'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Car dates for each #Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Car and #Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Car'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
INSERT INTO #EmploymentAndCar ( RowNumber, Person_ID, Begin_Date, End_Date )
SELECT F.RowNumber
, F.Person_ID
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
ORDER BY F.Begin_Date
--------------------------------------------------------------------------------------------------
--Now that the Employment and Car dates have been merged, empty the DatesToMerge table
DELETE FROM #DatesToMerge;
--Reload the DatesToMerge table with the newly-merged Employment and Car records,
--and the Food records that still need to be merged
INSERT INTO #DatesToMerge
SELECT RowNumber
, Person_ID
, 'PtBCar'
, Begin_Date
, End_Date
FROM #EmploymentAndCar
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Food])
, Person_ID
, 'Food'
, Begin_Date
, End_Date
FROM #Food
WHERE Person_ID = #Person_ID
;
WITH CarCTE AS
(--This CTE grabs just the Food rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Food'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Food dates for each Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Food and Car/Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Food'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
SELECT DISTINCT
F.Person_ID
, Employment
, Car
, Food
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
LEFT JOIN #Car Car
ON Car.[Begin_Date] <= F.Begin_Date
AND Car.[End_Date] >= F.[EndDate]
AND Car.Person_ID = #Person_ID
LEFT JOIN #Food Food
ON Food.[Begin_Date] <= F.[Begin_Date]
AND Food.[End_Date] >= F.[EndDate]
AND Food.Person_ID = #Person_ID
ORDER BY F.Begin_Date
If anyone has a more elegant solution, I will be happy to accept their answer.

sql compare columns to get result

I have the following issue. I have products with 3 different states. Parent, Child and products which are orphans. I am setting Parents as 1, Children as 2 and Orphans as 0. I am struggling to get the Orphan to set to 0. I realise that counting the amount of Parent PLU's is where I am going wrong but I do not know how to resolve this issue. Any help would be appreciated. (As you maybe able to tell, I am a noob and constructive criticism would be appreciated)
Kind Regards,
Jason.
Picture of results from query
declare #OrderID int = 1635
declare #Store char(3) = '001'
declare #SortedBy smallint = 2
DECLARE #tbl TABLE (DetailID int, OrderID int, PLU nvarchar(35), ParentPLU nvarchar(35))
INSERT INTO #tbl (DetailID, OrderID, PLU, ParentPLU)
SELECT DetailID, OrderDetails.OrderID, OrderDetails.PLU, OrderDetails.ParentPLU
FROM OrderDetails
INNER JOIN PLU
ON PLU.PLU = OrderDetails.PLU
WHERE OrderDetails.OrderID = #OrderID
AND OrderDetails.OrderStore = #Store
SELECT DetailID, OrderID, PLU, ParentPLU,
CASE WHEN ( SELECT COUNT(DISTINCT ParentPLU)
FROM #tbl
WHERE ParentPLU IN (SELECT PLU FROM #tbl)
) > 0 AND ParentPLU = '' THEN 1
WHEN ( SELECT COUNT(DISTINCT ParentPLU)
FROM #tbl
WHERE ParentPLU IN (SELECT PLU FROM #tbl)
) > 0 THEN 2
ELSE
0
END AS ParentChild,
ROW_NUMBER() OVER (ORDER BY
CASE WHEN #SortedBy = 1 THEN OrderID END ASC,
CASE WHEN #SortedBy = 2 THEN DetailID END ASC
) AS ID
FROM #tbl

You can use coalesce to get your desired result. First subquery checks for parent state, second for children. If both are null, then it is orphan
select
DetailID, OrderID, PLU, ParentPLU
, coalesce((
select
distinct 1
from
#tbl b
where
a.PLU = b.ParentPlu
)
, (
select
distinct 2
from
#tbl b
where
b.PLU = a.ParentPlu
), 0)
from
#tbl a

Can I insert a dynamic number of rows into a table using values from the table?

I want to insert a dynamic number of rows into a table, based on information in that table.
I can do it using the code below, but I'm wondering if there's a way to avoid the loop.
The commented out section was my best attempt at what I was trying to do, but it gave me an error of:
"The reference to column "iCount" is not allowed in an argument to a TOP, OFFSET, or FETCH clause. Only references to columns at an outer scope or standalone expressions and subqueries are allowed here."
DECLARE #TableX TABLE (
TDate DATE
, TType INT
, Fruit NVARCHAR(20)
, Vegetable NVARCHAR(20)
, Meat NVARCHAR(20)
, Bread NVARCHAR(20)
)
INSERT INTO #TableX VALUES
('2016-11-10',1,'Apple','Artichoke',NULL,NULL)
, ('2016-11-10',1,'Banana','Beet',NULL,NULL)
, ('2016-11-10',1,'Canteloupe','Cauliflower',NULL,NULL)
, ('2016-11-10',1,'Durian','Daikon',NULL,NULL)
, ('2016-11-10',2,NULL,NULL,'Rabbit','Rye')
, ('2016-11-10',2,NULL,NULL,'Sausage','Sourdough')
, ('2016-11-11',1,'Elderberry','Eggplant',NULL,NULL)
, ('2016-11-11',2,NULL,NULL,'Turkey','Tortilla')
, ('2016-11-11',2,NULL,NULL,'Venison','Vienna')
SELECT * FROM #TableX
DECLARE #BlankRow TABLE (
ID INT IDENTITY
, TDate DATE
, TType INT
, iCount INT
)
DECLARE #Counter1 INT = 0
, #RowCount INT
; WITH BR1
AS (
SELECT TDate, TType, COUNT(*) AS iCount
FROM #TableX
WHERE TType = 1
GROUP BY TDate, TType
)
, BR2
AS (
SELECT TDate, TType, COUNT(*) AS iCount
FROM #TableX
WHERE TType = 2
GROUP BY TDate, TType
)
INSERT INTO #BlankRow
SELECT ISNULL(BR1.TDate, BR2.TDate) AS TDate,
CASE WHEN ISNULL(BR1.iCount,0) < ISNULL(BR2.iCount,0) THEN 1 ELSE 2 END AS TType,
ABS(ISNULL(BR1.iCount,0) - ISNULL(BR2.iCount,0)) AS iCount
FROM BR1
FULL JOIN BR2
ON BR1.TDate = BR2.TDate
WHILE #Counter1 < (SELECT MAX(ID) FROM #BlankRow)
BEGIN
SET #Counter1 += 1
SET #RowCount = (SELECT iCount FROM #BlankRow WHERE ID = #Counter1)
INSERT INTO #TableX
SELECT TOP (#RowCount) tx.TDate, br.TType, NULL, NULL, NULL, NULL
FROM #TableX tx
LEFT JOIN #BlankRow br
ON tx.TDate = br.TDate
WHERE br.ID = #Counter1
END
/*INSERT INTO #TableX
SELECT TOP (tx.iCount) tx.TDate, br.TType, NULL, NULL, NULL, NULL
FROM #TableX tx
JOIN #BlankRow br
ON tx.TDate = br.TDate*/
SELECT *
FROM #TableX
ORDER BY TDate, TType,
ISNULL(Fruit,REPLICATE(CHAR(255),20)),
ISNULL(Vegetable,REPLICATE(CHAR(255),20)),
ISNULL(Meat,REPLICATE(CHAR(255),20)),
ISNULL(Bread,REPLICATE(CHAR(255),20))
The data is silly, I know, but my end goal is to have two different Tablix's in ReportBuilder that end up with the same number of rows so the headers of my groups show up at the same place on the page.

Something like this:
declare #TableX table(TDate date
,TType int
,Fruit nvarchar(20)
,Vegetable nvarchar(20)
,Meat nvarchar(20)
,Bread nvarchar(20)
);
insert into #TableX values
('2016-11-10',1,'Apple','Artichoke',NULL,NULL)
,('2016-11-10',1,'Banana','Beet',NULL,NULL)
,('2016-11-10',1,'Canteloupe','Cauliflower',NULL,NULL)
,('2016-11-10',1,'Durian','Daikon',NULL,NULL)
,('2016-11-10',2,NULL,NULL,'Rabbit','Rye')
,('2016-11-10',2,NULL,NULL,'Sausage','Sourdough')
,('2016-11-11',1,'Elderberry','Eggplant',NULL,NULL)
,('2016-11-11',2,NULL,NULL,'Turkey','Tortilla')
,('2016-11-11',2,NULL,NULL,'Venison','Vienna');
with DataRN as
(
select *
,row_number() over (partition by TDate, TType order by TDate) rn
from #TableX
)
,RowsRN as
(
select tt.TDate
,tt.TType
,td.rn
from (select distinct TDate, TType
from #TableX
) tt
full join (select distinct t1.TDate
,row_number() over (partition by t1.TDate, t1.TType order by t1.TDate) rn
from #TableX t1
) td
on(tt.TDate = td.TDate)
)
select r.TDate
,r.TType
,d.Fruit
,d.Vegetable
,d.Meat
,d.Bread
from DataRN d
full join RowsRN r
on(d.TDate = r.TDate
and d.TType = r.TType
and d.rn = r.rn
)
order by r.TDate
,r.TType
,isnull(d.Fruit,REPLICATE(CHAR(255),20))
,isnull(d.Vegetable,REPLICATE(CHAR(255),20))
,isnull(d.Meat,REPLICATE(CHAR(255),20))
,isnull(d.Bread,REPLICATE(CHAR(255),20))
In response to your comment, here is how you would use another cte to generate the full list of dates that you would need, if you havn't got a Dates reference table already (These are tremendously useful):
declare #MinDate date = (select min(TDate) from #TableX);
declare #MaxDate date = (select max(TDate) from #TableX);
with Dates as
(
select #MinDate as DateValue
union all
select dateadd(d,1,DateValue)
from Dates
where DateValue < #MaxDate
)
select DateValue
from Dates
option (maxrecursion 0);

reuse table data in round robin manner

Let us say I have some data I would like to repeat N times. A naive approach would be this:
IF OBJECT_ID('dbo.Data', 'U') IS NOT NULL
DROP TABLE dbo.Data
CREATE TABLE Data
(
DataId INT NOT NULL PRIMARY KEY,
DataValue NVARCHAR(MAX) NOT NULL
)
INSERT INTO Data (DataId, DataValue)
SELECT 1, 'Value1' UNION ALL
SELECT 2, 'Value2' UNION ALL
SELECT 3, 'Value3' UNION ALL
SELECT 4, 'Value4' UNION ALL
SELECT 5, 'Value5'
DECLARE #RowsRequired INT
DECLARE #Counter INT
DECLARE #NumberOfRows INT
SET #RowsRequired = 22
IF OBJECT_ID('tempdb..#TempData') IS NOT NULL DROP TABLE #TempData
CREATE TABLE #TempData
(
Id INT IDENTITY(1,1),
DataValue NVARCHAR(MAX)
)
SELECT #NumberOfRows = COUNT(*) FROM Data
SET #Counter = 1
WHILE #RowsRequired > 0
BEGIN
INSERT INTO #TempData
SELECT DataValue FROM Data WHERE DataId = #Counter
SET #Counter = #Counter + 1
SET #RowsRequired = #RowsRequired - 1
IF(#Counter > #NumberOfRows)
BEGIN
SET #Counter = 1
END
END
SELECT * FROM #TempData
Here #RowsRequired determines how many rows are required. Could this be rephrased in a set based form? Thanks.
Here is a SQLFiddle with the code.

Try this instead:
DECLARE #RowsRequired INT = 22
;WITH CTE AS
(
SELECT DataId, DataValue, ROW_NUMBER() over (PARTITION BY DataId ORDER BY DataId) sort
FROM DATA
CROSS JOIN
(
SELECT TOP (#RowsRequired) 0 d
FROM master..spt_values
) d
)
SELECT TOP (#RowsRequired) ROW_NUMBER() over (order by sort), DataValue
FROM CTE
ORDER BY sort, 1

I tried this and worked for me.
declare #requiredrows int
set #requiredrows = 22;
declare #foreachrow int
select #foreachrow = #requiredrows / Count(*) from Data;
select top (#requiredrows) * from
(
select *, ROW_NUMBER() over(partition by dataId order by number) rno
from Data
Cross Join master..spt_values
) A
where rno <= #foreachrow + 1
Hope it will help.

Percentage of Values for Top 3 from a Character Field

I have an unusual situation. Please consider the following code:
IF OBJECT_ID('tempdb..#CharacterTest') IS NOT NULL
DROP TABLE #CharacterTest
CREATE TABLE #CharacterTest
(
[ID] int IDENTITY(1, 1) NOT NULL,
[CharField] varchar(50) NULL
)
INSERT INTO #CharacterTest (CharField)
VALUES ('A')
, ('A')
, ('A')
, ('A')
, ('B')
, ('B')
, ('B')
, ('C')
, ('C')
, ('D')
, ('D')
, ('F')
, ('G')
, ('H')
, ('I')
, ('J')
, ('K')
, ('L')
, ('M')
, ('N')
, (' ')
, (' ')
, (' ')
, (NULL)
, ('');
I would like a query which gives me a character string like this:
A (16%), B (12%), C(8%)
Please notice the following:
I don't want to have empty strings, strings with all blanks, or nulls listed in the top 3, but I do want the percentage of values calculated using the entire record count for the table.
Ties can be ignored, so if there were 22 values in the list with 8% frequency, it's alright to simply return whichever one is first.
Percentages can be rounded to whole numbers.
I'd like to find the easiest way to write this query while still retaining T-SQL compatibility back to SQL Server 2005. What is the best way to do this? Window Functions?

I'd go for.
WITH T1
AS (SELECT [CharField],
100.0 * COUNT(*) OVER (PARTITION BY [CharField]) /
COUNT(*) OVER () AS Pct
FROM #CharacterTest),
T2
AS (SELECT DISTINCT TOP 3 *
FROM T1
WHERE [CharField] <> '' --Excludes all blank or NULL as well
ORDER BY Pct DESC)
SELECT STUFF((SELECT ',' + [CharField] + ' (' + CAST(CAST(ROUND(Pct,1) AS INT) AS VARCHAR(3)) + ')'
FROM T2
ORDER BY Pct DESC
FOR XML PATH('')), 1, 1, '') AS Result

My first attempt would probably be this. Not saying that it's the best way to handle it, but that it would work.
DECLARE #TotalCount INT
SELECT #TotalCount = COUNT(*) FROM #CharacterTest AS ct
SELECT TOP(3) CharField, COUNT(*) * 1.0 / #TotalCount AS OverallPercentage
FROM #CharacterTest AS ct
WHERE CharField IS NOT NULL AND REPLACE(CharField, ' ', '') <> ''
GROUP BY CharField
ORDER BY COUNT(*) desc
DROP TABLE #CharacterTest

This should get the character string you need:
declare #output varchar(200);
with cte as (
select CharField
, (count(*) * 100) / (select count(*) from #CharacterTest) as CharPct
, row_number() over (order by count(*) desc, CharField) as RowNum
from #CharacterTest
where replace(CharField, ' ', '') not like ''
group by CharField
)
select #output = coalesce(#output + ', ', '') + CharField + ' (' + cast(CharPct as varchar(11)) + '%)'
from cte
where RowNum <= 3
order by RowNum;
select #output;
-- Returns:
-- A (16%), B (12%), C (8%)
I would draw attention to storing a single character in a varchar(50) column, however.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Query to find duplicate rows in a table - tsql

If this is a one time job, and involves a huge number of rows, and not to be made as a View, then perhaps you'd opt to INSERT SELECT it into a table with UNIQUE index with IGNORE_DUP_KEY option.

Related

SQL Server split overlapping date ranges

sql compare columns to get result

Can I insert a dynamic number of rows into a table using values from the table?

reuse table data in round robin manner

Percentage of Values for Top 3 from a Character Field

Categories

Resources