Related
I need to split date ranges that overlap. I have a primary table (I've called it Employment for this example), and I need to return all Begin-End date ranges for a person from this table. I also have multiple sub tables (represented by Car and Food), and I want to return the value that was active in the sub tables during the times given in the main tables. This will involve splitting the main table date ranges when a sub table item changes.
I don't want to return sub table information for dates not in the main tables.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
For the above data, the results should be:
Person_ID Employment Food Car Begin_Date End_Date
123 ACME 1986-01-01 1986-04-30
123 ACME Red Car 1986-05-01 1990-12-31
123 Office Corp Red Car 1995-05-15 1996-12-31
123 Office Corp Eggs Red Car 1997-01-01 1997-03-09
123 Office Corp Red Car 1997-03-10 1997-06-23
123 Office Corp 1997-06-24 1997-07-02
123 Office Corp Blue Car 1997-07-03 1998-10-03
123 Job 3 Blue Car 1998-10-04 2001-02-22
123 Job 3 Donuts Blue Car 2001-02-23 2001-02-25
123 Job 3 Blue Car 2001-02-26 2999-12-31
The first row is his time working for ACME, where he didn't have a car or a weird food obsession. In the second row, he purchased a car, and still worked at ACME. In the third row, he changed jobs to Office Corp, but still has the Red Car. Note how we're not returning data during his unemployment gap, even though he had the Red Car. We only want to know what was in the Car and Food tables during the times there are values in the Employment table.
I found a solution for SQL Server 2012 that uses the LEAD/LAG functions to accomplish this, but I'm stuck with 2008 R2.
To change the 2012 solution from that blog to work with 2008, you need to replace the LEAD in the following
with
ValidDates as …
,
ValidDateRanges1 as
(
select EmployeeNo, Date as ValidFrom, lead(Date,1) over (partition by EmployeeNo order by Date) ValidTo
from ValidDates
)
There are a number of ways to do this, but one example is a self join to the same table + 1 row (which is effectively what a lead does). One way to do this is to put a rownumber on the previous table (so it is easy to find the next row) by adding another intermediate CTE (eg ValidDatesWithRowno). Then do a left outer join to that table where EmployeeNo is the same and rowno = rowno + 1, and use that value to replace the lead. If you wanted a lead 2, you would join to rowno + 2, etc. So the 2008 version would look something like
with
ValidDates as …
,
ValidDatesWithRowno as --This is the ValidDates + a RowNo for easy self joining below
(
select EmployeeNo, Date, ROW_NUMBER() OVER (ORDER BY EmployeeNo, Date) as RowNo from ValidDates
)
,
ValidDateRanges1 as
(
select VD.EmployeeNo, VD.Date as ValidFrom, VDLead1.Date as ValidTo
from ValidDatesWithRowno VD
left outer join ValidDatesWithRowno VDLead1 on VDLead1.EmployeeNo = VD.EmployeeNo
and VDLead1.RowNo = VD.RowNo + 1
)
The rest of the solution described looks like it will work like you want on 2008.
Here is the answer I came up with. It works, but it's not very pretty.
It goes it two waves, first splitting any overlapping Employment/Car dates, then running the same SQL a second time add the Food dates and split any overlaps again.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
DECLARE #Person_ID INT = 123;
--A table to hold date ranges that need to be merged together
DECLARE #DatesToMerge TABLE
(
ID INT,
Person_ID INT,
Date_Type VARCHAR(10),
Begin_Date DATETIME,
End_Date DATETIME
)
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Car])
, Person_ID
, 'Car'
, Begin_Date
, End_Date
FROM #Car
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Employment])
, Person_ID
, 'Employment'
, Begin_Date
, End_Date
FROM #Employment
WHERE Person_ID = #Person_ID;
--A table to hold the merged #Employment and Car records
DECLARE #EmploymentAndCar TABLE
(
RowNumber INT,
Person_ID INT,
Begin_Date DATETIME,
End_Date DATETIME
)
;
WITH CarCTE AS
(--This CTE grabs just the Car rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Car'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Car dates for each #Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Car and #Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Car'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
INSERT INTO #EmploymentAndCar ( RowNumber, Person_ID, Begin_Date, End_Date )
SELECT F.RowNumber
, F.Person_ID
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
ORDER BY F.Begin_Date
--------------------------------------------------------------------------------------------------
--Now that the Employment and Car dates have been merged, empty the DatesToMerge table
DELETE FROM #DatesToMerge;
--Reload the DatesToMerge table with the newly-merged Employment and Car records,
--and the Food records that still need to be merged
INSERT INTO #DatesToMerge
SELECT RowNumber
, Person_ID
, 'PtBCar'
, Begin_Date
, End_Date
FROM #EmploymentAndCar
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Food])
, Person_ID
, 'Food'
, Begin_Date
, End_Date
FROM #Food
WHERE Person_ID = #Person_ID
;
WITH CarCTE AS
(--This CTE grabs just the Food rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Food'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Food dates for each Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Food and Car/Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Food'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
SELECT DISTINCT
F.Person_ID
, Employment
, Car
, Food
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
LEFT JOIN #Car Car
ON Car.[Begin_Date] <= F.Begin_Date
AND Car.[End_Date] >= F.[EndDate]
AND Car.Person_ID = #Person_ID
LEFT JOIN #Food Food
ON Food.[Begin_Date] <= F.[Begin_Date]
AND Food.[End_Date] >= F.[EndDate]
AND Food.Person_ID = #Person_ID
ORDER BY F.Begin_Date
If anyone has a more elegant solution, I will be happy to accept their answer.
I got a problem regarding missing rows in a table that is giving me a headache.
As base data, I have the following table:
declare #table table
(
id1 int,
id2 int,
ch char(1) not null,
val int
)
insert into #table values (1112, 121, 'A', 12)
insert into #table values (1351, 121, 'A', 13)
insert into #table values (1411, 121, 'B', 81)
insert into #table values (1312, 7, 'C', 107)
insert into #table values (1401, 2, 'A', 107)
insert into #table values (1454, 2, 'D', 107)
insert into #table values (1257, 6, 'A', 1)
insert into #table values (1269, 6, 'B', 12)
insert into #table values (1335, 6, 'C', 12)
insert into #table values (1341, 6, 'D', 5)
insert into #table values (1380, 6, 'A', 3)
The output should be ordered by id2 and follow a fixed sequence of ch, which should repeat until next id2 begins.
Sequence:
'A'
'B'
'C'
'D'
If the sequence or the pattern is interrupted, it should fill the missing rows with null, so that i get this result table:
id1 id2 ch val
----------------------------
1112 121 'A' 12
NULL 121 'B' NULL
NULL 121 'C' NULL
NULL 121 'D' NULL
1351 121 'A' 13
1411 121 'B' 81
NULL 121 'C' NULL
NULL 121 'D' NULL
NULL 7 'A' NULL
NULL 7 'B' NULL
1312 7 'C' 107
NULL 7 'D' NULL
1401 2 'A' 107
NULL 2 'B' NULL
NULL 2 'C' NULL
1454 2 'D' 107
and so on...
What I'm looking for is a way to do this without iterations.
I hope someone can help!
Thanks in advance!
A solution might be this:
declare #table table ( id1 int, id2 int, ch char(1) not null, val int )
insert into #table values (1112, 121, 'A', 12)
,(1351, 121, 'A', 13),(1411, 121, 'B', 81),(1312, 7, 'C', 107),(1401, 2, 'A', 107)
,(1454, 2, 'D', 107),(1257, 6, 'A', 1),(1269, 6, 'B', 12),(1335, 6, 'C', 12)
,(1341, 6, 'D', 5),(1380, 6, 'A', 3)
;with foo as
(select
*
,row_number() over (partition by id2 order by id1) rwn
,ascii(isnull(lag(ch,1) over (partition by id2 order by id1),'A'))-ascii('A') prev
,count(*) over (partition by id2,ch) nr
,ascii(ch)-ascii('A') cur
from #table
)
,bar as
(
select
*,case when cur<=prev and rwn>1 then 4 else 0 end + cur-prev step
from foo
)
,foobar as
(
select *,sum(step) over (partition by id2 order by id1 rows unbounded preceding) rownum
from bar
)
,iterations as
(
select id2,max(nr) nr from foo
group by id2
)
,blanks as
(
select
id2,ch chnr,char(ch+ascii('A') )ch,ROW_NUMBER() over (partition by id2 order by c.nr,ch)-1 rownum,c.nr
from iterations a
inner join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) c(nr)
on c.nr<=a.nr
cross join (values (0),(1),(2),(3)) b(ch)
)
select
b.id1,a.id2,a.ch,b.val
from blanks a
left join foobar b
on a.id2=b.id2 and a.rownum=b.rownum
order by a.id2,a.rownum
I first make the query "foo" which looks at the row number and gets the previous value for ch for each id2.
"bar" then finds how many missing values there are between the rows. For instance If the previous was an A and the current is a c then there are 2. If the previous was an A and the current is an A, then there are 4!
"foobar" then adds the steps, thus numbering the original rows, where they should be in the final output.
"iterations" counts the number of times the "ABCD" rows should appear.
"BLANKS" then is all the final rows, that is for each id2, it outputs all the "ABCD" rows that should be in the final output, and numbers them in rownum
Finally I left join "foobar" with "BLANKS" on id2 and rownum. Thus we get the correct number of rows, and the places where there are values in the original is output.
If you can manage to add an extra column in your table, that defines which [id2] are part from the same sequence you can try this:
declare #table table
(
id1 int,
id2 int,
ch char(1) not null,
val int,
category int -- extra column
)
insert into #table values (1112, 121, 'A', 12, 1)
insert into #table values (1351, 121, 'A', 13, 2)
insert into #table values (1411, 121, 'B', 81, 2)
insert into #table values (1312, 7, 'C', 107, 3)
insert into #table values (1401, 2, 'A', 107, 4)
insert into #table values (1454, 2, 'D', 107, 4)
insert into #table values (1257, 6, 'A', 1, 5)
insert into #table values (1269, 6, 'B', 12, 5)
insert into #table values (1335, 6, 'C', 12, 5)
insert into #table values (1341, 6, 'D', 5, 5)
insert into #table values (1380, 6, 'A', 3, 5)
DECLARE #sequence table (seq varchar(1))
INSERT INTO #sequence values ('A'), ('B'), ('C'), ('D')
SELECT b.id1, a.id2, a.seq, b.val, a.category
INTO #T1
FROM (
SELECT *
FROM #table
CROSS JOIN #sequence
) A
LEFT JOIN (
SELECT * FROM #table
) B
ON 1=1
AND a.id1 = b.id1
AND a.id2 = b.id2
AND a.seq = b.ch
AND a.val = b.val
;WITH rem_duplicates AS (
SELECT *, dup = ROW_NUMBER() OVER (PARTITION by id2, seq, category ORDER BY id1 DESC)
FROM #T1
) DELETE FROM rem_duplicates WHERE dup > 1
SELECT * FROM #T1 ORDER BY id2 DESC, category ASC, seq ASC
DROP TABLE #T1
I'm little confused by your output, try this:
Update
DECLARE #table TABLE
(
row INT IDENTITY(1, 1) ,
id1 INT ,
id2 INT ,
ch CHAR(1) NOT NULL ,
val INT
);
DECLARE #Sequence TABLE ( ch3 CHAR(1) NOT NULL );
INSERT INTO #Sequence
VALUES ( 'A' );
INSERT INTO #Sequence
VALUES ( 'B' );
INSERT INTO #Sequence
VALUES ( 'C' );
INSERT INTO #Sequence
VALUES ( 'D' );
INSERT INTO #table
VALUES ( 1112, 121, 'A', 12 );
INSERT INTO #table
VALUES ( 1351, 121, 'A', 13 );
INSERT INTO #table
VALUES ( 1411, 121, 'B', 81 );
INSERT INTO #table
VALUES ( 1312, 7, 'C', 107 );
INSERT INTO #table
VALUES ( 1401, 2, 'A', 107 );
INSERT INTO #table
VALUES ( 1454, 2, 'D', 107 );
INSERT INTO #table
VALUES ( 1257, 6, 'A', 1 );
INSERT INTO #table
VALUES ( 1269, 6, 'B', 12 );
INSERT INTO #table
VALUES ( 1335, 6, 'C', 12 );
INSERT INTO #table
VALUES ( 1341, 6, 'D', 5 );
INSERT INTO #table
VALUES ( 1380, 6, 'A', 3 );
SELECT r.id1 ,
fin.id2 ,
ch3 ,
r.val
FROM ( SELECT *
FROM ( SELECT CASE WHEN r.chd - l.chd = 1 THEN 0
ELSE 1
END [gap in sq] ,
l.*
FROM ( SELECT id2 ,
ASCII(ch) chd ,
ch ,
val ,
id1 ,
row
FROM #table
) AS l
LEFT JOIN ( SELECT id2 ,
ASCII(ch) chd ,
row
FROM #table
) AS r ON l.row = r.row - 1
) AS temp ,
#Sequence s
WHERE temp.[gap in sq] = 1
OR ( temp.[gap in sq] = 0
AND s.ch3 = temp.ch
)
) AS fin
LEFT JOIN #table r ON r.id2 = fin.id2
AND r.id1 = fin.id1
AND r.ch = fin.ch3
I am running the following query which is terribly inefficient and can take hours. I am having SQL brain farts today and I do not know how to improve this query. There are several nullable varchar fields, and I need to identify the duplicate rows (all columns containing identical values as another row)
select * from transactions x where exists (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions y
where Coalesce(x.ColA, '') = Coalesce(x.ColA, '') and
Coalesce(x.ColB, '') = Coalesce(x.ColB, '') and
Coalesce(x.ColC, '') = Coalesce(x.ColC, '')
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
)
Why does this take so long to run? There has to be a better way.
You could improve it by
removing unnecesssary checks
putting a composite index on ColA, ColB and ColC
What is unnecessary? It seems to be unnecessary to join the table with itself. Why don't you use a simple GROUP BY? You also don't need the WHERE:
SELECT COALESCE(ColA, '') AS ColA,
COALESCE(ColB, '') AS ColB,
COALESCE(ColC, '') AS ColC,
Count(*) As Cnt
FROM transactions t
GROUP BY COALESCE(ColA, ''), COALESCE(ColB, ''), COALESCE(ColC, '')
HAVING Count(*) > 1
Does this work?
DECLARE #transactions TABLE (
ColA INT
, ColB INT
, ColC INT
, ColD INT
, ColE INT
, ColF INT
)
DECLARE #Counter1 INT = 0
WHILE #Counter1 < 10000
BEGIN
SET #Counter1 += 1
INSERT INTO #transactions
SELECT ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
, ROUND(RAND()*10,0)
END
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ColA, ColB, ColC, ColD, ColE, ColF
ORDER BY ColA, ColB, ColC, ColD, ColE, ColF) AS rn
FROM #transactions
)
SELECT * FROM Dupe WHERE rn > 1
You can use an ISNULL on anything where you need to compare a value that might be null. Note that most of this I've written is just to generate a useful data set. With 6 columns and 10,000 rows I got 42 identical rows in less than a second. No triples. Bumped it up to 100,000 rows and I got 3,489 duplicate rows, including some triples. Took 3 seconds.
Here's an example using text. This whole thing took 25 seconds on 100,000 records, although my timer shows that less than 4 of that was finding the duplicates, with the remainder being the table population.
DECLARE #transactions2 TABLE (
ColA NVARCHAR(30)
, ColB NVARCHAR(30)
, ColC NVARCHAR(30)
, ColD NVARCHAR(30)
, ColE NVARCHAR(30)
, ColF NVARCHAR(30)
)
DECLARE #names TABLE (
ID INT IDENTITY
, Name NVARCHAR(30)
)
DECLARE #Counter2 INT = 0
, #ColA NVARCHAR(30)
, #ColB NVARCHAR(30)
, #ColC NVARCHAR(30)
, #ColD NVARCHAR(30)
, #ColE NVARCHAR(30)
, #ColF NVARCHAR(30)
INSERT INTO #names VALUES
('Anderson, Arthur')
, ('Broberg, Bruce')
, ('Chan, Charles')
, ('Davidson, Darwin')
, ('Eggert, Emily')
, ('Fox, Francesca')
, ('Garbo, Greta')
, ('Hollande, Hortense')
, ('Iguadolla, Ignacio')
, ('Jackson, Jurimbo')
, ('Katana, Ken')
, ('Lawrence, Larry')
, ('McDonald, Michael')
, ('Nyugen, Nathan')
, ('O''Dell, Oliver')
, ('Peterson, Phillip')
, ('Quigley, Quentin')
, ('Ramallah, Rodolfo')
, ('Smith, Samuel')
, ('Turner, Theodore')
, ('Uno, Umberto')
, ('Victor, Victoria')
, ('Wallace, William')
, ('Xing, Xiopan')
, ('Young, Yvette')
, ('Zapata, Zorro')
, (NULL)
WHILE #Counter2 < 100000
BEGIN
SET #Counter2 += 1
SET #ColA = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColB = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColC = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColD = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColE = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
SET #ColF = (SELECT Name FROM #names WHERE ID = ROUND(RAND()*27 +.5,0))
INSERT INTO #transactions2
SELECT #ColA, #ColB, #ColC, #ColD, #ColE, #ColD
END
PRINT CAST(GETDATE() AS DateTime2 (3))
;WITH Dupe
AS (
SELECT *, ROW_NUMBER() OVER
(PARTITION BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')
ORDER BY ISNULL(ColA,''), ISNULL(ColB,''), ISNULL(ColC,''), ISNULL(ColD,''), ISNULL(ColE,''), ISNULL(ColF,'')) AS rn
FROM #transactions2
)
SELECT * FROM Dupe WHERE rn > 1 ORDER BY rn
PRINT CAST(GETDATE() AS DateTime2 (3))
Here is a much faster way using a subquery join. It ran in under 10 seconds
select * from transactions x
join (
select Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
from transactions
group by Coalesce(ColA, ''),
Coalesce(ColB, ''),
Coalesce(ColC, '')
having count(*) > 1
) dups on
dups.ColA = x.ColA and
dups.ColB = x.ColB and
dups.ColC = x.ColC
The important thing about this query is that it returns both/all rows, not just the duplicate(s)
If this is a one time job, and involves a huge number of rows, and not to be made as a View, then perhaps you'd opt to INSERT SELECT it into a table with UNIQUE index with IGNORE_DUP_KEY option.
Can anyone help me with the query ?
I've tried the following but it comes up with an error
SELECT Column1, Column2, Column3 FROM Table WHERE [Column1] NOT IN
(SELECT [Column1] FROM Table GROUP BY [Column1] HAVING COUNT([Column]) > 1)
Invalid MEMO, OLE, or Hyperlink Object in subquery [Column1].
Use Group By with Having clause:
SELECT Column1, MIN(Column2)AS Column2, MIN(Column3)AS Column3
FROM dbo.Table
GROUP BY Column1
HAVING ( COUNT(Column1) = 1 )
Should work since there's only one row per "group".
Your original query should work you just had column instead of column1.
SELECT Column1, Column2, Column3 FROM TableName WHERE [Column1] NOT IN
(SELECT [Column1] FROM TableName GROUP BY [Column1] HAVING COUNT(Column1) > 1)
see: http://sqlfiddle.com/#!3/d99a8/5/0
As far as i get , you need all data where [Column1] is unique (appear just one time)
DECLARE #x TABLE (col1 INT, col2 INT, col3 INT)
INSERT INTO #x
( [col1], [col2], [col3] )
VALUES ( 1, 2, 3 )
,( 1, 4, 5 )
,( 2, 6, 7 )
SELECT * FROM #x
SELECT col1, col2 , col3 FROM #x
WHERE col1 NOT IN
( SELECT [col1] FROM #x GROUP BY [col1] HAVING COUNT(*) > 1 )
I am using SSMS 2008 and trying to concatenate one of the rows together based on a different field's grouping. I have two columns, people_id and address_desc. They look like this:
address_desc people_id
---------- ------------
Murfreesboro, TN 37130 F15D1135-9947-4F66-B778-00E43EC44B9E
11 Mohawk Rd., Burlington, MA 01803 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Unknown, UN 00000 C561918F-C2E9-4507-BD7C-00FB688D2D6E Jacksonville, NC 28546 FC7C78CD-8AEA-4C8E-B93D-010BF8E4176D
Memphis, TN 38133 8ED8C601-5D35-4EB7-9217-012905D6E9F1
44 Maverick St., Fitchburg, MA 8ED8C601-5D35-4EB7-9217-012905D6E9F1
Now I want to concatenate the address_desc field / people_id. So the first one here should just display "Murfreesboro, TN 37130" for address_desc. But second person should have just one line instead of two which says "11 Mohawk Rd., Burlington, MA 01803;Unknown, UN 00000" for address_desc.
How do I do this? I tried using CTE, but this was giving me ambiguity error:
WITH CTE ( people_id, address_list, address_desc, length )
AS ( SELECT people_id, CAST( '' AS VARCHAR(8000) ), CAST( '' AS VARCHAR(8000) ), 0
FROM dbo.address_view
GROUP BY people_id
UNION ALL
SELECT p.people_id, CAST( address_list +
CASE WHEN length = 0 THEN '' ELSE ', ' END + c.address_desc AS VARCHAR(8000) ),
CAST( c.address_desc AS VARCHAR(8000)), length + 1
FROM CTE c
INNER JOIN dbo.address_view p
ON c.people_id = p.people_id
WHERE p.address_desc > c.address_desc )
SELECT people_id, address_list
FROM ( SELECT people_id, address_list,
RANK() OVER ( PARTITION BY people_id ORDER BY length DESC )
FROM CTE ) D ( people_id, address_list, rank )
WHERE rank = 1 ;
Here was my initial SQL query:
SELECT a.address_desc, a.people_id
FROM dbo.address_view a
INNER JOIN (SELECT people_id
FROM dbo.address_view
GROUP BY people_id
HAVING COUNT(*) > 1) t
ON a.people_id = t.people_id
order by a.people_id
You can use FOR XML PATH('') like this:
DECLARE #TestData TABLE
(
address_desc NVARCHAR(100) NOT NULL
,people_id UNIQUEIDENTIFIER NOT NULL
);
INSERT #TestData
SELECT 'Murfreesboro, TN 37130', 'F15D1135-9947-4F66-B778-00E43EC44B9E'
UNION ALL
SELECT '11 Mohawk Rd., Burlington, MA 01803', 'C561918F-C2E9-4507-BD7C-00FB688D2D6E'
UNION ALL
SELECT 'Unknown, UN 00000', 'C561918F-C2E9-4507-BD7C-00FB688D2D6E'
UNION ALL
SELECT 'Memphis, TN 38133', '8ED8C601-5D35-4EB7-9217-012905D6E9F1'
UNION ALL
SELECT '44 Maverick St., Fitchburg, MA', '8ED8C601-5D35-4EB7-9217-012905D6E9F1';
SELECT a.people_id,
(SELECT SUBSTRING(
(SELECT ';'+b.address_desc
FROM #TestData b
WHERE a.people_id = b.people_id
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
,2
,4000)
) GROUP_CONCATENATE
FROM #TestData a
GROUP BY a.people_id
Results:
people_id GROUP_CONCATENATE
------------------------------------ ------------------------------------------------------
F15D1135-9947-4F66-B778-00E43EC44B9E Murfreesboro, TN 37130
C561918F-C2E9-4507-BD7C-00FB688D2D6E 11 Mohawk Rd., Burlington, MA 01803;Unknown, UN 00000
8ED8C601-5D35-4EB7-9217-012905D6E9F1 Memphis, TN 38133;44 Maverick St., Fitchburg, MA