DB-Fiddle
create table items (
name varchar(15) not null,
id1 integer,
id2 integer,
UNIQUE(id1),
UNIQUE(id2)
);
insert into items (name, id1, id2) values
('a', 1, null),
('b', 2, null),
('c', null, 2);
select * from items where id1=2
union
select * from items where id2=2
and id2 not in (select id2 from items where id1=2);
I have a table where there are multiple fields containing the unique id belonging to a given item. In my example, either id1 or id2 contains this value. My goal would be to only rely on id2 if the item cannot be found via id1. So, I would expect to always get back b in my example.
I have managed to get this working via a union, but it seems like a very hacky solution with bad performance. A better solution seems to me is to filter on the client side. What do you think?
We can use ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY COALESCE(id1, id2) ORDER BY id1) rn
FROM items
)
SELECT name, id1, id2
FROM cte
WHERE rn = 1;
Each logical pair of records is described by COALESCE(id1, id2) being the same between them (if a pair exists). Within each pair, we choose the record having the lowest id1 value, which would mean the non null record.
I would use
SELECT *
FROM (SELECT * FROM items WHERE id1 = 2
UNION ALL
SELECT * FROM items WHERE id1 = 2) AS q
ORDER BY id2 IS NULL
FETCH FIRST 1 ROWS ONLY;
That relies on the fact that FALSE < TRUE.
I didn't find my case on the Internet. Tell me how i can delete duplicates if the values are in different columns.
I have a table with a lot of values, for example:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
|89417978|89417980|
|89417979|89417980|
I need to exclude duplicates and leave in the answer only:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
min/max does not work here, as the values may be different.
I tried to union/join tables on a table/exclude results with temporary tables, but in the end I come to the beginning.
Assuming id1 and id2 are primary keys columns you could try this
DECLARE #tbl table (id1 int, id2 int )
INSERT INTO #tbl
SELECT 89417980, 89417978
UNION SELECT 89417980, 89417979
UNION SELECT 89417978, 89417980
UNION SELECT 89417979, 89417980
SELECT * FROM #tbl
;WITH CTE AS (--Get comparable value as "cs"
SELECT
IIF(id1 > id2, CHECKSUM(id1, id2), CHECKSUM(id2,id1)) as cs
, id1
, id2
, ROW_NUMBER() OVER (order by id1, id2) as rn
FROM #tbl
)
, CTE2 AS ( --Get rows to keep
SELECT MAX (rn) as rn
FROM CTE
GROUP BY cs
HAVING COUNT(*) > 1
)
DELETE tbl -- Delete all except the rows to keep
FROM #tbl tbl
WHERE NOT EXISTS(SELECT 1
FROM CTE2
JOIN CTE ON CTE.rn = CTE2.rn
WHERE CTE.id1 = tbl.id1
AND CTE.id2 = tbl.id2
)
SELECT * FROM #tbl
I am going to use MERGE to insert or update a table depending upon ehether it's exist or not. This is my query,
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT id FROM #t WHERE ID = 2) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
The result is,
id name
1 a
I think it should be,
id name
1 a
2 b
You have your USING part slightly messed up, that's where to put what you want to match against (although in this case you're only using id)
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT 2, 'b') AS t2 (id, name) ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
As Mikhail pointed out, your query in the USING clause doesn't contain any rows.
If you want to do an upsert, put the new data into the USING clause:
MERGE INTO #t t1
USING (SELECT 2 as id, 'b' as name) t2 ON (t1.id = t2.id) --This no longer has an artificial dependency on #t
WHEN MATCHED THEN
UPDATE SET name = t2.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (t2.id, t2.name);
This query won't return anything:
SELECT id FROM #t WHERE ID = 2
Because where is no rows in table with ID = 2, so there is nothing to merge into table.
Besides, in MATCHED clause you are updating a field ID on which you are joining table, i think, it's forbidden.
For each DML operations you have to commit (Marks the end of a successful the transaction)Then only you will be able to see the latest data
For example :
GO
BEGIN TRANSACTION;
GO
DELETE FROM HumanResources.JobCandidate
WHERE JobCandidateID = 13;
GO
COMMIT TRANSACTION;
GO
I have a table that will have 500,000+ records.
Each record has a LineNumber field which is not unique and not part of the primary key.
Each record has a CreatedOn field.
I need to update all 500,000+ records to identify repeat records.
A repeat records is defined by a record that has the same LineNumber within the last seven days of its CreatedOn field.
In the diagram above row 4 is a repeat because it occurred only five days since row 1.
Row 6 is not a repeat even though it occurs only four days since row 4, but row 4 itself is already a repeat, so Row 6 can only be compared to Row 1 which is nine days prior to Row 6, therefore Row 6 is not a repeat.
I do not know how to update the IsRepeat field with stepping through each record one-by-one via a cursor or something.
I do not believe cursors is the way to go, but I'm stuck with any other possible solution.
I have considered maybe Common Table Expressions may be of help but I have no experience with them and have no idea where to start.
Basically this same process needs to be done on the table every day as the table is truncated and re-populated every single day. Once the table is re-populated, I have to go through and re-mark each record if it is a repeat or not.
Some assistance would be most appreciated.
UPDATE
Here is a script to create a table and insert test data
USE [Test]
GO
/****** Object: Table [dbo].[Job] Script Date: 08/18/2009 07:55:25 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
DROP TABLE [dbo].[Job]
GO
USE [Test]
GO
/****** Object: Table [dbo].[Job] Script Date: 08/18/2009 07:55:25 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Job](
[JobID] [int] IDENTITY(1,1) NOT NULL,
[LineNumber] [nvarchar](20) NULL,
[IsRepeat] [bit] NULL,
[CreatedOn] [smalldatetime] NOT NULL,
CONSTRAINT [PK_Job] PRIMARY KEY CLUSTERED
(
[JobID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO
SET NOCOUNT ON
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-01 07:52:08')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-01 08:30:01')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-01 09:30:35')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-01 10:51:10')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-02 09:22:30')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-02 10:27:28')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-02 11:15:33')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-02 13:01:13')
INSERT INTO dbo.Job VALUES ('1014',NULL,'2009-07-03 12:05:56')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-03 13:57:34')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-03 15:38:54')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-04 16:32:20')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-05 13:46:46')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-05 15:08:35')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-05 15:19:50')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-05 16:37:19')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-05 17:14:09')
INSERT INTO dbo.Job VALUES ('1009',NULL,'2009-07-05 20:55:08')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-06 08:29:29')
INSERT INTO dbo.Job VALUES ('1002',NULL,'2009-07-07 11:22:38')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-07 12:25:23')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-08 09:32:07')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-08 09:46:33')
INSERT INTO dbo.Job VALUES ('1016',NULL,'2009-07-08 10:09:08')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-09 10:45:04')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-09 11:31:23')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-09 13:10:06')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-09 15:04:06')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-09 17:32:16')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-09 19:51:28')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-10 15:09:42')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-10 16:15:31')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-10 21:55:43')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-11 08:49:03')
INSERT INTO dbo.Job VALUES ('1022',NULL,'2009-07-11 16:47:21')
INSERT INTO dbo.Job VALUES ('1026',NULL,'2009-07-11 18:23:16')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-11 19:49:31')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-12 11:57:26')
INSERT INTO dbo.Job VALUES ('1003',NULL,'2009-07-13 08:32:20')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-13 09:31:32')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 09:52:54')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 11:22:31')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-14 11:54:14')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-14 15:17:08')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-15 13:27:08')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-15 14:10:56')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-15 15:20:50')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-15 15:39:18')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-15 16:06:17')
INSERT INTO dbo.Job VALUES ('1017',NULL,'2009-07-16 11:52:08')
SET NOCOUNT OFF
GO
Ignores LineNumber is null. How should IsRepeat be handled in that case?
It works for test data. Whether it will be efficient enough for production volumes?
In the case of duplicate (LineNumber, CreatedOn) on pairs, arbitrarily choose one. (The one with minimum JobId)
Basic idea:
Get all JobId pairs that
are at least seven days apart, by
line number.
Count the number of
rows that are more than seven days
from the left side, upto and
including the right side. (CNT)
Then we know if JobId x is not a repeat, the next not a repeat is the pair with X on
the left side, and CNT = 1
Use recursive CTE to start with the first row for each LineNumber
Recursive element uses the pair with counts to get the next row.
Finally update, setting all IsRepeat to 0 for non-repeats and 1 for everything else.
; with AllPairsByLineNumberAtLeast7DaysApart (LineNumber
, LeftJobId
, RightJobId
, BeginCreatedOn
, EndCreatedOn) as
(select l.LineNumber
, l.JobId
, r.JobId
, dateadd(day, 7, l.CreatedOn)
, r.CreatedOn
from Job l
inner join Job r
on l.LineNumber = r.LineNumber
and dateadd(day, 7, l.CreatedOn) < r.CreatedOn
and l.JobId <> r.JobId)
-- Count the number of rows within from BeginCreatedOn
-- up to and including EndCreatedOn
-- In the case of CreatedOn = EndCreatedOn,
-- include only jobId <= jobid, to handle ties in CreatedOn
, AllPairsCount(LineNumber, LeftJobId, RightJobId, Cnt) as
(select ap.LineNumber, ap.LeftJobId, ap.RightJobId, count(*)
from AllPairsByLineNumberAtLeast7DaysApart ap
inner join Job j
on j.LineNumber = ap.LineNumber
and ap.BeginCreatedOn <= j.createdOn
and (j.CreatedOn < ap.EndCreatedOn
or (j.CreatedOn = ap.EndCreatedOn
and j.JobId <= ap.RightJobId))
group by ap.LineNumber, ap.LeftJobId, ap.RightJobId)
, Step1 (LineNumber, JobId, CreatedOn, RN) as
(select LineNumber, JobId, CreatedOn
, row_number() over
(partition by LineNumber order by CreatedOn, JobId)
from Job)
, Results (JobId, LineNumber, CreatedOn) as
-- Start with the first rows.
(select JobId, LineNumber, CreatedOn
from Step1
where RN = 1
and LineNumber is not null
-- get the next row
union all
select j.JobId, j.LineNumber, j.CreatedOn
from Results r
inner join AllPairsCount apc on apc.LeftJobId = r.JobId
inner join Job j
on j.JobId = apc.RightJobId
and apc.CNT = 1)
update j
set IsRepeat = case when R.JobId is not null then 0 else 1 end
from Job j
left outer join Results r
on j.JobId = R.JobId
where j.LineNumber is not null
EDIT:
After I turned off the computer last night I realized I had made things more complicated than they needed to be. A more straightforward (and on the test data, slightly more effecient) query:
Basic Idea:
Generated PotentialStep (FromJobId, ToJobId) These are the pairs where if FromJobId
is not a repeat, than ToJobId is also not a repeat. (First row by LineNumber more
than seven days from FromJobId)
Use a recursive CTE to start from the first JobId for each LineNumber and then step,
using PontentialSteps, to each Non Repeating JobId
; with PotentialSteps (FromJobId, ToJobId) as
(select FromJobId, ToJobId
from (select f.JobId as FromJobId
, t.JobId as ToJobId
, row_number() over
(partition by f.LineNumber order by t.CreatedOn, t.JobId) as RN
from Job f
inner join Job t
on f.LineNumber = t.LineNumber
and dateadd(day, 7, f.CreatedOn) < t.CreatedOn) t
where RN = 1)
, NonRepeats (JobId) as
(select JobId
from (select JobId
, row_number() over
(partition by LineNumber order by CreatedOn, JobId) as RN
from Job) Start
where RN = 1
union all
select J.JobId
from NonRepeats NR
inner join PotentialSteps PS
on NR.JobId = PS.FromJobId
inner join Job J
on PS.ToJobId = J.JobId)
update J
set IsRepeat = case when NR.JobId is not null then 0 else 1 end
from Job J
left outer join NonRepeats NR
on J.JobId = NR.JobId
where J.LineNumber is not null
UPDATE Jobs
SET Jobs.IsRepeat = 0 -- mark all of them IsRepeat = false
UPDATE Jobs
SET Jobs.IsRepeat = 1
WHERE EXISTS
(SELECT TOP 1 i.LineNumber FROM Jobs i WHERE i.LineNumber = Jobs.LineNumber
AND i.CreatedOn <> Jobs.CreatedOn and i.CreatedOn BETWEEN Jobs.CreatedOn - 7
AND Jobs.CreatedOn)
NOTE: I hope this helps you somewhat. Let me know, if you find any discrepancy that you will come across on a larger data set.
I'm not proud of this, it makes many assumptions (e.g. that CreatedOn is date only, and (LineNUmber,CreatedOn) is a key. Some tuning may be required, only works with test data.
In other words, I created this more for intellectual curiosity rather than because I think it's a genuine solution. Final select could be an update to set IsRepeat in the base table, based on existence on rows in V4. Final note before letting people see evil - could people please post test data in comments for data sets that it doesn't work for. It might be possible to turn this into a real solution:
with V1 as (
select t1.LineNumber,t1.CreatedOn,t2.CreatedOn as PrevDate from
T1 t1 inner join T1 t2 on t1.LineNumber = t2.LineNumber and t1.CreatedOn > t2.CreatedOn and DATEDIFF(DAY,t2.CreatedOn,t1.CreatedOn) < 7
), V2 as (
select v1.LineNumber,v1.CreatedOn,V1.PrevDate from V1
union all
select v1.LineNumber,v1.CreatedOn,v2.PrevDate from v1 inner join v2 on V1.LineNumber = v2.LineNumber and v1.PrevDate = v2.CreatedOn
), V3 as (
select LineNumber,CreatedOn,MIN(PrevDate) as PrevDate from V2 group by LineNumber,CreatedOn
), V4 as (
select LineNumber,CreatedOn from V3 where DATEDIFF(DAY,PrevDate,CreatedOn) < 7
)
select
T1.LineNumber,
T1.CreatedOn,
CASE WHEN V4.LineNumber is Null then 0 else 1 end as IsRepeat
from
T1
left join
V4
on
T1.LineNumber = V4.LineNumber and
T1.CreatedOn = V4.CreatedOn
order by T1.CreatedOn,T1.LineNumber
option (maxrecursion 7)