Say for example, one has a table with certain rows, how does one prevent updates to certain rows based on a condition equalling true but allow the update to commit on all rows where the condition is false.
take this example where I "lock" all rows prior to 1/4/2007 by aborting the whole transaction
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tbl_TriggerTest] (
[ID] [int] IDENTITY(1,1) NOT NULL,
[Value] [varchar](25) NULL,
[Date] [datetime] NULL,
CONSTRAINT [PK_tbl_TriggerTest] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
SET IDENTITY_INSERT [dbo].[tbl_TriggerTest] ON
GO
INSERT [dbo].[tbl_TriggerTest] ([ID], [Value], [Date]) VALUES (1, N'12', CAST(0x0000979A00000000 AS DateTime))
GO
INSERT [dbo].[tbl_TriggerTest] ([ID], [Value], [Date]) VALUES (2, N'13', CAST(0x00009A7500000000 AS DateTime))
GO
INSERT [dbo].[tbl_TriggerTest] ([ID], [Value], [Date]) VALUES (3, N'14', CAST(0x00009BE200000000 AS DateTime))
GO
INSERT [dbo].[tbl_TriggerTest] ([ID], [Value], [Date]) VALUES (4, N'4', CAST(0x00009D4F00000000 AS DateTime))
GO
SET IDENTITY_INSERT [dbo].[tbl_TriggerTest] OFF
GO
CREATE TRIGGER [dbo].[LockOldWelshRows] ON [dbo].[tbl_TriggerTest]
FOR UPDATE
AS
BEGIN
DECLARE #Count INT
SELECT #Count = COUNT([ID])
FROM INSERTED
WHERE [Date] < CONVERT(DATETIME, '01/04/2007 00:00:00', 103)
IF #Count > 0
BEGIN
RAISERROR('Rows prior to 01/04/2007 are locked',16,1)
ROLLBACK TRANSACTION
RETURN ;
END
END
GO
If one was to run the following:
UPDATE [tbl_TriggerTest] SET [Value] = [Value] + 'M'
The transaction would fail with error:
Msg 50000, Level 16, State 1, Procedure LockOldWelshRows, Line 12
Rows prior to 01/04/2007 are locked
Msg 3609, Level 16, State 1, Line 1
The transaction ended in the trigger. The batch has been aborted.
Is there a way of modifying this trigger to allow the transaction to commit, but only rows where the date is > 1/4/2007
This is a VERY brief example (the tables i am working with are far more complex), and if i am honest, i think it's cleaner if the whole transaction fails, i was just curious as to how it could be done.
You'd need a BEFORE aka INSTEAD OF trigger and filter to allowed rows only.
Untested:
CREATE TRIGGER [dbo].[LockOldWelshRows] ON [dbo].[tbl_TriggerTest]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE
T
SET
T.SomeCol = I.SomeCol...
FROM
[dbo].[tbl_TriggerTest] T
JOIN
INSERTED I ON T.keycol = I.keycol
WHERE
T.[Date] >= '20070401';
END
GO
Related
This question already has answers here:
SQL Server Query: Fast with Literal but Slow with Variable
(8 answers)
Closed 10 months ago.
I found a big difference of the query execution under MS SQL Server Standart 2019.
T-SQL
DECLARE #atTime datetime2 = '2022-05-04 13:23:20';
DECLARE #startTime datetime2;
DECLARE #shiftTime datetime2;
SET #startTime = #atTime;
SET #shiftTime = DATEADD(SECOND, -5, #atTime)
-- SELECT #shiftTime, #startTime
-- 2022-05-04 13:23:15.0000000 2022-05-04 13:23:20.0000000
-- #1 It takes 7 seconds to complete
SELECT TOP(1) * FROM [TrackerPositions] WITH(NOLOCK) WHERE AtTime BETWEEN #shiftTime AND #startTime
-- #1 It takes 0 seconds to complete
SELECT TOP(1) * FROM [TrackerPositions] WITH(NOLOCK) WHERE AtTime BETWEEN '2022-05-04 13:23:15.0000000' AND '2022-05-04 13:23:20.0000000'
Note: AtTime colum has datetime2
Please, help to get working fast SELECT #1
Thank you!
UPDATE #1
CREATE TABLE [dbo].[TrackerPositions](
[ID] [uniqueidentifier] NOT NULL,
[GPSTrackerID] [int] NOT NULL,
[AtTime] [datetime2](7) NOT NULL,
[Lat] [decimal](9, 6) NOT NULL,
[Lng] [decimal](9, 6) NOT NULL,
[GeoLocation] AS ([geography]::STGeomFromText(((('POINT('+CONVERT([varchar](20),[Lng],0))+' ')+CONVERT([varchar](20),[Lat],0))+')',(4326))),
[SignalLevel] [int] NULL,
[IPAddress] [nvarchar](40) NULL,
[Port] [int] NULL,
[Height] [int] NULL,
[IsMoving] [bit] NULL,
[Speed] [decimal](18, 4) NULL,
CONSTRAINT [PK_TrackerPositions] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TrackerPositions] ADD CONSTRAINT [DF_TrackerPositions_ID] DEFAULT (newid()) FOR [ID]
GO
ALTER TABLE [dbo].[TrackerPositions] ADD CONSTRAINT [DF_TrackerPositions_IsMoving] DEFAULT ((0)) FOR [IsMoving]
GO
ALTER TABLE [dbo].[TrackerPositions] WITH CHECK ADD CONSTRAINT [FK_TrackerPositions_GPSTrackers] FOREIGN KEY([GPSTrackerID])
REFERENCES [dbo].[GPSTrackers] ([ID])
GO
ALTER TABLE [dbo].[TrackerPositions] CHECK CONSTRAINT [FK_TrackerPositions_GPSTrackers]
GO
The right answer is to use OPTION(RECOMPILE)
SELECT TOP(1) * FROM [TrackerPositions] WITH(NOLOCK) WHERE AtTime BETWEEN #shiftTime AND #startTime OPTION(RECOMPILE)
Assume we have the following row in a table already:
INSERT INTO some_table (id, amount) VALUES (1, 0);
Having the following queries running at the same time with READ_COMMITED:
INSERT INTO some_table (id, amount) VALUES (1, 0)
ON CONFLICT DO
UPDATE some_table SET amount=amount+100 WHERE id=1;
INSERT INTO some_table (id, amount) VALUES (1, 0)
ON CONFLICT DO
UPDATE some_table SET amount=amount-50 WHERE id=1;
Can they run into race condition with resulting amount = 100 or amount = -50 if they both read initial (committed) amount=0 and one transaction overwrite the result of another one?
If yes, can it be fixed by:
switching to REPEATABLE_READ ?
using "FOR UPDATE" like this
WITH updating as (
SELECT id, amount FROM some_table FOR UPDATE
)
INSERT INTO some_table (id, amount) VALUES (1, 0)
ON CONFLICT DO
UPDATE some_table t SET t.amount=updating.amount+100 WHERE t.id=updating.id;
WITH updating as (
SELECT id, amount FROM some_table FOR UPDATE
)
INSERT INTO some_table (id, amount) VALUES (1, 0)
ON CONFLICT DO
UPDATE some_table t SET t.amount=updating.amount-50 WHERE t.id=updating.id;
INSERT ... ON CONFLICT does not suffer from race conditions. One of the UPDATEs will lock the row first, and the other has to wait, but no matter what the order is, in the end the amount will be 50 more than in the beginning.
Note that WHERE id = 1 is unnecessary, since the UPDATE will only affect the one row that conflicts with the INSERT anyway.
I was asked to reset the Auto_Increment (identity) column in this way:+
Auto_Increment(1,3,8,10) to this New_Auto_Increment (1,2,3,4).I don't want to drop the column and rebuild because it can cause serious problems. Thanks
One method to assign new IDENTITY values is by loading a staging table with the new values and then using SWITCH to move the new data back into the source table. If foreign keys reference the table, those will need to be dropped and recreated (and the referencing key values updated). Sample script below.
--example setup
CREATE TABLE dbo.Example(
ID int NOT NULL IDENTITY CONSTRAINT PK_Example PRIMARY KEY
, SomeData int
);
GO
SET IDENTITY_INSERT dbo.Example ON;
GO
INSERT INTO dbo.Example(ID, SomeData) VALUES (1,1),(3,1),(8,1),(10,1);
GO
BEGIN TRY
BEGIN TRAN;
--create staging table with same schema, constraints and indexes
CREATE TABLE dbo.ExampleStaging(
ID int NOT NULL CONSTRAINT PK_Example_ExampleStaging PRIMARY KEY
, SomeData int
);
--load staging table with new values
INSERT INTO dbo.ExampleStaging
SELECT ROW_NUMBER() OVER(ORDER BY ID), SomeData
FROM dbo.Example;
--clear source table
TRUNCATE TABLE dbo.Example;
--switch new data back into original table
ALTER TABLE dbo.ExampleStaging
SWITCH TO dbo.Example;
DROP TABLE dbo.ExampleStaging;
DBCC CHECKIDENT('dbo.Example');
COMMIT;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
GO
One could also drop and recreate the column to reassign identity values. The implications with that method are:
1) Indexes (e.g. primary key constraint) on the column would need to be first dropped and recreated. In the case of a the clustered index, all non-clustered indexes on the table would be implicitly rebuilt twice, once when the clustered index is dropped and again when recreated. However, one could explicitly drop and recreate non-clustered indexes so that only happens once.
2) The new identity values would not be in the same incremental sequence as the original values. This might be a non-issue unless one expects the values to reflect order of insertion.
3) Each row in the table would need to be updated twice, once when the original identity column is dropped and again when the new one created.
Can do it with row_number() and a cursor
And could even clean up any FKs
drop table #T;
set nocount on;
create table #T (pk int identity primary key, val int);
insert into #T(val) values (2), (5), (11), (3), (7), (2), (5), (11), (3), (7);
delete t from #T t where t.val in (5, 7);
select *, ROW_NUMBER() over (order by t.pk) as rn
from #T t
order by t.pk;
declare #val int, #pk int, #rn int;
set identity_insert #T on;
DECLARE csr CURSOR FOR
select t.pk, t.val, ROW_NUMBER() over (order by t.pk) as rn
from #T t
order by t.pk;
OPEN csr
FETCH NEXT FROM csr
INTO #pk, #val, #rn;
WHILE ##FETCH_STATUS = 0
BEGIN
--select #pk as pk, #val as val, #rn as rn
if(#pk <> #rn)
begin
insert into #T(pk, val) values (#rn, #val);
delete #T where pk = #pk;
end
FETCH NEXT FROM csr
INTO #pk, #val, #rn;
END
CLOSE csr;
DEALLOCATE csr;
set identity_insert #T off;
select *
from #T t
order by t.pk;
DBCC CHECKIDENT ('#T', RESEED, #rn);
insert into #T(val) values (77);
select *
from #T t
order by t.pk;
drop table #t;
I've the following schema and the data:
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[EntryExitLogs]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[EntryExitLogs](
[DeviceLogId] [int] NOT NULL,
[EmployeeCode] [nvarchar](50) NOT NULL,
[LogDate] [datetime] NOT NULL,
[Direction] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_EntryExitLogs] PRIMARY KEY CLUSTERED
(
[EmployeeCode] ASC,
[LogDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('435859','30032','2014-01-21 07:04:41','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('438019','30032','2014-01-21 08:59:09','out');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('441564','30032','2014-01-21 16:57:35','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('441263','30032','2014-01-21 19:09:19','out');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('441264','30032','2014-01-21 19:10:20','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('439928','34035','2014-01-21 08:29:59','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('437962','34035','2014-01-21 08:30:12','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('437992','34035','2014-01-21 08:47:33','out');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('440203','34035','2014-01-21 13:38:56','out');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('442858','34035','2014-01-21 16:34:08','in');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('442860','34035','2014-01-21 16:35:11','out');
INSERT INTO [dbo].[EntryExitLogs]([DeviceLogId],[EmployeeCode],[LogDate],[Direction]) VALUES('441283','34035','2014-01-21 19:16:58','out');
I've written SQL to calculate In and out times like this:
;WITH cte AS (
SELECT ROW_NUMBER() OVER(
PARTITION BY lt.EmployeeCode ORDER BY lt.EmployeeCode,
lt.LogDate
) AS RowNo,
lt.DeviceLogId,
lt.EmployeeCode,
lt.LogDate,
lt.Direction
FROM EntryExitLogs lt
)
SELECT i.EmployeeCode,
i.LogDate AS InTime,
(
SELECT MIN(o.LogDate)
FROM cte AS o
WHERE o.EmployeeCode = i.EmployeeCode
AND o.RowNo = (i.RowNo + 1)
AND o.Direction = 'out'
) AS OutTime
FROM cte AS i
WHERE i.Direction = 'in'
ORDER BY i.EmployeeCode, i.LogDate
I'm getting the output (but not as desired by me), but I expect the output in the following manner (comments are given for more information against each row):
EmployeeCode InTime OutTime Comments
30032 21-Jan-2014 07:04:41 21-Jan-2014 08:59:09
30032 21-Jan-2014 16:57:35 21-Jan-2014 19:09:19
30032 21-Jan-2014 19:10:20 NULL If no OUT is specified for the last IN then it should be NULL
34035 21-Jan-2014 08:29:59 21-Jan-2014 13:38:56 Earliest IN and Latest OUT to be taken in case of multiple IN & OUT
34035 21-Jan-2014 16:34:08 21-Jan-2014 19:16:58 Earliest IN and Latest OUT to be taken in case of multiple IN & OUT
Please find the schema for this here
Kindly help me to achieve this.
This should work:
;WITH TrueOut AS --select only the latest "out" in between two "ins"
(
SELECT *
FROM EntryExitLogs a
WHERE direction='out'
AND
ISNULL((SELECT MIN(LogDate) FROM EntryExitLogs b
WHERE a.EmployeeCode=b.Employeecode
AND b.direction='out'
AND b.LogDate>a.LogDate),'9999-12-31')
>=
ISNULL((SELECT MIN(LogDate) FROM EntryExitLogs b
WHERE a.EmployeeCode=b.Employeecode
AND b.direction='in'
AND b.LogDate>a.LogDate),'9999-12-31')
),
TrueIn AS --select only the earlies "in" in between two "outs"
(
SELECT *
FROM EntryExitLogs a
WHERE direction='in'
AND
ISNULL((SELECT MAX(LogDate) FROM EntryExitLogs b
WHERE a.EmployeeCode=b.Employeecode
AND b.direction='out'
AND b.LogDate<a.LogDate),'1900-01-01')
>=
ISNULL((SELECT MAX(LogDate) FROM EntryExitLogs b
WHERE a.EmployeeCode=b.Employeecode
AND b.direction='in'
AND b.LogDate<a.LogDate),'1900-01-01')
)
-- For every in select the next out
SELECT a.EmployeeCode, a.LogDate InTime,
(SELECT MIN(LogDate)
FROM TrueOut b
WHERE a.EmployeeCode=b.EmployeeCode
AND a.LogDate<b.LogDate) OutTIme
FROM TrueIn a
I have a table that will have 500,000+ records.
Each record has a LineNumber field which is not unique and not part of the primary key.
Each record has a CreatedOn field.
I need to update all 500,000+ records to identify repeat records.
A repeat records is defined by a record that has the same LineNumber within the last seven days of its CreatedOn field.
In the diagram above row 4 is a repeat because it occurred only five days since row 1.
Row 6 is not a repeat even though it occurs only four days since row 4, but row 4 itself is already a repeat, so Row 6 can only be compared to Row 1 which is nine days prior to Row 6, therefore Row 6 is not a repeat.
I do not know how to update the IsRepeat field with stepping through each record one-by-one via a cursor or something.
I do not believe cursors is the way to go, but I'm stuck with any other possible solution.
I have considered maybe Common Table Expressions may be of help but I have no experience with them and have no idea where to start.
Basically this same process needs to be done on the table every day as the table is truncated and re-populated every single day. Once the table is re-populated, I have to go through and re-mark each record if it is a repeat or not.
Some assistance would be most appreciated.
UPDATE
Here is a script to create a table and insert test data
USE [Test]
GO
/****** Object: Table [dbo].[Job] Script Date: 08/18/2009 07:55:25 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
DROP TABLE [dbo].[Job]
GO
USE [Test]
GO
/****** Object: Table [dbo].[Job] Script Date: 08/18/2009 07:55:25 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Job]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Job](
[JobID] [int] IDENTITY(1,1) NOT NULL,
[LineNumber] [nvarchar](20) NULL,
[IsRepeat] [bit] NULL,
[CreatedOn] [smalldatetime] NOT NULL,
CONSTRAINT [PK_Job] PRIMARY KEY CLUSTERED
(
[JobID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO
SET NOCOUNT ON
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-01 07:52:08')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-01 08:30:01')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-01 09:30:35')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-01 10:51:10')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-02 09:22:30')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-02 10:27:28')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-02 11:15:33')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-02 13:01:13')
INSERT INTO dbo.Job VALUES ('1014',NULL,'2009-07-03 12:05:56')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-03 13:57:34')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-03 15:38:54')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-04 16:32:20')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-05 13:46:46')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-05 15:08:35')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-05 15:19:50')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-05 16:37:19')
INSERT INTO dbo.Job VALUES ('1019',NULL,'2009-07-05 17:14:09')
INSERT INTO dbo.Job VALUES ('1009',NULL,'2009-07-05 20:55:08')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-06 08:29:29')
INSERT INTO dbo.Job VALUES ('1002',NULL,'2009-07-07 11:22:38')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-07 12:25:23')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-08 09:32:07')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-08 09:46:33')
INSERT INTO dbo.Job VALUES ('1016',NULL,'2009-07-08 10:09:08')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-09 10:45:04')
INSERT INTO dbo.Job VALUES ('1027',NULL,'2009-07-09 11:31:23')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-09 13:10:06')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-09 15:04:06')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-09 17:32:16')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-09 19:51:28')
INSERT INTO dbo.Job VALUES ('1000',NULL,'2009-07-10 15:09:42')
INSERT INTO dbo.Job VALUES ('1025',NULL,'2009-07-10 16:15:31')
INSERT INTO dbo.Job VALUES ('1006',NULL,'2009-07-10 21:55:43')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-11 08:49:03')
INSERT INTO dbo.Job VALUES ('1022',NULL,'2009-07-11 16:47:21')
INSERT INTO dbo.Job VALUES ('1026',NULL,'2009-07-11 18:23:16')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-11 19:49:31')
INSERT INTO dbo.Job VALUES ('1029',NULL,'2009-07-12 11:57:26')
INSERT INTO dbo.Job VALUES ('1003',NULL,'2009-07-13 08:32:20')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-13 09:31:32')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 09:52:54')
INSERT INTO dbo.Job VALUES ('1021',NULL,'2009-07-14 11:22:31')
INSERT INTO dbo.Job VALUES ('1023',NULL,'2009-07-14 11:54:14')
INSERT INTO dbo.Job VALUES (NULL,NULL,'2009-07-14 15:17:08')
INSERT INTO dbo.Job VALUES ('1005',NULL,'2009-07-15 13:27:08')
INSERT INTO dbo.Job VALUES ('1010',NULL,'2009-07-15 14:10:56')
INSERT INTO dbo.Job VALUES ('1011',NULL,'2009-07-15 15:20:50')
INSERT INTO dbo.Job VALUES ('1028',NULL,'2009-07-15 15:39:18')
INSERT INTO dbo.Job VALUES ('1012',NULL,'2009-07-15 16:06:17')
INSERT INTO dbo.Job VALUES ('1017',NULL,'2009-07-16 11:52:08')
SET NOCOUNT OFF
GO
Ignores LineNumber is null. How should IsRepeat be handled in that case?
It works for test data. Whether it will be efficient enough for production volumes?
In the case of duplicate (LineNumber, CreatedOn) on pairs, arbitrarily choose one. (The one with minimum JobId)
Basic idea:
Get all JobId pairs that
are at least seven days apart, by
line number.
Count the number of
rows that are more than seven days
from the left side, upto and
including the right side. (CNT)
Then we know if JobId x is not a repeat, the next not a repeat is the pair with X on
the left side, and CNT = 1
Use recursive CTE to start with the first row for each LineNumber
Recursive element uses the pair with counts to get the next row.
Finally update, setting all IsRepeat to 0 for non-repeats and 1 for everything else.
; with AllPairsByLineNumberAtLeast7DaysApart (LineNumber
, LeftJobId
, RightJobId
, BeginCreatedOn
, EndCreatedOn) as
(select l.LineNumber
, l.JobId
, r.JobId
, dateadd(day, 7, l.CreatedOn)
, r.CreatedOn
from Job l
inner join Job r
on l.LineNumber = r.LineNumber
and dateadd(day, 7, l.CreatedOn) < r.CreatedOn
and l.JobId <> r.JobId)
-- Count the number of rows within from BeginCreatedOn
-- up to and including EndCreatedOn
-- In the case of CreatedOn = EndCreatedOn,
-- include only jobId <= jobid, to handle ties in CreatedOn
, AllPairsCount(LineNumber, LeftJobId, RightJobId, Cnt) as
(select ap.LineNumber, ap.LeftJobId, ap.RightJobId, count(*)
from AllPairsByLineNumberAtLeast7DaysApart ap
inner join Job j
on j.LineNumber = ap.LineNumber
and ap.BeginCreatedOn <= j.createdOn
and (j.CreatedOn < ap.EndCreatedOn
or (j.CreatedOn = ap.EndCreatedOn
and j.JobId <= ap.RightJobId))
group by ap.LineNumber, ap.LeftJobId, ap.RightJobId)
, Step1 (LineNumber, JobId, CreatedOn, RN) as
(select LineNumber, JobId, CreatedOn
, row_number() over
(partition by LineNumber order by CreatedOn, JobId)
from Job)
, Results (JobId, LineNumber, CreatedOn) as
-- Start with the first rows.
(select JobId, LineNumber, CreatedOn
from Step1
where RN = 1
and LineNumber is not null
-- get the next row
union all
select j.JobId, j.LineNumber, j.CreatedOn
from Results r
inner join AllPairsCount apc on apc.LeftJobId = r.JobId
inner join Job j
on j.JobId = apc.RightJobId
and apc.CNT = 1)
update j
set IsRepeat = case when R.JobId is not null then 0 else 1 end
from Job j
left outer join Results r
on j.JobId = R.JobId
where j.LineNumber is not null
EDIT:
After I turned off the computer last night I realized I had made things more complicated than they needed to be. A more straightforward (and on the test data, slightly more effecient) query:
Basic Idea:
Generated PotentialStep (FromJobId, ToJobId) These are the pairs where if FromJobId
is not a repeat, than ToJobId is also not a repeat. (First row by LineNumber more
than seven days from FromJobId)
Use a recursive CTE to start from the first JobId for each LineNumber and then step,
using PontentialSteps, to each Non Repeating JobId
; with PotentialSteps (FromJobId, ToJobId) as
(select FromJobId, ToJobId
from (select f.JobId as FromJobId
, t.JobId as ToJobId
, row_number() over
(partition by f.LineNumber order by t.CreatedOn, t.JobId) as RN
from Job f
inner join Job t
on f.LineNumber = t.LineNumber
and dateadd(day, 7, f.CreatedOn) < t.CreatedOn) t
where RN = 1)
, NonRepeats (JobId) as
(select JobId
from (select JobId
, row_number() over
(partition by LineNumber order by CreatedOn, JobId) as RN
from Job) Start
where RN = 1
union all
select J.JobId
from NonRepeats NR
inner join PotentialSteps PS
on NR.JobId = PS.FromJobId
inner join Job J
on PS.ToJobId = J.JobId)
update J
set IsRepeat = case when NR.JobId is not null then 0 else 1 end
from Job J
left outer join NonRepeats NR
on J.JobId = NR.JobId
where J.LineNumber is not null
UPDATE Jobs
SET Jobs.IsRepeat = 0 -- mark all of them IsRepeat = false
UPDATE Jobs
SET Jobs.IsRepeat = 1
WHERE EXISTS
(SELECT TOP 1 i.LineNumber FROM Jobs i WHERE i.LineNumber = Jobs.LineNumber
AND i.CreatedOn <> Jobs.CreatedOn and i.CreatedOn BETWEEN Jobs.CreatedOn - 7
AND Jobs.CreatedOn)
NOTE: I hope this helps you somewhat. Let me know, if you find any discrepancy that you will come across on a larger data set.
I'm not proud of this, it makes many assumptions (e.g. that CreatedOn is date only, and (LineNUmber,CreatedOn) is a key. Some tuning may be required, only works with test data.
In other words, I created this more for intellectual curiosity rather than because I think it's a genuine solution. Final select could be an update to set IsRepeat in the base table, based on existence on rows in V4. Final note before letting people see evil - could people please post test data in comments for data sets that it doesn't work for. It might be possible to turn this into a real solution:
with V1 as (
select t1.LineNumber,t1.CreatedOn,t2.CreatedOn as PrevDate from
T1 t1 inner join T1 t2 on t1.LineNumber = t2.LineNumber and t1.CreatedOn > t2.CreatedOn and DATEDIFF(DAY,t2.CreatedOn,t1.CreatedOn) < 7
), V2 as (
select v1.LineNumber,v1.CreatedOn,V1.PrevDate from V1
union all
select v1.LineNumber,v1.CreatedOn,v2.PrevDate from v1 inner join v2 on V1.LineNumber = v2.LineNumber and v1.PrevDate = v2.CreatedOn
), V3 as (
select LineNumber,CreatedOn,MIN(PrevDate) as PrevDate from V2 group by LineNumber,CreatedOn
), V4 as (
select LineNumber,CreatedOn from V3 where DATEDIFF(DAY,PrevDate,CreatedOn) < 7
)
select
T1.LineNumber,
T1.CreatedOn,
CASE WHEN V4.LineNumber is Null then 0 else 1 end as IsRepeat
from
T1
left join
V4
on
T1.LineNumber = V4.LineNumber and
T1.CreatedOn = V4.CreatedOn
order by T1.CreatedOn,T1.LineNumber
option (maxrecursion 7)