Purge data using a while loop.

Purge data using a while loop. - tsql

I have a database that gets populated daily with incremental data and then at the end of each month a full download of the month's data is put into the system. Our business wants each day put into the system and then at the end of the month the daily stuff is removed and the full month data is left. I have written the query below and if you could help I'd appreciate it.
DECLARE #looper INT
DECLARE #totalindex int;
select name, (substring(name,17,8)) as Attempt, substring(name,17,4) as [year], substring(name,21,2) as [month], create_date
into #work_to_do_for
from sys.databases d
where name like 'Snapshot%' and
d.database_id >4 and
(substring(name,21,2) = DATEPART(m, DATEADD(m, -1, getdate()))) AND (substring(name,17,4) = DATEPART(yyyy, DATEADD(m, -1, getdate())))
order by d.create_date asc
SELECT #totalindex = COUNT(*) from #work_to_do_for
SET #looper = 1 -- reset and reuse counter
WHILE (#looper < #totalindex)
BEGIN;
set #looper=#looper+1
END;
DROP TABLE #work_to_do_for;
I'd need to perform the purge on several tables.
Thanks in advance.

When I delete large numbers of records, I always do it in batches and off-hours so as not to use up resources during production processes. To accomplish this, you incorporate a loop and some testing to find the optimal number to delete at a time.
begin transaction del -- I always use transactions as a safeguard
declare #count int = 1
while #count > 0
begin
delete top (100000) t
from dbo.MyTable t -- JOIN if necessary
-- WHERE if necessary
set #count = ##ROWCOUNT
end
Run this manually (without the WHILE loop) 1 time with 100000 records in parenthesis and see what your execution time is. Write it down. Run it again with 200000 records. Check the time; write it down. Run it with 500000 records. What you're looking for is a trend in the execution time. As long as the time required to delete 100000 records is decreasing as you increase the batch size, keep increasing it. You might end at 500k, but this method will help you find the optimal number to delete per batch. Then, run it as a loop.
That being said, if you are literally deleting MILLIONS of records, it might make more sense to drop and recreate the table as long as you aren't going to interfere with other processes. If you needed to save some of the data, you could insert what you needed into a new table (eg MyTable_New), drop the original table (MyTable), and rename MyTable_New to MyTable.

The script you've posted iterating through with a while loop to delete the rows should be changed to a set-based operation if at all possible. Relational database engines excel at set-based operations like
Delete dbo.table WHERE yourcolumn = 5
as opposed to iterating through one at a time. Especially if it will be for "several million" rows as you indicated in the comments above.

#rwking where are you putting the COMMIT to the Transaction.. I mean are you keeping the all eligible Delete count in single Transaction and doing one final Commit?
I have the similar type of Requirement where I have to Delete in batches, and also track the number of count affected in the end.
My Sample Code is as Follows:
Declare #count int
Declare #deletecount int
set #count=0
While(1=1)
BEGIN
BEGIN TRY
BEGIN TRAN
DELETE TOP 1000 FROM --CONDITION
SET #COUNT = #COUNT+##ROWCOUNT
IF (##ROWCOUNT)=0
Break;
COMMIT
END CATCH
BEGIN CATCH
ROLLBACK;
END CATCH
END
set #deletecount=#COUNT
Above Code Works fine, but how to keep track of #deletecount if Rollback happens in one of the batch.

Related

Performance of a stored procedure in SQL Server 2008 R2 (using to purge records from a table)

I have a database purge process that uses a stored procedure to delete records from a huge table based on Expire Date, it runs every 3 weeks and delete about 3 million records.
Currently it is taking about 5 hours to purge the data which is causing lot of problems. I know there are lot of efficient way to write the code, but I'm out of ideas, please help me to the right direction.
--Stored Procedure
CREATE PROCEDURE [dbo].[pa_Expire_StoredValue_By_Date]
#ExpireDate DateTime, #NumExpired int OUTPUT, #RunAgain int OUTPUT
AS
-- This procedure expires all the StoredValue records that have an ExpireDate less than or equal to the DeleteDate provided
-- and have QtyUsed<QtyEarned
-- invoked by DBPurgeAgent
declare #NumRows int
set nocount on
BEGIN TRY
BEGIN TRAN T1
set #RunAgain = 1;
select #NumRows = count(*) from StoredValue where ExpireACK = 1;
if #NumRows = 0
begin
set rowcount 1800; -- only delete 1800 records at a time
update StoredValue with (RowLock)
set ExpireACK = 1
where ExpireACK = 0
and ExpireDate < #ExpireDate
and QtyEarned > QtyUsed;
set #NumExpired=##RowCount;
set rowcount 0
end
else begin
set #NumExpired = #NumRows;
end
if #NumExpired = 0
begin -- stop processing when there are no rows left
set #RunAgain = 0;
end
else begin
Insert into SVHistory (LocalID, ServerSerial, SVProgramID, CustomerPK, QtyUsed, Value, ExternalID, StatusFlag, LastUpdate, LastLocationID, ExpireDate, TotalValueEarned, RedeemedValue, BreakageValue, PresentedCustomerID, PresentedCardTypeID, ResolvedCustomerID, HHID)
select
SV.LocalID, SV.ServerSerial, SV.SVProgramID, SV.CustomerPK,
(SV.QtyEarned-SV.QtyUsed) as QtyUsed, SV.Value, SV.ExternalID,
3 as StatusFlag, getdate() as LastUpdate,
-9 as LocationID, SV.ExpireDate, SV.TotalValueEarned,
0 as RedeemedValue,
((SV.QtyEarned-SV.QtyUsed)*SV.Value*isnull(SVUOM.UnitOfMeasureLimit, 1)),
PresentedCustomerID, PresentedCardTypeID,
ResolvedCustomerID, HHID
from
StoredValue as SV with (NoLock)
Left Join
SVUnitOfMeasureLimits as SVUOM on SV.SVProgramID = SVUOM.SVProgramID
where
SV.ExpireACK = 1
Delete from StoredValue with (RowLock) where ExpireACK = 1;
end
COMMIT TRAN T1;
END TRY
BEGIN CATCH
set #RunAgain = 0;
IF ##TRANCOUNT > 0 BEGIN
ROLLBACK TRAN T1;
END
DECLARE #ErrorMessage NVARCHAR(4000);
DECLARE #ErrorSeverity INT;
DECLARE #ErrorState INT;
SELECT
#ErrorMessage = ERROR_MESSAGE(),
#ErrorSeverity = ERROR_SEVERITY(),
#ErrorState = ERROR_STATE();
RAISERROR (#ErrorMessage, #ErrorSeverity, #ErrorState);
END CATCH

Why you're running with this logic makes no sense to me. It looks like you are batching by rerunning the stored proc over and over again. You really should just do it in a WHILE loop and use smaller batches within a single run of the stored proc. You also should run in smaller transactions, this will speed things up considerably. Arguably, the way this is written you don't need a transaction. You can resume since you are flagging every record.
It's also not clear why you are touching the table 3 times. You really shouldn't need to update a flag AND select the rows into a new table AND then delete them. You can just use an output clause to do this in one step if desired, but you need to clarify your logic to get help on that.
Also, why are you using ROWLOCK? Lock escalation is fine and makes things run faster (less memory holding locks). Are you running this while the system is live? If it's after hours, use TABLOCK instead.
This is some suggested pseudo-code you can flesh out. I recommend taking #BatchSize as a parameter. Also obviously missing is error handling, this is just the core for the delete logic.
WHILE 1=1
BEGIN
BEGIN TRAN
UPDATE TOP (#BatchSize) StoredValue
SET <whatever>
INSERT INTO SVHistory <insert statement>
DELETE FROM StoredValue WHERE ExpireAck=1
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRAN
BREAK;
END
COMMIT TRAN
END

First look at what is causing it to be slow by looking at the execution oplan. Is it the insert statement or the delete?
Due to the calculations in the insert, I would suspect it is the slower part (unless you have triggers or cascade delete on the table). You could change the history table to have the columns you use in the calculation and allow nulls in the calculated fields. Now you can insert to that table more quickly and then do the delete. In a separate step in your job, you can then update the calculations. This would at least tie things up for a shorter period of time, but depending on how the history table is acccessed may or may not be possible.
Another out of the box possibility is to rename your table StoredValue to StoredValueRaw and create a view called StoredValue that only displays the active records. Then the job to delete the records could run every fifteen minutes or so and only delete a few records at a time. This might be far less disruptive to the users even if the actual deletes take longer. You still might need to put the records in the history table at the time they are identified as expired.
You should possibly rethink doing this process every three weeks as well, the fewer records you have to deal with the faster it will go.

Conflict between UPDATE and SELECT

I have a table DB.DATA_FEED that I update using a T/SQL Procedure. Every minute, the procedure below is executed 100 times for different data.
ALTER PROCEDURE [DB].[UPDATE_DATA_FEED]
#P_MARKET_DATE varchar(max),
#P_CURR1 int,
#P_CURR2 int,
#P_PERIOD float(53),
#P_MID float(53)
AS
BEGIN
BEGIN TRY
UPDATE DB.DATA_FEED
SET
MID = #P_MID,
MARKET_DATE = convert(datetime,#P_MARKET_DATE, 103)
WHERE
cast(MARKET_DATE as date) =
cast(convert(datetime,#P_MARKET_DATE, 103) as date) AND
CURR1 = #P_CURR1 AND
CURR2 = #P_CURR2 AND
PERIOD = #P_PERIOD
IF ##TRANCOUNT > 0
COMMIT WORK
END TRY
BEGIN CATCH
--error code
END CATCH
END
END
When Users use the application, then they also read from this table, as per the SQL below. Potentially this select can run thousands of times in one minute. (Questions marks are replaced by parser with appropriate date/numbers)
DECLARE #MYDATE AS DATE;
SET #MYDATE='?'
SELECT *
FROM DB.DATA_FEED
WHERE MARKET_DATE>=#MYDATE AND MARKET_DATE<DATEADD(D,1,#MYDATE)
AND CURR1 = ?
AND CURR2 = ?
AND PERIOD = ?
ORDER BY PERIOD
I have sometimes, albeit rarely, got a database lock.
Using the the script from http://sqlserverplanet.com/troubleshooting/blocking-processes-lead-blocker I saw it was SPID=58. I then did DECLARE #SPID INT; SET #SPID = 58; DBCC INPUTBUFFER(#SPID) to find the SQL script which turned out to be my select statement.
Is there something wrong with my SQL code? What can I do to prevent such locks happening in the future?
Thanks

Readers have priority over writers so when someone is writing the readers have to wait for the writing to finish. There are two Table Hints you ca try one is NOLOCK that reads uncommited lines (dirty reads) and the other is READPAST (only reads information that has been commited on the last commit). In both cases the readers never block the table, there for do not deadlock a writer.
Writers can block other writers but, if I understood correctly, only one write per execution so the readers will intercalate writes, diminuishing the deadlocks.
Hope it helps.

After Insert Trigger....Looping

What would the processing load concern be if I had an "After Insert" trigger created on a table and in that trigger I performed a While loop to iterate through "potentially" multiple rows?
End result is I will 99.999% of the time have only 1 row, but as the future is unpredictable i also want to be able to handle multiple rows being inserted.
Trigger Model:
1) Insert information into the table
2) Create views specific to the client, via stored procedures (if possible)
What Say You? :)
Haven't fully developed but this is the design i am looking for, may not be structurally sound but should get the point acrossed.
CREATE TRIGGER dbo.New_Client_Setup
ON dbo.client
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
--Fill Temp Table
select * into #clients
from inserted
--Iterate through Temp Table
While (select count(*) from #clients) <> 0 BEGIN
declare #id int, #clnt nvarchar(10)
select top(1)
#id = id
, #clnt = short
order by id desc
Execute dbo.sp_Create_View_Client ( #id, #clnt )
-- Drop used ID
delete from #clients
where id = #id
END
Drop table #clients
END
GO
Again, observe the design of the trigger not necessarily the syntactic sugar

Design wise, reading the comments, I think you do not neccesarily need to do this in triggers. I would say you should do it as part of your insert statement in transactions - i.e. do the insert, and then do the loop that you want to do (whatever that does - execute dbo.sp_Create_View_Client)...
The second thing I would mention is what exactly is dbo.sp_Create_View_Client doing - is it a must-dependent on the insert? Meaning, what happens if the insert works fine, and the trigger fails? I would maybe do the whole insert and execute of the SP all in one transaction, so as to preserve data integrity.

Trigger Code on a table in my ERP Database

My ERP Vendor has the following trigger on a table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [dbo].[SOItem_DeleteCheck]
ON [dbo].[soitem]
FOR DELETE
AS
BEGIN
DECLARE #RecCnt int, #LogInfo varchar(256)
SET #RecCnt = (SELECT COUNT(*) FROM deleted)
IF #RecCnt > 150
BEGIN
RAISERROR (54010, 18, 1, 'SOItem') WITH LOG
ROLLBACK TRANSACTION
END
SET #LogInfo = 'Deleting ' + LTRIM(STR(#RecCnt)) + ' Rows From SOItem'
EXEC LogDeletes #LogInfo
END
GO
This seems very inefficient to me. Doesn't select count(*) take longer than Count(specific field)?

Honestly even if is is slower, I can run a select stament like that in less than a millisecond on my largest table that has millions of rows which this trigger is unlikely to hit. There is no real performance gain from changing it. I'm curious as to why you would want to rollback any transaction with more than 150 records thoug.

I think in the past there was a benefit to count(1) vs count(*), and we were all taught to use that approach, but at this point it's more about style than performance.

How do I avoid using cursors in Sybase (T-SQL)?

Imagine the scene, you're updating some legacy Sybase code and come across a cursor. The stored procedure builds up a result set in a #temporary table which is all ready to be returned except that one of columns isn't terribly human readable, it's an alphanumeric code.
What we need to do, is figure out the possible distinct values of this code, call another stored procedure to cross reference these discrete values and then update the result set with the newly deciphered values:
declare c_lookup_codes for
select distinct lookup_code
from #workinprogress
while(1=1)
begin
fetch c_lookup_codes into #lookup_code
if ##sqlstatus<>0
begin
break
end
exec proc_code_xref #lookup_code #xref_code OUTPUT
update #workinprogress
set xref = #xref_code
where lookup_code = #lookup_code
end
Now then, whilst this may give some folks palpitations, it does work. My question is, how best would one avoid this kind of thing?
_NB: for the purposes of this example you can also imagine that the result set is in the region of 500k rows and that there are 100 distinct values of look_up_code and finally, that it is not possible to have a table with the xref values in as the logic in proc_code_xref is too arcane._

You have to have a XRef table if you want to take out the cursor. Assuming you know the 100 distinct lookup values (and that they're static) it's simple to generate one by calling proc_code_xref 100 times and inserting the results into a table

Unless you are willing to duplicate the code in the xref proc, there is no way to avoid using a cursor.

They say, that if you must use cursor, then, you must have done something wrong ;-) here's solution without cursor:
declare #lookup_code char(8)
select distinct lookup_code
into #lookup_codes
from #workinprogress
while 1=1
begin
select #lookup_code = lookup_code from #lookup_codes
if ##rowcount = 0 break
exec proc_code_xref #lookup_code #xref_code OUTPUT
delete #lookup_codes
where lookup_code = #lookup_code
end

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Purge data using a while loop. - tsql

Related

Performance of a stored procedure in SQL Server 2008 R2 (using to purge records from a table)

Conflict between UPDATE and SELECT

After Insert Trigger....Looping

Trigger Code on a table in my ERP Database

How do I avoid using cursors in Sybase (T-SQL)?

Categories

Resources