https://geek-goddess-bonnie.blogspot.com/2015/07/sql-server-chunking-large-deletes.html
I wasn't totally happy with how I ended up doing the deletes in that previous blog post. I also never took into account the effect of changing the AutoGrowth and MaxSize properties. Plus, there are various ways that these deletes can be done. I'll use the date as the condition to delete by:
- Method #1
- Do the DELETE by the deletion criteria in the loop:
- DELETE TOP(X_number) MyTable WHERE saveddatetime < @time
- Do the DELETE by the deletion criteria in the loop:
- Method #2
- Select the deletion criteria into a Table parameter first. Then, in the loop, DELETE TOP()
- DECLARE @keysToDelete TABLE (keys bigint)
- INSERT INTO @keysToDelete SELECT TheKey FROM MyTable WHERE saveddatetime < @time
- -- in the loop:
- DELETE TOP (X_number) MyTable WHERE TheKey IN (SELECT keys FROM @keysToDelete)
- Select the deletion criteria into a Table parameter first. Then, in the loop, DELETE TOP()
- Method #3
- Do the SELECT into the Table parameter and the DELETE both inside the loop
- INSERT INTO @keysToDelete SELECT TOP (X_number) TheKey FROM MyTable WHERE saveddatetime < @time
- DELETE MyTable WHERE TheKey IN (SELECT keys FROM @keysToDelete)
- Do the SELECT into the Table parameter and the DELETE both inside the loop
DECLARE @StartTime datetime
DECLARE @LoopStart datetime
DECLARE @Elapsed int
DECLARE @DaysToKeep int = 7
-- @DaysToKeep is passed into the SP as a parameter
IF @DaysToKeep > 0
SET @DaysToKeep = -@DaysToKeep -- need to make it negative
SET NOCOUNT ON
DECLARE @time datetime
DECLARE @rc INT
DECLARE @LatestTime datetime
SELECT @LatestTime = MAX(saveddatetime) FROM message
SET @time = DATEADD(day, @delay, @LatestTime)
-- method #1
SET @rc = 1
SET @StartTime = GETDATE()
WHILE @rc > 0
BEGIN
BEGIN TRANSACTION
SET @LoopStart = GETDATE()
--DELETE TOP (10000) message WHERE saveddatetime < @time
DELETE TOP (100000) message WHERE saveddatetime < @time
SET @rc = @@ROWCOUNT
COMMIT TRANSACTION
CHECKPOINT
SET @Elapsed = DATEDIFF(MILLISECOND, @LoopStart, GETDATE())
RAISERROR ('Elapsed %d milliseconds to delete %d rows', 0, 0, @Elapsed, @rc) WITH NOWAIT
END
SET @Elapsed = DATEDIFF(MILLISECOND, @StartTime, GETDATE())
RAISERROR ('TOTAL Elapsed %d milliseconds', 0, 0, @Elapsed) WITH NOWAIT
-- method #2
DECLARE @keysToDelete TABLE (keys bigint) -- this is also needed in method #3
SET @StartTime = GETDATE()
INSERT INTO @keysToDelete SELECT messagekey FROM message WHERE saveddatetime < @time
SET @rc = 1
WHILE @rc > 0
BEGIN
BEGIN TRANSACTION
SET @LoopStart = GETDATE()
--DELETE TOP (10000) message WHERE messagekey IN (SELECT keys FROM @keysToDelete)
DELETE TOP (100000) message WHERE messagekey IN (SELECT keys FROM @keysToDelete)
SET @rc = @@ROWCOUNT
COMMIT TRANSACTION
CHECKPOINT
SET @Elapsed = DATEDIFF(MILLISECOND, @LoopStart, GETDATE())
RAISERROR ('Elapsed %d milliseconds to delete %d rows', 0, 0, @Elapsed, @rc) WITH NOWAIT
END
SET @Elapsed = DATEDIFF(MILLISECOND, @StartTime, GETDATE())
RAISERROR ('TOTAL Elapsed %d milliseconds', 0, 0, @Elapsed) WITH NOWAIT
-- method #3
SET @rc = 1
SET @StartTime = GETDATE()
WHILE @rc > 0
BEGIN
BEGIN TRANSACTION
SET @LoopStart = GETDATE()
--INSERT INTO @keysToDelete SELECT TOP (10000) messagekey FROM message WHERE saveddatetime < @time
INSERT INTO @keysToDelete SELECT TOP (100000) messagekey FROM message WHERE saveddatetime < @time
DELETE message WHERE messagekey IN (SELECT keys FROM @keysToDelete)
SET @rc = @@ROWCOUNT
DELETE @keysToDelete
COMMIT TRANSACTION
CHECKPOINT
SET @Elapsed = DATEDIFF(MILLISECOND, @LoopStart, GETDATE())
RAISERROR ('Elapsed %d milliseconds to delete %d rows', 0, 0, @Elapsed, @rc) WITH NOWAIT
END
SET @Elapsed = DATEDIFF(MILLISECOND, @StartTime, GETDATE())
RAISERROR ('TOTAL Elapsed %d milliseconds', 0, 0, @Elapsed) WITH NOWAIT
If you routinely have a lot of rows to delete, like 500,000 or a million or more, then you should think about deleting in larger chunks, but be aware that the more rows that you delete at one time, the larger the database log file will grow. Be sure to account for that by not limiting your log file to too small a size.
For my testing, I have a database table with 2 million rows and will be deleting approximately 1.8 million rows (which will leave 200,000 rows of the most recent data). This is from a test system that we had been hitting for months without a cleanup in place yet. I finally got the cleanup stored procedures to my liking and figured it was time to do it. There was a lot of tweaking going on as I tested this off and on (in between other work that had more priority), but what I learned is the subject of this blog.
The test database started with a 18 MB log file and a maxsize of 1000MB
Log File Growth of 1 MB Deleting 100,000 at a time |
Log File Growth of 1 MB Deleting 10,000 at a time |
Method #1 (log grew 796 times to 814 MB) TOTAL Elapsed 226777 milliseconds (3:48) |
Method #1 (log grew 165 times to 183 MB) TOTAL Elapsed 303567 milliseconds (5:05) |
Method #2 (log grew 796 times to 814 MB) OTAL Elapsed 603477 milliseconds (10:05) |
Method #2 (log grew 110 times to 128 MB) TOTAL Elapsed 1147310 milliseconds (19:09) |
Method #3 (log grew 940 times to 958 MB) TOTAL Elapsed 610474 milliseconds (10:12) |
Method #3 (log grew 157 times to 175 MB) TOTAL Elapsed 580877 milliseconds (9:43) |
Log File Growth of 32 MB Deleting 100,000 at a time |
Log File Growth of 32 MB Deleting 10,000 at a time |
Method #1 (log grew 28 times to 914 MB) TOTAL Elapsed 206517 milliseconds (3:28) |
Method #1 (log grew 9 times to 306 MB) TOTAL Elapsed 286067 milliseconds (4:48) |
Method #2 (log grew 25 times to 818 MB) TOTAL Elapsed 605470 milliseconds (10:07) |
Method #2 (log grew 9 times to 306 MB) TOTAL Elapsed 1196263 milliseconds (19:58) |
Method #3 (log grew 29 times to 946 MB) TOTAL Elapsed 607330 milliseconds (10:09) |
Method #3 (log grew 8 times to 274 MB) TOTAL Elapsed 599856 milliseconds (10:01) |
Notice the huge difference in time with Method #2 when deleting 100,000 vs 10,000. In both Autogrowth scenarios, the time it took to delete nearly doubled (10 minutes vs 19 to 20 minutes). That was unexpected, and I have no idea why. Very strange.
The times for all 3 methods were very similar between the 1 MB and 32 MB Autogrowth settings. However, I would avoid the smaller Autogrowth setting due to how many times the log file grows (as I said earlier, that can be bad for performance). I used the 1MB setting in this test because the database I tested with defaulted to that, which is too small I think. From the Googling I’ve done, 32 MB seemed reasonable.
The takeaway from all of this is that your individual preferences for your own databases will influence how you handle this:
If you prefer that this run as quickly as possible and you don’t mind if your log file is large, then you’ll want to do something along the lines of Method #1 (and, if you do the 10,000 delete with that methodology you even get a decent size log file).
If you’d rather keep your log file as small as possible, and you don’t mind if it takes longer, then Method #3 with 10,000 deletes seems to be the way to go.
Now, all that said, these 3 Methods are fairly simplistic in their approach. Your own delete conditions may involve much more complicated SELECTs with JOINs and who knows what else. Some of my real-life conditions were simple, but some were very, very complex. YMMV
One last note: if you are running this cleanup processor from a .NET application, be sure to run it in its own thread, so that it can keep on chugging until it’s done without interrupting anything else. If you set the SqlCommand.CommandTimeout = 0, then you won’t have to worry about your app crashing with a SqlTimeout exception.