Anders On SQL: 2017

Many times as a DBA I have taken over a system that has had data gathering for years. The worst I took over had about 400 million records in the main table, with an average of 24 records in the secondary related table per primary record, data going back 12+ years. In 2017, this might not be that bad, but back in 2002....

Seeing as the data had to be retrievable for any date, I could not simply delete the very old data. These tables also had constant inserts and updates into them, so making sure the tables remained available became important, i.e. needed to have acceptable time that the table was being locked, with time for waiting transactions to finish.

The solution I came up with does this with variable size batches. Now, with modern versions of SQL, there are other ways to do this, but the good thing about this method it works regardless of version of SQL, as well as edition. Azure SQL DB would need some modification to make it work to archive to a separate database.

The following sample illustrates arching data from two tables, in a typical Sales header and detail tables, data consistency must be maintained.

CREATE OR ALTER PROCEDURE prc_ArchiveSalesData (@StartDate date, @EndDate date, @BatchSize INT)AS
BEGIN
   SET NOCOUNT ON
   SET XACT_ABORT ON
  
   DECLARE @rowcount INT-- set end date to be one after the date sent in to account for time data. 
-- Better performance than converting data in where clause
   SET @EndDate = DATEADD(DAY,1,@EndDate)
-- check for existing temp tables , drop and create 
   IF OBJECT_ID('tempdb..#AllItems') IS NOT NULL
       DROP TABLE #AllItems
       CREATE TABLE #AllItems (
           SalesID INT
           )

   CREATE INDEX #idxAllItems ON #AllItems(SalesID) --lots of joins to happen on this

   IF OBJECT_ID('tempdb..#CurrentBatch') IS NOT NULL
       DROP TABLE #CurrentBatch

   CREATE TABLE #CurrentBatch (
       SalesID INT
       )

   CREATE INDEX #idx_Current ON #CurrentBatch(SalesID)
-- gather all items for the given date range
   INSERT INTO #AllItems
   SELECT  S.SalesID FROM Sale S WHERE S.SaleDate >= @StartDate AND S.SaleDate < @EndDate

   -- loop outside transaction, that way if a failure or need to cancel, will commit the succesful ones
   INSERT INTO #CurrentBatch
   SELECT TOP (@BatchSize) SalesID FROM #AllItems ORDER BY SalesID ASC -- technically no need to do this order by, but makes debugging easier
   SELECT @rowcount = @@ROWCOUNT
   WHILE @rowcount > 0
   BEGIN
       BEGIN TRANSACTION-- copy to archive tables
           INSERT INTO SalesArchive.dbo.Sale (SalesID, CustomerID, SalesPersonID, TotalSales, SaleDate)
           SELECT S.*  -- this would make archive fail if we change the table, which we would want
           FROM Sale S
           INNER JOIN #CurrentBatch CB
               ON S.SalesID = CB.SalesID
      
  
           INSERT INTO SalesArchive.dbo.SalesDetail ( SalesDetailID, SalesID, ItemID, Quantity)
           SELECT  SD.*
           FROM SalesDetail SD
           INNER JOIN #CurrentBatch CB
               ON SD.SalesID = CB.SalesID

-- delete from production tables

           DELETE FROM SalesDetail
           FROM SalesDetail SD
           INNER JOIN #CurrentBatch CB
               ON SD.SalesID = CB.SalesID

           DELETE FROM Sale
           FROM Sale S
           INNER JOIN #CurrentBatch CB
               ON S.SalesID = CB.SalesID
-- delete the current batch from all items
           DELETE FROM #AllItems
           FROM #AllItems AI
           INNER JOIN #CurrentBatch CB
               ON AI.SalesID = CB.SalesID
-- delete from current batch completed
           DELETE FROM #CurrentBatch
       COMMIT TRANSACTION       -- committing transaction here makes it possible to stop the archiving and not loose everything

       WAITFOR delay '00:00:10' -- 10 second pause before continuing, adjust as needed


       INSERT INTO #CurrentBatch
       SELECT TOP (@BatchSize) SalesID FROM #AllItems ORDER BY SalesID ASC -- technically no need to do this order by, but makes debugging easier
       SELECT @rowcount = @@ROWCOUNT

   END -- rowcount loop
END -- proc

Hope you find this useful if you ever find a need for archiving large amounts of data.

SentryOne is a great tool for monitoring servers, but with a lot of "chatty" servers, the amount of data collected can become unwieldy. With about 65 servers monitored, the delete functionality in SentryOne could not keep up with the deletes in the PerformanceAnalysisDataTableAndIndexCounter (hereafter just refered to as the table....) table. At the point I got brought in to clean it up there was close to a billion records in this table, and we were about to add another 110+ servers to be monitored. The table itself was about 320 GB, the cleanup function was not working, and the manual method provided by SentryOne support could not handle the deletes without causing a lot of blocking and time out issues.

The delete function provided had multiple manual steps, and the where clause was not well aligned with the indexes on the table. To be able to handle cleanup of this size, and not cause undue blocking and timeouts in the collection of data, I decided to find a better way.

Looking at the code provided by SentryOne, and digging a bit in the database itself, found how I could get the deletes into smaller, adjustable batches.

The first problem was to find the boundaries for the delete, and this step is the single most time consuming step in the process:

SELECT @EndID = MAX(id) FROM table WITH (nolock) WHERE TIMESTAMP <= @EndTimeStamp -- this step can take time
Since data in this table is just written to and never updated, using nolock here is ok.

After this it becomes a simple loop until you meet a goal in batches. The delete is done with a simple top statement in a CTE.

After I was done catching up to the 15 days retention rate we wanted, which is hardcoded into the procedure to avoid accidentally deleting more than we wanted and can be changed if your retention rate is different, the procedure was set up to simply be called once a day. We decided to set the retention rate in SentryOne to be 1 day longer than the procedure, so if anything was ever changed SentryOne delete routine would kick in if this one fails.

Hopefully this is useful for you if you run into this problem with SentryOne. The same setup can be used for other tables as well, but PLEASE check with their support before you start to delete data that it is ok since some tables are detail "collector" tables, some are summaries that you need for your time lines. The same format was used to delete in a couple of other tables, including the one that collects deadlock information.......

The code in its entirety is on Github, or below. I would highly recommend getting it from Github if you are going to use it instead of here.

The parameters to use needs to be adjusted for your use and server performance, the defaults are just a safe small run of deletes.

@BatchSize increased as I got less data in the table, and how busy the system was. The trick is to get each batch to be big enough you will get done in a reasonable time with your cleanup, but not so large that they cause time outs. Only one way to figure out, run the procedure and keep a close eye on your system while it is running.

@MaxIterations is simply how many times to loop through running deletes.

USE SQLSentry
-- use SentryOne -- depends if new install, or upgradeCREATE PROCEDURE dbo.BatchDeletePerformanceAnalysisDataTableAndIndexCounterData
   @EndDateTime DATETIME,
   @MaxIterations INT = 1,
   @BatchSize INT = 1000
AS/******************
Anders Pedersen
2017-01-18

Made to batch delete data out of the PerformanceAnalysisDataTableAndIndexCounter table.
Delete statement provided by SentryOne could not handle a lot of data without
a lot of manual intervention (manual batches).

This uses the column with the primary key on it for deletes instead of the timestamp column,
this substantially speeds up deletes.

*/BEGIN
   SET NOCOUNT ON
   --set DEADLOCK_PRIORITY low -- if you see a lot of other processes being victim, set this
   --(unless it is more important to get it cleaned up than loosing a few records)
-- get first time stamp
   DECLARE @BeginTimeStamp INT, @EndTimeStamp INT
   DECLARE @BeginID bigint, @EndID bigint
-- sanity check, do not let it delete to a date less than 15 days back.  Making it 16 to cover the entire day.  Close enough
   IF DATEDIFF(DAY, @EndDateTime, GETDATE()) <= 16 OR @EndDateTime IS NULL
       SET @EndDateTime = DATEADD(DAY, -16, GETDATE())
   SELECT @EndTimeStamp = [dbo].[fnConvertDateTimeToTimestamp] (@EndDateTime)

   -- get ID for that newest one to delete
   SELECT @EndID = MAX(id) FROM PerformanceAnalysisDataTableAndIndexCounter WITH (nolock) WHERE TIMESTAMP <= @EndTimeStamp -- this step can take time
   SELECT @BeginID = MIN(id) FROM PerformanceAnalysisDataTableAndIndexCounter WITH (nolock) WHERE ID <= @EndID

   DECLARE @cnt  bigint = 0
   DECLARE @X INT = 1
   WHILE @x <= @MaxIterations AND @BeginID < @EndID
   BEGIN
       IF @x <> 1 -- skip wait on first iteration. Save time if you only running small batches
           WAITFOR DELAY '00:00:03'; -- 3 second delay to let transactions through.  This might need adjusted
       WITH CTE AS
       (
           SELECT TOP (@BatchSize) * FROM PerformanceAnalysisDataTableAndIndexCounter WHERE ID BETWEEN @BeginID AND @EndID ORDER BY ID ASC
       )
       DELETE FROM CTE
       SELECT @cnt += @@ROWCOUNT
       SET @x +=1
 
 
   END
   SELECT @cnt AS RecordsDeleted

END

Anders On SQL

Sunday, July 2, 2017

Archiving Data

Thursday, March 2, 2017

SentryOne cleanup, PerformanceAnalysisDataTableAndIndexCounter table