Seeing as the data had to be retrievable for any date, I could not simply delete the very old data. These tables also had constant inserts and updates into them, so making sure the tables remained available became important, i.e. needed to have acceptable time that the table was being locked, with time for waiting transactions to finish.
The solution I came up with does this with variable size batches. Now, with modern versions of SQL, there are other ways to do this, but the good thing about this method it works regardless of version of SQL, as well as edition. Azure SQL DB would need some modification to make it work to archive to a separate database.
The following sample illustrates arching data from two tables, in a typical Sales header and detail tables, data consistency must be maintained.
CREATE OR ALTER PROCEDURE prc_ArchiveSalesData (@StartDate date, @EndDate date, @BatchSize INT)AS
BEGIN
SET NOCOUNT ON
SET XACT_ABORT ON
DECLARE @rowcount INT-- set end date to be one after the date sent in to account for time data.
-- Better performance than converting data in where clause
SET @EndDate = DATEADD(DAY,1,@EndDate)
-- check for existing temp tables , drop and create
IF OBJECT_ID('tempdb..#AllItems') IS NOT NULL
DROP TABLE #AllItems
CREATE TABLE #AllItems (
SalesID INT
)
CREATE INDEX #idxAllItems ON #AllItems(SalesID) --lots of joins to happen on this
IF OBJECT_ID('tempdb..#CurrentBatch') IS NOT NULL
DROP TABLE #CurrentBatch
CREATE TABLE #CurrentBatch (
SalesID INT
)
CREATE INDEX #idx_Current ON #CurrentBatch(SalesID)
-- gather all items for the given date range
INSERT INTO #AllItems
SELECT S.SalesID FROM Sale S WHERE S.SaleDate >= @StartDate AND S.SaleDate < @EndDate
-- loop outside transaction, that way if a failure or need to cancel, will commit the succesful ones
INSERT INTO #CurrentBatch
SELECT TOP (@BatchSize) SalesID FROM #AllItems ORDER BY SalesID ASC -- technically no need to do this order by, but makes debugging easier
SELECT @rowcount = @@ROWCOUNT
WHILE @rowcount > 0
BEGIN
BEGIN TRANSACTION-- copy to archive tables
INSERT INTO SalesArchive.dbo.Sale (SalesID, CustomerID, SalesPersonID, TotalSales, SaleDate)
SELECT S.* -- this would make archive fail if we change the table, which we would want
FROM Sale S
INNER JOIN #CurrentBatch CB
ON S.SalesID = CB.SalesID
INSERT INTO SalesArchive.dbo.SalesDetail ( SalesDetailID, SalesID, ItemID, Quantity)
SELECT SD.*
FROM SalesDetail SD
INNER JOIN #CurrentBatch CB
ON SD.SalesID = CB.SalesID
-- delete from production tables
DELETE FROM SalesDetail
FROM SalesDetail SD
INNER JOIN #CurrentBatch CB
ON SD.SalesID = CB.SalesID
DELETE FROM Sale
FROM Sale S
INNER JOIN #CurrentBatch CB
ON S.SalesID = CB.SalesID
-- delete the current batch from all items
DELETE FROM #AllItems
FROM #AllItems AI
INNER JOIN #CurrentBatch CB
ON AI.SalesID = CB.SalesID
-- delete from current batch completed
DELETE FROM #CurrentBatch
COMMIT TRANSACTION -- committing transaction here makes it possible to stop the archiving and not loose everything
WAITFOR delay '00:00:10' -- 10 second pause before continuing, adjust as needed
INSERT INTO #CurrentBatch
SELECT TOP (@BatchSize) SalesID FROM #AllItems ORDER BY SalesID ASC -- technically no need to do this order by, but makes debugging easier
SELECT @rowcount = @@ROWCOUNT
END -- rowcount loop
END -- proc
Hope you find this useful if you ever find a need for archiving large amounts of data.
No comments:
Post a Comment