Tuesday, March 8, 2016

T-SQL Tuesday #76, text searching the hard way


T-SQL Tuesday #76 invite


Text searches.  

Possibly one of the least fun things to do in SQL.

Back in the days before Full-Text Search in SQL, I was tasked with replacing the search engine at ABC Radio News, this search engine helped them research any broadcast they had ever done.  The Fox Pro application they had in place was so slow, and unsupported, they needed something better.

Some of the requirements was either exact match, wild card searches, anywhere searches (i.e. the words typed into the search bar, should find any records that had the words anywhere in them).  Because they had gotten so used to the search being slow, they also had an ability to narrow down the search results with further key words.

Starting work on this, I first convinced them not to implement the search of the search function, and let me implement this the way I want, convinced I could make this fast enough without the use of such trickery.

Looking into what was needed, I quickly realized there was a LOT of data, guess 50+ years of news broadcasts will do this.  Consider this was in the early 2000s, some innovation was needed from anything I had coded before.  Obviously LIKE searches was out of the question, full text search was not available.  So what to do?

Basically I decided to break down each broadcast to words into a separate table, the entire application fit in 2 tables: Story and Words.

The Story table had everything in it, broadcast date, tape location, tape number, tape type, and the text of the broadcast itself.  The Words table had 3 columns:  StoryID, Word and WordCount.

The parsing of the Story table took a while, with old school TEXT data type, no easy way to do this.  I would then insert the results into the Words table, with the count showing how many times that word appeared in that one story (filtered out articles like the, a and an).

On the search itself, if "Anywhere search" was selected, it simply found any article that had all those words in the Words table, then returned the records from the Story table.  If "exact match" was selected it would use the Words table to find candidate records, then do a like search on the text column in the Story table that had the entire story.  Pretty simple.  And it was quick!  We could search all of their history in sub seconds.

When I first demonstrated this to a few users, it was immediately accepted and one of our front end developers made a quick application to sit as a front end for it.  The search capability was now so good some of them where upset since in the past they had used the time while waiting to go get coffee etc.  Win in my book.




Monday, February 29, 2016

Backup Restore Chain revisited.

(This is an update to Backup Restore Chain article I posted back in September.)

Again, much thanks to SQLSoldier for the initial code.

Recently I got promoted to a new team that has been without a BDA for a while, and as such started poking around, first of all making sure backups are running, and that we have a reasonable expectation of being able to recover in case of a failure or problem with our databases.

I started some of these by grabbing some random database to test the restore chain on, running the script from the previous article.  On one of the databases this query just went into a tail spin, never returning, on a hunch I changed the MaxRecursion option from 0 to 100, knowing that it should never hit more than about 30 at the time due to the data.  Just selecting out of the Logs CTE, it became very apparent that I got into a cycle where it kept repeating over and over until it hit the max recursion allowed.

Looking at the backupset table, I saw this (condensed down for clarity and size since this table is huge):

SELECT  backup_set_idbackup_Start_dateB.First_LsnB.Last_Lsn,  B.TYPE
FROM 
msdb.dbo.backupset B
WHERE database_name 'AndersTest'
  




Backup_set_id 1713355 and 1713378 has First_Lsn = Last_Lsn, and they are the same between the two records.  There was more records where this was true, just happens to be early in the result set so was easy to spot.

Question then arose in my mind if this caused a problem with the CTE, which it is obvious that it did, but to fix it, I would have to remove those records from being considered in the restore chain (or switch to a loop from a recursive CTE).

I have to admit, I had always been of the impression that ALL backups in a chain had to be restored, but then started to think about the error message when you try to skip one, it says to the effect "LSN to large, try an earlier backup."  This makes sense, but would the reverse be true, that if the LSN did not change, you would not need to restore it?

Testing this I restored backup_set_ids 1713302, 1713332 and 1713401 (i.e. the full backup, and the 5:15 am and 6:00 am transaction log backups, skipping the 5:30 and 5:45 backups).  After the last transaction log backup was restored, I recovered the database and it came up just fine.

This tested, it was time to figure out exactly how I could make the Backup Restore Chain script work, after a few, uhm, misfires, it turned out to be very simple, simply ignore the records where the two LSNs where identical, added this to both queries inside the Logs CTE:

AND B.First_lsn <> B.Last_Lsn

This simple change did the trick, now when I run the query, I got this as my result:




I have tested the following script in as many scenarios as I could think of, but if you find something wrong, please let me know, either in comments here, or to my twitter account.


In rewriting this, I also decided to make it into a stored procedure that could take the database name as a required parameter, and a date as an optional.  With the date sent in, it will use the newest FULL backup prior on or before that date, this could very easily be changed if you need to do it by date/time if you run multiple full backups per day.  I decided to add this in case I ever need to restore prior to the latest full backup.


Code for procedure below, you can also find it saved on Github

USE MASTERGO
CREATE PROCEDURE sp_RestoreChainByDB
   
@DBName SYSNAME,
   
@MaxFullBackupDate date NULL
AS
BEGIN
/****************
Created by @SQLSoldier
http://www.sqlsoldier.com/wp/sqlserver/day3of31daysofdisasterrecoverydeterminingfilestorestoredatabase

2015-09-24
Modified by Anders Pedersen
Twitter: @arrowdrive
Blog: Blog.AndersOnSQL.com
Made to work with SQL 2008R2.


2016-02-26
Modified by Anders Pedersen
Fix issue where First_LSN and Last_LSN values are the same on a record for the trasaction log backups.
These transaction log backups can be ignored in the restore chain.
Also made into a procedure, with the database name for a required parameter, and max date for the full backup to start with as a parameter
*****************/
   
DECLARE @DBBackupLSN numeric(250)

   
DECLARE @Baks TABLE (
       
BakID INT IDENTITY(11) NOT NULL PRIMARY KEY,
       
backup_set_id INT NOT NULL,
       
media_set_id INT NOT NULL,
       
first_family_number tinyint NOT NULL,
       
last_family_number tinyint NOT NULL,
       
first_lsn numeric(250) NULL,
       
last_lsn numeric(250) NULL,
       
database_backup_lsn numeric(250) NULL,
       
backup_finish_date DATETIME NULL,
       
TYPE CHAR(1) NULL,
       
family_sequence_number tinyint NOT NULL,
       
physical_device_name NVARCHAR(260) NOT NULL,
       
device_type tinyint NULL,
       
checkpoint_LSN numeric (25,0)-- Anders 2015-09-24
       
)
   
SET NOCOUNT ON;



   
-- Get the most recent full backup with all backup files
   
INSERT INTO @Baks (backup_set_id,
       
media_set_id,
       
first_family_number,
       
last_family_number,
       
first_lsn,
       
last_lsn,
       
database_backup_lsn,
       
backup_finish_date,
       
TYPE,
       
family_sequence_number,
       
physical_device_name,
       
device_type
       
,checkpoint_LSN)-- Anders
   
SELECT TOP(1WITH Ties B.backup_set_id,
       
B.media_set_id,
       
B.first_family_number,
       
B.last_family_number,
       
B.first_lsn,
       
B.last_lsn,
       
B.database_backup_lsn,
       
B.backup_finish_date,
       
B.TYPE,
       
BF.family_sequence_number,
       
BF.physical_device_name,
       
BF.device_type
       
B.checkpoint_lsn -- Anders 2015-09-24
   
FROM msdb.dbo.backupset AS B
   
INNER JOIN msdb.dbo.backupmediafamily AS BF
       
ON BF.media_set_id B.media_set_id
           
AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
   
WHERE B.database_name @DBName
   
AND B.is_copy_only 0
   
AND B.TYPE = 'D'
   
AND BF.physical_device_name NOT IN ('Nul''Nul:')
   AND 
CONVERT(DATE,B.backup_start_date) <=(CASE
                                           
WHEN @MaxFullBackupDate IS NOT NULL THEN   @MaxFullBackupDate
                                           
ELSE CONVERT(DATE,B.backup_start_dateEND-- Anders 2016-02-26
      
   
ORDER BY backup_finish_date DESCbackup_set_id;


   
-- Get the lsn that the differential backups, if any, will be based on
   
SELECT @DBBackupLSN checkpoint_LSN -- Anders 2015-09-24
   
FROM @Baks;

   
-- Get the most recent differential backup based on that full backup
   
INSERT INTO @Baks (backup_set_id,
       
media_set_id,
       
first_family_number,
       
last_family_number,
       
first_lsn,
       
last_lsn,
       
database_backup_lsn,
       
backup_finish_date,
       
TYPE,
       
family_sequence_number,
       
physical_device_name,
       
device_type)
   
SELECT TOP(1WITH Ties B.backup_set_id,
       
B.media_set_id,
       
B.first_family_number,
       
B.last_family_number,
       
B.first_lsn,
       
B.last_lsn,
       
B.database_backup_lsn,
       
B.backup_finish_date,
       
B.TYPE,
       
BF.family_sequence_number,
       
BF.physical_device_name,
       
BF.device_type
   
FROM msdb.dbo.backupset AS B
   
INNER JOIN msdb.dbo.backupmediafamily AS BF
       
ON BF.media_set_id B.media_set_id
           
AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
   
WHERE B.database_name @DBName
   
AND B.is_copy_only 0
   
AND B.TYPE = 'I'
   
AND BF.physical_device_name NOT IN ('Nul''Nul:')
   AND 
B.database_backup_lsn @DBBackupLSN

   
ORDER BY backup_finish_date DESCbackup_set_id;


   
--select * from @Baks

   -- Get the last LSN included in the differential backup,
   -- if one was found, or of the full backup
   
SELECT TOP @DBBackkupLSN last_lsn
   
FROM @Baks
   
WHERE TYPE IN ('D''I')
   
ORDER BY BakID DESC;

   
-- Get first log backup, if any, for restore, where
   -- last_lsn of previous backup is >= first_lsn of the
   -- log backup and <= the last_lsn of the log backup
   
INSERT INTO @Baks (backup_set_id,
       
media_set_id,
       
first_family_number,
       
last_family_number,
       
first_lsn,
       
last_lsn,
       
database_backup_lsn,
       
backup_finish_date,
       
TYPE,
       
family_sequence_number,
       
physical_device_name,
       
device_type)
   
SELECT TOP(1WITH Ties B.backup_set_id,
       
B.media_set_id,
       
B.first_family_number,
       
B.last_family_number,
       
B.first_lsn,
       
B.last_lsn,
       
B.database_backup_lsn,
       
B.backup_finish_date,
       
B.TYPE,
       
BF.family_sequence_number,
       
BF.physical_device_name,
       
BF.device_type
   
FROM msdb.dbo.backupset B
   
INNER JOIN msdb.dbo.backupmediafamily AS BF
       
ON BF.media_set_id B.media_set_id
           
AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
   
WHERE B.database_name @DBName
   
AND B.is_copy_only 0
   
AND B.TYPE = 'L'
   
AND BF.physical_device_name NOT IN ('Nul''Nul:')
   AND 
@DBBackupLSN BETWEEN B.first_lsn AND B.last_lsn
   
ORDER BY backup_finish_datebackup_set_id;

   
-- Get last_lsn of the first log backup that will be restored
   
SET @DBBackupLSN NULL;
   
SELECT @DBBackupLSN MAX(last_lsn)
   
FROM @Baks
   
WHERE TYPE = 'L';


   
--select * from @Baks;

   -- Recursively get all log backups, in order, to be restored
   -- first_lsn of the log backup = last_lsn of the previous log backup
   
WITH Logs
   
AS (SELECT B.backup_set_id,
           
B.media_set_id,
           
B.first_family_number,
           
B.last_family_number,
           
B.first_lsn,
           
B.last_lsn,
           
B.database_backup_lsn,
           
B.backup_finish_date,
           
B.TYPE,
           
BF.family_sequence_number,
           
BF.physical_device_name,
           
BF.device_type,
           
AS LogLevel
       
FROM msdb.dbo.backupset B
       
INNER JOIN msdb.dbo.backupmediafamily AS BF
           
ON BF.media_set_id B.media_set_id
               
AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
       
WHERE B.database_name @DBName
       
AND B.is_copy_only 0
       
AND B.TYPE = 'L'
       
AND BF.physical_device_name NOT IN ('Nul''Nul:')
       AND 
B.first_lsn @DBBackupLSN
       
AND B.First_lsn <> B.Last_Lsn -- Anders 2016-02-26
       
UNION ALL
       
SELECT B.backup_set_id,
           
B.media_set_id,
           
B.first_family_number,
           
B.last_family_number,
           
B.first_lsn,
           
B.last_lsn,
           
B.database_backup_lsn,
           
B.backup_finish_date,
           
B.TYPE,
           
BF.family_sequence_number,
           
BF.physical_device_name,
           
BF.device_type,
           
L.LogLevel 1
       
FROM msdb.dbo.backupset B
       
INNER JOIN msdb.dbo.backupmediafamily AS BF
           
ON BF.media_set_id B.media_set_id
               
AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
       
INNER JOIN Logs L ON L.database_backup_lsn B.database_backup_lsn
       
WHERE B.database_name @DBName
       
AND B.is_copy_only 0
       
AND B.TYPE = 'L'
       
AND BF.physical_device_name NOT IN ('Nul''Nul:')
       AND 
B.first_lsn L.last_lsn
       
AND B.First_lsn <> B.Last_Lsn-- Anders 2016-02-26
   
INSERT INTO @Baks (backup_set_id,
       
media_set_id,
       
first_family_number,
       
last_family_number,
       
first_lsn,
       
last_lsn,
       
database_backup_lsn,
       
backup_finish_date,
       
TYPE,
       
family_sequence_number,
       
physical_device_name,
       
device_type)
   
SELECT backup_set_id,
       
media_set_id,
       
first_family_number,
       
last_family_number,
       
first_lsn,
       
last_lsn,
       
database_backup_lsn,
       
backup_finish_date,
       
TYPE,
       
family_sequence_number,
       
physical_device_name,
       
device_type
   
FROM Logs
   
OPTION(MaxRecursion 0);

   
-- Select out just the columns needed to script restore

   
SELECT RestoreOrder Row_Number() OVER(Partition BY family_sequence_number ORDER BY BakID),
       
RestoreType CASE WHEN TYPE IN ('D''I'THEN 'Database'
               
WHEN TYPE = 'L' THEN 'Log'
           
END,
       
DeviceType CASE WHEN device_type IN (2102THEN 'Disk'
               
WHEN device_type IN (5105THEN 'Tape'
           
END,
       
PhysicalFileName physical_device_name
   
FROM @Baks
   
ORDER BY BakID;


   
SET NOCOUNT OFF;
END -- proc