Anders On SQL: 2015

Tuesday, November 10, 2015

T-SQL Tuesday #72: Data modelling gone extremely wrong

T-SQL Tuesday #72 invite

Once upon a time....

Way back in 1997 or 1998, I was called late one Friday afternoon by my manager at the consulting company I worked. One of the large local newspapers where in serious trouble. Friday afternoon and they had just turned on their new whiz-bang VB and SQL based solution to manage their newspaper business, everything from selling and managing subscriptions, to determining how many papers to print every day, and to which delivery vehicles they needed to be delivered on. Basically your typical manage the newspaper kind of applications, nothing fancy, no internet self service (oh those blissful days before E-Commerce!).

I quickly got dressed up for a client meeting and back in my car, luckily I had gone straight home from the other client that day instead of out to eat, or more enjoyable activities. Drove the hour to the news papers IT office, was met by a frazzled DBA and CIO, both seemed in need of a long nap.

While getting to know each other we walked to the Project office, while getting an idea about their problems and installation. The SQL Server was, as prescribed by the developer of the software: SQL 6.5 running on DEC Alpha servers.

As I stepped into the office, I saw what looked like a giant flow chart covering one wall. "The database diagram?" I asked. Yupp. "Well, there is your problem." Silence.

And more silence.

"What do you mean?"

I started poking around, and finally found the main subscriber table, it looked something like this:

CREATE TABLE subscriber (
   SubscriberID INT IDENTITY(1,1),
   FirstNameID INT NOT NULL,
   MiddleNameID INT NOT NULL,
   LastNameID INT NOT NULL,
   StreetNumberID INT NOT NULL,
   StreetNameID INT NOT NULL,
   StreetNameTypeID INT NOT NULL,
   AppartmentID INT NULL,
   CityNameID INT NOT NULL,
   StateCodeID INT NOT NULL,
   ZipCodeID INT NOT NULL,
   PhoneAreaCodeID INT,
   PhoneExchangeID INT,
   PhoneNumberID INT,
   SubscriptionTypeID INT
   )

Everything was a FK out. Now some of this makes sense, such as Street names being FK'ed out? Sure, although being a newspaper with need to know which order to write pick lists, I would have done it a bit different. Street number being a FK? Really? So it pointed from 2007 to PK value 2007, which may or may not be the street number 2007. Was this in proper 5th normal form? I really do not know, never took the time to check all the boxes, but I suspect that it was, if not some new 5.5th normal form.

Now some of you that have 6 and 6.5 experience might have caught on here: Too normalized for SQL 6.5 to deal with, we ran out of joins. What? Yes, 6.5 had a limit of 16 joins per query. I ended up spending all night re-designing the tables and procedures. ETL'ing data into the new tables.

After many long hours designing and coding, we placed the new version in production.... At least it ran now. However we very soon was starting to see corruption in the database, which we eventually traced back to a bug in the ODBC code for DEC Alpha. YAY! Ok now what? The CIO, DBA, one of the NT guys and I scrambled to build an Intel based server overnight from scrap parts, BCP'ed the data over (backup/restores wouldn't work right between Intel and DEC Alpha based SQL), and had it up in production for the Sunday paper.

The next 36 hours was spent further de-normalizing the database to perform better. After about 80 hours on site I had the system stable enough for their DBA to keep it running until the developers could sit down and take a more long term look at the application.

The database architect for this system turned out to be someone who had written several books on relational database design, but upon further investigation it was determined he had never had a job outside of academia, and had no idea how to design a database model that would work in a real setting.

Thursday, October 29, 2015

First Timer Summit impressions.

oh man, this partying is wearing me out. No wait, I mean all these sessions learning stuff is wearing me out. 5 parties, 1 night of bar hopping, in 3 days.

For those of you in the SQL line of work. Get involved with PASS, your local chapters, get on Twitter, #sqlfamily, #sqlhelp. Never having met a single person here before (except some of the presenters), I had instant contact with people. Hanging out with people from all over the world, all with the same interest. Just a simple Twitter message on Monday night, and I ended up going from being on my way to my room after the excellent Monday night event put on by Andy Warren and Steve Jones, to hanging out for another 3 hours with a group of people I had not ever met in person until then.

This community, is, by far the best community of people I have ever had the pleasure of being a member of. For being a group of people traditionally known for their egos, there are non. People chat, share stories, from the people known in the community to all of us, to the un-knows. From people with 20+ years experience, to people just starting out. And we are there for each others. We laugh at each others crazy experiences with SQL, crappy clients, managers that do not understand what we do for a living, groan together at the sometimes crazy work hours.

Yet, we pretty much all absolutely love what we do! We sit in a session about Windowing Functions, and during a break, there are a group of 5 people discussing concrete use cases for what we just had Klaus Aschenbrenner (@Aschenbrenner) talk about. None of us knew each other before that break, yet we jumped right into chatting with each other as if we had been co-workers for years. And so far, this has been pretty much what has happened at every single break during both pre-cons, and between sessions on the first day of regular sessions.

Breakfast and lunch? Sit down at a table, and I've had to scramble just to make it to the session due to interesting discussions with strangers? WTH? I'm an introvert. This stuff is not supposed to happen, I'm supposed to quietly sit and eat, go to my sessions, then head to the hotel room for the night. Ha! Even remotely just entertaining the possibility of chatting with people, you will be busy,

Walking (after sitting on airplanes and in sessions for 4 days, needed a walk) to Idera's party tonight, I ended up walking next to a gentleman from China. Talking SQL and experiences the mile+ walk to find the place we were going to. We were so deep in discussions we passed right by the place the first time and walked a few blocks past the place!!!!

Having grown up in Norway, for the first time in the 20 years of working with SQL, I have talked SQL in Norwegian, with multiple Norwegian DBAs and a Swede, with immediate switch over to English when non Norwegian speakers showed up. (special thanks to the Norwegian team for letting me have some, uhm, Freia chocolate in the Community Zone).

So, Anders, what about the Summit itself? Well, I will get to that. The reason for the above is to frankly show this amazing community, built around PASS (and off course SQL...) , and to a large extend around Twitter, for me especially #SQLFamily, but as a new blogger also around #SQLNewBlogger. After 20 years of doing this DBA thing, I have to admit in the last few years I have gotten somewhat jaded and tired of doing this for a living, the desire to get into the next version of SQL has not been there to the extent it has been in the past. Seeing the positive feedback from my own blog posts, cheering on the other new people starting to blog, has been a trans formative experience for me in itself. Add on top coming out to Seattle for Summit 15 and getting to meet these people in person? Pure awesomeness. I'm for the first time in years actually excited to work with SQL again. I'm excited for the possibilities SQL 2016 will open up, both for what I can do for my company with it, and to new areas it can take my career.

While I am a first timer for Summit, I have been to several other large conventions in the past. NONE of the above happened on any of those events. At lunch people would sit and just eat, maybe check their email, make some calls. After hours there might have been some vendor events, but they where never engaging with people, just the sales people.

The sessions themselves are top notch. From Paul Randal's always engaging performance discussions for the pre con, to the McGowns pre con on scripting, to the Key Note, to the individual short sessions.

Special shout out to all the employees, volunteers, speakers and sponsors for making this possible.

Thursday, September 24, 2015

Backup Restore Chain (tested in SQL 2008R2)

This is based on the script created by SQLSoldier (Day 3 of 31 Days of Disaster Recovery: Determining Files to Restore Database) but with slight modifications as it did not correctly display the order to restore in for SQL2008R2.

Comments welcome! Good or bad.

If you test this on different versions, it would be great if you would comment which versions this works on.

Lines with my changes noted by --Anders

/****************
Created by @SQLSoldier
http://www.sqlsoldier.com/wp/sqlserver/day3of31daysofdisasterrecoverydeterminingfilestorestoredatabase

Modified by Anders Pedersen @arrowdrive
Made to work with SQL 2008R2.

2015-09-24
*****************/DECLARE @DBName SYSNAME,
   @DBBackupLSN numeric(25, 0);

DECLARE @Baks TABLE (
   BakID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
   backup_set_id INT NOT NULL,
   media_set_id INT NOT NULL,
   first_family_number tinyint NOT NULL,
   last_family_number tinyint NOT NULL,
   first_lsn numeric(25, 0) NULL,
   last_lsn numeric(25, 0) NULL,
   database_backup_lsn numeric(25, 0) NULL,
   backup_finish_date DATETIME NULL,
   TYPE CHAR(1) NULL,
   family_sequence_number tinyint NOT NULL,
   physical_device_name NVARCHAR(260) NOT NULL,
   device_type tinyint NULL,
   checkpoint_LSN numeric (25,0)-- Anders
   )SET NOCOUNT ON;
-- Set the name of the database you want to restoreSET @DBName = N'AndersTest';
-- Get the most recent full backup with all backup filesINSERT INTO @Baks (backup_set_id,
   media_set_id,
   first_family_number,
   last_family_number,
   first_lsn,
   last_lsn,
   database_backup_lsn,
   backup_finish_date,
   TYPE,
   family_sequence_number,
   physical_device_name,
   device_type
   ,checkpoint_LSN)-- AndersSELECT TOP(1) WITH Ties B.backup_set_id,
   B.media_set_id,
   B.first_family_number,
   B.last_family_number,
   B.first_lsn,
   B.last_lsn,
   B.database_backup_lsn,
   B.backup_finish_date,
   B.TYPE,
   BF.family_sequence_number,
   BF.physical_device_name,
   BF.device_type
   , B.checkpoint_lsn -- AndersFROM msdb.dbo.backupset AS BINNER JOIN msdb.dbo.backupmediafamily AS BF
   ON BF.media_set_id = B.media_set_id
       AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_numberWHERE B.database_name = @DBNameAND B.is_copy_only = 0AND B.TYPE = 'D'AND BF.physical_device_name NOT IN ('Nul', 'Nul:')ORDER BY backup_finish_date DESC, backup_set_id;
-- Get the lsn that the differential backups, if any, will be based onSELECT @DBBackupLSN = checkpoint_LSN -- AndersFROM @Baks;
-- Get the most recent differential backup based on that full backupINSERT INTO @Baks (backup_set_id,
   media_set_id,
   first_family_number,
   last_family_number,
   first_lsn,
   last_lsn,
   database_backup_lsn,
   backup_finish_date,
   TYPE,
   family_sequence_number,
   physical_device_name,
   device_type)SELECT TOP(1) WITH Ties B.backup_set_id,
   B.media_set_id,
   B.first_family_number,
   B.last_family_number,
   B.first_lsn,
   B.last_lsn,
   B.database_backup_lsn,
   B.backup_finish_date,
   B.TYPE,
   BF.family_sequence_number,
   BF.physical_device_name,
   BF.device_typeFROM msdb.dbo.backupset AS BINNER JOIN msdb.dbo.backupmediafamily AS BF
   ON BF.media_set_id = B.media_set_id
       AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_numberWHERE B.database_name = @DBNameAND B.is_copy_only = 0AND B.TYPE = 'I'AND BF.physical_device_name NOT IN ('Nul', 'Nul:')
AND B.database_backup_lsn = @DBBackupLSN
ORDER BY backup_finish_date DESC, backup_set_id;

--select * from @Baks

-- Get the last LSN included in the differential backup,
-- if one was found, or of the full backupSELECT TOP 1 @DBBackupLSN = last_lsnFROM @BaksWHERE TYPE IN ('D', 'I')ORDER BY BakID DESC;
-- Get first log backup, if any, for restore, where
-- last_lsn of previous backup is >= first_lsn of the
-- log backup and <= the last_lsn of the log backupINSERT INTO @Baks (backup_set_id,
   media_set_id,
   first_family_number,
   last_family_number,
   first_lsn,
   last_lsn,
   database_backup_lsn,
   backup_finish_date,
   TYPE,
   family_sequence_number,
   physical_device_name,
   device_type)SELECT TOP(1) WITH Ties B.backup_set_id,
   B.media_set_id,
   B.first_family_number,
   B.last_family_number,
   B.first_lsn,
   B.last_lsn,
   B.database_backup_lsn,
   B.backup_finish_date,
   B.TYPE,
   BF.family_sequence_number,
   BF.physical_device_name,
   BF.device_type FROM msdb.dbo.backupset BINNER JOIN msdb.dbo.backupmediafamily AS BF
   ON BF.media_set_id = B.media_set_id
       AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_numberWHERE B.database_name = @DBNameAND B.is_copy_only = 0AND B.TYPE = 'L'AND BF.physical_device_name NOT IN ('Nul', 'Nul:')
AND @DBBackupLSN BETWEEN B.first_lsn AND B.last_lsnORDER BY backup_finish_date, backup_set_id;
-- Get last_lsn of the first log backup that will be restoredSET @DBBackupLSN = NULL;SELECT @DBBackupLSN = MAX(last_lsn)FROM @BaksWHERE TYPE = 'L';
-- Recursively get all log backups, in order, to be restored
-- first_lsn of the log backup = last_lsn of the previous log backupWITH LogsAS (SELECT B.backup_set_id,
       B.media_set_id,
       B.first_family_number,
       B.last_family_number,
       B.first_lsn,
       B.last_lsn,
       B.database_backup_lsn,
       B.backup_finish_date,
       B.TYPE,
       BF.family_sequence_number,
       BF.physical_device_name,
       BF.device_type,
       1 AS LogLevel
   FROM msdb.dbo.backupset B
   INNER JOIN msdb.dbo.backupmediafamily AS BF
       ON BF.media_set_id = B.media_set_id
           AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
   WHERE B.database_name = @DBName
   AND B.is_copy_only = 0
   AND B.TYPE = 'L'
   AND BF.physical_device_name NOT IN ('Nul', 'Nul:')
   AND B.first_lsn = @DBBackupLSN
   UNION ALL
   SELECT B.backup_set_id,
       B.media_set_id,
       B.first_family_number,
       B.last_family_number,
       B.first_lsn,
       B.last_lsn,
       B.database_backup_lsn,
       B.backup_finish_date,
       B.TYPE,
       BF.family_sequence_number,
       BF.physical_device_name,
       BF.device_type,
       L.LogLevel + 1
   FROM msdb.dbo.backupset B
   INNER JOIN msdb.dbo.backupmediafamily AS BF
       ON BF.media_set_id = B.media_set_id
           AND BF.family_sequence_number BETWEEN B.first_family_number AND B.last_family_number
   INNER JOIN Logs L ON L.database_backup_lsn = B.database_backup_lsn
   WHERE B.database_name = @DBName
   AND B.is_copy_only = 0
   AND B.TYPE = 'L'
   AND BF.physical_device_name NOT IN ('Nul', 'Nul:')
   AND B.first_lsn = L.last_lsn)INSERT INTO @Baks (backup_set_id,
   media_set_id,
   first_family_number,
   last_family_number,
   first_lsn,
   last_lsn,
   database_backup_lsn,
   backup_finish_date,
   TYPE,
   family_sequence_number,
   physical_device_name,
   device_type)SELECT backup_set_id,
   media_set_id,
   first_family_number,
   last_family_number,
   first_lsn,
   last_lsn,
   database_backup_lsn,
   backup_finish_date,
   TYPE,
   family_sequence_number,
   physical_device_name,
   device_typeFROM LogsOPTION(MaxRecursion 0);
-- Select out just the columns needed to script restoreSELECT RestoreOrder = Row_Number() OVER(Partition BY family_sequence_number ORDER BY BakID),
   RestoreType = CASE WHEN TYPE IN ('D', 'I') THEN 'Database'
           WHEN TYPE = 'L' THEN 'Log'
       END,
   DeviceType = CASE WHEN device_type IN (2, 102) THEN 'Disk'
           WHEN device_type IN (5, 105) THEN 'Tape'
       END,
   PhysicalFileName = physical_device_nameFROM @BaksORDER BY BakID;

SET NOCOUNT OFF;

Result will look something like this:

Friday, July 24, 2015

Querying IO Statistics, the quick and dirty way

(Apologies for formatting, still trying to figure this thing out.....)

Reducing IO is one of the "easiest" way to speed up SQL queries. There are off course many other ways, but for many situations, IO is the most bang for the time spent optimizing.

Recently I came across a query that really needed help. Not going to go into WHY it needed help, as that is a full on article by itself. One of the tools I use for optimizing is SET STATISTICS IO ON. This will give a nice line of reads per table used in a query. (There are plenty of articles online about how to use this information). This particular query had some, uhm, scary numbers. And many of the tables where re-used multiple times. With 2 (yes 2, not 2 thousand, or 2 million) records in the base table that needed processing, I saw values of over 1 million table scans. At one part in the query plan with 1002 records in the base table, there was a node with 11 billion records output....

I started looking for a way to compile these numbers so I could compare, and get a better overall picture of what I was looking at. Below is a typical line from IO Stats. This is not the complete line, just the first few pieces of information in it:

Table 'XYZ'. Scan count 5, logical reads 18575......

I decided what I need is a tool that I can relatively easily load these lines into a table, and be able to query and see the differences between different modifications I make to the query.

First we need a table to store and modify the IO lines:

CREATE TABLE [dbo].[StatsImport](
    [StatsLine] [varchar](MAX) NULL,
    [StatsXML] [xml] NULL
)  ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

To get the stats we need, have these two lines before you run a query:

SET STATISTICS io ON
SET NOCOUNT ON

The nocount will make them all come out as one nice big list if there are multiple queries, this makes the next step easier.

Now take the text from messages and load it into the StatsImport table:

SET QUOTED_IDENTIFIER  OFF
DELETE FROM StatsImport
INSERT INTO StatsImport (StatsLine)VALUES("Table 'ABC'. Scan count 5, logical reads 18575, physical reads 68, read-ahead reads 16895, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.",),
("Table 'DEF'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.",),
("Table 'GHI'. Scan count 5, logical reads 8929, physical reads 63, read-ahead reads 8822, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.")

Now the tricky thing is to get this into a format that can easily be put into a table, I decided to go via XML (GASP!!!!!).

Part of the solution, was done with this excellent article by Sylvia Moestl Vasilik , mostly the part of how to get the lines set to be valid XML. But since I wanted to make all the lines query-able, had to make some slight modifications.

SET QUOTED_IDENTIFIER  OFF

UPDATE StatsImport
  SET StatsLine = REPLACE(StatsLine, "'", "'")
UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, ' reads', '_reads')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, 'Table ', '<Detail Table="')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, 'Reads ', 'reads="')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, '. Scan count','" Scancount="')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, '.', '" />')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, ',', '"')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine,',','"')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine, 'lob ', 'lob_')UPDATE StatsImport
SET StatsLine = REPLACE(StatsLine,'-', '_')
UPDATE StatsImport
SET StatsLine = '<Output> ' + StatsLine + ' </Output>'
UPDATE StatsImport
SET StatsXML = StatsLine

At this point we have all the data in a valid XML format in the StatsXML column.

Now, if we wanted to just read it and display it, all we would have to do is query that column, I however, wanted to make it so I could do all kinds of crazy things with it.

Table to store the shredded XML:

CREATE TABLE [dbo].[IOStats](
     [TableName] [varchar](128) NOT NULL,
     [ScanCount] [int] NOT NULL,
     [LogicalReads] [int] NOT NULL,
     [PhysicalReads] [int] NOT NULL,
     [ReadAheadReads] [int] NOT NULL,
     [RunIdentifier] [varchar](50) NULL
) ON [PRIMARY]

Now to parse it into this table:

SET QUOTED_IDENTIFIER ON
INSERT INTO IOStats
    SELECT
            c.value('@Table[1]', 'varchar(128)') AS tableName,
            c.value('@Scancount[1]', 'integer') AS ScanCountc,
            c.value('@logical_reads[1]', 'integer') AS logical_reads ,
            c.value('@physical_reads[1]', 'integer') AS physical_reads  ,
            c.value('@read_ahead_reads[1]', 'integer') AS read_ahead_reads
    --c.value('@lob_logical_reads[1]', 'integer') as lob_logical_reads
            , 'Original'
    FROM StatsImport
            CROSS APPLY StatsXML.nodes('Output/Detail') AS T(C)

The line that reads 'Original' is just a text identifier for this load of stats. Now you can see, I did not include all the data from stats as I did not need it for this particular problem, I think for most of you it would be easy to modify this if you need the lob etc. values. One caveat here is you might need to go to a larger data type than integer if you have particularly crazy tables/queries.

With a little careful naming of your RunIdentifier values, it becomes easy to get output in the order you want.

So how did I use this? Two ways. First one was to see changes between different versions of the query. With a baseline loaded as 'Original' and my first version loaded as 'New':

WITH TableNames AS (SELECT DISTINCT TableName FROM IOStats WHERE TableName NOT LIKE '#%')
, IOStats2 AS (
   SELECT
       i.RunIdentifier
       , i.TableName
       , SUM(i.ScanCount) AS ScanCount
       , SUM(i.LogicalReads) AS LogicalReads
       , SUM(i.PhysicalReads) AS PhysicalReads
       , SUM(i.ReadAheadReads) AS ReadAheadReads
   FROM IOStats I
   WHERE TableName NOT LIKE '#%'
   GROUP BY i.RunIdentifier, i.TableName
   )SELECT TN.TableName,
   ISNULL(New.ScanCount,0) - ISNULL(Orig.ScanCount,0) AS ScanCountChange,
   ISNULL(New.LogicalReads,0) - ISNULL(Orig.LogicalReads,0) AS LogicalReadsChange,
   ISNULL(New.PhysicalReads,0) - ISNULL(Orig.PhysicalReads,0) AS PhysicalReadsChange,
   ISNULL(New.ReadAheadReads,0) - ISNULL(Orig.ReadAheadReads,0) AS ReadAheadReadsChange--into #TablesFROM TableNames TNLEFT OUTER JOIN IOStats2 Orig
   ON TN.TableName = Orig.TableName
       AND Orig.RunIdentifier = 'Original'LEFT OUTER JOIN IOStats2 New
   ON TN.TableName = New.TableName
       AND New.RunIdentifier = 'new'ORDER BY TN.TableName ASC

The output looks like below, although table names here has been changed to protect the guilty. Basically this shows the reduction in reads on the different tables.

Another way I used it was for a more overall view of changes:

SELECT RunIdentifier,SUM(scanCount) SumScanCount, SUM(LogicalReads) sumLogicalReadsFROM IOStatsGROUP BY RunIdentifier

The 1k, 6k and 20k, are how many values I started with that would eventually need an update. Although the particular query I was optimizing just inserts it into a temporary table that then will have further manipulations done to it. I didn't even attempt 20k records with the original query, since I know when that happens (which is what caused me to start this project) we had to kill the job after 3 hours. There are some abnormalities in this result that I need to go back and re-test, but at least I can easily see that the new query scales much much better as the number of records goes up.

So what is left to do? I think writing a PowerShell script to be able to just save the IO Statistics to a file and load to the table and run the parsing. Make the file name be the name of the parse. That way re-loading into the tables would be much more straightforward, as well as make it possible to have my developers send me the stats and load them should be straight forward.