Tuesday, November 10, 2015

T-SQL Tuesday #72: Data modelling gone extremely wrong


T-SQL Tuesday #72 invite

T-SQL Tuesday #72 invite


Once upon a time....

Way back in 1997 or 1998, I was called late one Friday afternoon by my manager at the consulting company I worked.  One of the large local newspapers where in serious trouble.  Friday afternoon and they had just turned on their new whiz-bang VB and SQL based solution to manage their newspaper business, everything from selling and managing subscriptions, to determining how many papers to print every day, and to which delivery vehicles they needed to be delivered on.  Basically your typical manage the newspaper kind of applications, nothing fancy, no internet self service (oh those blissful days before E-Commerce!).

I quickly got dressed up for a client meeting and back in my car, luckily I had gone straight home from the other client that day instead of out to eat, or more enjoyable activities.  Drove the hour to the news papers IT office, was met by a frazzled DBA and CIO, both seemed in need of a long nap.

While getting to know each other we walked to the Project office, while getting an idea about their problems and installation.  The SQL Server was, as prescribed by the developer of the software: SQL 6.5 running on DEC Alpha servers.

As I stepped into the office, I saw what looked like a giant flow chart covering one wall.  "The database diagram?" I asked.  Yupp.  "Well, there is your problem."  Silence.

And more silence.

"What do you mean?"

I started poking around, and finally found the main subscriber table, it looked something like this:

CREATE TABLE subscriber (
   
SubscriberID INT IDENTITY(1,1),
   
FirstNameID INT NOT NULL,
   
MiddleNameID INT NOT NULL,
   
LastNameID INT NOT NULL,
   
StreetNumberID INT NOT NULL,
   
StreetNameID INT NOT NULL,
   
StreetNameTypeID INT NOT NULL,
   
AppartmentID INT NULL,
   
CityNameID INT NOT NULL,
   
StateCodeID INT NOT NULL,
   
ZipCodeID INT NOT NULL,
   
PhoneAreaCodeID INT,
   
PhoneExchangeID INT,
   
PhoneNumberID INT,
   
SubscriptionTypeID INT
   
)


Everything was a FK out.  Now some of this makes sense, such as Street names being FK'ed out?  Sure, although being a newspaper with need to know which order to write pick lists, I would have done it a bit different.  Street number being a FK?  Really?  So it pointed from 2007 to PK value 2007, which may or may not be the street number 2007.  Was this in proper 5th normal form?  I really do not know, never took the time to check all the boxes, but I suspect that it was, if not some new 5.5th normal form.

Now some of you that have 6 and 6.5 experience might have caught on here:  Too normalized for SQL 6.5 to deal with, we ran out of joins.  What?  Yes, 6.5 had a limit of 16 joins per query.  I ended up spending all night re-designing the tables and procedures.  ETL'ing data into the new tables.

After many long hours designing and coding, we placed the new version in production....  At least it ran now.  However we very soon was starting to see corruption in the database, which we eventually traced back to a bug in the ODBC code for DEC Alpha.  YAY!  Ok now what?  The CIO, DBA, one of the NT guys and I scrambled to build an Intel based server overnight from scrap parts, BCP'ed the data over (backup/restores wouldn't work right between Intel and DEC Alpha based SQL), and had it up in production for the Sunday paper.

The next 36 hours was spent further de-normalizing the database to perform better.  After about 80 hours on site I had the system stable enough for their DBA to keep it running until the developers could sit down and take a more long term look at the application.

The database architect for this system turned out to be someone who had written several books on relational database design, but upon further investigation it was determined he had never had a job outside of academia, and had no idea how to design a database model that would work in a real setting.

2 comments:

  1. I've had many times that database architects designed something that'd would never work. Good article and funny to read.

    ReplyDelete
  2. It's the rediculious overuse of surrogates that really are the major design flaw here. Separating out names is really silly, even if you require conformance with USA-style names as separate atomic fields.

    Normalization usually *improves* performance, when it's done correctly, and not just for the sake of throwing SKs on everything.

    Example: An address table is typically sufficient, and if necessary one can create a set of proper normalized tables for some of the components. But you use real keys, not SKs. A great example of a real model used in production systems: http://www.softwaregems.com.au/Documents/Documentary%20Examples/Order%20DM%20Advanced.pdf

    The mythology of denormalization for performance is demolished in this SO answer as well:
    http://stackoverflow.com/questions/4394183/should-olap-databases-be-denormalized-for-read-performance/4731664#4731664

    ReplyDelete