Editorials

Signs Your Database Needs Normalization – Index Sprawl

To continue with the topic from yesterday on symptoms of a badly normalized database let’s talk about how index sprawl can be an example of a poorly designed schema.

When your database has repeating data types in the same row it leads to the need for additional indexes. Let me give you an extreme example I would never use, but will help demonstrate the problem. If you had a table for names, and it had only one column for any name, an index created on that single column would serve to search for any name. So, using my name, Benjamin Scot Taylor, you would have three records in the name table. Benjamin, Scot, and Taylor. You could then search for me by any one of my names and the single index serves for all searches.

Of course the name example is not practical. It adds complexity for things like how to associate each of my names to the single entity of Me. The value of the design is not worth the cost of implementation or the complexity of use, In fact, the only way I could ever see using something like that name table would be for a completely de-normalized table for searching for terms or something like that.

But we do see repeating columns in our every data database designs. How about those tables we have with multiple columns for different phone numbers. These can easily be normalized into a phone number table. I’m not going to the extreme normalization where a phone number only exists once in the database, and it is assigned to entities by association. I’m talking about having a phone number table as a one to many relationship for something like a person or business. A person may have a cell phone and work phone. A business may have any number of phones. But, if phone numbers are found in a single table for a business, you can search for that business by any phone number in its phone table.

In health care databases I continue to see Dependent 1, Dependent2, etc. This is a relationship, not a repeating data type in the same record. Often dependents are found by searching on their birthday. If your table allows for six dependents, then you have 7 birthday columns, the employee and each potential dependent. Now you have 7 indexes to search by birthday.

Is it wrong to ever have repeating columns? I would say no from a practical sense. However, I would lean toward normalization. You’ll find out if you made the correct decision when you find you have to search on these repeating columns.

One exception to the repeating columns would be data marts. However, data marts are special purpose tables with slowly moving facts. The data is appended to on a regular basis, often daily, but it is not maintained real time in most cases. Data marts are often used for decision support systems, and do not have to have up to the moment information to be useful. So, having index sprawl is a design imperative, rather than a performance inhibitor.

So, if your database is sluggish, or you are finding your indexes are sprawling out of control, a good place to start is to review your normalization.

Cheers,

Ben