Editorials

Signs Your Database Needs Normalization – SQL Syntax

Today we complete the series of symptoms one experiences when their database is not properly normalized. We’re going to look at the very syntax of queries needed that are optimized when data is normalized accordingly. Again, let me emphasize that these indications are to be used has hints that a different database schema may be more appropriate, not that the design MUST be wrong. There are always times when you do not follow best practices…but those need to be reasoned and necessary.

The biggest syntactical indication of sub-optimal normalization is the use of the OR operation in a query, when the OR is used for different columns of the same table. Again we are pointing to the same underlying issue of having repeating columns in a table. The OR operation, just like sprawling indexes, indicates that a different design may be a better option.

Let’s use the medical claims database with multiple dependent fields in the same table as we had yesterday. In the example we used, an employee and up to six dependents were included in the same table, with repeating columns for the dependents. Dependent1-6 were used, with different columns for each attribute associated with a dependent. Using the search query to find a dependent, a system often searches based on the individual’s birthdate and last name. A query could look as follows…

DECLARE @SearchDate DATETIME = '10/16/2003'
DECLARE @SearchName = 'Smith'

SELECT ... 
FROM   Participants
WHERE
 	(ParticipantBirthdate = @SearchDate AND ParticipantLastName = @SearchName)
OR	(Dependant1Birthdate = @SearchDate AND Dependant1LastName = @SearchName)
OR	(Dependant2Birthdate = @SearchDate AND Dependant2LastName = @SearchName)
OR	(Dependant3Birthdate = @SearchDate AND Dependant3LastName = @SearchName)
OR	(Dependant4Birthdate = @SearchDate AND Dependant4LastName = @SearchName)
OR	(Dependant5Birthdate = @SearchDate AND Dependant5LastName = @SearchName)
OR	(Dependant6Birthdate = @SearchDate AND Dependant6LastName = @SearchName)

In this specific case, a better schema would be to normalize the eligibility of the individuals, including the participant, into a separate table called eligibility. Each eligibility record would be related to the Participant table as a foreign key, thus allowing for the much more optimized query that follows.

DECLARE @SearchDate DATETIME = '10/16/2003'
DECLARE @SearchName = 'Smith'

SELECT ...
FROM Eligibililty
WHERE Birthdate = @SearchDate AND LastName = @SearchName

So, when you start to see repeating statements in your query, not because you are looking for multiple possible values, but because the value you are seeking may be in different database fields, it is a good time to ask yourself if your database could benefit from some normalization.

Cheers,

Ben