Editorials

Migrations for the Masses

I have talked a lot about database migration scripts allowing you to promote database schema changes from one environment to another with reliability. Because this task is daunting, there have been a number of tools built around the process to simplify it for you. Depending on the tool, they may not migrate data as seamlessly as they do schema modifications. Today I want to demonstrate one way to roll your own data migrations so that they can be run more than once without failure, because they do not expect the database where the migration is being applied to be in any pre-existing condition.

I’m going to start out with a simple database structure to demonstrate this technique. I’m going to have two tables. A Bands table containing different music groups, and a BandMembers table containing members of a band. There is a parent (band) to child (BandMember) relationship of one to zero or many. Here is the schema.

CREATE TABLE Bands (
    BandId      INT             NOT NULL    IDENTITY(1,1)
   ,BandName    VARCHAR(128)    NOT NULL
   ,CONSTRAINT  PK_Band 
    PRIMARY KEY CLUSTERED (BandId)
)
ON [Default]
GO

CREATE TABLE BandMembers (
    BandMemberId    INT             NOT NULL    IDENTITY(1,1)
   ,BandId          INT             NOT NULL
   ,FirstName       NVARCHAR(32)    NOT NULL
   ,LastName        NVARCHAR(32)    NOT NULL
   ,CONSTRAINT      PK_BandMember 
    PRIMARY KEY CLUSTERED (BandMemberId)
   ,CONSTRAINT      FK_BandMember_Band 
    FOREIGN KEY (BandId) 
    REFERENCES Bands (BandId)
)
ON [Default]
GO

Now it’s time to demonstrate a technique for populating these tables using a database migration script that may be executed more than once without failure. Here’s some sample code for populating the Bands table.

SELECT  BandName
INTO    #NewBands
FROM    Bands
WHERE   0 = 1

SELECT  FirstName
       ,LastName
       ,CONVERT(VARCHAR(128), '') AS BandName
INTO    #NewBandMembers
FROM    BandMembers
WHERE   0 = 1

INSERT INTO #NewBands (BandName) VALUES
 ('PTX')
,('Home Free')

;WITH newBands
AS
(
    SELECT      n.BandName
    FROM        #NewBands   n
    LEFT JOIN   Bands       o   ON  n.BandName = o.BandName
    WHERE       o.BandId  IS NULL
)
INSERT INTO Bands 
        (BandName)
SELECT  BandName
FROM    newBands

First we’ll populate the bands table with two Acapella groups I listen to on occasion, PTX and Home Free. For this example, I start by creatinga temporary staging table with the same schema as the Bands table, excluding the identity column. Using Select … INTO WHERE 0 = 1 creates a new table with the same data types and sizes as the original table, so I know the data will fit. I then populate the temporary table with the new data I wish to insert into the bands table. Once the temporary table is populated, I then create a CTE (Common Table Expression) using a LEFT JOIN to the Bands table, thus identifying the records that have not already been inserted into the Bands table. Finally, I insert the new records into the Bands table from the results of the CTE.

Now that the Bands table is up to date we can update the BandMembers table. This process is slightly different from the Bands table, because we need to get the system assigned IDENTITY column from Bands to use in the BandMembers foreign key column, BandId. Here is what that code looks like.

INSERT INTO #NewBandMembers 
    (FirstName, LastName, BandName)
    VALUES
 ('Mitch', 'Grassi', 'PTX')
,('Kevin', 'Olusola', 'PTX')
,('Scott', 'Hoying', 'PTX')
,('Kirsti', 'Maldonado', 'PTX')
,('Avi', 'Kaplan', 'PTX')
,('Tim', 'Foust', 'Home Free')
,('Austim', 'Brown', 'Home Free')
,('Dan', 'Lemke', 'Home Free')
,('Eliott', 'Robinson', 'Home Free')
,('Chris', 'Rupp', 'Home Free')

;WITH newMembers
AS
(
    SELECT  n.FirstName, n.LastName, b.BandId
    FROM    #NewBandMembers n
    JOIN    Bands           b   
            ON  n.BandName = b.BandName
    LEFT JOIN   BandMembers o   
                ON  b.BandId = o.BandId
                AND o.firstName = n.FirstName
                AND o.LastName = n.LastName
    WHERE   o.BandId IS NULL
)
INSERT INTO BandMembers
    (BandId, FirstName, LastName)
SELECT  n.BandId, n.FirstName, n.LastName
FROM    newMembers  n

Since I don’t know what the BandId is when I am writing the script, I create a temporary table that has the band member data along with the name of the band.

At this stage I have a temporary table #newBandMebers containing a list I want to be in my BandMembers table. Again, I create a CTE with my temp table LEFT JOINed to my BandMembers table, identifying records that have not yet been inserted. This time, I also join the temp table to the Bands table using the natural key of the BandName, so I can capture the BandId in the CTE for use when I insert the records. Once the CTE is populated, I simply insert the new records into the BandMembers table from the CTE results.

If you were to run the migration portion of this code over and over, it would only insert records into Bands and BandMembers that do not already exist. If they do exist, the redundant data is simply ignored, and your data remains pure.

If you have a tool to build your database migrations for you this probably won’t be as much help. However, this example works regardless of tools, and may be adapted to work with any database engine, using the syntax unique to each. You may have to use views, multiple select statemenst with UNION, instead of using temp tables and CTEs. But the concepts and the order of exection remain the same.You can enhance this technique to update existing data, or remove deprecated data as well.

So, keep your migrations solid without fear of corrupting your data.

Cheers,

Ben