I have talked a lot about database migration scripts allowing you to promote database schema changes from one environment to another with reliability. Because this task is daunting, there have been a number of tools built around the process to simplify it for you. Depending on the tool, they may not migrate data as seamlessly as they do schema modifications. Today I want to demonstrate one way to roll your own data migrations so that they can be run more than once without failure, because they do not expect the database where the migration is being applied to be in any pre-existing condition.
I’m going to start out with a simple database structure to demonstrate this technique. I’m going to have two tables. A Bands table containing different music groups, and a BandMembers table containing members of a band. There is a parent (band) to child (BandMember) relationship of one to zero or many. Here is the schema.
CREATE TABLE Bands ( BandId INT NOT NULL IDENTITY(1,1) ,BandName VARCHAR(128) NOT NULL ,CONSTRAINT PK_Band PRIMARY KEY CLUSTERED (BandId) ) ON [Default] GO CREATE TABLE BandMembers ( BandMemberId INT NOT NULL IDENTITY(1,1) ,BandId INT NOT NULL ,FirstName NVARCHAR(32) NOT NULL ,LastName NVARCHAR(32) NOT NULL ,CONSTRAINT PK_BandMember PRIMARY KEY CLUSTERED (BandMemberId) ,CONSTRAINT FK_BandMember_Band FOREIGN KEY (BandId) REFERENCES Bands (BandId) ) ON [Default] GO
Now it’s time to demonstrate a technique for populating these tables using a database migration script that may be executed more than once without failure. Here’s some sample code for populating the Bands table.
SELECT BandName INTO #NewBands FROM Bands WHERE 0 = 1 SELECT FirstName ,LastName ,CONVERT(VARCHAR(128), '') AS BandName INTO #NewBandMembers FROM BandMembers WHERE 0 = 1 INSERT INTO #NewBands (BandName) VALUES ('PTX') ,('Home Free') ;WITH newBands AS ( SELECT n.BandName FROM #NewBands n LEFT JOIN Bands o ON n.BandName = o.BandName WHERE o.BandId IS NULL ) INSERT INTO Bands (BandName) SELECT BandName FROM newBands
First we’ll populate the bands table with two Acapella groups I listen to on occasion, PTX and Home Free. For this example, I start by creatinga temporary staging table with the same schema as the Bands table, excluding the identity column. Using Select … INTO WHERE 0 = 1 creates a new table with the same data types and sizes as the original table, so I know the data will fit. I then populate the temporary table with the new data I wish to insert into the bands table. Once the temporary table is populated, I then create a CTE (Common Table Expression) using a LEFT JOIN to the Bands table, thus identifying the records that have not already been inserted into the Bands table. Finally, I insert the new records into the Bands table from the results of the CTE.
Now that the Bands table is up to date we can update the BandMembers table. This process is slightly different from the Bands table, because we need to get the system assigned IDENTITY column from Bands to use in the BandMembers foreign key column, BandId. Here is what that code looks like.
INSERT INTO #NewBandMembers (FirstName, LastName, BandName) VALUES ('Mitch', 'Grassi', 'PTX') ,('Kevin', 'Olusola', 'PTX') ,('Scott', 'Hoying', 'PTX') ,('Kirsti', 'Maldonado', 'PTX') ,('Avi', 'Kaplan', 'PTX') ,('Tim', 'Foust', 'Home Free') ,('Austim', 'Brown', 'Home Free') ,('Dan', 'Lemke', 'Home Free') ,('Eliott', 'Robinson', 'Home Free') ,('Chris', 'Rupp', 'Home Free') ;WITH newMembers AS ( SELECT n.FirstName, n.LastName, b.BandId FROM #NewBandMembers n JOIN Bands b ON n.BandName = b.BandName LEFT JOIN BandMembers o ON b.BandId = o.BandId AND o.firstName = n.FirstName AND o.LastName = n.LastName WHERE o.BandId IS NULL ) INSERT INTO BandMembers (BandId, FirstName, LastName) SELECT n.BandId, n.FirstName, n.LastName FROM newMembers n
Since I don’t know what the BandId is when I am writing the script, I create a temporary table that has the band member data along with the name of the band.
At this stage I have a temporary table #newBandMebers containing a list I want to be in my BandMembers table. Again, I create a CTE with my temp table LEFT JOINed to my BandMembers table, identifying records that have not yet been inserted. This time, I also join the temp table to the Bands table using the natural key of the BandName, so I can capture the BandId in the CTE for use when I insert the records. Once the CTE is populated, I simply insert the new records into the BandMembers table from the CTE results.
If you were to run the migration portion of this code over and over, it would only insert records into Bands and BandMembers that do not already exist. If they do exist, the redundant data is simply ignored, and your data remains pure.
If you have a tool to build your database migrations for you this probably won’t be as much help. However, this example works regardless of tools, and may be adapted to work with any database engine, using the syntax unique to each. You may have to use views, multiple select statemenst with UNION, instead of using temp tables and CTEs. But the concepts and the order of exection remain the same.You can enhance this technique to update existing data, or remove deprecated data as well.
So, keep your migrations solid without fear of corrupting your data.
Cheers,
Ben