Editorials

Rolling Window Data Storage

A common database problem is the management of large amounts of data occurring and deprecated over time. For example, we often have data we maintain for a period of days, weeks, months, years, etc. When the data becomes obsolete we want to remove it quickly. When we start a new time segment for data, we want to generate that quickly, and segregate it from all other periods. I have heard this concept called a rolling window; a continuously moving view into time with a fixed period length.

A good example of a rolling window would be daily activity. Having enough daily activity, say multi-millions of detail records, you may wish to separate the data into separate daily tables. If you wanted to maintain a 30 day rolling window of daily activity, you would use 30 tables, with one table for each day. So you might have a table DailyAcitivity1, DailyActivity2, … DailyActivity30. As each day begins, you simple purge the data from the appropriate table, and start over. There are a couple problems with this scenario. Not every month has 30 days. When you wish to report against all of the DailyActivity you have to search all 30 tables for the data you wish to locate.

In the old days you could resolve this by using a view which unions all of the 30 tables. With a before trigger on the view, you could have and data modifications executed against the appropriate table. This is a lot of work to configure and manage. You have to roll this view and triggers for every partitioned set you create. The nice thing about this design is that you can do it in many of the different SQL engines.

With SQL Server they came up with a more elegant solution. They came up with a partitioned table. In this case, you create a partition function the table uses to determine in what table your data resides. When you need a new partition, you create a new table independently. Then, you add the new table to the partitioned table, and all the union and data management you used to write is already in place. When you find a partition is no longer relevant, you remove it from the partition, at which point you can even drop the table. No time waiting for the delete to remove all the records, and include it all in the transaction log, blowing it way out of proportion.

If you want to see this concept on steroids, take a look at federated views in SQL Azure. It allows you to partition across databases. This allows you to extend the size of data you wish to store, and still recognize it as a single table definition. I am over simplifying how it works. It’s not as easy as it sounds if you are running a query that may be against multiple databases…but you can look into that yourself it interests you.

Please note that I am not really looking at sharded solutions where data may be stored on different servers (other than SQL Azure), as that found in the Data Center SQL Appliance, or that being implemented with clusters of MySql databases, or the like.

A rolling window is a real world scenario most DBAs will experience. Some people see it more often than not. It is more common in warehouses with data marts.

So, if you are in need of storing rolling windows, look into Partitioned Tables in SQL Server, or federated views if your SQL Engine of choice does not support Partitioned Tables.

Cheers,

Ben