Editorials

Optimal File Group Utilization

David Eaton brings us back to database fundamentals of file utilization for an SQL Server database. He writes:

The fundamentals of SQL Server are still the same, spread out the data on as many disks as you can. Make sure the database design is correct and scalable. Set up filegroups to move indexes intelligently, meaning use two or more and spread them out such that the highest traffic indexes are spread across the files equally. Do the same with the data tables. And make sure all the DDL is optimal and you will have a very good performing SQL Server.

I am in complete agreement with this position. One of the things I find difficult is the number of file groups to establish and what to put on each. If I am working with large tables I find it improves performance to put all of the non-clustered indexes on a separate file group than the data for the same table which is contained in the clustered index. This helps improve the performance when using a non-clustered index as part of the query plan.

A couple things come into play when you start adding file groups. One is that your database size can increase if you pre-allocate the size of the file group to allow new data to be added without having to let the file grow automatically. This increases database performance because you don’t have to wait for disk allocation to complete, and it helps keep your database files from being fragmented on the host disk, let alone fragmenting the data in the database object itself.

So how much extra disk space does your file group need? A key factor to answer this question is how do you defragment your database data? To de-fragment data you have to rebuild the index on which the data is built. When you rebuild an index you can use TempDb for the work to be done. If TempDB is not large enough to host the database being defragmented, then the work is done in the same file group as the data itself. If the file group doesn’t have enough empty space as the largest object to be defragmented, then the work can’t be done. This is the crux of the issue. When you have large tables with billions or trillions of rows, you must have enough empty disk space in the file group where that table is hosted in order to rebuild the index.

To be fair, using a modern version of SQL Server, you are not likely to have a table with billions of rows. You would probably have a partitioned table made up of multiple tables each on its own individual file group to keep things working fast. That’s a topic for tomorrow. The point is that if you have more than one table on a single file group, that same un-allocated space in the file group may be used to rebuild an index contained on that file group. The more file groups you have, the more unallocated space is needed to de-fragment.

I pretty much start out with three files for any database. A Data File, and Indexes File and a Transaction Log file. The names describe how I use them. As the size and performance requrements for one or more tables increase, I may make additiona file groups. If I use a partitioned table, I may create multiple file groups for the partitions because of disaster recovery.

Everything is a tradeoff. With SQL Server, one of the best trade offs you can make is to sacrifice disk space for performance.

Cheers,

Ben