SQL Server recovery procedures for a catastrophic failure – (Part-1)

Author: Basit A. Farooq

A database server is usually a mission-critical component. Having a recovery plan in place is essential to ensuring the business can continue to operate in the event of hardware failure, natural disaster, or other catastrophe.

A backup or standby server is an option, but not always practical or possible because of the hardware and software costs involved. Without some kind of a standby, the key to recovering from a failure is to get the database server back up and running as quickly as possible with as little data loss as possible.

In this first of the two part article series, you’ll first learn how to view and manage database states. You’ll also learn how to view file states. Finally, you’ll learn about recovery procedures for a catastrophic failure.

Recovery overview

When you plan for recovery, you need to consider the different scenarios that might make recovery necessary. Your database design should include disaster and recovery planning, where you consider what could happen and what you must do to recover.

Sources of failure

You need to consider key points when looking at possible failures:

How likely is it to happen?
How will you detect or recognize the error?
How much data would be involved?
What recovery resources do you have available?

Common sources of failures include:

Hardware failures
Loss of server (theft, fire, or natural disaster)
Application bugs or application failures
Inadvertent data errors (such as accidentally deleting critical data)
Malicious attacks (such as SQL Injection)

Depending on the specific error, recovery might require that you restore:

A page
A single file
A single database (user or system)
The entire server

Recovery considerations

As you consider failure scenarios, you must also consider your response to each. Identify your recovery requirements, and from there, you can determine your disaster preparation needs. When planning for recovery, keep in mind:

Length of time it will take to recover
Amount of data the company can afford to lose

Some recovery scenarios will require little time to implement. Others might require many hours, or even days. The most important resource, in nearly any scenario, is your database backups. Even then, complete recovery might not be a possibility, depending on the database recovery model. A recovery model determines how the transactions are logged, whether the backup of database transaction log file is allowed, and the type of restore options available to recover the database. The SQL Server databases can be configured to one of three recovery models, which are explained in the following sections.

The simple recovery model

When you use the simple recovery model, SQL Server logs a minimal amount of transactions in the transaction log file, and the transaction log is truncated as soon as transactions are committed. Simple recovery does not allow backups of transaction log files; therefore, databases with a simple recovery model are vulnerable to data loss because you are also restricted by the fact that you can’t recover to a specific point in time.

Bulk-logged recovery model

With the bulk-logged recovery model, bulk operations are minimally logged in the transaction log file, reducing the overall size of the transaction log file. All other operations are fully logged in the transaction log file. The backups of transaction log files are allowed in a bulk-logged recovery model. Therefore, in most situations, you can restore the databases to a specific point in time using the bulk-logged recovery model.

Full recovery

With a full recovery model, SQL Server logs all database changes in the transaction log, and the transaction log continues to grow until the backup is performed. The full recovery model supports the greatest number of backup and restore options. Therefore, your chances of recovering all (or at least most) of your data are best when you configure the database for the full recovery model.

Recovery preparations

Assuming that a mirrored or standby server isn’t an option, there are still things you can do to prepare for a disaster and minimize its adverse effects:

Use fault tolerant storage.
Back up data regularly.
Test recovery procedures before you need them.
Back up before any potentially hazardous operations.
Keep an off-site backup copy.
Keep data and transaction logs on separate (physical) hard disks.
Physically secure computers.
Use computer-safe fire control systems.
Protect against malicious activity.
Train users in recognizing (and reporting) suspected problems.

Always keep in mind that failures aren’t a matter of if, but of when. You can’t prevent them, but you can minimize their impact on your servers and the cost to your business.

Database and file states

A database is always in one of the following states:

ONLINE – Users can access the database. The primary filegroup is online.
OFFLINE – Users can’t access the database. A database enters the OFFLINE state only when a user takes it offline.
RESTORING – Users can’t access the database. The database is offline and in the process of restoring one or more files.
RECOVERY PENDING – Users can’t access the database. An error occurred during recovery and the database can’t be started.
SUSPECT – Users can’t access the database. The database couldn’t be recovered during startup. The primary filegroup might be damaged.
EMERGENCY – Only a single member of sysadmin can access the database to perform a repair or restoration. You must manually set a database to EMERGENCY state. The database is READ_ONLY. Operations against the database aren’t logged.

You set the database to ONLINE, OFFLINE, or EMERGENCY using the ALTER DATABASE statement: For example, to set the Outlander database to the EMERGENCY state, you execute:

ALTER DATABASE Outlander SET EMERGENCY

You can view the state of each database by querying sys.databases or by using the DATABASEPROPERTYEX function.

Each file in a filegroup also has a status. You can view the status of the files and filegroups using the following catalog views:

sys.database_files – View the status of files in a database that isn’t OFFLINE.
sys.master_files – View the status of all files for all databases, including those that are offline.
sys.filegroups – View the status of each filegroup.

A file is always in one of the following states:

ONLINE – Users can access the file.
OFFLINE – Users can’t access the file. You explicitly mark a file as OFFLINE, but can only bring it online by restoring it from backup.
RESTORING – The entire file is being restored.
RECOVERY PENDING – A piecemeal restore requires additional action before the file can be recovered and brought online.
SUSPECT – The file couldn’t be recovered during an online restore. A SUSPECT file in the primary filegroup makes the entire filegroup SUSPECT. A SUSPECT file in another filegroup doesn’t affect other files in the filegroup.
DEFUNCT – The file was offline when a user dropped it.

You can recover a SUSPECT file by:

Restoring from backup.
Executing DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS.

When a file in a filegroup is offline, the entire filegroup is offline.

Recovering from catastrophic failure

When the entire database server fails and you need to recover using new equipment, recovery requires the following steps:

Obtain new hardware.
Install the operating system and connect to the domain.
Install preferred SQL Server version and any necessary service packs/hotfixes.
Restore the backup of the master database and the msdb database.
Restart the computer.
Enable and disable services and network libraries to reestablish the security configuration.
If the new server has a different name:
- Execute sp_dropserver
- Execute sp_addserver
Recreate any server-level objects created or modified since the last backup of the master database.
Restore and recover user databases.
Verify system functionality.

Rebuilding the master database

SQL Server requires the master database. The master database, as its name implies, is the most important database in a SQL Server 2014 instance. In fact, it is the heart of a SQL Server 2014 instance because, without it, SQL Server will not start. The master database contains the following system-level configuration information:

Information on how a SQL Server 2014 instance is initialized.
The names, locations, and other information about the databases hosted within the instance of SQL Server 2014.
All settings for logins, and the roles the logins are members of
Information about fixed and user-defined server roles
Other SQL Server instance-level security settings (such as certificates, keys, and so on)
AlwaysOn and database mirroring configuration information
Resource Governor configuration information
Information about how linked servers are configured
Configuration information of all SQL Server 2014 instance endpoints
Other system-level configuration settings (such as system errors and warnings, assemblies, available system languages, and so on)

It’s important to keep a current backup of the master database. Always back up the master database after major changes, such as adding a user database or other critical objects. You should always use the full recovery model for master. Sometimes it’s necessary to rebuild the master database. You need to rebuild master database:

If the backup is corrupt.
If the backup is damaged by fire or other catastrophe.
If SQL Server can’t start.

Continue to Part-2…

SQL Server recovery procedures for a catastrophic failure – (Part 1)

SQL Server recovery procedures for a catastrophic failure – (Part-1)

Recovery overview

Sources of failure

Recovery considerations

The simple recovery model

Bulk-logged recovery model

Full recovery

Recovery preparations

Database and file states

Recovering from catastrophic failure

Rebuilding the master database

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG