How DNS Can Help in a Disaster Recovery Solution
Murphy walks among us. You know Murphy, the famous "optimist" who helps make every bad situations even worse? Well, after a disaster has occurred is not the time to figure out how and where you need to recover your data. Sure, we practice (ok, hopefully we practice!) recovering our database(s) from tape or disk. And during our testing we restore it into a test database or, if we’re lucky enough, into a test server. Great, but how does the application connect to the recovered database if we’ve run into a massive hardware (server) failure?
Before I describe how Domain Name Services (DNS) can help, take a look at a typical database environment like I've shown in Figure 1. This diagram shows each of our databases, named dbServer1, dbServer2, and dbServer3 with 3 database on each one. This particular design gives the Database Administrator (DBA) a lot of flexibility to move databases around for system maintenance, upgrades and recovery. It can also provide a level of security by isolating "sensitive" data from less sensitive data under normal circumstances (think account data from regular user databases that all employees have access to comply with Sarbanes-Oxley requirements)., Of course, if a disaster were to occur, many processes and procedures change until the situation is rectified (the disaster is over and the original database server(s) are brought back on-line.
Database Servers 1
So now, I'll describe how DNS can be used as a tool to help point applications to the correct database server.
As a refresher, DNS responds to requests for IP Addresses based upon a server name. Most people also realize that the reverse is also true; we can send an IP Address to the DNS server and get back the Authoritative Name of the device assigned to the address. In DNS, this is the A-record and is there is always one, and only one, specific name for an IP Address.
That’s nice, but it doesn’t really help us. Wouldn't it be nice if there was a way to have more than one name assigned to a single IP Address? There is and it’s called the Canonical Name Alias, often referred to as the CNAME record by DNS Administrators. And you can have an unlimited number of CNAME entries for each IP Address. (NOTE: Remember that DNS is a special kind of data file and while it can be “unlimited,” having too many records or entries could cause performance problems on your DNS server or network.)
How does that help with the recovery of a database? One part of the answer involves application configuration when the application is installed – whether it's a fat client on a desktop or a web service. Usually it requests the name of the server where the database is stored. For many people doing the install, they enter the physical name of the database server (the information associated with the A-Record in DNS). However when that database server fails, the application will also fail. And even though the data can be recovered to another database from the backups and log files, the connection information has to be re-entered to point to the new server, whether it requires an install or just a change in the application's configuration. Even more worrisome is for a fat-client this could require touching hundreds or thousands of desktop systems to update their configuration. Even with tools to push updates, this could be a lengthy and error-prone process. Worse, this downtime could lead to significantly lost revenue or revenue opportunities.
Now, you’re probably thinking, "I know where this is going. I can just drop my A-record for DBServer2 and recreate it as a CNAME-record to point to DBServer1." And that will work, provided all the databases on DBServer2 are restored to DBServer1. However, it is very rare that DBServer1 has sufficient free resources to recover all the databases from DBServer2. Instead, you end up recovering some database on DBServer1 and some on DBServer3. (See Figure 2)
Failure of DB Server 2
At this point, you wonder "How do you recover some databases to DBServer1 and others to DBServer3?" One solution is to create a CNAME alias for each database stored on each server. While this is more work to manage and track, it provides the most flexibility when you need to move databases from server to server.
While this helps with recovery, it also helps in environments where replication is involved since you already have the database copied to a secondary (or stand-by) server. Updating the CNAME alias allows all the clients to point to the secondary (now primary) database server.
While this has only been focused on recovery, a side-benefit of using this technique also provides an opportunity to migrate databases from one server to another. If you're running into performance issues or need to do a database upgrade or maintenance, you can easily move the database with minimal impacts to client systems.
The final step is to force all the client systems that connect to the database to have their local DNS cache cleared out. This will allow client systems to connect to figure out the IP Address for the new server name. Under Microsoft Windows the command "ipconfig /flushdns" will clear the local DNS cache and find the new IP Address by doing a look-up of the IP Address on the DNS server. Under Linux the command to restart the "nscd" daemon (Name Services Caching Daemon) will clear the cache ("/etc/init.d/nscd restart"). Both should take no more than a couple of seconds to complete.
In short, using a CNAME alias makes managing recovery easier should a critical failure occur. While providing a solution to easily recover a database to a different server, it provides a secondary benefit of abstracting out the physical server name to assist with database engine upgrades or moving databases between servers for performance reasons. And it also reduces the other things Murphy can cause to grown wrong while you're trying to recover your databases.