Best practices for implementing disk-to-disk backup: Part 2

In this article, we continue our series on the best disk-to-disk backup strategies and wrap up our discussion on the challenges associated with software-based disk-to-disk backup.

As discussed in the previous article, software-based disk-to-disk backup creates some potential implementation and operation issues. We also noted that over time, backup software providers will resolve most of these issues.

The one area that backup software providers won't be able to address however, is the file system itself, specifically file system size, fragmentation and sharing. Unlike tape, in order for backup software to use disk as a backup destination, it must first have a file system installed on it. Ideally, this file system is large enough to hold the entire disk backup. If you have 10TB of backups, you'd like to make a 10TB file system.

The trouble with file systems

Many file systems, both practically and theoretically, cannot support anywhere close to this size. In fact, a 2TB file system size is large. Consequently, if we have 10TB of backup data and we can create only a 2TB file system, we will have to create five file systems. Each of these file systems must be independently managed and monitored, and more file systems must be created as the backup data set grows.

With disk-to-disk backup, fragmentation results from saving the backup jobs to the backup disk area. These jobs vary in size; they are made smaller, larger and eventually get deleted when you migrate the job to tape. This change and variation over time causes fragmentation. Since the backup-to-disk process is nothing but file changes and deletions, the resulting fragmentation happens faster and more severely than in other applications. All operating systems suffer from this problem, which can be solved only by using a disk defragmenter. However, using a disk defragmenter is very processor-intensive and unpopular with system administrators. With a multiple terabyte file system, a defragmentation job can run for days.

Another issue with a standard file system is its inability to be shared. In a tape storage-area network (SAN) environment, you can have multiple servers (even with different operating systems) accessing the same tape library at the same time. This is because each backup server using the same backup software writes its data stream to its own dedicated tape drive. This isn't true with a disk backup target on a standard file system, where multiple servers share the same disk destination at the same time. With a standard disk file system, each server performing backups needs its own file system on the SAN. Because of this, it would be very difficult and risky to share these file systems, especially with backup servers with dissimilar operating systems.

Hardware suppliers have validated the concerns with software-based disk-to-disk backup. Most major hardware suppliers now offer more than "just disk." In fact, many now offer some sort of disk-to-disk appliance or service that attempts to overcome most of the limitations listed above, as well as those mentioned in the last article.

EMC Corp., Network Appliance Inc., Storage Technology Corp. and Advanced Digital Information Corp. all deliver solutions that resolve these limitations with software-based disk-to-disk. Product categories addressing these weaknesses include hardware-based disk libraries, virtual tape libraries and the new emerging commonality factoring devices.

Hardware-based Disk Libraries

The first attempt at resolving these issues came from suppliers that created what are essentially hardware-based disk libraries. HBDL is a term used for stand-alone products that emulate a tape library. It's important to note here that emulation is exactly what an HBDL does -- and nothing else. This is just like connecting a second tape library to your backup server, except that it happens to be disk-based instead of tape-based. Unlike virtual tape libraries (VTL), which we'll detail in the next article, HBDLs typically don't support direct attachment to a tape library. HBDL units do solve some of the problems that traditional software-based solutions suffer from, however, including the file system issues discussed in the last article.

HBDL systems also handle fragmentation by either defragmenting in the background on the fly or by creating a file system that never fragments in the first place. They don't typically have file systems with practical or theoretical size limitations, so creating very large (tens of terabytes) file systems is realistic. SAN "sharability" is performed just like a tape library on a SAN and as such, is much more reliable.

In addition to SAN sharing, another key aspect and shortcoming of software-based disk-to-disk backup is that of raw performance. Assuming that the rest of the environment can deliver data quickly enough to the master or media server, performance is also very good on these devices because they are specifically designed for high-performance writes and large block sizes.

In addition, HBDLs minimize the concern of how the backup software interacts with disk. Since tape library emulation makes the HBDL "look, act and smell" like a tape library, this should be a plug-and-play solution in most environments. This refers to implementation, not necessarily operation.

Another scenario where an HBDL may be a more logical choice than a VTL is in a Tivoli Storage Manager environment. Unlike other backup software, Tivoli Storage Manager already does so much with disk that it's more difficult to integrate with a VTL-like product. The simplest solution may be to integrate a "fast" (i.e., HBDL) tape library and let Tivoli Storage Manager do the rest.

Currently, HBDLs are getting a lot of support from backup software manufacturers. That's because with HBDLs, the backup software application has complete control over the media management process. With VTLs, the software does not.

In the final article, we will look at VTLs and commonality factoring devices.

George Crump is vice president of technology solutions at SANZ Inc., an Englewood, Colo.-based data storage consulting and system integration company focused on the design, deployment and support of intelligent data management.

Copyright © 2005 IDG Communications, Inc.

  
Shop Tech Products at Amazon