In the past, I have worked on moving from physical (tapes) to virtual (disks) backups and have run into several different and unique things for me to deal with. I needed to understand the differences in data throughput times on different mediums and understand how this will affect storing reliably restorable data. Another thing to keep in mind is what data makes sense to keep. This could range from critical files and configuration data needed to rebuild a complete system (Bare-Metal Restore) or just enough to restore files which have been incorrectly altered or removed. I also needed to keep in mind how much space would be needed to store the physical or virtual data.
Trying to understand what medium to use depends mostly on how much data you intend to backup within what time duration. When choosing your method of transferring your data to some sort of storage backup device you need to understand your limitations. They can range from spindle speed, fiber, network, number of simultaneous backups that can be run at the same time, and tape write latency times. My biggest challenge has been dealing with very large data transfers that were unable to be completed within the time allotted. When trying to figure out where my issues were, I would use a graph similar to the one below to help determine if it was theoretically possible to transfer all the data within the time allotted. My other big challenge was figuring out a scheduling process to allow for all the backups to have enough time to finish and allowing the next queued backup job to start.
One of the methods I looked at was streaming data to multiple physical or virtual tape drives at the same time. This allowed for directories with a large amount of data to get backed up within my time window. Some of my backup policies became so complex, they required using different regular expression patterns to create small enough sets of files to allow for transferring all the data within my backup window. This allowed me to stream to more than one tape device simultaneously allowing me to use larger bandwidth to transfer data more quickly.
Trying to figure out what data is necessary to keep depends on how much storage space you have, how much data you feel is needed to restore a system from a catastrophic failure, and what is required from your legal department. In some cases you may want to look into storing just enough data to restore the application only. The system may be part of a cluster and the only data you need to restore are application files – rebuilding a server is a fairly easy task. In other cases, it may make more sense to backup a whole system for Bare-Metal Restore, for example a very complex system build which requires many hours to build. Each system will need to be addressed individually.
There are also many challenges when storing on a virtual library. One of them you need to understand is your growth and retention needs. Depending on your retention time, you may need to wait through a complete retention cycle to get the big picture of how much data you will be storing on your library and how much space is needed. This is one advantage having physical tapes - you would just buy more as you run out of space. The challenge is you just can’t always buy new storage space. This is where it becomes really important to watch and track your data growth. When trending this data you will need to understand how much space the backups take (daily, weekly, monthly, annually) and how this data will grow over time. Example, you have a 40TB database that is collected once a month and stored for 1 year. This means you will need to have enough space to store 40TB x 13. You will need to have 520TB of space to store the years’ worth of data just for the one backup. The 13th backup is added in case the release of the oldest data is done after the 13th data capture. This process will need to be performed for each retention plan and each server to adequately give you a good estimate on how much raw space you will need. The real fun with numbers will show up if you are looking at de-duplication and/or compression. This brings in a huge anomaly to your equation. This process would be similar but you will need to calculate the average size after the de-duplication or compression process has finished and figure out the total data space per system after that.
Graph based on max throughput on some common media