DB2 Database Maintenance and Recovery: Getting Past the Common Misconceptions

By: Brenda Honeycutt , Craig S. Mullins
To read more DBA articles, visit http://dba.fyicenter.com/article/

Every DBA strives to maintain a healthy DB2 environment every day. However, there are numerous lingering misconceptions about database health to which many organizations fall prey. These misconceptions occur because DBAs believe in, or apply, concepts that are objectively false.

Some misconceptions exist simply because they’ve been passed down from older generations. In terms of DB2, this means that best practices for older DB2 versions may no longer be best. This article examines several common misconceptions about DB2 maintenance and recovery.

“Lies, Damned Lies, and Statistics“

This is part of a phrase usually attributed to Benjamin Disraeli and popularized in the U.S. by Mark Twain. Although the origin of this statement may be questioned, its meaning is clear. It refers to the persuasive power of numbers and neatly describes how even accurate statistics can be used to bolster inaccurate arguments.

When it comes to DB2, statistics are the lifeblood of database health. But it’s important to distinguish between the two types of statistics available to us: catalog statistics and real-time statistics. Although originally intended as food for the DB2 Optimizer, historically we’ve also needed analytical RUNSTATS that updated catalog statistics to drive our maintenance processes. With the advent of real-time statistics, there’s no need to waste time and resources by running analytical RUNSTATS. It’s time to use RUNSTATS strictly for the purpose IBM intended: driving access paths for optimal performance.

Let’s dispel the delusion that using real-time statistics increases CPU costs. DB2 collects the statistics in real-time anyway, even if you aren’t using them. Experience and customer polls indicate that externalizing RTS has practically no impact on CPU. Because these statistics are collected in real-time, they’re timely and accurate.

So you have your statistics straight. They accurately reflect your database activity and you can rely on them as a basis for backups. The real-time statistics that pertain to backups include:

• COPYLASTTIME: The timestamp of the last full or incremental image copy on the table space or partition
• COPYUPDATEDPAGES: The number of distinct pages that have been updated since the last COPY
• COPYCHANGES: The number of insert, delete, and update operations since the last COPY
• COPYUPDATELRSN: The LRSN (Log Record Sequence Number) or RBA (Relative Byte Address) of the first update after the last COPY
• COPYUPDATETIME: The timestamp of the first update after the last COPY.

Database Backup Fallacies
Now let’s consider some common fallacies regarding database backups outside the realm of determining when to do them:

Migrating too early to tape. Some organizations immediately migrate current backups to tape to free up disk space. This will free up some DASD, but the time needed to recall that tape must be added to the recovery time. So which is more important: disk space or recovery time?

You should consider keeping the most current backups on DASD and migrate them to tape only when a new backup is taken. Of course, your tape vs. disk strategy for backups will heavily depend on the nature of the data being backed up and Recovery Time Objectives (RTOs) for that data.

Some shops use dual backups. Rather than trying to save money on disk space, consider limiting dual backups to objects that require a critical copy. A critical copy is one taken after a REORG or a LOAD LOG NO, as well as a REORG with a rebuild of the compression dictionary for compressed table spaces. In other cases, DB2 always has the opportunity to fall back to a prior copy in the unlikely event of an unusable copy.

Virtual tape storage is as good as DASD. Not. There’s a hard-and-fast rule concerning Virtual Tape Systems (VTS) that everyone should follow: Although choosing a less expensive storage media, such as VTS, for backups is OK, don’t overestimate the speed of VTS. When deciding on the right backup device, always keep mount and recall times in mind. For example, mount and recall times for small objects are often bad in relation to recovery time.

Oops, the archive logs are on tape! If your archive logs are on tape or VTS, a parallel recovery isn’t possible. Keeping archive logs on DASD isn’t an option for some shops. When backing up archive logs to tape, don’t follow the rule of thumb pertaining to the 28,672 block size normally optimal for tape. Instead, use a block size of 24,576. To enable parallel processing for recovery, the archive logs on tape must be copied to DASD, and 24,576 is the preferred size; it’s been the ZPARM default since DB2 V8.

Think about the trade-off in tape savings vs. recovery speed and your decision should be obvious. Before backing up any logs to tape, it’s a good idea to always have at least 24 hours covered by the active log and at least 48 hours covered by archive logs on DASD.

You can use mirroring instead of dual logging. Some organizations mistakenly believe that single active logging is fine if mirroring is used. But what happens if a technical error occurs with the channel, the device, or the firmware of the device? The data will be mirrored as inconsistently as the source. You should always do dual logging, without exception. The DB2 log is the most important resource to ensure data consistency.

Fast, faster, flash it. IBM introduced BACKUP SYSTEM and RESTORE SYSTEM in V8 to take full advantage of the immense speed gains in FlashCopy I and II technology. With FlashCopy, the I/O subsystem does the copy, freeing the CPU to do more important work. FlashCopies enable ultra-fast image copies and now support incremental copies at the volume level and over multi-volumes when using “consistency groups.”

Some people have hailed FlashCopy as “the end of the image copy,” but that isn’t the case for all installations. It’s the best for select, large, enterprise-critical objects, but you should choose those objects wisely. FlashCopy can be expensive in terms of the hardware cost and system topology. You must have twice the number of disks to flash a given “pool,” which can make keeping generations of backups extremely expensive. Even if you then offload the flash pool to tape, you fill up tapes with gigabytes of unchanged data. The introduction of incremental FlashCopies has alleviated some, but not all, of this space management nightmare. Incremental flashes are at the volume level. A data set that’s multi-volume must be handled with great care—unless persistent, change recording, background nocopy, and inhibit target write options are all in effect.

FlashCopies aren’t registered anywhere, unless from the BACKUP SYSTEM DB2 command. You must remember that you did a flash, and remember what was flashed. Flashes don’t reset the COPY PENDING status. Don’t, however, let these facts discourage you from using FlashCopy outside of DB2. Some third-party tools keep track of FlashCopies and reset the copy pending status. Maybe yours is one of these.

Copying indexes. Some folks blindly continue on, administering their DB2 systems with little or no changes even as new versions roll in with new functionality. We’ve used COPY to make image copy backups of DB2 indexes for some time now, but many shops completely ignore this capability. If the index was created (or altered) with the COPY YES parameter, then it can be recovered using either the REBUILD or RECOVER utilities. An index defined as COPY NO can only be recovered using the REBUILD utility.

It can make sense to backup indexes for recovery because the REBUILD utility can take a long time to complete for large amounts of data or when a large number of indexes exist. For example, in one simple test, recovering a 50-million row index substantially outperformed rebuilding that same index. Of course, your mileage may vary.

Another reason for backing up indexes is in the case of a recovery of both the data object and the index. COPY YES indexes can be restored at the same time as the data is being restored, reducing the overall outage time.

Remember, however, that all index copies must be full copies, as incremental image copies aren’t permitted for indexes.

Full article...