Monitor your Database Backups Using Operations Manager

By: Thomas LaRock
To read more DBA articles, visit http://dba.fyicenter.com/article/

Introduction
While presenting my work on Operations Manager at PASS this year, I showed a new custom monitor for Operations Manager that I had recently written and deployed into our Data Server environment. Previously, I had created what I had thought to be an exciting Operations Manager monitor to notify me if any SQL Agent jobs had been created or modified within the past day. Not many people shared my excitement, and I started to realize that people really want Operations Manager monitors that are simple and work reliably.

I could not agree more. Even I want it to work, and I was becoming frustrated that, every now and then, Operations Manager would seem to not work as expected. For example, it is possible to have a job not succeed, but not report a failure, at which point Operations Manager will not notify me of the issue. This is mostly due to poor error handling than anything else, but what if the job happens to be your database backup job inside of SQL Agent? You could be without a recent backup of your database and never know it!

After some internal conversations I gave myself two options; either dig into, and through all, our jobs, ensure we have thorough error handling for every conceivable error, and make certain we have configured Operations Manager to adequately scrub the error logs and event logs to make certain we raise alerts when necessary. Or, preferably I could build a monitor in Operations Manager that looks into the msdb database to determine when the last time a full backup has been completed.

Which one sounds better to you? I was attracted to the second option because it let me build yet another custom monitor that I feel should be part of the standard Management Pack offered by Microsoft (or perhaps a third party vendor). And when I presented this idea to the folks at PASS it really seemed to hit home. I know many people have strung together lots of reports on their database dumps in order to verify that they have happened. My preference is to have real time monitoring rather than daily reports, and I think this custom monitor really made people sit up and take notice of Operations Manager, as it provided one of the ‘Hello World’ examples that people could build upon for themselves.

Create the Monitor
This monitor was created in much the same way as I’ve already described in SQL Agent Jobs. There are two important differences in the new monitor
* the target is the database,
* We will accommodate both SQL 2000 and SQL 2005 Databases

Why two different targets? Because of the second difference, that we will need two different VB scripts, one for each version we are monitoring currently (SQL2000 and SQL2005). And why do we need this? Because Microsoft reserves the right to make changes to their system tables between versions, that’s why.

In Figure 1 you can see that I have named the monitor ‘Verify Backups’, and I have placed it under the ‘Availability’ aggregate rollup monitor for each target. This means that, when we examine the database state view later on, we will be able to drill into the health explorer for a database to see the current health state for a particular database. I chose the Availability rollup simply because it made the most logical sense. If you do not have a current backup of your database available, then the health status for the availability of your database should show that. In other words ‘availability’ does not have to simply mean real-time availability; it can be extended to include the availability of your backups.

Script Overview
Using the same idea as in the previous monitor I created, I knew that I needed to create a PropertyBag, which meant that I needed to return a name-value pair. So, I went about trying to piece together some T-SQL that would return three columns: database name, number of says since the last full backup, and number of days since the last differential backup. I could change the code to include transaction log backups, if desired, but my goal for this example was simply to check that nightly differential backups, and a weekly full backup were being done.

The VB script for the SQL 2000 DB target is nearly identical to the SQL 2005 DB target VB script. The only difference is in the T-SQL being sent. The SQL 2000 version is as follows:

SET NOCOUNT ON
SELECT d.name as [DBname]
, [daysSinceFull] = max(isnull(datediff(dd,b.backup_start_date,getdate()),0))
, [daysSinceDiff] = max(isnull(datediff(dd,bi.backup_start_date,getdate()),0))
FROM [master]..[sysdatabases] d WITH (NOLOCK)
LEFT JOIN [msdb]..[backupset] b WITH (NOLOCK) on d.name = b.database_name
AND b.backup_start_date = (select max(backup_start_date)
FROM [msdb]..[backupset] b2
WHERE b.database_name = b2.database_name AND b2.type = 'D')
LEFT JOIN [msdb]..[backupset] bi WITH (NOLOCK) on d.name = bi.database_name
AND bi.backup_start_date = (select max(backup_start_date)
FROM [msdb]..[backupset] b3
WHERE bi.database_name = b3.database_name AND b3.type = 'I')
WHERE d.name NOT IN ('Pubs','tempdb','Northwind', 'Adventureworks')
AND d.status < 30
GROUP BY d.name

The idea was to return a list of current databases from the master database and join out to the msdb database. That way I would only focus on current databases, as opposed to reporting on databases that no longer exist but still have records inside of the msdb database. We also filter for sample databases as well as tempdb. The SQL 2005 version is as follows:

SET NOCOUNT ON
SELECT d.name as [DBname]
, [daysSinceFull] = max(isnull(datediff(dd,b.backup_start_date,getdate()),0))
, [daysSinceDiff] = max(isnull(datediff(dd,bi.backup_start_date,getdate()),0))
FROM [master].[sys].[databases] d WITH (NOLOCK)
LEFT JOIN [msdb]..[backupset] b WITH (NOLOCK) on d.name = b.database_name
AND b.backup_start_date = (select max(backup_start_date)
FROM [msdb]..[backupset] b2
WHERE b.database_name = b2.database_name AND b2.type = 'D')
LEFT JOIN [msdb]..[backupset] bi WITH (NOLOCK) on d.name = bi.database_name
AND bi.backup_start_date = (select max(backup_start_date)
FROM [msdb]..[backupset] b3
WHERE bi.database_name = b3.database_name AND b3.type = 'I')
WHERE d.name NOT IN ('Pubs','tempdb','Northwind', 'Adventureworks')
AND d.state = 0
AND d.source_database_id IS NULL
GROUP BY d.name

The difference here is that in SQL2005 we want to avoid looking at database dump information for database snapshots. Although a database snapshot cannot have a backup taken, it does appear in the list of current databases for the instance, so we needed to filter for that possibility in our query.

Also note that we focus on the start time for the backup, as opposed to the finish time. This is because the start time for the backup is more important than the finish time since that is when the LSN checkpoint is marked in the backup file. In other words, you restore to the point in time at which the backup started, not when it finished. So, we want to focus our monitor on the start time.

For the monitor schedule I decided to run every three hours, but you can adjust according to your level of OCD. Some people will choose to run once a day, some might choose once an hour.

The last items of interest would be defining ‘health state’ for the monitor. I decided to define the state of this monitor as ‘Unhealthy’ should I find a database that has not been dumped in more than seven days as shown in Figure2.

Full article...