The Replication infrastructure in SQL Server is implemented using SQL Server Agent to execute the various components involved in the form of a job (e.g. LogReader agent job, Distribution agent job, Merge agent job)
SQL Server jobs execute a binary executable file which is basically C++ code.
You can download all the scripts for this article here
SQL Server Job Schedules
By default each of job has only one schedule that is set to Start automatically when SQL Server Agent starts. This schedule ensures that when ever the SQL Server Agent service is started all the replication components are also put into action. This is OK and makes sense but there is one problem with this default configuration that needs improvement - if for any reason one of the components fails it remains down in a stopped state.
Unless you monitor the status of each component you will typically get to know about such a failure from a customer complaint as a result of missing data or data that is not up to date at the subscriber level. Furthermore, having any of these components in a stopped state can lead to more severe problems if not corrected within a short time.
The action required to improve on this default settings is in fact very simple. Adding a second schedule that is set as a Daily Reoccurring schedule which runs every 1 minute does the trick. SQL Server Agentís scheduler module knows how to handle overlapping schedules so if the job is already being executed by another schedule it will not get executed again at the same time. So, in the event of a failure the failed job remains down for at most 60 seconds.
Many DBAs are not aware of this capability and so search for more complex solutions such as having an additional dedicated job running an external code in VBS or another scripting language that detects replication jobs in a stopped state and starts them but there is no need to seek such external solutions when what is needed can be accomplished by T-SQL code.
SQL Server Jobs Status
In addition to the 1 minute schedule we also want to ensure that key components in the replication are enabled so I can search for those components by their Category, and set their status to enabled in case they are disabled, by executing the stored procedure MonitorEnableReplicationAgents.
The jobs that I typically have handled are listed below but you may want to extend this, so below is the query to return all jobs along with their category.
SELECT category_id, name FROM msdb.dbo.syscategories ORDER BY category_id;
* Distribution Cleanup
* LogReader Agent
* Distribution Agent
Snapshot Agent Jobs
By default when a publication is created, a snapshot agent job also gets created with a daily schedule. I see more organizations where the snapshot agent job does not need to be executed automatically by the SQL Server Agent scheduler than organizations who need a new snapshot generated automatically. To assure this setting is in place I created the stored procedure MonitorSnapshotAgentsSchedules which disables snapshot agent jobs and also deletes the job schedule.
It is worth mentioning that when the publication property immediate_sync is turned off then the snapshot files are not created when the Snapshot agent is executed by the job. You control this property when the publication is created with a parameter called @immediate_sync passed to sp_addpublication and for an existing publication you can use sp_changepublication.
The scripts assume the existence of a database named PerfDB.
1. Run the scripts to create the stored procedures in the PerfDB database.
2. Create a job that executes the stored procedures every hour.