Manage and Monitor Identity Ranges in SQL Server Transactional Replication

By: Yaniv Etrogi
To read more DBA articles, visit http://dba.fyicenter.com/article/

Problem
When using transactional replication to replicate data in a one way topology from a publisher to a read-only subscriber(s) there is no need to manage identity ranges. However, when using transactional replication to replicate data in a two way replication topology - between two or more servers there is a need to manage identity ranges in order to prevent a situation where an INSERT commands fails on a PRIMARY KEY violation error due to the replicated row being inserted having a value for the identity column which already exists at the destination database.

Solution
There are two ways to address this situation:
1. Assign a range of identity values per each server.
2. Work with parallel identity values.

The first method requires some maintenance while the second method does not and so the scripts provided with this article are very useful for anyone using the first method. I will explore this in more detail later in the article.

In the first solution set server1 to work in the range of 1 to 1,000,000,000 and server2 to work in the range of 1,000,000,001 to 2,000,000,000. The ranges are set and defined using the DBCC CHECKIDENT command and when the ranges in this example are well maintained you meet the goal of preventing the INSERT commands to fall due to a PRIMARY KEY violation. The first insert at server1 will get the identity value of 1, the second insert will get the value of 2 and so on while on server2 the first insert will get the identity value of 1000000001, the second insert 1000000002 and so on thus avoiding a conflict.

Be aware that when a row is inserted the identity value (seed) is generated as part of the insert command at each server and the inserted row is replicated. The replicated row includes the identity column’s value so the data remains consistent across all servers but you will be able to tell on what server the original insert took place due the range that the identity value belongs to.

In the second solution you do not manage ranges but enforce a situation in which identity values can never get overlapped by setting the first identity value (seed) and the increment property one time only during the CREATE TABLE command of each table.

So a table on server1 looks like this:
CREATE TABLE T1
(
c1 int NOT NULL IDENTITY(1, 5) PRIMARY KEY CLUSTERED
,c2 int NOT NULL
);

And a table on server2 looks like this:
CREATE TABLE T1
(
c1 int NOT NULL IDENTITY(2, 5) PRIMARY KEY CLUSTERED
,c2 int NOT NULL
);

When these two tables are inserted the results of the identity values look like this:
Server1: 1, 6, 11, 16, 21, 26…
Server2: 2, 7, 12, 17, 22, 27…

This assures no identity values conflicts while leaving a room for 3 additional servers to participate in this same environment. You can go up to 9 servers using this method by setting an increment value of 9 instead of 5 as I used in this example.

Points to consider and comments
First Method
* Allows for more than 9 servers to be managed at the same replicated environment.
* The identity ranges have to be selected carefully to assure the max value of the range is never reached.
* It is suggested to monitor the identity values so you know a head if any table is about to reach the max range value
* Defining the ranges requires knowing the activity on site and taking into account future growth and you may need to modify int data type columns to bigint to allow for ranges to exceed the 2^32 figure supported by the int data type.
* Every new table that has an identity column added to the replication has to be seeded (a one time event) to match the range for the given server using the DBCC CHECKIDENT command (see the accompanied scripts). A failure to do so results in a PRIMARY KEY Violation error until the situation is corrected.
* Requires more maintenance / management and is prone to more errors.
Second Method
* Limited for up to 9 servers to be managed at the same replicated environment.
* Every new table that has an identity column added to the replication has to be created using a slightly different command because of the identity seed and increment values and this can effect the way you load versions to your servers because of the differences in the schema scripts.
* This method is safer – once the table is correctly created there is no additional maintenance / management and points of failure.
* Consider if you need to modify int data type columns to bigint to assure the int data type 2^32 figure is never reached.

Full article...