Understanding Scans And Seeks

By: Preethiviraj Kulasingham
To read more DBA articles, visit http://dba.fyicenter.com/article/

There is a popular theory on scans and seeks: "Scan reads through the object (table/index) from top to bottom looking for the records that it needs while seek goes directly to the part of the object that it needs and reads to where the data that it needs ends. This is obviously a must more efficient operation than a scan, as database server already knows where the data is that it is looking for". This article analyzes the validity of the theory.

The problem
Some time back I had a situation where I had to find the last key of a table. The table was designed to have an integer column as the primary key with the key always incrementing. However the table did not have an identity column. It forced me to find the last key before the insert occurs. As the table grew over time, each operation against the table became increasingly costly.

I used a query similar to the one below to find the last value of the key. (This example uses AdventureWorks database.)
SELECT TOP 1 TransactionID
FROM Production.TransactionHistory
ORDER BY TransactionID DESC

I was puzzled. It was using a scan (In my case a clustered index scan). A clustered index is organized slightly differently than a non-clustered index. While all other indexes have separate index pages for the entire index, clustered indexes use the data pages for the leaf level entries. Does that mean SQL server is scanning the entire table to get just one row? It can easily find the last entry in the index and get the value. So in my opinion it should be a seek operation. So, why I observe this peculiar behavior?

My Solution
I remembered I read somewhere that SQL Server uses backward search as well. (That is, searching from last entry towards the first entry). May be it is not true then, I thought, and removed that DESC keyword from the query to check the execution plan. But it didn't make any difference in the plan. It was a confirmation that SQL Server treats backward and forward searches the same way. However, it did not solve my problem.

I wrote the query in different ways, such as below:
DECLARE @Rows bigint
SET @Rows = 1
SELECT TOP (@Rows) TransactionID
FROM Production.TransactionHistory
ORDER BY TransactionID DESC

To add to my frustration, it gave the same execution plan with scan operation taking almost 100% of the total cost.
SET ROWCOUNT 1
SELECT TransactionID
FROM Production.TransactionHistory
ORDER BY TransactionID DESC
SET ROWCOUNT 0

The execution plan changed only slightly. When the Top operator is removed, I saw a slight improvement in the total estimated cost, but still the major operation stayed as a clustered index scan. The scan changed into seek when I wrote the query differently:

SET ROWCOUNT 1
SELECT TransactionID
FROM Production.TransactionHistory
WHERE TransactionID <= 2147483647
ORDER BY TransactionID DESC
SET ROWCOUNT 0

2147483647 is the largest number "int" data type can hold. So, it is clear the results will not change because of the additional condition. But to my amusement, the execution plan changed.

Full article...