DBA > Articles

Estimating Disk Space Requirements for Databases

By: Alex Kozak
To read more DBA articles, visit http://dba.fyicenter.com/article/

Database Maintenance Tips and Tricks: Part 1.

In the first of a series of Database Maintenance Tips and Tricks, Alex Kozak discusses some general issues surrounding disk space requirements for SQL Server databases and shows how to estimate Disk space usage and requirements. He gives tips on using DBCC CHECKDB consistency check and for estimating the space required for replication.

I do not know many people who actually enjoy routine database maintenance tasks. They can be quite monotonous and require more in the way of pedantry than creativity from the DBA. However, there can be no doubt that database maintenance is important. No matter how brilliant the application is, or how perfectly-designed the database, the system will very soon cease to function correctly without proper maintenance

While database maintenance may never be the most motivating activity that a DBA can undertake, I've found that, if you are prepared to make a real effort to learn SQL Server internals and to understand what is going behind the scenes, the task quickly becomes much more interesting, even intriguing.

In this series of the articles, I would like to discuss some SQL Server maintenance internals and, along the way, present a few tips, tricks and traps that DBAs should be aware of when performing database maintenance. In this particular instalment, I will cover:
* Some general issues surrounding disk space requirements
* Disk space and the DBCC CHECKDB consistency check
* Snapshot folder space and transactional replication


How much disk space do I need?
This question, or its close relation, "How fast will the database grow?" is one that customers ask very often. At first glance, it seems to be a quite straightforward, innocuous-looking question, but in reality one must consider many different factors in order to be able to answer it with any degree of accuracy.

Disk-space planning is just one element of the broader discipline of sizing and capacity planning. Sizing (also called pre-configuration capacity planning) takes place before you design the system and attempts to estimate the resources the system will need, over a period of time, in order to meet the conditions of the Service Level Agreement (SLA), or similar specification. At this stage, no statistics exist and so you need to create them, by modelling the system that needs to be designed, which can be quite expensive, or else find other ways to estimate the required resources.

Capacity planning deals with working applications. It uses existing statistics regarding resource consumption to predict future requirements, in order that the system will continue to meet the requirements of the SLA.

Note: A full discussion of sizing and capacity planning is beyond the scope of this article. For further information, the Microsoft Press book, "Microsoft SQL Server 2000 Performance Tuning: Technical Reference", contains several useful chapters on these topics.

When planning disk space requirements, many people start and end with a basic estimation of database size, and its growth over time. However, there are many other factors to consider that will affect your overall disk space requirements. We'll discuss each in turn in the following sections.

Database Size and Growth
Many people adopt a fairly simplistic approach to database sizing. For example, for a new database application, they may try to estimate the future size of the database based on:
1. The row length of each table.
2. The number of business transactions that the database will need to store over a given period of time

This approach may work fairly well for databases with relatively small number of the tables. However, the more variable-length columns in the tables, the less precise will be the result of these calculations.

Of course, when estimating the number of business transactions that may need to be stored in a given period, and the disk space required to accommodate this, you need to give due consideration to the basic nature of the business. For example, a sales application that sells grocery products is likely to need more disk space than one that sells furniture, because a single grocery transaction contains more items than a furniture transaction. Furthermore, there may be seasonal trends to consider. For example, when database sizing for a superstore that sells office supplies, you will need to account, in your calculations, for an abnormally high number of transactions at the end of August, just before school year starts.

You will also need to plan for the amount of space required by indexes, as the indexes can occupy the same, or even more, space than data.

Keep in mind that indexes and data do not fully utilize the index (data) pages.
The amount of free space in data pages depends on the number of columns in a table, and the data type of each of these columns.

For the index pages, the amount of free space also depends on the fill factor and pad parameters. Therefore, how full the page is depends on structure of the table and any change in the structure of the table will affect the size of your database, sometimes significantly.

Some modelling tools are available that can facilitate these sizing calculations, based on rows and indexes length. See, for example: The Basics of Sizing a SQL Server Database By Brian Knight, 2008/08/01 .

Disk Speed
While a disk system may be of sufficient size to cope with predicted requirements, it may not be able to handle the anticipated number of I/Os per second, generated by the workload.

In this case, you may need to add some extra disks in order to separate different database files physically, so that each will reside on its own disk or logical drive.

Alternatively, you may want to change the RAID configuration to one that is more suitable for your particular application.

Disk Utilization
According to the utilization rule of queuing theory, when utilization of resources reaches a certain point (the knee of the curve on the utilization graph), the queue of processes waiting for the resources, and so the response time of the system, starts to grow exponentially.

For disk space, and for the number of I/O operations per second, the knee of the curve on the utilization graph occurs at 80-85% capacity. As such, your database should not occupy more than 85% of total disk space; and the number of I/Os per sec. should not regularly exceed 85% of its maximum. These factors must be considered in your planning calculations.

RAID Configuration
Storage planning also depends on the type of RAID configuration in use. Most RAID configurations require some disk redundancy in order to make the disk system fault-tolerant.

Sizing TempDB and the Transaction Log
You also need to reserve some space for TempDB and the transaction log – and both can vary in size over quite a wide range. Allowing inadequate space for TempDB is a trap into which many unwary DBAs fall, as we will discuss shortly.

Database Maintenance
Finally, one more area that should not be overlooked is database maintenance. Again, as we will see shortly, a number of database maintenance activities themselves require extra disk space.

TempDB and Consistency Checks
When estimating how much disk space you need for the database, it is important to consider the sizing of TempDB, and the space it will need to handle temp tables, dynamic SQL usage, merge joins and so on. It is also important to remember that performing consistency checks, by running the DBCC CHECKDB command, will also require space in TempDB. Failure to plan for this can lead to difficulties.

For example, one day I received a call from a customer who had a problem with a one Terabyte SQL Server database and needed help. The first thing I suggested was that they run a DBCC CHECKDB health check.

The DBCC CHECKDB command had been running for seven hours when it detected a corruption in the database. However, the error message was not about the corruption: the consistency check had failed due to a lack of space for TempDB!

Unfortunately, this situation is quite typical. On the one hand, most DBAs understand the importance of TempDB. They know that the database engine and the programmers use TempDB very intensively and they follow the recommendations of SQL Server Books Online and provide enough room for TempDB growth. In many cases, they also remember to place TempDB on a fast disk drive (RAID), separately from data files.

On the other hand, it is in a human nature to treat "temporary" things as less important, and this is may be why DBAs or operational staff often put applications, backups, temporary files and so on, on the same drive as TempDB. This reduces the space available for TempDB growth and increases the total number of I/O operations on the drive where TempDB resides.

Full article...


Other Related Articles

... to read more DBA articles, visit http://dba.fyicenter.com/article/