In this article I introduce a simple process that saves tables size information at points in time which latter allows for tracking down table growth over time.
The process is useful in monitoring data growth over time and lets you see what tables are growing rapidly in what databases and can also help in estimating future growth and disk space requirements. In addition it can help prevent unexpectedly running out of disk space, or if you are already in that situation then quickly find the table(s) that for some reason have grown more than excepted and are the source to the problem.
In addition, I also show a stored procedure called sp_DataFiles that returns a report of disk space usage information at the data file level.
Run these three scripts:
The process is built on top of a stored procedure called sp_TableSize which returns table level information such as row count and size.
Periodically, typically by a weekly SQL Server Agent job, the procedure will be executed against all databases using an INSERT...EXEC statement that saves the results into a database table.
Having done that we actually freeze and document size related information at the table level which can now be analyzed and so the next and final step would be to retrieve the data.
There are two main scenarios in which you can benefit from the process;
One scenario is simply viewing the report knowing what tables grow in what size and that rate at which they grow.
The second scenario would be to execute the DataGrowth stored procedure while passing the @Delta_MB input parameter a value that is not expected to be reached so that it always returns an empty result set. Now, when the day comes and this procedure returns data you know there was a growth in data exceeding your predefined threshold and can trigger a notification email.
Summary of scripts used:
TableSizeHistory.sql – Creates the PerfDB database and the TableSizeHistory table
DataGrowth.sql – Analyzes and retrieves the captured data
ExecuteDatabasesCommand.sql – A wrapper over a database cursor
sp_DataFiles.sql – Returns size information at the data file level for all databases
sp_TableSize.sql – Returns table level information for the current database
Download all the scripts
sp_TableSize is a stored procedure created in the master database and marked as a system stored procedure. The procedure works at the table level in the context of the current database, that is the database where it is executed from and returns one row per table with the information of the row count and disk space usage in units of MB sorted by disk space usage in a descending order.
The procedure is handy and easy to use with no input parameters required and the information returned is based on system meta data, meaning you do not actually access any user object so there is no effect (no shared locks and no IO) on user objects as opposed to issuing a COUNT(*) query.
Bare in mind that due to the above fact there may be a slight (minor) inaccuracy with the row count figure but that is not common and results from the way SQL Server's Storage Engine keeps track of data that was deleted and typically gets aligned with the accurate figure following a clustered index rebuild.
Part of the code in sp_TableSize is combined in my sp_helpindex2 stored procedure.
This stored procedure analyzes and retrieves the data that was saved to the TableSizeHistory table using the sp_TableSize stored procedure and returns the disk space growth in units of MB per each table.
The code identifies the first and last occurrences of each table in the requested time period as defined by the @StartTime and @EndTime input parameters and calculates the differences between the two occurrences so that the returned result set is the delta between these two values.
You can control the tables returned by the @Delta_MB input parameter so that you get to see only tables of interest, i.e. tables that grew more than a certain size.
The Delta_reserved_MB column is the sum of the Delta_data_MB + Delta_index_size_MB and this is the actual disk size the table uses. Don’t let the name confuse you, I only followed Microsoft’s terminology used in their sp_spaceused system stored procedure.
The @Database parameter has a default value of 1 and returns a second result set with information at the database level.
This procedure is actually a wrapper over a cursor that uses the database name from sys.databases DMV.
I created this procedure in order to reuse my code so that every time I need a cursor over databases I use it instead of recoding that cursor part. i.e. when I run an index rebuild process that uses my sp_Reindex procedure, DBCC commands for integrity checks, attach, detach etc.
This stored procedure is not related to the data growth process but is a very useful disk space related tool.
sp_DataFiles works at the database level and returns one row per each data file on the SQL Server instance.
The procedure is actually built on top of the disk usage report query issued by SSMS (SQL Server Management Studio) which I captured in Profiler a couple of years a go and adjusted a little for my needs. Being able to get that report by executing a procedure is far more convenience and efficient than going through the entire process required via the UI until finally receiving that desired report and also, the SSMS report works on a specific database and when you need to get a more wider picture of how your disk space is consumed by your databases it is not enough.
The procedure makes use of the undocumented DBCC command SHOWFILESTATS which returns the total extents and the used extents information per data file and joins that information with the sys.data_spaces DMV (Dynamic Management View).
All the disk space figures are calculated and derived from the extent information returned by the DBCC command while the DMV only contributes the file group name to the returned results.
DBCC SHOWFILESTATS returns one row per each data file in the database and sys.data_spaces returns one row per each data space (which can actually be a filegroup, partition scheme, or FILESTREAM data filegroup).
The procedure also returns the space reserved and space used information which is very useful. When these two figures are far from each other (i.e. a data file that stores 10GB of data but reserves 40GB) there is space that can be reclaimed to the operating system if needed.
On the other hand when these two figures are close to each other it points out that the file Auto Grow option is kicking in and that of course is something we usually want to avoid.