Data warehousing is again becoming a well-discussed hot topic in the mainframe world. Warehousing has always been popular on System z, though it seems few want to talk about it. With all the “green” talk about System z and the rising cost of power to run large server farms (assuming you can even get additional power where your data center is located), warehousing on the mainframe is becoming even more popular.
It’s not just hardware that’s intriguing; it’s everything that runs on that hardware. Having an Integrated Facility for Linux (IFL) lets you run a copy of Linux for System z stand-alone or multiple copies of Linux under z/VM. Linux for z opens the doors to numerous product offerings. You can easily have your warehouse solution, Business Intelligence (BI) solution, and all your System z solutions sharing a single, common platform.
At the heart of any data warehouse, before you can even consider running a BI product, you need a Database Management System (DBMS). Fortunately, System z provides DB2 for z/OS. In the beginning of data warehousing, DB2 was at the center of any decision support implementation. Throughout DB2’s long history, it has always managed product enhancements that championed decision support. Although many customers continue to use DB2 for z/OS today for their warehouses, some did drift away. In an answer to the changing database landscape in today’s challenging warehousing world, IBM has delivered significant DB2 for z/OS enhancements that provide direct support for data warehousing and BI. The last few DB2 releases have been rich with capabilities to make your warehouse and BI experience better.
This article focuses on one specific, significant set of DB2 improvements that address performance. Everyone likes to see their performance improve, and with DB2 9, you won’t be disappointed. Even though this article will focus on how these performance changes could help your warehousing and BI environments, you’ll find these features will significantly help anyone using DB2 for z/OS.
What will DB2 9 for z/OS do for you? Consider the features that will help yield a positive performance impact on your data warehouse project once you upgrade to DB2 9:
Randomized index key: You want your keys spread throughout the index with no hotspots, a task sometimes easier to explain than accomplish. Sometimes stuff just all ends up in an incorrect place, no matter how hard you try to pick a key that will spread the data around. Enter DB2 9 for z/OS with a new option for the CREATE and ALTER INDEX SQL statements. In the past, you could specify ascending for forward index scans or descending for backward index scans on each column in an index key. In DB2 9, you have the additional choice of specifying RANDOM for an index column. This will cause the index entries to be put in random order by that column. Randomly inserting keys could reduce the chances of contention caused by ascending index inserts or index hot spots. Indexes created with the RANDOM option don’t support range scans.
There are instances when you can’t use RANDOM:
• If the key column is VARCHAR and the index uses the NOT PADDED option
• The index was created with the GENERATE KEY USING clause
• The index is part of the partitioning key.
RANDOM clause on an index is available only after you’ve upgraded your DB2 9 subsystem to New Function Mode (NFM).
Index on expression: “That’s a stage 2 predicate” is a cry that can strike fear into the heart and mind of an application programmer. Sometimes a stage 2 predicate isn’t used by choice but rather necessity. What can you do if a column contains both uppercase and lowercase characters and there’s no way of predicting which you might find? What if you really wanted only the stuff from the year 2005, yet all the data is stored with full dates? What if you have to perform a calculation on two columns, for example, as part of the predicate? Figure 1 shows examples of how an expression could be used to solve the query.
However, as we all know from the predicate types and processing table (Table 34) in the DB2 9 Performance Monitoring and Tuning Guide (SC18- 9851), or the same table (Table 124) in the DB2 V8 Administration Guide (SC18-7413), expressions used in predicates are often nothing more than a performance problem waiting to happen. The following is a list of expressions that might, but not always, result in a stage 2 predicate:
• “COL BETWEEN expression1 AND expression2”
• “T1.COL = T2 col expr”
• “T1.COL <> T2 col expr”
• “expression = value”
• “expression <> value”
• “expression op value”
• “expression op (subq)”.
DB2 9 has changed the rules. A note appeared after each of the aforementioned predicates, stating that it may now be indexable by an index on expression. Just like sargable and non-sargable vanished from the DB2 documentation years ago, so, too, does one more stage 1, and in this case, indexable, restriction. This is great news for programmers. The two previous examples could be considered indexable upon creation of the indexes in Figure 2.
A word of warning applies if you decide to use the functions UPPER or LOWER. When using an index on either function, it won’t work without locale. For more information on how to use locale, see the z/OS XL C/C++ Programming Guide (SC09-4765). There’s an open APAR (PK68295) that will remove the locale requirement when specifying the built-in scalar functions UPPER or LOWER.
Index on expression could be quite beneficial when working with a data warehouse. There are many situations where warehouse queries need to use expressions as the only way to solve a business problem when running a query.
You must follow a few rules to take advantage of this DB2 9 NFM feature. First, the expression must contain a referenced column from the table on which the index is being created. However, referenced columns and the result columns can’t be Large Object (LOB), XML, or Decimal Floating-Point (DECFLOAT) data types. There’s also a list of rules to follow if you want to use this feature. The expression can’t contain:
• A subquery
• An aggregate function
• A not deterministic function
• A function that has an external action
• A user-defined function
• A sequence reference
• A host variable
• A parameter marker
• A special register
• A CASE expression
• An OLAP specification.
Always check the DB2 SQL Reference (SC18-9854) for the correct syntax and additional coding restrictions. If this is a feature you think you can benefit from, keep an eye open for maintenance. Recently, APAR PK74378 extended index on expression to casting data types TIMESTAMP, DATE, and TIME.
Nothing is ever free and index on expression is no exception. If you take advantage of this feature, you could incur extra cost for SQL INSERTs and UPDATEs that affect the index. You also could see the cost increase for running the LOAD, REBUILD INDEX, CHECK INDEX, and REORG TABLESPACE. However, you shouldn’t see additional cost for running a REORG INDEX because expressions are evaluated at INSERT or REBUILD INDEX time.
Index on expression is available only after you’ve upgraded your DB2 9 subsystem to NFM.
Index usage tracking: For years, customers asked for an easy way to determine if an index is no longer being used. If they knew there was no access against a particular index, it would make it much easier to delete the index, therefore cleaning up the DB2 catalog, and maybe avoiding some unnecessary updates and deletes to that index. There’s now a way to accomplish this. This can be especially important for a data warehouse that frequently refreshes its tables. By removing unused indexes, DB2 avoids updating those indexes during INSERT or LOAD processing.
DB2 9 for z/OS added the column LASTUSED to the Real-Time Statistics (RTS) table SYSIBM. SYSINDEXSPACESTATS. LASTUSED contains the date when the index, described by this row in the catalog table, was last used for a SELECT, FETCH, searched UPDATE, searched DELETE, or used to enforce a referential integrity constraint. This column isn’t updated when the index is changed because of an INSERT or UPDATE. LASTUSED defaults to a value of 1/1/0001. If you find an index with the default date, that index has never been used since moving to DB2 9.
You should wait a while after upgrading to DB2 9 before taking advantage of this feature to make sure the column has the opportunity to be updated.
Also, there’s an APAR, a somewhat older fix (hiper), APAR PK44579 (maintain SYSIBM.SYSINDEXSPACESTATS. LASTUSED for all cases) that you need to apply if you plan to take advantage of this column.
The tables for RTS are now part of the DB2 catalog.
Index usage tracking is available only after you’ve upgraded your DB2 9 subsystem to NFM.
Index compression: One of the more popular solutions to a query performance dilemma is an index. Adding an index can fix many SQL issues. However, adding an index has a cost; the additional index will consume additional disk space. You can try to decide which is the lesser of two evils—the disk space consumption or the poorly running SQL statement. Moreover, DB2 9 now supports some powerful index enhancements. You may suddenly find yourself using even more disk storage for indexes in DB2 9 than you ever thought possible. Of course, DB2 9 does come with a near-perfect solution: index compression.
Even though some query types used in a data warehouse environment can significantly benefit from the addition of an index, it’s possible for a warehouse to reach a point where the indexes have storage requirements that are equal to, if not sometimes greater than, your table data storage requirements.
Index compression in DB2 9 is a possible solution to an index disk storage issue. Early testing and information obtained from first implementers indicate that significant disk savings can be achieved by using index compression. With index compression being measured as high as 75 percent, you can expect to achieve, on average, about a 50 percent index compression rate. Of course, compression carries some cost. In certain test cases, there was a slight decrease in class 1 CPU time accessing a compressed index, but you can expect your total CPU time, class 1 and class 2 SRB CPU time combined, to increase. However, it’s important to remember that a CPU cost is realized only during an index page I/O. Once the index has been decompressed into a buffer pool or compressed during the write to disk, compression adds zero cost.
Unlike data compression, with index compression there’s no performance advantage directly from the use of compression. Index compression is strictly for reducing your index disk storage. If any performance gain occurs, it will show up when the optimizer can take advantage of one of the additional indexes that now exist—an index that may never have been created because of disk space constraints prior to the introduction of index compression.
Index compression is available only after you’ve upgraded your DB2 9 subsystem to NFM.
APPEND clause: A data warehouse can have a tremendous amount of SQL INSERT and online LOAD utility processing. Normally, DB2 makes every attempt to place a row in the correct place in a table. To accomplish this, it must perform a search. The more it has to search, the more expensive the INSERT or LOAD process becomes. Some warehouses may refresh one or more tables using INSERT or LOAD on a nightly basis—all expensive, time-consuming operations.
DB2 9 provides a solution. DB2 has added a new APPEND clause to the CREATE and ALTER TABLE SQL statements. If APPEND is set to YES, DB2 places a new row at the end of the table, ignoring any forms of clustering defined to that table. The end of a table can be defined several ways, depending on the type of object being accessed. For example:
• For a segmented table space, it’s placed at the end of the table.
• For partitioned table spaces and partition by range universal table spaces, the row is placed at the end of the appropriate partition.
• For partition by range universal table spaces, the row is placed in any partition with space at the end.
The APPEND clause can’t be used with tables created in LOB, XML, and workfile table spaces; it also is ignored by the REORG utility so the REORG can be used to maintain cluster order if that order is necessary.
The APPEND clause is available only after you’ve upgraded your DB2 9 subsystem to NFM.
Data-partitioned secondary index: A data warehouse usually means large amounts of data. That translates to partitioned table spaces, and partitioned table spaces often call for Data- Partitioned Secondary Indexes (DPSIs). DB2 9 has made a few improvements to DPSIs that should make them more usable and popular. For example, a DPSI can now get index-only access and a DPSI index can be defined as unique when columns in the DPSI key are a superset of partitioning columns. A few more subtle DPSI improvements are enhanced page range screening to help avoid hitting every partition, more parallelism, and more index look-aside.
Remember the lessons learned in DB2 Version 8 if you use DPSIs. Always include a local predicate on the leading column(s) of the DPSI index to avoid scanning the entire index structure. Also, remember that when using a DPSI, you will see an increase in the number of VSAM data sets that DB2 will need to use. Make sure you appropriately adjust your DSMAX value.
All these DPSI index improvements are available in Conversion Mode (CM).
Prefetch improvements: In a data warehousing environment, some queries will use prefetch; it’s just normal. As mentioned earlier, there are situations where a stage 2 predicate is your only option; stage 2 means prefetch. The good news for all of those stage 2 warehouse queries is that prefetch has improved in DB2 9. Prefetch quantity has increased. When a pool size becomes larger than 40,000 pages, a larger prefetch quantity is now used. If the buffer pool is greater than 40,000 and less than or equal to 80,000 pages, the prefetch quantity is 64 pages for SQL processing and 64 pages for utilities. The utility page quantity jumps to 128 pages when the VPSIZE * VPSEQT is greater than 80,000 pages. Of course, this description of prefetch quantities is based on 4K page size. These numbers must be adjusted for other page sizes.
When DB2 chooses to use dynamic prefetch is also a significant change made in DB2 9. Dynamic prefetch is used for index scans and table access via table scans. When dynamic prefetch is picked at bind time, it can switch between sequential prefetch and index access based on the data pages accessed using something called sequential detection. In addition, dynamic prefetch doesn’t use a triggering page and can scan forward or backward, something that is handy for a backward index scan. Finally, dynamic prefetch engines also can run in parallel. A few utilities available via APAR PK44026 also are recent converts to dynamic prefetch. For example, dynamic prefetch is enabled for:
• UNLOAD phase of REORG INDEX
• The UNLOAD and BUILD phases of REORG TABLESPACE PART
• The BUILD phase of LOAD REPLACE PART
• The RUNSTATS phase of RUNSTATS INDEX.
The changes to dynamic prefetch in DB2 9 apply to single table access, multitable join, outer join, subquery, and union. Sequential prefetch, as of DB2 9, is used only for table space scans.
Dynamic prefetch could have all kinds of performance advantages over sequential prefetch because it isn’t dependent on the optimizer making the correct decision the first time and allows the access path to be changed on the fly as data patterns change.
The prefetch changes are available in DB2 9 CM.
Clone tables: Clone table support in DB2 9 could be an excellent way to minimize an outage caused by performing a total refresh on a data warehouse table. Let’s say warehouse table X is completely replaced every evening for use the following day. Let’s also assume that access to other tables in the warehouse depends on being able to get to the data in table X. In the time it takes each evening to replace (refresh) table X, none of the tables dependent on it are available. Clone table support could be the solution.
With clone table support, a matching table, or a clone, is generated. The clone table structure is identical in every way to the original base table; it’s even created in the same table space. This table clone can be populated with new data while the original base table is still being accessed. Both SQL INSERTs and the LOAD utility can be used to get data into the table clone. Once the clone has been populated with the appropriate data, the SQL EXCHANGE statement can be executed to switch which table DB2 accesses. Prior to issuing the EXCHANGE, DB2 uses the original base table. After issuing the EXCHANGE, DB2 is referencing what was the clone table. The entire switch is extremely fast and completely transparent to applications and users. DB2 tracks which is the clone and which is the base table by manipulating the special naming convention in the underlying VSAM data sets.
As DB2 is using what was the clone as the new base table, what used to be the base table before the EXCHANGE is now the clone and is eligible to be loaded with data for the next table space flip. This is similar to the concept of an online LOAD REPLACE.
Although clone tables aren’t really a performance option, they can improve your availability. You can’t work on the performance of anything if you can’t get to the object.
Clone tables are available only after you’ve upgraded your DB2 9 subsystem to NFM. In addition, you can clone only base tables created in universal table spaces.
Reordered Row Format (RRF) is one of those significant changes that most wouldn’t even know about if not for a couple of user group presentations. This change affects how a table’s variable length columns are ordered when written to disk. A data warehouse, by its nature, can have lots of variable length columns.
A challenge that has existed for DB2 professionals since DB2’s beginnings is placement of the fixed length (CHAR) columns vs. the placement of variable length (VARCHAR) columns in a newly created table definition. There have been conflicting views on what’s correct, efficient, and the best coding practice. Most have agreed that the VARCHAR columns should go at the end of the row or at least infrequently updated variable length columns should go last. However, no matter what order you use to define a table’s columns, once DB2 hits a VARCHAR column when retrieving a row, DB2 must calculate the starting position of subsequent columns in that row. Until DB2 reads the length of the VARCHAR column (the two-byte prefix on every VARCAHR column), it has no idea where to find the beginning of the next column that needs to be retrieved.
DB2 9 provides a potential solution to this quandary. Once you get to DB2 9 for z/OS NFM, all VARCHAR columns on newly created table spaces are placed at the end of the row. DB2 9 can make this type of decision because the row format in DB2 9 has changed. DB2 9 introduces Reordered Row Format (RRF), a straightforward concept in which a DB2 9 NFM row will have all the fixed length columns first, followed by pointers to the beginning of each VARCHAR row. The pointers are followed by the actual variable length data. Rather than scan what could be some lengthy columns just to find the beginning of the column you’re looking for, DB2 only needs to scan the list of pointers to find the location for the beginning of the column you want.
You’ll get RRF for any table space created in DB2 9 NFM; additionally, any table spaces, or table space partitions, will be converted to RRF when a REORG or LOAD REPLACE is run against that table space or table space partition. Be cautious when working with partitioned table spaces. If you only REORG selected partitions of a table space, then you’ll end up with some partitions, the ones that have been reorganized, in the new RRF. The remaining partitions that haven’t yet been reorganized will stay in basic row format. Basic row format is the phrase used to describes the row format prior to DB2 9 for z/OS NFM.