Over the past few years, XML has emerged as the new standard for data transmission, and its use is becoming more prevalent as companies continue to adopt XML-based solutions. As more organizations begin to enforce XML standards for all data transmission, increasingly complex XML formats are emerging. These complex formats can include multiple namespaces, thousands of elements, and recursive definitions. As the XML documents produced from these formats grow in size and complexity, managing this content has become increasingly challenging, with limited information available on how to address this challenge.
In this article, you will learn how to use the XML DB feature in Oracle Database 11g to manage complex XML content as well as its advantages over commercial ETL products. You will see an example of a complex XML schema that demonstrates the following:
* Registration of a complex XML schema
* Insertion of XML files into the database
* Retrieval of XML data via relational query
* In-Place Evolution for XML schema modifications
Furthermore, you will get an overview of strategies for maximizing the performance and throughput of Oracle XML DB solutions and practical applications of complex XML formats.
Oracle XML DB Background
Oracle XML DB is a feature of Oracle Database that provides a powerful tool for managing XML content, including storage, manipulation, and retrieval. It offers different storage options to meet the unique requirements of different XML formats. These options include unstructured, binary, and structured storage:
* Unstructured (character large object, or CLOB). By treating the document as one large object and storing it in the database, this method allows for the best insertion times. However, this storage method also consumes the most space and has the worst performance for relational access to the data. This is an impractical solution for managing large, complex XML documents if relational access is required. Unstructured storage can be a practical solution if disk space is not an issue and the objective is to archive the documents in their original format.
* Binary storage. This option, new in Oracle Database 11g, stores data in a postparse binary format designed specifically for XML data. This option has several advantages over unstructured storage, in that it is XML-schema-aware, allowing better disk space efficiency and query performance. Although this option offers incredible performance compared to that of unstructured storage, it does not have the same query performance as structured storage. Binary storage is a good option whenever its performance for relational access is acceptable. Because this storage option is easy to use, it is worthwhile to evaluate it prior to opting for structured storage.
* Structured storage. Also known as schema-based storage, this option uses an object-relational model to store XML documents in the database. This storage option is the most efficient in terms of disk space and relational access. It also has the highest overhead during file insertion and requires additional preparation for schema registration. Structured storage is the best option when optimal relational access is a requirement. For handling very complex and large files with a requirement for efficient relational access, this storage option is usually the best choice.
The perception of the size and complexity of an XML document can differ greatly, depending on the organization. On one hand, for online transaction processing (OLTP) databases using XML for their electronic data interchange (EDI) or other transactional data exchange, a file with several thousand lines might be regarded as a very large file. On the other hand, a multiterabyte data warehouse might regularly process XML documents measured in gigabytes and not consider a file to be large unless it contains millions of lines. The same concept holds for the perceived complexity of an XML document.
For purposes of this article, a document is considered "complex" if it has the following properties:
* It is single-rooted, with multiple namespaces.
* It has flexible XML definitions, allowing for great variations while maintaining validity.
* It has recursive or circular/cyclical references.
* It has nonstatic XML schemas.
In this article, XML documents are considered "large" if they are single-rooted and are greater than 20MB. These properties introduce certain scalability and management considerations that must be addressed for a robust enterprise solution.
There are no golden rules for choosing the best storage option. Based on the file structure, performance objectives, available resources, and expected volume of data, the best option will vary. If you cannot decide which storage option is the best for your particular requirement, it is worthwhile to try the different formats and determine which is optimal for your specific needs. Generally speaking, if you are dealing with large documents and require relational access, unstructured storage is not acceptable from a performance or resource perspective. Binary XML may be the optimal solution if the query performance is acceptable for the business use or if the business demands the option with the least maintenance time. However, if relational access is the primary objective and users need fast access to any data contained in the XML document, structured storage is most likely the best option.