SQLXML Bulk Loader Basics
By: Adam Aspin
SQLXML isn't exactly new technology, but like the even more venerable BCP, it remains the quickest and most reliable way of heaving large quantities of data into SQL Server databases. SQLXML is very versatile, and once set up is wonderfully reliable ETL system, but isn't trivial to learn. Adam Aspin comes to the rescue with a simple guide.
At some point in their careers, most SQL Server developers will have faced the challenge of getting the data from an XML source file into SQL Server. Despite the ubiquity of the CSV format, and the rising star of JSON, a vast quantity of data is still exchanged as XML files.
For bulk loading XML data, the standard tools in the Extract-Transform-Load (ETL) toolkit tend to be the venerable OPENXML, or slightly more modern OPENROWSET/XQUERY, for the T-SQL purist, or for SSIS devotees the XML task.
Any of these solutions work well enough for small-to-medium sized files and even quite complex XML structures, but they often don't scale to larger loads. Anything over 2GB will crash OPENXML or SSIS in my experience. In some cases, even if they can load a large file, the time taken could prove a threat to your SLAs.
In these circumstances, SQLXML Bulk Loader is your friend. It can import extremely large XML source data files into SQL Server at amazing speed, while preserving referential integrity, if required. Unfortunately, it's not installed with SQL Server, by default, and it's poorly explained in the SQL Server documentation. As a result, most developers either ignore it completely, or implement it poorly and it is unfairly perceived as "hard to use".
This article attempts to put the record straight. It will describe briefly how to find and install SQLXML Bulk Loader and then how to use it to load three "classic" XML file structures:
A simple XML format
Multiple tables stored as XML
A nested, moderately complex XML structure that maps to a relational table structure
To work through the examples, you'll need to download the sample XML files from the "article rating" box at the top of this article. I've provided the kind of "semi-structured" source files that SQLXML Bulk Loader can digest with ease. You'll also need to create a database in which to practice. In the examples, I've assumed you've created a directory called C:\SQLXMLArticle for the files, and created a database called CarSales_Staging. All techniques described will work with all versions of the database from 2005 through 2014.
In a subsequent article, I will explain some of the more advanced features of SQLXML Bulk Loader, from handling attribute-centric XML to optimizing the process for bulk loads via validating the XML and integrating SQLXML Bulk Loader with SSIS, and other tricks and techniques to help you get the most out of this tool.
SQLXML Bulk Loader Use Cases
SQLXML Bulk Loader is best used in the following situations:
When the source XML file is large. What is large? Well, if you were planning on using OPENXML, then that means any file over 2 gigabytes. The same upper limit is true of XML loaded into a variable and shredded using XQuery. With SSIS, it depends on the memory available, but I have experienced practical difficulties way below the 2 GB limit. In contrast, I have loaded XML files of tens of gigabytes using SQLXML Bulk Load.
When the XML source structure is relatively simple. In reality, this means a file consisting of tables and fields where the nested XML structure is not too complex. The source data cannot be too intricate, in XML terms, or it will not load. It is not for nothing that this is called "semi-structured" XML data.
When you wish to load multiple tables from the same source file.
When speed loading the data is important. In my tests, SQLXML Bulk Load loaded data at about 90 percent of native BCP speeds, for separate tables without relational links.
Be aware from the start that SQLXML Bulk Loader can only handle a certain level of complexity in the source file. Should you be faced with a file that SQLXML Bulk Loader refuses to load, your only solution will be one of the following:
Negotiate a simpler structure from the data supplier Use one of the other available technologies mentioned above Use XSLT to simplify or 'flatten' the file structure, to produce a file amenable to processing by SQLXML. XML transformation is a vast and completely separate topic and one that I will not be explaining it in this article
Preparing SQLXML Bulk Loader
SQLXML 4.0 was installed by default as far as SQL Server 2005 but from SQL Server 2008 onwards, it has to be installed as part of the Feature Pack, for SQL Server 2008, or downloaded separately. You can obtain version 4.0 SP1 at the following URL:
SQLXML Bulk Loader exists in both 32- and 64-bit versions (there is even a 64-bit IA version, but I expect that few people will be needing this). The installation is simple and I will not waste your time explaining it here. Once complete, you will be ready to start using SQLXML Bulk Loader to load XML files into SQL Server.
Importing an XML File into SQL Server
SQLXML Bulk Loader is a COM (Component Object Model) object that allows you to load semi-structured XML data into SQL Server tables. To make it work, you will need the following:
An XML source file.
A destination table in a SQL Server database.
An XSD schema file that maps the XML to the destination table – this is the interesting part of the process
A Visual Basic Script file to execute the XML load process.
Listing 1 shows the XML source, which in this simple example will be used to "bulk import" 2 rows in a SQL Server table. Throughout the article, the code listing caption provides the name of the relevant file in the code download bundle.
Other Related Articles
... to read more DBA articles, visit http://dba.fyicenter.com/article/