DBA > Articles

Importing Poorly Formatted Text Files using SSIS

By: Dinesh Asanka
To read more DBA articles, visit http://dba.fyicenter.com/article/

Handling text files from SQL Server Integration Service (SSIS) is not new and I am not going to discuss about importing well formatted text files. There are many occasions when database developers must import text files which are not properly formatted. I will discuss three methods of doing this. For these situations we are going to use Script Component data flow task.

Knowledge-wise you need to have basic understanding of SSIS packages. If you have created SSIS package to import traditional text files that will be more than enough. We are going to do some .NET scripting, you need to have some understanding of .NET coding.

Resource-wise you need to have SQL Server Business Intelligence Development Studio installed in your computer.

Case 1: Importing Text files with different row delimiters in different rows

I am sure that you have imported text files. The text files normally have several columns which are separated by either comma (,), semi colon (;) or colon (:) etc. However, what would happen when there are different column delimiters. For example what if you have data in a text file like the following.

P0001,Product 1
P0002,Product 2
P0003,Product 3
P0004;Product 4
P0005,Product 5
P0006,Product 6
P0007;Product 7
P0008,Product 8
P0009;Product 9
P00010,Product 10

You can see that above two columns are separated either by comma or semi-colon. Let us see how can we create a package to import this data. I will explain how to write the first package in detail.

After creating a package project and new package in the project, create a connection manager named Text File. In the General section of the connection manager, you need to give the path for the text file.

Next you need to select columns options. Select row delimiter as {CR}{LF}.

After selecting the row delimiter, you need to set the column delimiter to something other than {CR}{LF}. Otherwise it won't allow you to confirm this dialogue. However, this does not really matter as this option will be disabled later. After doing this you can see that entire record of the text file as shown in one column.

Next select advanced section. In that you have to give enough length to OutputColumnWidth. In this case it is 250.Then drag and drop a data flow task to control flow. Inside the added data flow, drag and drop Flat File Source. As you have only one text file connection manager by default - Text File will be attached to this Flat File Source. If not you have to assign correct text file connection from the connection option of the flat file source.

Next is adding most import component the script component. Just drag and drop a script task to the data flow area. Just after dragging the script component you will get the following screen.It is obvious that we are going to do a Transformation which is the last option. I will discuss about other two options in a separate article.

Next task is to configure the Script component. We have three options to configure - Input Columns, Input and Outputs and Script. If you select Input option you see the following screen. In this screen you can leave the Output Alias as it is Column 0 by default. But for the completeness I have changed this to Line.

Next option is , Inputs and Outputs. Here we have to define the input and output column properties.

As you can see on the input columns tree node, there is only one element which is Line. Line was defined in the Input Columns.

Full article...

Other Related Articles

... to read more DBA articles, visit http://dba.fyicenter.com/article/