Export (0) Print
Expand All

Parsing Non-Standard Text File Formats with the Script Component

New: 5 December 2005

When your source data is arranged in a non-standard format, you may find it more convenient to consolidate all your parsing logic in a single script than to chain together multiple Integration Services transformations to achieve the same result.

Example 1: Parsing Row-Delimited Records

Example 2: Splitting Parent and Child Records

ms345160.note(en-US,SQL.90).gifNote:
If you want to create a component that you can more easily reuse across multiple Data Flow tasks and multiple packages, consider using the code in this Script component sample as the starting point for a custom data flow component. For more information, see Developing a Custom Data Flow Component.

This example shows how to take a text file in which each column of data appears on a separate line and parse it into a destination table by using the Script component.

For more information about how to configure the Script component for use as a transformation in the data flow, see Creating a Synchronous Transformation with the Script Component and Creating an Asynchronous Transformation with the Script Component.

  1. Create and save a text file named rowdelimiteddata.txt that contains the following source data:

    FirstName: Nancy
    LastName: Davolio
    Title: Sales Representative
    City: Seattle
    StateProvince: WA
    
    FirstName: Andrew
    LastName: Fuller
    Title: Vice President, Sales
    City: Tacoma
    StateProvince: WA
    
    FirstName: Steven
    LastName: Buchanan
    Title: Sales Manager
    City: London
    StateProvince:
    
    
  2. Open Management Studio and connect to an instance of SQL Server 2005.

  3. Select a destination database, and open a new query window. In the query window, execute the following script to create the destination table:

    create table RowDelimitedData
    (
    FirstName varchar(32),
    LastName varchar(32),
    Title varchar(32),
    City varchar(32),
    StateProvince varchar(32)
    )
    
    
  4. Open BI Development Studio and create a new Integration Services package named ParseRowDelim.dtsx.

  5. Add a Flat File connection manager to the package, name it RowDelimitedData, and configure it to connect to the rowdelimiteddata.txt file that you created in a previous step.

  6. Add an OLE DB connection manager to the package and configure it to connect to the instance of SQL Server and the database in which you created the destination table.

  7. Add a Data Flow task to the package and click the Data Flow tab of SSIS Designer.

  8. Add a Flat File Source to the data flow and configure it to use the RowDelimitedData connection manager. On the Columns page of the Flat File Source Editor, select the single available external column.

  9. Add a Script Component to the data flow and configure it as a transformation. Connect the output of the Flat File Source to the Script Component.

  10. Double-click the Script component to display the Script Transformation Editor.

  11. On the Input Columns page of the Script Transformation Editor, select the single available input column.

  12. On the Inputs and Outputs page of the Script Transformation Editor, select Output 0 and set its SynchronousInputID to 0. Create 5 output columns, all of type string [DT_STR] with a length of 32:

    • FirstName
    • LastName
    • Title
    • City
    • StateProvince
  13. On the Script page of the Script Transformation Editor, click Design Script and enter the code shown in the ScriptMain class of the example. Close the script development environment and the Script Transformation Editor.

  14. Add a SQL Server Destination to the data flow. Configure it to use the OLE DB connection manager and the RowDelimitedData table. Connect the output of the Script Component to this destination.

  15. Run the package. After the package has finished, examine the records in the SQL Server destination table.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Dim columnName As String
        Dim columnValue As String

        ' Check for an empty row.
        If Row.Column0.Trim.Length > 0 Then
            columnName = Row.Column0.Substring(0, Row.Column0.IndexOf(":"))
            ' Check for an empty value after the colon.
            If Row.Column0.Substring(Row.Column0.IndexOf(":")).TrimEnd.Length > 1 Then
                ' Extract the column value from after the colon and space.
                columnValue = Row.Column0.Substring(Row.Column0.IndexOf(":") + 2)
                Select Case columnName
                    Case "FirstName"
                        ' The FirstName value indicates a new record.
                        Me.Output0Buffer.AddRow()
                        Me.Output0Buffer.FirstName = columnValue
                    Case "LastName"
                        Me.Output0Buffer.LastName = columnValue
                    Case "Title"
                        Me.Output0Buffer.Title = columnValue
                    Case "City"
                        Me.Output0Buffer.City = columnValue
                    Case "StateProvince"
                        Me.Output0Buffer.StateProvince = columnValue
                End Select
            End If
        End If

    End Sub

This example shows how to take a text file, in which a separator row precedes a parent record row that is followed by an indefinite number of child record rows, and parse it into properly normalized parent and child destination tables by using the Script component. This simple example could easily be adapted for source files that use more than one row or column for each parent and child record, as long as there is some way to identify the beginning and end of each record.

ms345160.Caution(en-US,SQL.90).gifCaution:
This sample is intended for demonstration purposes only. If you run the sample more than once, it inserts duplicate key values into the destination table.

For more information about how to configure the Script component for use as a transformation in the data flow, see Creating a Synchronous Transformation with the Script Component and Creating an Asynchronous Transformation with the Script Component.

  1. Create and save a text file named parentchilddata.txt that contains the following source data:

    **********
    PARENT 1 DATA
    child 1 data
    child 2 data
    child 3 data
    child 4 data
    **********
    PARENT 2 DATA
    child 5 data
    child 6 data
    child 7 data
    child 8 data
    **********
    
    
  2. Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

  3. Select a destination database, and open a new query window. In the query window, execute the following script to create the destination tables:

    CREATE TABLE [dbo].[Parents](
    [ParentID] [int] NOT NULL,
    [ParentRecord] [varchar](32) NOT NULL,
     CONSTRAINT [PK_Parents] PRIMARY KEY CLUSTERED 
    ([ParentID] ASC)
    )
    GO
    CREATE TABLE [dbo].[Children](
    [ChildID] [int] NOT NULL,
    [ParentID] [int] NOT NULL,
    [ChildRecord] [varchar](32) NOT NULL,
     CONSTRAINT [PK_Children] PRIMARY KEY CLUSTERED 
    ([ChildID] ASC)
    )
    GO
    ALTER TABLE [dbo].[Children] ADD CONSTRAINT [FK_Children_Parents] FOREIGN KEY([ParentID])
    REFERENCES [dbo].[Parents] ([ParentID])
    
    
  4. Open Business Intelligence Development Studio and create a new Integration Services package named SplitParentChild.dtsx.

  5. Add a Flat File connection manager to the package, name it ParentChildData, and configure it to connect to the parentchilddata.txt file that you created in a previous step.

  6. Add an OLE DB connection manager to the package and configure it to connect to the instance of SQL Server and the database in which you created the destination tables.

  7. Add a Data Flow task to the package and click the Data Flow tab of SSIS Designer.

  8. Add a Flat File Source to the data flow and configure it to use the ParentChildData connection manager. On the Columns page of the Flat File Source Editor, select the single available external column.

  9. Add a Script Component to the data flow and configure it as a transformation. Connect the output of the Flat File Source to the Script Component.

  10. Double-click the Script component to display the Script Transformation Editor.

  11. On the Input Columns page of the Script Transformation Editor, select the single available input column.

  12. On the Inputs and Outputs page of the Script Transformation Editor, select Output 0, rename it to ParentRecords, and set its SynchronousInputID to 0. Create 2 output columns:

    • ParentID (the primary key), of type four-byte signed integer [DT_I4]
    • ParentRecord, of type string [DT_STR] with a length of 32.
  13. Create a second output and name it ChildRecords. The SynchronousInputID of the new output is already set to 0. Create 3 output columns:

    • ChildID (the primary key), of type four-byte signed integer [DT_I4]
    • ParentID (the foreign key), also of type four-byte signed integer [DT_I4]
    • ChildRecord, of type string [DT_STR] with a length of 50
  14. On the Script page of the Script Transformation Editor, click Design Script. In the ScriptMain class, enter the code shown in the example. Close the script development environment and the Script Transformation Editor.

  15. Add a SQL Server Destination to the data flow. Connect the ParentRecords output of the Script Component to this destination.Configure it to use the OLE DB connection manager and the Parents table.

  16. Add another SQL Server Destination to the data flow. Connect the ChildRecords output of the Script Component to this destination. Configure it to use the OLE DB connection manager and the Children table.

  17. Run the package. After the package has finished, examine the parent and child records in the two SQL Server destination tables.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Static nextRowIsParent As Boolean = False
        Static parentCounter As Integer = 0
        Static childCounter As Integer = 0

        ' If current row starts with separator characters,
        '  then following row contains new parent record.
        If Row.Column0.StartsWith("***") Then
            nextRowIsParent = True
        Else
            If nextRowIsParent Then
                ' Current row contains parent record.
                parentCounter += 1
                Me.ParentRecordsBuffer.AddRow()
                Me.ParentRecordsBuffer.ParentID = parentCounter
                Me.ParentRecordsBuffer.ParentRecord = Row.Column0
                nextRowIsParent = False
            Else
                ' Current row contains child record.
                childCounter += 1
                Me.ChildRecordsBuffer.AddRow()
                Me.ChildRecordsBuffer.ChildID = childCounter
                Me.ChildRecordsBuffer.ParentID = parentCounter
                Me.ChildRecordsBuffer.ChildRecord = Row.Column0
            End If
        End If

    End Sub

Release History

17 July 2006

Changed content:
  • Noted that duplicate keys are created if the sample script is run more than once.

14 April 2006

Changed content:
  • Improved inline comments in the code samples.

Community Additions

ADD
Show:
© 2015 Microsoft