14 March 2013

Creation of Basic SSIS package

Step-1 Creating a New Integration Services Project
-------  ----------------------------------------------------

Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model.

In this tutorial, you will learn how to use SSIS Designer to create a simple Microsoft SQL Server Integration Services package. The package that you create takes data from a flat file, reformats the data, and then inserts the reformatted data into a fact table. In following lessons, the package will be expanded to demonstrate looping, package configurations, logging and error flow.

When you install the sample data that the tutorial uses, you also install the completed versions of the packages that you will create in each lesson of the tutorial. By using the completed packages, you can skip ahead and begin the tutorial at a later lesson if you like. If this is your first time working with packages or the new development environment, we recommend that you begin with Lesson1.
 
Understanding the Requirements:

Before creating a package, you need a good understanding of the formatting used in both the source data and the destination. Once you understand both of these data formats, you will be ready to define the transformations necessary to map the source data to the destination.

Looking at the Source

For this tutorial, the source data is a set of historical currency data contained in the flat file, SampleCurrencyData.txt. The source data has the following four columns: the average rate of the currency, a currency key, a date key, and the end-of-day rate.

Here is an example of the source data contained in the SampleCurrencyData.txt file:

1.00010001   ARS   9/3/2001 0:00    0.99960016

1.00010001   ARS   9/4/2001 0:00    1.001001001

1.00020004   ARS   9/5/2001 0:00    0.99990001

1.00020004   ARS   9/6/2001 0:00    1.00040016

1.00050025   ARS   9/7/2001 0:00    0.99990001

1.00050025   ARS   9/8/2001 0:00    1.001001001

1.00050025   ARS   9/9/2001 0:00    1

1.00010001   ARS   9/10/2001 0:00   1.00040016

1.00020004   ARS   9/11/2001 0:00   0.99990001

1.00020004   ARS   9/12/2001 0:00   1.001101211

When working with flat file source data, it is important to understand how the Flat File connection manager interprets the flat file data. If the flat file source is Unicode, the Flat File connection manager defines all columns as [DT_WSTR] with a default column width of 50. If the flat file source is ANSI-encoded, the columns are defined as [DT_STR] with a column width of 50. You will probably have to change these defaults to make the string column types more appropriate for your data. To do this, you will need to look at the data type of the destination where the data will be written to and then choose the correct type within the Flat File connection manager.

Looking at the Destination

The ultimate destination for the source data is the FactCurrencyRate fact table in AdventureWorksDW. The FactCurrencyRate fact table has four columns, and has relationships to two dimension tables, as shown in the following table.

Column Name
Data Type
Lookup Table
Lookup Column
AverageRate
float
None
None
CurrencyKey
int (FK)
DimCurrency
CurrencyKey (PK)
TimeKey
Int (FK)
DimTime
TimeKey (PK)
EndOfDayRate
float
None
None

Mapping Source Data to be Compatible with the Destination

Analysis of the source and destination data formats indicates that lookups will be necessary for the CurrencyKey and TimeKey values. The transformations that will perform these lookups will obtain the CurrencyKey and TimeKey values by using the alternate keys from DimCurrency and DimTime dimension tables.

Flat File Column
Table Name
Column Name
Data Type
0
FactCurrencyRate
AverageRate
Float
1
DimCurrency
CurrencyAlternateKey
nchar (3)
2
DimTime
FullDateAlternateKey
Datetime
3
FactCurrencyRate
EndOfDayRate
Float

To create a new Integration Services project

1.       On the Start menu, point to All Programs, point to Microsoft SQL Server, and click SQL Server Business Intelligence Development Studio.

2.       On the File menu, point to New, and click Project to create a new Integration Services project.

3.       In the New Project dialog box, select Integration Services Project in the Templates pane.

4.       In the Name box, change the default name to SSIS Tutorial. Optionally, clear the Create directory for solution check box.

5.       Accept the default location, or click Browse to browse to locate the folder you want to use.

6.       In the Project Location dialog box, click the folder and click Open.

7.       Click OK.

By default, an empty package, titled Package.dtsx, will be created and added to your project.

8.       In the Solution Explorer toolbar, right-click Package.dtsx, click Rename, and rename the default package to Lesson 1.dtsx.

9.       When prompted to rename the package object, click Yes.

To set locale sensitive properties


1.       On the View menu, click Properties Window.

2.       In the Properties window, set the LocaleID property to English (United States).


Step-2 Adding and Configuring a Flat file Connection manager
----------------------------------------------------------------------------
 you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.

You must create a new Flat File connection manager for each file format that you work with. Because this tutorial extracts data from multiple flat files that have exactly the same data format, you will need to add and configure only one Flat File connection manager for your package.

For this tutorial, you will configure the following properties in your Flat File connection manager:

  • Column names   Because the flat file does not have column names, the Flat File connection manager creates default column names. These default names are not useful for identifying what each column represents. To make these default names more useful, you need to change the default names to names that match the fact table into which the flat file data is to be loaded.
  • Data mappings   The data type mappings that you specify for the Flat File connection manager will be used by all flat file data source components that reference the connection manager. You can either manually map the data types by using the Flat File connection manager, or you can use the Suggest Column Types dialog box. In this tutorial, you will view the mappings suggested in the Suggest Column Types dialog box and then manually make the necessary mappings in the Flat File Connection Manager Editor dialog box.

The Flat File connection manager provides locale information about the data file. If your computer is not configured to use the regional option English (United States), you must set additional properties in the Flat File Connection Manager Editor dialog box.

To add a Flat File connection manager

1.       Right-click anywhere in the Connection Managers area, and then click New Flat File Connection.

2.       In the Flat File Connection Manager Editor dialog box, for Connection manager name, type Sample Flat File Source Data.

3.       Click Browse.

4.       In the Open dialog box, locate the sample data folder and open the SampleCurrencyData.txt file. By default, the tutorial sample data is installed to the c:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data folder.

To set locale sensitive properties

1.       In the Flat File Connection Manager Editor dialog box, click General.

2.       Set Locale to English (United States) and Code page to 1252.

To rename columns in the Flat File connection manager

1.       In the Flat File Connection Manager Editor dialog box, click Advanced.

2.       In the property pane, makes the following changes:

o    Change the Column 0 name property to AverageRate.

o    Change the Column 1 name property to CurrencyID.

o    Change the Column 2 name property to CurrencyDate.

o    Change the Column 3 name property to EndOfDayRate.

Note
By default, all four of the columns are initially set to a string data type [DT_STR] with an OutputColumnWidth of 50.

To remap column data types

1.       In the Flat File Connection Manager Editor dialog box, click Suggest Types.

Integration Services automatically suggests the most appropriate data types based on the first 100 rows of data. You can also change these suggestion options to sample more or less data, to specify the default data type for integer or Boolean data, or to add spaces as padding to string columns.

For now, make no changes to the options in the Suggest Column Types dialog box, and click OK to have Integration Services suggest data types for columns. This returns you to the Advanced pane of the Flat File connection Manager Editor dialog box, where you can view the column data types suggested by Integration Services. (If you click Cancel, no suggestions are made to column metadata and the default string (DT_STR) data type is used.)

In this tutorial, Integration Services suggests the data types shown in the second column of the following table for the data from the SampleCurrencyData.txt file. However, the data types that are required for the columns in the destination, which will be defined in a later step, are shown in the last column of the following table.

Flat File Column
Suggested Type
Destination Column
Destination Type
AverageRate
Float [DT_R4]
FactCurrencyRate.AverageRate
Float
CurrencyID
String [DT_STR]
DimCurrency.CurrencyAlternateKey
nchar(3)
CurrencyDate
Date [DT_DATE]
DimTime.FullDateAlternateKey
datetime
EndOfDayRate
Float [DT_R4]
FactCurrencyRate.EndOfDayRate
Float

The data types suggested for the CurrencyID and CurrencyDate columns are incompatible with the data types of the fields in the destination table. Because the data type of DimCurrency.CurrencyAlternateKey is nchar (3), CurrencyID must be changed from string [DT_STR] to string [DT_WSTR]. Additionally, the fieldDimTime.FullDateAlternateKey is defined as a DateTime data type; therefore, CurrencyDate needs to be changed from date [DT_Date] to database timestamp [DT_DBTIMESTAMP].

2.       In the property pane, change the data type of column CurrencyID from string [DT_STR] to Unicode string [DT_WSTR].

3.       In the property pane, change the data type of column CurrencyDate from date [DT_DATE] to database timestamp [DT_DBTIMESTAMP].

4.       Click OK.


Step-3: Adding and configuring an OLE DB connection manager
----------------------------------------------------------------------------------

After you have added a Flat File connection manager to connect to the data source, the next task is to add an OLE DB connection manager to connect to the destination. An OLE DB connection manager enables a package to extract data from or load data into any OLE DB–compliant data source. Using the OLE DB Connection manager, you can specify the server, the authentication method, and the default database for the connection.

In this lesson, you will create an OLE DB connection manager that uses Windows Authentication to connect to the local instance of AdventureWorksDB. The OLE DB connection manager that you create will also be referenced by other components that you will create later in this tutorial, such as the Lookup transformation and the OLE DB destination.

To add and configure an OLE DB Connection Manager


1.       Right-click anywhere in the Connection Managers area, and then click New OLE DB Connection.

2.       In the Configure OLE DB Connection Manager dialog box, click New.

3.       For Server name, enter localhost.

When you specify localhost as the server name, the connection manager connects to the default instance of SQL Server on the local computer. To use a remote instance of SQL Server, replace localhost with the name of the server to which you want to connect.

4.       In the Log on to the server group, verify that Use Windows Authentication is selected.

5.       In the Connect to a database group, in the Select or enter a database name box, type or select AdventureWorksDW.

6.       Click Test Connection to verify that the connection settings you have specified are valid.

7.       Click OK.

8.       Click OK.

9.       In the Data Connections pane of the Configure OLE DB Connection Manager dialog box, verify that localhost.AdventureWorksDW is selected.

10.    Click OK.

 

Step-4: Adding a  Dataflow to the package

After you have created the connection managers for the source and destination data, the next task is to add a Data Flow task to your package. The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and provides the functionality for transforming, cleaning, and modifying data as it is moved. The Data Flow task is where most of the work of an extract, transform, and load (ETL) process occurs.

Note
SQL Server Integration Services separates data flow from control flow. This separation of data flow from control flow is one of the dramatic differences between Integration Services and Microsoft SQL Server 2000 Data Transformation Services.

To add a Data Flow task


1.       Click the Control Flow tab.

2.       In the Toolbox, expand Control Flow Items, and drag a Data Flow Task onto the design surfaceof the Control Flow tab.

3.       On the Control Flow design surface, right-click the newly added Data Flow Task, click Rename, and change the name to Extract Sample Currency Data.

It is good practice to provide unique names to all components that you add to a design surface. For ease of use and maintainability, the names should describe the function that each component performs. Following these naming guidelines allows your Integration Services packages to be self-documenting. Another way to document your packages is by using annotations. For more information about annotations, see Using Annotations in Packages.

4.       Right-click the Data Flow task, click Properties, and in the Properties window, verify that the LocaleID property is set to English (United States).

 

Step-5: Adding and configuring the flat file source

In this task, you will add and configure a Flat File source to your package. A Flat File source is a data flow component that uses metadata defined by a Flat File connection manager to specify the format and structure of the data to be extracted from the flat file by a transform process. The Flat File source can be configured to extract data from a single flat file by using the file format definition provided by the Flat File connection manager.

For this tutorial, you will configure the Flat File source to use the Sample Flat File Source Data connection manager that you previously created.

To add a Flat File Source component


1.       Open the Data Flow designer, either by double-clicking the Extract Sample Currency Data data flow task or by clicking the Data Flow tab.

2.       In the Toolbox, expand Data Flow Sources, and then drag a Flat File Source onto the design surface of the Data Flow tab.

3.       On the Data Flow design surface, right-click the newly added Flat File Source, click Rename, and change the name to Extract Sample Currency Data.

4.       Double-click the Flat File source to open the Flat File Source Editor dialog box.

5.       In the Flat file connection manager box, type or select Sample Flat File Source Data.

6.       Click Columns and verify that the names of the columns are correct.

7.       Click OK.

8.       Right-click the Flat File source and click Properties.

9.       In the Properties window, verify that the LocaleID property is set to English (United States).

 

Step-6: Adding and configuring the lookup Transformations

After you have configured the Flat File source to extract data from the source file, the next task is to define the Lookup transformations needed to obtain the values for theCurrencyKey and TimeKey. A Lookup transformation performs a lookup by joining data in the specified input column to a column in a reference dataset. The reference dataset can be an existing table or view, a new table, or the result of an SQL statement. In this tutorial, the Lookup transformation uses an OLE DB connection manager to connect to the database that contains the data that is the source of the reference dataset.

Note
You can also configure the Lookup transformation to connect to a cache that contains the reference dataset. For more information, see Lookup Transformation.

For this tutorial, you will add and configure the following two Lookup transformation components to the package:

·         One transformation to perform a lookup of values from the CurrencyKey column of the DimCurrency dimension table based on matching CurrencyID column values from the flat file.

·         One transformation to perform a lookup of values from the TimeKey column of the DimTime dimension table based on matching CurrencyDate column values from the flat file.

In both cases, the Lookup transformations will utilize the OLE DB connection manager that you previously created.

To add and configure the Lookup Currency Key transformation


1.       In the Toolbox, expand Data Flow Transformations, and then drag Lookup onto the design surfaceof the Data Flow tab. Place Lookup directly below the Extract Sample Currency Data source.

2.       Click the Extract Sample Currency Data flat file source and drag the green arrow onto the newly added Lookup transformation to connect the two components.

3.       On the Data Flow design surface, click Lookup in the Lookup transformation, and change the name to Lookup Currency Key.

4.       Double-click the Lookup CurrencyKey transformation.

5.       On the General page, make the following selections:

1.       Select Full cache.

2.       In the Connection type area, select OLE DB connection manager.

6.       On the Connection page, make the following selections:

1.       In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW is displayed.

2.       Select Use results of an SQL query, and then type or copy the following SQL statement:

3.  select * from (select * from [dbo].[DimCurrency]) as refTable
4.  where [refTable].[CurrencyAlternateKey] = 'ARS'
5.  OR
6.  [refTable].[CurrencyAlternateKey] = 'AUD'
7.  OR
8.  [refTable].[CurrencyAlternateKey] = 'BRL'
9.  OR
10. [refTable].[CurrencyAlternateKey] = 'CAD'
11. OR
12. [refTable].[CurrencyAlternateKey] = 'CNY'
13. OR
14. [refTable].[CurrencyAlternateKey] = 'DEM'
15. OR
16. [refTable].[CurrencyAlternateKey] = 'EUR'
17. OR
18. [refTable].[CurrencyAlternateKey] = 'FRF'
19. OR
20. [refTable].[CurrencyAlternateKey] = 'GBP'
21. OR
22. [refTable].[CurrencyAlternateKey] = 'JPY'
23. OR
24. [refTable].[CurrencyAlternateKey] = 'MXN'
25. OR
26. [refTable].[CurrencyAlternateKey] = 'SAR'
27. OR
28. [refTable].[CurrencyAlternateKey] = 'USD'
29. OR
30. [refTable].[CurrencyAlternateKey] = 'VEB'

7.       On the Columns page, make the following selections:

1.       In the Available Input Columns panel, drag CurrencyID to the Available Lookup Columns panel and drop it on CurrencyAlternateKey.

2.       In the Available Lookup Columns list, select the check box to the right of CurrencyKey.

8.       Click OK to return to the Data Flow design surface.

9.       Right-click the Lookup Currency Key transformation, click Properties.

10.    In the Properties window, verify that the LocaleID property is set to English (United States) and the DefaultCodePage property is set to 1252.

To add and configure the Lookup DateKey transformation


1.       In the Toolbox, drag Lookup onto the Data Flow design surface. Place Lookup directly below the Lookup CurrencyKey transformation.

2.       Click the Lookup Currency Key transformation and drag the green arrow onto the newly added Lookup transformation to connect the two components.

3.       In the Input Output Selection dialog box, click Lookup Match Output in the Output list box, and then click OK.

4.       On the Data Flow design surface, click Lookup in the newly added Lookup transformation, and change the name to Lookup DateKey.

5.       Double-click the Lookup DateKey transformation.

6.       On the General page, select Partial cache.

7.       On the Connection page, make the following selections:

1.       In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW is displayed.

2.       In the Use a table or view box, type or select [dbo].[DimTime].

8.       On the Columns page, make the following selections:

1.       In the Available Input Columns panel, drag CurrencyDate to the Available Lookup Columns panel and drop it on FullDateAlternateKey.

2.       In the Available Lookup Columns list, select the check box to the right of TimeKey.

9.       On the Advanced page, review the caching options.

10.    Click OK to return to the Data Flow design surface.

11.    Right-click the Lookup Date Key transformation and click Properties.

12.    In the Properties window, verify that the LocaleID property is set to English (United States) and the DefaultCodePage property is set to 1252.

 

Step-7: Adding and configuring the OLE-DB Destination

To add and configure the sample OLE DB destination


1.       In the Toolbox, expand Data Flow Destinations, and drag OLE DB Destination onto the design surface of the Data Flow tab. Place the OLE DB destination directly below the Lookup DateKey transformation.

2.       Click the Lookup Date Key transformation and drag the green arrow over to the newly added OLE DB Destination to connect the two components together.

3.       In the Input Output Selection dialog box, in the Output list box, click Lookup Match Output, and then click OK.

4.       On the Data Flow design surface, click OLE DB Destination in the newly added OLE DB Destination component, and change the name to Sample OLE DB Destination.

5.       Double-click Sample OLE DB Destination.

6.       In the OLE DB Destination Editor dialog box, ensure that localhost.AdventureWorksDW is selected in the OLE DB Connection manager box.

7.       In the Name of the table or the view box, type or select [dbo].[FactCurrencyRate].

8.       Click Mappings.

9.       Verify that the AverageRate, CurrencyKey, EndOfDayRate, and TimeKey input columns are mapped correctly to the destination columns. If same-named columns are mapped, the mapping is correct.

10.    Click OK.

11.    Right-click the Sample OLE DB Destination destination and click Properties.

12.    In the Properties window, verify that the LocaleID property is set to English (United States) and theDefaultCodePage property is set to 1252.

 

Step-8: Making lesson 1 package easier to understand


Now that you have completed the configuration of the Lesson 1 package, it is a good idea to tidy up the package layout. If the shapes in the control and data flow layouts are random sizes, or if the shapes are not aligned or grouped, the functionality of package can be more difficult to understand.

Business Intelligence Development Studio provides tools that make it easy and quick to format the package layout. The formatting features include the ability to make shapes the same size, align shapes, and manipulate the horizontal and vertical spacing between shapes.

Another way to improve the understanding of package functionality is to add annotations that describe package functionality.

In this task, you will use the formatting features in Business Intelligence Development Studio to improve the layout of the data flow and also add an annotation to the data flow.

To format the layout of the data flow


1.       If the Lesson 1 package is not already open, double-click Lesson 1.dtsx in Solution Explorer.

2.       Click the Data Flow tab.

3.       Place the cursor to the top and to the right of the Extract Sample Currency transformation, click, and then drag the cursor across all the data flow components.

All data flow components are now selected. The shape selected first—the shape whose selected indicators are white in color—dictates the size and location that is used in formatting the layout.

4.       On the Format menu, point to Make Same Size, and then click Both.

5.       With the data flow objects selected, on the Format menu, point to Align, and then click Lefts.

To add an annotation to the data flow


1.       Right-click anywhere in the background of the data flow design surface and then click Add Annotation.

2.       Type or paste the following text in the annotation box.

The data flow extracts data from a file, looks up values in the CurrencyKey column in the DimCurrency table and the TimeKey column in the DimTime table, and writes the data to the FactCurrencyRate table.

To wrap the text in the annotation box, place the cursor where you want to start a new line and press the Ctrl and Enter keys.

If you do not add text to the annotation box, it disappears when you click outside the box.

 

Step-9: Testing the lesson 1 tutorial package


In this lesson, you have done the following tasks:

  • Created a new SSIS project.
  • Configured the connection managers that the package needs to connect to the source and destination data.
  • Added a data flow that takes the data from a flat file source, performs the necessary Lookup transformations on the data, and configures the data for the destination.

Your package is now complete! It is time to test your package.







Before you test the package you should verify that the control and data flows in the Lesson 1 package contain the objects shown in the following diagrams.

Control Flow







Data Flow


 




















To run the Lesson 1 tutorial package

1.       On the Debug menu, click Start Debugging.

The package will run, resulting in 1097 rows successfully added into the FactCurrencyRate fact table in AdventureWorksDW.

2.       After the package has completed running, on the Debug menu, click Stop Debugging.

 

No comments:

Post a Comment