Step-1 Creating a New Integration Services Project
------- ----------------------------------------------------
To
set locale sensitive properties
Step-2 Adding and Configuring a Flat file Connection manager
----------------------------------------------------------------------------
you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.
To
add and configure an OLE DB Connection Manager
To
add a Data Flow task
To
add a Flat File Source component
To
add and configure the Lookup Currency Key transformation
To
add and configure the Lookup DateKey transformation
To add and configure the
sample OLE DB destination
Now that you have completed the configuration of the Lesson 1 package, it is a good idea to tidy up the package layout. If the shapes in the control and data flow layouts are random sizes, or if the shapes are not aligned or grouped, the functionality of package can be more difficult to understand.
To
format the layout of the data flow
To
add an annotation to the data flow
In this lesson, you have done the following tasks:
To run the Lesson 1 tutorial package
------- ----------------------------------------------------
Integration Services
includes graphical tools and wizards for building and debugging packages; tasks
for performing workflow functions such as FTP operations, executing SQL
statements, and sending e-mail messages; data sources and destinations for
extracting and loading data; transformations for cleaning, aggregating,
merging, and copying data; a management service, the Integration Services
service for administering package execution and storage; and application
programming interfaces (APIs) for programming the Integration Services object
model.
In this tutorial, you will
learn how to use SSIS Designer to create a simple Microsoft SQL
Server Integration Services package. The package that you create takes
data from a flat file, reformats the data, and then inserts the reformatted
data into a fact table. In following lessons, the package will be expanded to
demonstrate looping, package configurations, logging and error flow.
When you install the sample
data that the tutorial uses, you also install the completed versions of the
packages that you will create in each lesson of the tutorial. By using the
completed packages, you can skip ahead and begin the tutorial at a later lesson
if you like. If this is your first time working with packages or the new
development environment, we recommend that you begin with Lesson1.
Understanding the Requirements:
Before creating a package,
you need a good understanding of the formatting used in both the source data
and the destination. Once you understand both of these data formats, you will
be ready to define the transformations necessary to map the source data to the
destination.
Looking at the Source
For this tutorial, the
source data is a set of historical currency data contained in the flat file,
SampleCurrencyData.txt. The source data has the following four columns: the
average rate of the currency, a currency key, a date key, and the end-of-day
rate.
Here is an example of the
source data contained in the SampleCurrencyData.txt file:
1.00010001 ARS 9/3/2001
0:00 0.99960016
1.00010001 ARS 9/4/2001
0:00 1.001001001
1.00020004 ARS 9/5/2001
0:00 0.99990001
1.00020004 ARS 9/6/2001
0:00 1.00040016
1.00050025 ARS 9/7/2001
0:00 0.99990001
1.00050025 ARS 9/8/2001
0:00 1.001001001
1.00050025 ARS 9/9/2001
0:00 1
1.00010001 ARS 9/10/2001
0:00 1.00040016
1.00020004 ARS 9/11/2001
0:00 0.99990001
1.00020004 ARS 9/12/2001
0:00 1.001101211
When working with flat file
source data, it is important to understand how the Flat File connection manager
interprets the flat file data. If the flat file source is Unicode, the Flat
File connection manager defines all columns as [DT_WSTR] with a default column
width of 50. If the flat file source is ANSI-encoded, the columns are defined as
[DT_STR] with a column width of 50. You will probably have to change these
defaults to make the string column types more appropriate for your data. To do
this, you will need to look at the data type of the destination where the data
will be written to and then choose the correct type within the Flat File
connection manager.
Looking at the Destination
The ultimate destination for
the source data is the FactCurrencyRate fact table in AdventureWorksDW.
The FactCurrencyRate fact table has four columns, and has
relationships to two dimension tables, as shown in the following table.
Column
Name
|
Data
Type
|
Lookup
Table
|
Lookup
Column
|
AverageRate
|
float
|
None
|
None
|
CurrencyKey
|
int
(FK)
|
DimCurrency
|
CurrencyKey
(PK)
|
TimeKey
|
Int
(FK)
|
DimTime
|
TimeKey
(PK)
|
EndOfDayRate
|
float
|
None
|
None
|
Mapping Source Data to be Compatible with the Destination
Analysis of the source and
destination data formats indicates that lookups will be necessary for the CurrencyKey and TimeKey values.
The transformations that will perform these lookups will obtain the CurrencyKey and TimeKey values
by using the alternate keys from DimCurrency and DimTime dimension
tables.
Flat
File Column
|
Table
Name
|
Column
Name
|
Data
Type
|
0
|
FactCurrencyRate
|
AverageRate
|
Float
|
1
|
DimCurrency
|
CurrencyAlternateKey
|
nchar
(3)
|
2
|
DimTime
|
FullDateAlternateKey
|
Datetime
|
3
|
FactCurrencyRate
|
EndOfDayRate
|
Float
|
To create a new Integration Services project
1.
On the Start menu,
point to All
Programs, point to Microsoft SQL Server, and click SQL Server Business
Intelligence Development Studio.
2.
On the File menu,
point to New,
and click Project to create a new Integration Services
project.
3.
In the New Project dialog
box, select Integration
Services Project in
the Templates pane.
4.
In the Name box,
change the default name to SSIS Tutorial. Optionally, clear the Create directory
for solution check box.
5.
Accept the default location, or click Browse to browse to locate the folder you
want to use.
6.
In the Project Location dialog
box, click the folder and click Open.
7.
Click OK.
By default, an empty
package, titled Package.dtsx, will
be created and added to your project.
8.
In the Solution Explorer toolbar,
right-click Package.dtsx,
click Rename,
and rename the default package to Lesson 1.dtsx.
9.
When prompted to rename the package object, click Yes.
To
set locale sensitive properties
1.
On the View menu, click Properties Window.
2.
In the Properties window, set the LocaleID property to English (United
States).
Step-2 Adding and Configuring a Flat file Connection manager
----------------------------------------------------------------------------
you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.
You must create a new Flat
File connection manager for each file format that you work with. Because this
tutorial extracts data from multiple flat files that have exactly the same data
format, you will need to add and configure only one Flat File connection
manager for your package.
For this tutorial, you will
configure the following properties in your Flat File connection manager:
- Column names Because the flat file does not have
column names, the Flat File connection manager creates default column
names. These default names are not useful for identifying what each column
represents. To make these default names more useful, you need to change
the default names to names that match the fact table into which the flat
file data is to be loaded.
- Data mappings The data type mappings that you
specify for the Flat File connection manager will be used by all flat file
data source components that reference the connection manager. You can either
manually map the data types by using the Flat File connection manager, or
you can use the Suggest Column Types dialog box. In this
tutorial, you will view the mappings suggested in the Suggest
Column Types dialog box and then manually make the necessary
mappings in the Flat File Connection Manager Editor dialog
box.
The Flat File connection
manager provides locale information about the data file. If your computer is
not configured to use the regional option English (United States), you must set
additional properties in the Flat File Connection Manager Editor dialog
box.
To add a Flat File connection manager
1. Right-click
anywhere in the Connection Managers area, and then click New
Flat File Connection.
2. In
the Flat File Connection Manager Editor dialog box, for Connection
manager name, type Sample Flat File Source Data.
3. Click Browse.
4. In
the Open dialog box, locate the sample data folder and open
the SampleCurrencyData.txt file. By default, the tutorial sample data is
installed to the c:\Program Files\Microsoft SQL Server\100\Samples\Integration
Services\Tutorial\Creating a Simple ETL Package\Sample Data folder.
To set locale sensitive properties
1. In
the Flat File Connection Manager Editor dialog box,
click General.
2. Set Locale to
English (United States) and Code page to 1252.
To rename columns in the Flat File connection manager
1. In
the Flat File Connection Manager Editor dialog box,
click Advanced.
2. In
the property pane, makes the following changes:
o Change
the Column 0 name property to AverageRate.
o Change
the Column 1 name property to CurrencyID.
o Change
the Column 2 name property to CurrencyDate.
o Change
the Column 3 name property to EndOfDayRate.
By
default, all four of the columns are initially set to a string data type
[DT_STR] with an OutputColumnWidth of 50.
|
To remap column data types
1. In
the Flat File Connection Manager Editor dialog box,
click Suggest Types.
Integration
Services automatically suggests the most appropriate data types based on the
first 100 rows of data. You can also change these suggestion options to sample
more or less data, to specify the default data type for integer or Boolean
data, or to add spaces as padding to string columns.
For
now, make no changes to the options in the Suggest Column Types dialog
box, and click OK to have Integration Services suggest data
types for columns. This returns you to the Advanced pane of
the Flat File connection Manager Editor dialog box, where you
can view the column data types suggested by Integration Services. (If you
click Cancel, no suggestions are made to column metadata and the
default string (DT_STR) data type is used.)
In
this tutorial, Integration Services suggests the data types shown in the second
column of the following table for the data from the SampleCurrencyData.txt
file. However, the data types that are required for the columns in the
destination, which will be defined in a later step, are shown in the last
column of the following table.
Flat
File Column
|
Suggested
Type
|
Destination
Column
|
Destination
Type
|
AverageRate
|
Float
[DT_R4]
|
FactCurrencyRate.AverageRate
|
Float
|
CurrencyID
|
String
[DT_STR]
|
DimCurrency.CurrencyAlternateKey
|
nchar(3)
|
CurrencyDate
|
Date
[DT_DATE]
|
DimTime.FullDateAlternateKey
|
datetime
|
EndOfDayRate
|
Float
[DT_R4]
|
FactCurrencyRate.EndOfDayRate
|
Float
|
The
data types suggested for the CurrencyID and CurrencyDate columns
are incompatible with the data types of the fields in the destination table.
Because the data type of DimCurrency.CurrencyAlternateKey is
nchar (3), CurrencyID must be changed from string [DT_STR] to
string [DT_WSTR]. Additionally, the fieldDimTime.FullDateAlternateKey is
defined as a DateTime data type; therefore, CurrencyDate needs
to be changed from date [DT_Date] to database timestamp [DT_DBTIMESTAMP].
2. In
the property pane, change the data type of column CurrencyID from
string [DT_STR] to Unicode string [DT_WSTR].
3. In
the property pane, change the data type of column CurrencyDate from
date [DT_DATE] to database timestamp [DT_DBTIMESTAMP].
4. Click OK.
Step-3: Adding and configuring an OLE DB connection
manager
----------------------------------------------------------------------------------
After
you have added a Flat File connection manager to connect to the data source,
the next task is to add an OLE DB connection manager to connect to the
destination. An OLE DB connection manager enables a package to extract data
from or load data into any OLE DB–compliant data source. Using the OLE DB
Connection manager, you can specify the server, the authentication method, and
the default database for the connection.
In
this lesson, you will create an OLE DB connection manager that uses Windows
Authentication to connect to the local instance of AdventureWorksDB.
The OLE DB connection manager that you create will also be referenced by other
components that you will create later in this tutorial, such as the Lookup
transformation and the OLE DB destination.
To
add and configure an OLE DB Connection Manager
1.
Right-click anywhere in the Connection Managers area,
and then click New
OLE DB Connection.
2.
In the Configure OLE DB Connection Manager dialog box, click New.
3.
For Server name, enter localhost.
When you specify localhost
as the server name, the connection manager connects to the default instance of
SQL Server on the local computer. To use a remote instance of SQL Server,
replace localhost with the name of the server to which you want to connect.
4.
In the Log on to the server group, verify that Use Windows
Authentication is
selected.
5.
In the Connect to a database group, in the Select or enter a
database name box,
type or select AdventureWorksDW.
6.
Click Test Connection to
verify that the connection settings you have specified are valid.
7.
Click OK.
8.
Click OK.
9.
In the Data Connections pane
of the Configure
OLE DB Connection Manager dialog
box, verify that localhost.AdventureWorksDW is selected.
10.
Click OK.
Step-4: Adding a Dataflow to the package
After
you have created the connection managers for the source and destination data,
the next task is to add a Data Flow task to your package. The Data Flow task
encapsulates the data flow engine that moves data between sources and
destinations, and provides the functionality for transforming, cleaning, and
modifying data as it is moved. The Data Flow task is where most of the work of
an extract, transform, and load (ETL) process occurs.
SQL
Server Integration Services separates data flow from control flow. This
separation of data flow from control flow is one of the dramatic differences
between Integration Services and Microsoft SQL Server 2000 Data
Transformation Services.
|
To
add a Data Flow task
1.
Click the Control Flow tab.
2.
In the Toolbox, expand Control Flow Items, and drag a Data Flow Task onto the design surfaceof the Control
Flow tab.
3.
On the Control Flow design
surface, right-click the newly added Data Flow Task, click Rename,
and change the name to Extract Sample Currency Data.
It is good practice to
provide unique names to all components that you add to a design surface. For
ease of use and maintainability, the names should describe the function that
each component performs. Following these naming guidelines allows your
Integration Services packages to be self-documenting. Another way to document
your packages is by using annotations. For more information about annotations,
see Using Annotations in Packages.
4.
Right-click the Data Flow task, click Properties,
and in the Properties window, verify that the LocaleID property
is set to English
(United States).
Step-5: Adding and configuring the flat file source
In
this task, you will add and configure a Flat File source to your package. A
Flat File source is a data flow component that uses metadata defined by a Flat
File connection manager to specify the format and structure of the data to be
extracted from the flat file by a transform process. The Flat File source can
be configured to extract data from a single flat file by using the file format
definition provided by the Flat File connection manager.
For
this tutorial, you will configure the Flat File source to use the Sample Flat File
Source Data connection
manager that you previously created.
To
add a Flat File Source component
1.
Open the Data Flow designer,
either by double-clicking the Extract Sample Currency Data data flow task or by clicking the Data Flow tab.
2.
In the Toolbox, expand Data Flow Sources, and then drag a Flat File Source onto the design surface of the Data Flow tab.
3.
On the Data Flow design
surface, right-click the newly added Flat File Source, click Rename,
and change the name to Extract Sample Currency Data.
4.
Double-click the Flat File source to open the Flat File Source
Editor dialog box.
5.
In the Flat file connection manager box, type or select Sample Flat File
Source Data.
6.
Click Columns and
verify that the names of the columns are correct.
7.
Click OK.
8.
Right-click the Flat File source and click Properties.
9.
In the Properties window, verify that the LocaleID property is set to English (United
States).
Step-6: Adding and configuring the lookup Transformations
After
you have configured the Flat File source to extract data from the source file,
the next task is to define the Lookup transformations needed to obtain the
values for theCurrencyKey and TimeKey. A Lookup
transformation performs a lookup by joining data in the specified input column
to a column in a reference dataset. The reference dataset can be an existing
table or view, a new table, or the result of an SQL statement. In this
tutorial, the Lookup transformation uses an OLE DB connection manager to
connect to the database that contains the data that is the source of the
reference dataset.
You
can also configure the Lookup transformation to connect to a cache that
contains the reference dataset. For more information, see Lookup Transformation.
|
For
this tutorial, you will add and configure the following two Lookup
transformation components to the package:
·
One transformation to perform a lookup of values from the CurrencyKey column of the DimCurrency dimension table based on matching CurrencyID column values from the flat file.
·
One transformation to perform a lookup of values from the TimeKey column of the DimTime dimension table based on matching CurrencyDate column values from the flat file.
In
both cases, the Lookup transformations will utilize the OLE DB connection
manager that you previously created.
To
add and configure the Lookup Currency Key transformation
1.
In the Toolbox, expand Data Flow Transformations, and then drag Lookup onto the design surfaceof the Data Flow tab. Place Lookup directly below the Extract Sample
Currency Data source.
2.
Click the Extract Sample Currency Data flat file source and drag the
green arrow onto the newly added Lookup transformation
to connect the two components.
3.
On the Data Flow design
surface, click Lookup in the Lookup transformation, and change the name to Lookup Currency
Key.
4.
Double-click the Lookup CurrencyKey transformation.
5.
On the General page,
make the following selections:
1.
Select Full cache.
2.
In the Connection type area,
select OLE
DB connection manager.
6.
On the Connection page,
make the following selections:
1.
In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW is displayed.
2.
Select Use results of an SQL query, and then type or copy
the following SQL statement:
3. select * from (select * from [dbo].[DimCurrency]) as refTable
4. where [refTable].[CurrencyAlternateKey] = 'ARS'
5. OR
6. [refTable].[CurrencyAlternateKey] = 'AUD'
7. OR
8. [refTable].[CurrencyAlternateKey] = 'BRL'
9. OR
10. [refTable].[CurrencyAlternateKey] = 'CAD'
11. OR
12. [refTable].[CurrencyAlternateKey] = 'CNY'
13. OR
14. [refTable].[CurrencyAlternateKey] = 'DEM'
15. OR
16. [refTable].[CurrencyAlternateKey] = 'EUR'
17. OR
18. [refTable].[CurrencyAlternateKey] = 'FRF'
19. OR
20. [refTable].[CurrencyAlternateKey] = 'GBP'
21. OR
22. [refTable].[CurrencyAlternateKey] = 'JPY'
23. OR
24. [refTable].[CurrencyAlternateKey] = 'MXN'
25. OR
26. [refTable].[CurrencyAlternateKey] = 'SAR'
27. OR
28. [refTable].[CurrencyAlternateKey] = 'USD'
29. OR
30. [refTable].[CurrencyAlternateKey] = 'VEB'
7.
On the Columns page,
make the following selections:
1.
In the Available Input Columns panel, drag CurrencyID to the Available Lookup
Columns panel and drop
it on CurrencyAlternateKey.
2.
In the Available Lookup Columns list, select the check box to the
right of CurrencyKey.
8.
Click OK to
return to the Data
Flow design surface.
9.
Right-click the Lookup Currency Key transformation, click Properties.
10.
In the Properties window, verify that the LocaleID property is set to English (United
States) and the DefaultCodePage property is set to 1252.
To
add and configure the Lookup DateKey transformation
1.
In the Toolbox, drag Lookup onto
the Data
Flow design surface.
Place Lookup directly below the Lookup CurrencyKey transformation.
2.
Click the Lookup Currency Key transformation
and drag the green arrow onto the newly added Lookup transformation
to connect the two components.
3.
In the Input Output Selection dialog box, click Lookup Match
Output in the Output list box, and then click OK.
4.
On the Data Flow design
surface, click Lookup in the newly added Lookup transformation, and change the name to Lookup DateKey.
5.
Double-click the Lookup DateKey transformation.
6.
On the General page,
select Partial
cache.
7.
On the Connection page,
make the following selections:
1.
In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW is displayed.
2.
In the Use a table or view box,
type or select [dbo].[DimTime].
8.
On the Columns page,
make the following selections:
1.
In the Available Input Columns panel, drag CurrencyDate to the Available Lookup
Columns panel and drop
it on FullDateAlternateKey.
2.
In the Available Lookup Columns list, select the check box to the
right of TimeKey.
9.
On the Advanced page,
review the caching options.
10.
Click OK to
return to the Data
Flow design surface.
11.
Right-click the Lookup Date Key transformation and click Properties.
12.
In the Properties window, verify that the LocaleID property is set to English (United
States) and the DefaultCodePage property is set to 1252.
Step-7: Adding and
configuring the OLE-DB Destination
To add and configure the
sample OLE DB destination
1.
In the Toolbox, expand Data Flow Destinations, and drag OLE DB
Destination onto the
design surface of the Data Flow tab.
Place the OLE DB destination directly below the Lookup DateKey transformation.
2.
Click the Lookup Date Key transformation
and drag the green arrow over to the newly added OLE DB
Destination to connect
the two components together.
3.
In the Input Output Selection dialog box, in the Output list box, click Lookup Match
Output, and then click OK.
4.
On the Data Flow design
surface, click OLE
DB Destination in the
newly added OLE
DB Destination component,
and change the name to Sample OLE DB Destination.
5.
Double-click Sample OLE DB Destination.
6.
In the OLE DB Destination Editor dialog box, ensure that localhost.AdventureWorksDW is selected in the OLE DB Connection
manager box.
7.
In the Name of the table or the view box, type or select [dbo].[FactCurrencyRate].
8.
Click Mappings.
9.
Verify that the AverageRate, CurrencyKey, EndOfDayRate, and TimeKey input
columns are mapped correctly to the destination columns. If same-named columns
are mapped, the mapping is correct.
10.
Click OK.
11.
Right-click the Sample OLE DB Destination destination and click Properties.
12.
In the Properties window, verify that the LocaleID property is set to English (United
States) and theDefaultCodePage property
is set to 1252.
Step-8: Making lesson 1
package easier to understand
Now that you have completed the configuration of the Lesson 1 package, it is a good idea to tidy up the package layout. If the shapes in the control and data flow layouts are random sizes, or if the shapes are not aligned or grouped, the functionality of package can be more difficult to understand.
Business
Intelligence Development Studio provides tools that make it easy and quick to
format the package layout. The formatting features include the ability to make
shapes the same size, align shapes, and manipulate the horizontal and vertical
spacing between shapes.
Another
way to improve the understanding of package functionality is to add annotations
that describe package functionality.
In
this task, you will use the formatting features in Business Intelligence
Development Studio to improve the layout of the data flow and also add an
annotation to the data flow.
To
format the layout of the data flow
1.
If the Lesson 1 package is not already open, double-click Lesson
1.dtsx in Solution Explorer.
2.
Click the Data Flow tab.
3.
Place the cursor to the top and to the right of the Extract
Sample Currency transformation, click, and then drag the cursor across all the
data flow components.
All
data flow components are now selected. The shape selected first—the shape whose
selected indicators are white in color—dictates the size and location that is
used in formatting the layout.
4.
On the Format menu,
point to Make
Same Size, and then click Both.
5.
With the data flow objects selected, on the Format menu, point to Align,
and then click Lefts.
To
add an annotation to the data flow
1.
Right-click anywhere in the background of the data flow design
surface and then click Add Annotation.
2.
Type or paste the following text in the annotation box.
The data flow extracts data from a file, looks up values in the
CurrencyKey column in the DimCurrency table and the TimeKey column in the
DimTime table, and writes the data to the FactCurrencyRate table.
To
wrap the text in the annotation box, place the cursor where you want to start a
new line and press the Ctrl and Enter keys.
If
you do not add text to the annotation box, it disappears when you click outside
the box.
Step-9: Testing the lesson 1
tutorial package
In this lesson, you have done the following tasks:
- Created a new SSIS project.
- Configured the connection
managers that the package needs to connect to the source and destination
data.
- Added a data flow that takes
the data from a flat file source, performs the necessary Lookup
transformations on the data, and configures the data for the destination.
Your
package is now complete! It is time to test your package.
Before
you test the package you should verify that the control and data flows in the
Lesson 1 package contain the objects shown in the following diagrams.
Control
Flow
Data
Flow
To run the Lesson 1 tutorial package
1. On
the Debug menu, click Start Debugging.
The
package will run, resulting in 1097 rows successfully added into the FactCurrencyRate fact
table in AdventureWorksDW.
2. After
the package has completed running, on the Debug menu,
click Stop Debugging.
No comments:
Post a Comment