7 March 2015

Contained Databases in SQL Server 2012

Problem
While looking through the new features and improvements in SQL Server 2012, we found a potentially interesting feature called Contained Databases. A contained database basically includes all database settings and the metadata within itself thereby resulting in no configuration dependencies on the instance of the SQL Server Database Engine where the database is actually installed. Users will be able to connect to a contained database without authenticating a login at the Database Engine level. This feature really helps to isolate the database from the Database Engine thereby making it possible to easily move the database from one instance of SQL Server to another. In this tip we will take a look at how to configure and use this feature of SQL Server 2012.
Solution
Let us go through an example to configure a Contained Database in SQL Server 2012. Below are the steps to configure and use this feature.
1. Enable Contained Database Feature in SQL Server 2012 Instance Level
2. Create a Contained Database in SQL Server 2012
3. Create an SQL Server User to Access the Contained Database
4. Login and Verify the User Permission on the Contained Database

Enable Contained Database Feature in SQL Server 2012 Instance Level
Execute the T-SQL code below to enable the Contained Database feature at the SQL Server instance level.
Use master
GO
sp_configure 'show advanced options', 1
GO
RECONFIGURE WITH OVERRIDE
GO 
sp_configure 'contained database authentication', 1
GO
RECONFIGURE WITH OVERRIDE
GO 
sp_configure 'show advanced options', 0 
GO
RECONFIGURE WITH OVERRIDE 
GO


Create a Contained Database in SQL Server 2012 Using SQL Server Management Studio
Once the Contained Database Feature is enabled at the SQL Server instance level, then create a Contained Database.
1. In the Object Explorer, right click the Databases and select New Database... option from the drop down list.
2. In the General Page you can provide the name of the database as DemoContainedDB and then click on Optionspage on the left side pane.
3. In the Options Page, to create a Contained Database you need to choose the value as Partial in the Other Options drop down list as highlighted in the screen shot below.
Create a Contained Database in SQL Server Denali Using SQL Server Management Studio
4. Finally, click OK to create the Contained Database.

Create a Contained Database in SQL Server 2012 Using T-SQL code
You can create a Contained Database by executing the T-SQL code below.
Use master
GO
CREATE DATABASE [DemoContainedDB]
CONTAINMENT = PARTIAL
ON PRIMARY 
( 
NAME = N'DemoContainedDB', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\DemoContainedDB.mdf' , 
SIZE = 3072KB , 
FILEGROWTH = 1024KB 
)
LOG ON 
( 
NAME = N'DemoContainedDB_log', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\DemoContainedDB_log.ldf' , 
SIZE = 1024KB , 
FILEGROWTH = 10%
)
GO


Create a SQL Server User to Access the Contained Database
Once the contained database is created successfully, next create an SQL Server or Windows User, which will have access to the newly created contained database. Follow these steps to create a SQL Server User within the contained database.
1. In the Object Explorer, expand Databases, expand DemoContainedDB Database which was created earlier, expand Security and then right click Users and choose New User... option from the drop down list as shown in the screen shot below.
create a sql server user to access the database
2. In Database User - New window, you need to choose the User Type value as SQL User with Password and provide the SQL Server User Name and Password. You also have the option to select the Default language andDefault schema. Finally, to create the user click the OK button.
used type value as sql user with password
3. You can also create the user by executing the T-SQL code below.
USE [DemoContainedDB]
GO
CREATE USER [ContainedDBUser] WITH PASSWORD=N'Contained$DBUser@5', 
 DEFAULT_LANGUAGE=[English], 
 DEFAULT_SCHEMA=[dbo]
GO


Login and Verify the User Permissions on a Contained Database
Let's try connecting to the newly created Contained Database using the SQL Server Login.
1. In the Login tab you need to provide the Server name, choose the Authentication type as "SQL Server Authentication" and specify the Login and Password.
sql server authentication
2. In Connection Properties tab, you need to specify the database name as highlighted in the screen shot below in the Connect to database option and then click the Connect button to get connected to the Contained Database.
specify the database name
3. In the screen shot below, you can see that you were able to successfully get connected to the Contained Databaseusing the newly created SQL Server Login which doesn't have access to the Database Engine.
using the newly created sql server login
4. You can also see in the screen shot above that the SQL Server Login (Contained User) only has access to the newly Contained Database.
5. Now let us try connecting to the SQL Server Instance using System Administrator (SA) privileges. Here you will be able to see all the databases. However, you will not be able to see the Contained Database User under Server Logins as shown in the screen shot below.
connecting the sql server instance using sa privilages


































Please refer MSSQL Tips blog







Full Text Search in SQL server

ProblemI have heard about Full Text Search and I know it has been around for the last few versions of SQL Server.  I have never really used it and I have seen some of the other tips related to the technology on your web site, so I assume someone must be using it.  Can you provide any use cases for Full Text Search?  I think I understand the general premise, but not sure if it is any quicker than just using a LIKE statement or how I would need to change my queries.
SolutionYes - You are correct, Full Text Search has been an available feature in the last few versions of SQL Server and does provide another means of data access.  In general, you need to setup a new catalog for table and column you would like to use and then change some of the syntax in your queries to use the Full Text Catalog.  Here is some information to get you started:
Use Case for Full Text Search
On a recent project, one of the needs was to search by a few different sets of criteria which were in a number of different tables with a number of one to many relationships.  Most of the tables had millions of rows of data with the largest table having over 12 million rows.  The data was primarily read-only and updated on a monthly basis.  Unfortunately, all of the queries had to access the 12 million row table with 1 to 5 joins to the child tables and 1 to 5 WHERE clauses in the queries.  The data needed to be returned as quickly as possible with the highest level of concurrency based on the existing hardware resources.
In this scenario, a number of test cases were built with a few different T-SQL coding techniques to include:
Based on the testing conducted (queries, data, concurrency, etc.), the EXCEPT and INTERSECT syntax was the best T-SQL option for a single JOIN and/or a single WHERE clause statement, which was considered a rare condition.  The best T-SQL option for a numerous JOIN and numerous WHERE clause statements, which was considered the norm, was the Full Text Catalog with the CONTAINS command.  This was a little counter intuitive initially, but the Full Text Catalog was the most efficient as we tested each scenario and reviewed the cost of the query plans.  However, this may not be the case for your data and queries, so test thoroughly.
*** NOTE *** - As a point of reference, the testing was conducted with the DBCC DROPCLEANBUFFERS command issued in between statement executions to ensure cached results would not unfairly benefit subsequent queries.
Setting up a Full Text Catalog
Since, Full Text Catalogs may be new to some DBAs and Developers, let's walk through an example of setting up a Full Text Catalog for the AdventureWorks database (SQL Server 2005 sample database):
Creation - To create a new Full Text catalog navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs.  To create the catalog, right click on the Full Text Catalogs folder and select the 'New Full-Text Catalog...' option.
On the New Full-Text Catalog interface the following settings should be configured:
  • Full-text catalog name
  • Catalog location
  • Filegroup
  • Owner
  • Accent Sensitivity
Corresponding Script
USE [AdventureWorks]
GO
CREATE FULLTEXT CATALOG [zProduction.Product]IN PATH N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData'WITH ACCENT_SENSITIVITY = ON
AUTHORIZATION
 [dbo]
GO
Configuration - To configure the Full Text Catalog (tables, columns, schedule, etc) navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs | Catalog Name (i.e. zProduction.Product in our example) and select the 'Properties' option. 
On the Full Text Catalog Properties interface the following settings should be configured:
  • General configurations
  • Tables\views with the corresponding columns
  • Population Schedule
General Page - This interface corresponds primarily to the catalog that was setup in the previous steps, with the ability to indicate if the catalog is the default, the catalog owner and if the catalog is accent sensitive.
Tables/Views Page - This page provides the opportunity to determine the tables, columns to include in the Full Text Catalog.  Although multiple tables and columns could be included.  In the lower portion of the interface, it is important to note the unique index for each table, the language and the track changes setting.  For more information about the track changes setting, reference the Rebuilding the Full Text Catalog section below.
Population Schedule -

Corresponding Script
USE [AdventureWorks]
GO
CREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON[zProduction.Product] WITH CHANGE_TRACKING AUTOGO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ENABLE
GO

USE [msdb]
GO
DECLARE @jobId BINARY(16)EXEC msdb.dbo.sp_add_job @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product',
@enabled
=1,
@start_step_id
=1,
@description
=N'Scheduled full-text optimize catalog population for full-text catalog zProduction.Product in database AdventureWorks. This job was created by the Full-Text Catalog Scheduling dialog or Full-Text Indexing Wizard.',
@category_name
=N'Full-Text', @job_id = @jobId OUTPUT
select
 @jobId
GO

EXEC msdb.dbo.sp_add_jobserver @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @server_name = N'JTKLAPTOP'GO

USE [msdb]
GO
DECLARE @schedule_id int
EXEC
 msdb.dbo.sp_add_jobschedule @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @name=N'zProduction.Product Full Text Catalog ',
@enabled
=1,
@freq_type
=4,
@freq_interval
=1,
@freq_subday_type
=1,
@freq_subday_interval
=0,
@freq_relative_interval
=0,
@freq_recurrence_factor
=1,
@active_start_date
=20070924,
@active_end_date
=99991231,
@active_start_time
=20000,
@active_end_time
=235959 @schedule_id = @schedule_id 
OUTPUT

select @schedule_id
GO

USE [msdb]
GO
EXEC msdb.dbo.sp_add_jobstep @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @step_name=N'Full-Text Indexing',
@step_id
=1,
@cmdexec_success_code
=0,
@on_success_action
=1,
@on_success_step_id
=-1,
@on_fail_action
=2,
@on_fail_step_id
=-1,
@retry_attempts
=0,
@retry_interval
=0,
@os_run_priority
=0, @subsystem=N'TSQL',
@command
=N'USE [AdventureWorks]
ALTER FULLTEXT CATALOG [zProduction.Product] REORGANIZE'
,
@database_name
=N'master'
GO
Querying with the Full Text Catalog
Here are a few Full Text Catalogs query examples with the CONTAINS command for the AdventureWorks database as a point of reference:
Product ID and Product Name Selection
USE AdventureWorks;GOSELECT ProductID, [Name]FROM Production.ProductWHERE CONTAINS([Name], '"*washer*" OR "*ball*"');
GO
Description Selection
USE AdventureWorks;GOSELECT ProductDescriptionID, DescriptionFROM Production.ProductDescriptionWHERE CONTAINS(Description, '"*technology*" OR "*performance*"');GO
Rebuilding the Full Text Catalog
One item to keep in mind is that the Full Text Catalogs need to be rebuilt automatically, on a schedule or on an as needed basis in order to include all of the updated data since the initial population.  Please reference the options below to rebuild the Full Text Catalogs based on the Tables/Views Page interface shown above:
  • Automatic - The data in the full-text index is automatically updated as the data in the corresponding table is inserted, updated or deleted. 
  • Manual - In this scenario, when the indexed data is inserted, updated or deleted, SQL Server will track the change, but not update the index as is the case with the automatic option. A SQL Server Agent job needs to run in order to update the index.
  • Do not track changes - When the indexed data is inserted, updated or deleted, SQL Server will not track the change and the index must be rebuilt to reflect all of the underlying data changes.
Each of these options need to be understood in terms of data availability versus performance.  If the data in the Full Text catalog always needs to be up to date at any cost, then use the automatic setting.  If the data changes need to be recorded, but not update the Full Text indexes then the manual setting should be used. This setting should balance some of the performance needs by rebuilding the Full Text indexes during a low usage period.  If the data is updated in bulk on a regular basis, then not tracking changes is probably appropriate with the practice that the Full Text index will need to be updated following the bulk loading of the data.

Please refer MSSQL Tips blog.

Columnstore index feature in SQL Server 2012

Problem
A new feature in SQL Server 2012 is the Columnstore Index which can be used to significantly improve query performance. In this tip we will take a look of how it works and how we can use it.
Solution
There are two types of storage available in the database; RowStore and ColumnStore.
In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored contiguously. So a ColumnStore Index works using ColumnStore storage.
column store versus row store in SQL Server
Now let's show how we can create a ColumnStore Index and how performance can be improved.

Creating a Column Store Index

Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below.
The syntax of a ColumnStore Index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N)

Performance Test

I used the AdventureWorks sample database for performing tests.
--Create the Test Table
USE [AdventureWorks2008R2]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test_Person](
 [BusinessEntityID] [int] NOT NULL,
 [PersonType] [nchar](2) NOT NULL,
 [NameStyle] [dbo].[NameStyle] NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] [dbo].[Name] NOT NULL,
 [MiddleName] [dbo].[Name] NULL,
 [LastName] [dbo].[Name] NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [EmailPromotion] [int] NOT NULL,
 [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
 [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- We Populated this table with the Data Stored in Table Person.Person.
-- As we need Plenty of data so we ran the loop 100 times.
INSERT INTO [dbo].[Test_Person] 
SELECT P1.*
FROM Person.Person P1
GO 100
-- At this point we have 1,997,200 rows in the table.
-- Create Clustered Index  on Coloun [BusinessEntityID] 
CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
( [BusinessEntityID])
GO
-- Creating Non - CLustered Index on 3 Columns
CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

-- Creating Non - CLustered  ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])
At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.

Query Without ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.

Query With ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
Here are the Actual Execution Plans for both queries:
sql server query plan for non columnstore query

sql server query plan for columnstore query
We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.
Now if we hover the mouse over the Index Scans we can see details for these operations.  The below is a comparison:
execution plan output for columnstore query
It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

Performing INSERT, DELETE or UPDATE Operations

We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:
 Msg 35330, Level 15, State 1, Line 1
DELETE statement failed because data cannot be updated 
in a table with a columnstore index. Consider disabling the 
columnstore index before issuing the DELETE statement, 
then rebuilding the columnstore index after DELETE is complete.
However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:
ALTER INDEX 'Index Name' on 'table name' DISABLE

Creating a ColumnStore Index using Management Studio

Right click and select New Index and select Non-Clustered Columnstore Index...
create columnstore index using SSMS
Click add to add the columns for the index.
create columnstore index using SSMS select columns
After selecting the columns click OK to create the index.
save columnstore index using SSMS

Limitations of a ColumnStore Index

  1. It cannot have more than 1024 columns.
  2. It cannot be clustered, only NonClustered ColumnStore indexes are available.
  3. It cannot be a unique index.
  4. It cannot be created on a view or indexed view.
  5. It cannot include a sparse column.
  6. It cannot act as a primary key or a foreign key.
  7. It cannot be changed using the ALTER INDEX statement. You have to drop and re-create the ColumnStore index instead. (Note: you can use ALTER INDEX to disable and rebuild a ColumnStore index.)
  8. It cannot be created with the INCLUDE keyword.
  9. It cannot include the ASC or DESC keywords for sorting the index.
Please refer MSSQL Tips blog

Understanding SQL Server ColumnStore Indexes


Problem
The amount of data in data warehouses is growing rapidly. At the same time, the query performance against these same data warehouses is degrading. I heard SQL Server 2012 introduces a new way of creating/managing/storing indexes which improves the performance of these common data warehousing queries several fold, in some cases 10 to 100 times faster. So what is this new feature? How can you build it? How does it get stored by SQL Server and how does it improve the performance of common data warehousing queries?  Read this tip to learn more.


Solution
ColumnStore Indexes are the functionality which was introduced with SQL Server 2012 which intends to improve the performance of common data warehousing queries. This new functionality includes ColumnStore Indexes and Batch mode (vector based) query processing/execution capabilities. The later one is used by the query optimizer internally to process the request in batches when the query execution plan uses at least one ColumnStore Index. The processing of a batch at a time speeds up joins, filtering and aggregations (in comparison with row at a time processing) and better utilizes today's multicore/multi-processors hardware. The choice to use either batch mode query processing or traditional row mode query processing is determined by the query optimizer.

Understanding SQL Server ColumnStore Indexes

The syntax for creating a ColumnStore Index is not much different from creating traditional Row Store/B-Tree indexes, but the difference lies in the way data gets stored/accessed in the pages on disk. The traditional Row Store Index gets stored in a B-Tree structure and data from all the columns of the index (of the row) are stored continuously on the same page (applies to heap as well).  This type of index is good in cases (OLTP) where you use predicates to filter the data or scan data from all the columns of the indexes, but for OLAP systems it poses some problems such as:
  • In the case of a Row Store Index, data from all the columns of the rows are stored together on the same page and hence it has a small ratio of redundancy on each page (data from different columns normally would not be similar), meaning compression would not be beneficial.
  • In the case of a Row Store Index, data from all the columns of the rows are stored together on the same page and no matter if you are selecting all the columns of the index or only few, it pulls out pages containing data of all the columns of the index into memory and yielding a significantly higher IO ratio, especially in case of a fact table which has dozens of columns and only few of them are normally referenced in the query.
ColumnStore Indexes also store each column data in separate pages (column wise fashion) rather than the traditional Row Store Index, which stores data from all the columns of a row together contiguously (row wise fashion), this way when you query. If your query only selects a few columns of the index, it reads less pages as it needs to read data of selected columns only and improves the performance by minimizing the IO cost.
Compression can be more effective for a ColumnStore index as data from each column is stored separately and redundancy (repetitive values) of the data in each column in each page is high. This means highly compressed pages takes less IO to bring the data into memory and effectively reduce query response time, especially for subsequent query execution.
As data gets stored in column wise fashion, your query selecting only few columns will reference only pages that contain data of those columns, which would normally be 10%-20% in a typical data warehouse scenario. This means you are reducing the IO by 80%-90%.
Please note, ColumnStore Indexes might be good when scanning and aggregating data in large fact tables involving star schema joins, whereas Row Store Indexes may offer better query performance for very selective queries, such as queries that lookup a single row or a small range of rows. In addition, ColumnStore Indexes are created to accelerate common data warehouse queries and would not be suitable for OLTP workloads.
Row store versus columnstore indexes
Row Store Index values from all the columns of the rows are stored together on the same page as shown in the top portion of the image whereas with ColumnStore Indexes values from a single column (multiple rows) are stored together as shown on the bottom portion of the image. Image source.

Benefits of using SQL Server ColumnStore Indexes

There are several benefits of using ColumnStore indexes over Row Store Indexes as outlined below:
  • Faster query performance for common data warehouse queries as only required columns/pages in the query are fetched from disk
  • Data is stored in a highly compressed form (Vertipaq technology) to reduce the storage space
  • Frequently accessed columns (pages that contains data for these columns) remain in memory because a high ratio of compression is used in the pages and less pages are involved
  • Enhanced query processing/optimization and execution feature (new Batch Operator or batch mode processing) improves common data warehouse queries' performance

Limitations of SQL Server ColumnStore Indexes

There are several limitations of using SQL Server ColumnStore indexes over Row Store indexes including:
  • A table with a ColumnStore Index cannot be updated
  • ColumnStore index creation takes more time (1.5 times almost) than creating a B-tree index (on same set of columns) because the data is compressed
  • A table can have only one ColumnStore Index and hence you should consider including all columns or at least all those frequently used columns of the table in the index
  • A ColumnStore Index can only be non cluster and non unique index; you cannot specify ASC/DESC or INCLUDE clauses
  • Not all data types (binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), etc.) are supported
  • The definition of a ColumnStore Index cannot be changed with the ALTER INDEX command, you need to drop and create the index or disable it then rebuild it
  • You can create a ColumnStore index on a table which has compression enabled, but you cannot specify the compression setting for the column store index
  • A ColumnStore Index cannot be created on view
  • A ColumnStore Index cannot be created on table which uses features like ReplicationChange TrackingChange Data Capture and Filestream

Loading data into a SQL Server table with a ColumnStore Index

As I said before, a table with ColumnStore Index is not updatable, so you will need to work out a way to update it. Luckily we have couple of options, for example, dropping/disabling index, loading data and rebuilding index again or load the data in a table, create ColumnStore Index on the table and then switch in that table in the main table, if the main table is partitioned.

Creating a SQL Server ColumnStore Index

First of all, you can have only one column store index on each table. This index should be a non clustered index that should ideally include all the columns of the table.  This is generally the best practice because all of the columns can be accessed independently from one another. The general syntax for creating ColumnStore Indexes has been provided below, but keep in mind you can also create ColumnStore Indexes using the Object Explorer in SSMS:
CREATE NONCLUSTERED COLUMNSTORE INDEX <IndexName> 
ON <TableName>
(
 Col1,
 Col2,
 ....
 ....
 Coln
)
GO
I am going to use the FactInternetSales fact table from the AdventureWorksDW2008R2 database for this demonstration. Please note, this table is not that large to demonstrate the performance gains possible with ColumnStore Indexes, but the idea is to start working with ColumnStore Indexes to understand how the query optimizer uses it for better performance and how to identify batch mode processing in the query plan. In the script below, I am creating a ColumnStore Index on the FactInternetSales table and including all of the frequently used columns from that table in the index. As I said before, you can choose to include all columns from the table and as the data gets stored in highly compressed form. 
USE AdventureWorksDW2008R2
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX CSI_FactInternetSales
ON dbo.FactInternetSales 
(
 ProductKey,
 OrderDateKey,
 DueDateKey,
 ShipDateKey,
 CustomerKey,
 PromotionKey,
 CurrencyKey,
 SalesTerritoryKey,
 SalesOrderNumber,
 SalesOrderLineNumber,
 TotalProductCost,
 SalesAmount
)
GO
Now its time to run some queries against the fact table on which we created the ColumnStore Index to see how the query optimizer uses it and how it improves the query performance. I have provided two sample queries below.  For the first one, the query optimizer will by default use the column store index since all of the required columns' data is available. The second query is same as the first one, but this time I am instructing query optimizer to ignore the ColumnStore Index with the OPTION clause as highlighted below and use the row store index instead (in this case cluster index):
SELECT F.SalesOrderNumber, F.OrderDateKey, F.CustomerKey, F.ProductKey, F.SalesAmount
FROM dbo.FactInternetSales AS F
INNER JOIN dbo.DimProduct AS D1 ON F.ProductKey = D1.ProductKey
INNER JOIN dbo.DimCustomer AS D2 ON F.CustomerKey = D2.CustomerKey


SELECT F.SalesOrderNumber, F.OrderDateKey, F.CustomerKey, F.ProductKey, F.SalesAmount
FROM dbo.FactInternetSales AS F
INNER JOIN dbo.DimProduct AS D1 ON F.ProductKey = D1.ProductKey
INNER JOIN dbo.DimCustomer AS D2 ON F.CustomerKey = D2.CustomerKey
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
GO
Below are the execution plans for the two queries. As you can see for the first query, the query optimizer uses the ColumnStore Index Scan operator (a brand new operator introduced in this release) whereas the second query uses Clustered Index Scan operator (as we asked explicitly to ignore column store index). Now look at the total cost of each query relative to the other one.  Even though both the queries are the same (and returns the same set of data to the client) the first one, which uses column store index, has 14% of the cost relative to the batch whereas the second one, which does not use column store index and uses cluster index instead, has 86% of the cost relative to the batch:
Here are the execution plans of the above two queries showing the performance of the ColumnStore index surpasses the Row Store index

As I said before, SQL Server has one more operation specifically for ColumnStore Indexes and this is evident when you hover your mouse over the ColumnStore Index Scan icon in the above execution plan. It will display details about this operator as shown below, notice the estimated I/O Cost for this operator and compare it with the one for Clustered Index Scan of the second query:
SQL Server ColumnStore index scan from the execution plan
We all know that the slowest part of a query execution is cost of IO (Input/Output), so the idea behind column store index, to reduce the IO cost by storing the data in column form, retrieving only those pages which contain data from the selected columns in the query and compressing the data on the page so that the total number of required pages can be reduced to a minimum. If you look at the image below, this is the IO statistics of the above two queries, the first query which uses ColumnStore Index has significantly less IO in comparison to the second query, which does not use column store index (instead it uses row store/clustered index):
IO Statistics for the two queries indicating the ColumnStore Index is more effecient than the Row Store Index
Now let's turn our attention to using aggregation. The two queries below are re-writes of the above queries, but this time I am using SUM function for summing up total sales for each day. The first query will use ColumnStore Index whereas the second query uses row store index (clustered) as we have specifically indicated to not use column store index by the query optimizer for the second query as highlighted:
SELECT F.OrderDateKey, SUM(F.SalesAmount) AS TotalSales
FROM dbo.FactInternetSales AS F
INNER JOIN dbo.DimProduct AS D1 ON F.ProductKey = D1.ProductKey
INNER JOIN dbo.DimCustomer AS D2 ON F.CustomerKey = D2.CustomerKey
GROUP BY F.OrderDateKey


SELECT F.OrderDateKey,SUM(F.SalesAmount) AS TotalSales
FROM dbo.FactInternetSales AS F
INNER JOIN dbo.DimProduct AS D1 ON F.ProductKey = D1.ProductKey
INNER JOIN dbo.DimCustomer AS D2 ON F.CustomerKey = D2.CustomerKey
GROUP BY F.OrderDateKey
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
GO
This time, the execution plan of the first query indicates that it uses the ColumnStore Index rather than the Row Store Index of the second query and hence the cost of the first query is 26% relative to the batch and the cost of the second query is 74% relative to the batch.  Once again the ColumnStore Index performance exceeds the traditional Row Store Indexes.
The execution plan of the first query uses the CcolumnStore Index and is more effecient than the Row Store Index in the second query

Please note the following items about ColumnStore Indexes

  • Example shown above is based on a table which has far less records as compared to production data warehouses. The expectation is that the performance improvement would be much higher than the example above for common data warehouse queries where a large amount of data scanning, data filtering and aggregation is involved. One study of an aggregation query on a 1 TB fact table with 1.44 billion records resulted in a 16 times improvement in CPU cycle and 455 times improvement in elapsed time if the query uses the ColumnStore Index.  For more information click here.
Please refer MSSQL Tips blog