6 March 2015

Covering Index Performance

Query
I was recently involved in troubleshooting data warehouse queries for one of my customers who had an ETL package running longer than usual. The package was quite simple – it was inserting a large number of rows from a staging table into a large table containing hundreds of millions of rows. The initial investigation revealed that the package's OLEDB destination adapter was responsible for most of the execution time. The destination table had a number of indexes, some of them larger than 200 GB and the execution plan for the bulk inserts has showed that index updates were significantly contributing to the slowness of the inserts. 
The total execution costs for a few large non clustered indexes had more than a 40% share in the entire execution plan. We've considered disabling the largest indexes, however it turned out they were heavily used by some daily reports and disabling them could lead to significant performance impact on those reports.
Disabling large indexes prior to ETL and enabling them afterwards also was not an option, because the index rebuild required significant time and reports on those indexes needed to be run immediately after the ETL package. So, I started looking into ways to reduce index sizes, assuming that smaller indexes would require less index updates and therefore better performance for data insert and update transactions.

Solution 

Fully covering and partially covering SQL Server indexes

A closer examination has showed that even though the largest indexes had only a few key fields, they contained a high number of columns in their INCLUDED columns logic, apparently some developers created them in order to satisfy queries with large SELECT lists. For those of you who are not familiar with covering indexes, I'd recommend readingthis article by Joe Webb.
The biggest benefit of covering index is that when it contains all the fields required by query, it may greatly improve query performance and best results are often achieved by placing fields required in filter or join criteria into the index key and placing fields required in the SELECT list into the INCLUDE part of the covering index. As you can see from the sample execution plan below, the index IDX_Employees_Covering provides all the fields included in the SELECT list and therefore completely satisfies the query:
Query:
SELECT FirstName, LastName, Birthdate, DepartmentId, PositionId, 
ManagerId, Salary, Address, City, State, HiredDate
from Employees 
where DepartmentId between 10 and 70 
AND PositionId between 10 and 80
Index:
CREATE NONCLUSTERED INDEX IDX_Employees_Covering ON Employees (DepartmentId,PositionId) 
INCLUDE (FirstName,LastName,Birthdate,ManagerId,Salary,Address,City,State,HiredDate)
Figure1. Execution plan for fully covering index:





















However, in some cases a fully covering index may become partially covering, due to table/query enhancements. As you can see from the example below for the same query, index IDX_Employees_PartiallyCovering doesn't provide all the required fields and SQL Server is forced to use an additional operator, a Key Lookup in order to bring missing columns (City, HiredDate, State in this example) from the clustered index:
Index:
CREATE NONCLUSTERED INDEX IDX_Employees_PartiallyCovering 
ON Employees DepartmentId PositionId 
INCLUDE FirstName LastName Birthdate ManagerId 
Figure2. Execution plan for partially covering index:








Index optimization

In the above mentioned problem my goal was to optimize the existing indexes without compromising the performance of read queries. So, I've found all the queries from the query cache which had reference to large indexes and analyzed their execution plans, using this query described by Jonathan Kehayias in this article.  This analysis has revealed that none of the large covering indexes were fully satisfying queries, in other words they had Lookup operators in their execution plans. So I thought of removing some of the columns from the INCLUDE part of the index would not change the query since a Lookup operator had to bring more fields in anyway as shown below.
Execution plans for partially covering indexes:
















I've removed most of the fields from the INCLUDE part of the covering indexes without touching the main index and index sizes have dropped up to 40%.  As a result, query performance improvements have exceeded all my expectations- the execution costs of related SELECT queries have dropped and they ran faster. Moreover, we've gained more than 45% improvement on data warehouse bulk insert transactions. Below I've tried to reproduce the problem and provide a comparison between various indexing options.

Please refer MSSQL Tips blog

No comments:

Post a Comment