Tips and Guidelines
• As a common practice,
every table should have a clustered index. Generally, but not always, the
clustered index should be on a column that monotonically increases, such as an
identity column or some other column where the value is unique. In many cases, the
primary key is the ideal column for a clustered index.
• Indexes should be measured
on all columns that are frequently used in
WHERE
, ORDER BY
, GROUP BY
, TOP
andDISTINCT
clauses.
• Do not automatically add
indexes on a table because it seems like the right thing to do. Only add
indexes if you know that they will be used by the queries run against the
table.
• For historical (static)
tables, create the indexes with a
FILLFACTOR
and a PAD_INDEX
of 100 to ensure there is no wasted
space. This reduces disk I/O, helping to boost overall performance.
• Queries that return a
single row are just as fast using a non-clustered index as a clustered index.
• Queries that return a
range of rows are just as fast using a clustered index as a non-clustered
index.
• Do not add more indexes on
your OLTP tables to minimize the overhead that occurs with indexes during data
modifications.
• Do not add the same index
more than once on a table with different names.
• Drop all those indexes
that are not used by the Query Optimizer, generally. You probably won't want to
add an index to a table under the following conditions:
·
If the index is not used by the query optimizer. Use the Query
Analyzer's "Show Execution Plan" option to see if your queries
against a particular table use an index or not.
·
If the table is small, most likely indexes will not be used.
·
If the column(s) to be indexed are very wide.
·
If the column(s) are defined as
TEXT
, NTEXT
or IMAGE
data types.
·
If the table is rarely queried but insertion, updating is
frequent.
• To provide up-to-date
statistics, the query optimizer needs to make smart query optimization
decisions. You will generally want to leave the "Auto Update
Statistics" database option on. This helps to ensure that
the optimizer statistics are valid, ensuring that queries are properly
optimized when they are run.
• Keep the "width"
of your indexes as narrow as possible. This reduces the size of the index and
reduces the number of disk I/O reads required to read the index.
• If possible, try to create
indexes on columns that have integer values instead of characters. Integer
values use less overhead than character values.
• If you have two or more
tables that are frequently joined together, then the columns used for the joins
should have an appropriate index. If the columns used for the joins are not
naturally compact, then consider adding surrogate keys to the tables that are
compact in order to reduce the size of the keys. This will decrease I/O during
the join process, which increases overall performance.
• When creating indexes, try
to make them unique indexes if at all possible. SQL Server can often search through a unique
index faster than a non-unique index. This is because, in a unique index, each
row is unique and once the needed record is found, SQL Server doesn't have to look any further.
• If a particular query
against a table is run infrequently and the addition of an index greatly speeds
the performanceof the query, but the performance of
INSERTS
, UPDATES
and DELETES
is negatively affected by the addition
of the index, consider creating the index for the table for the duration of
when the query is run and then dropping the index. An example of this is when
monthly reports are run at the end of the month on an OLTP application.
• Avoid using
FLOAT
or REAL
data types as primary keys, as they add
unnecessary overhead that can hurtperformance.
• If you want to boost the performance of
a query that includes an
AND
operator in the WHERE
clause, consider the following:
·
Of the search criteria in the
WHERE
clause, at least one of them should be
based on a highly selective column that has an index.
·
If at least one of the search criteria in the
WHERE
clause is not highly selective, consider
adding indexes to all of the columns referenced in the WHERE
clause.
·
If none of the columns in the
WHERE
clause are selective enough to use an
index on their own, consider creating a covering index for this query.
• The Query Optimizer will
always perform a table scan or a clustered index scan on a table if the
WHERE
clause in the query contains an OR
operator and if any of the referenced
columns in the OR
clause are not indexed (or do not have a
useful index). Because of this, if you use many queries with OR
clauses, you will want to ensure that
each referenced column in the WHERE
clause has an index.
• If you have a query that
uses
OR
s
and it is not making the best use of indexes, consider rewriting it as a UNION
and then testing performance. Only through testing can you be
sure that one version of your query will be faster than another.
• If you use the
SOUNDEX
function against a table column in a WHERE
clause, the Query Optimizer will ignore
any available indexes and perform a table scan.
• Queries that include
either the
DISTINCT
or the GROUP BY
clauses can be optimized by including
appropriate indexes. Any of the following indexing strategies can be used:
·
Include a covering, non-clustered index (covering the
appropriate columns) of the
DISTINCT
or the GROUP BY
clauses.
·
Include a clustered index on the columns in the
GROUP BY
clause.
·
Include a clustered index on the columns found in the
SELECT
clause.
·
Adding appropriate indexes to queries that include
DISTINCT
or GROUP BY
is most important for those queries that
run often.
• Avoid clustered indexes on
columns that are already "covered" by non-clustered indexes. A
clustered index on a column that is already "covered" is redundant.
Use the clustered index for columns that can better make use of it.
• Ideally a clustered index
should be based on a single column (not multiple columns) that are as narrow as
possible. This not only reduces the clustered index's physical size, it also
reduces the physical size of non-clustered indexes and boosts SQL Server's overall performance.
• When you create a
clustered index, try to create it as a unique clustered index, not a non-unique
clustered index.
•
SET NOCOUNT ON
at the beginning of each stored
procedure you write. This statement should be included in every stored
procedure, trigger, etc. that you write.
• Keep Transact-SQL transactions as short as possible
within a stored procedure. This helps to reduce the number of locks, helping to
speed up the overall performance of your SQL Server application.
• If you are creating a
stored procedure to run in a database other than the Master database, don't use the prefix
sp_
in its name. This special prefix is
reserved for system stored procedures. Although using this prefix will not
prevent a user defined stored procedure from working, what it can do is to slow
down its execution ever so slightly.
• Before you are done with
your stored procedure code, review it for any unused code, parameters or
variables that you may have forgotten to remove while you were making changes
and remove them. Unused code just adds unnecessary bloat to your stored
procedures, although it will not necessarily negatively affect performance of
the stored procedure.
• For best performance, all objects that are called within
the same stored procedure should be owned by the same object owner or schema,
preferably
dbo
,
and should also be referred to in the format ofobject_owner.object_name
or schema_owner.object_name
.
• When you need to execute a
string of Transact-SQL, you
should use the
sp_execute
sql stored
procedure instead of the EXECUTE
statement.
• If you use input
parameters in your stored procedures, you should validate all of them at the
beginning of your stored procedure. This way, if there is a validation problem
and the client application needs to be notified of the problem, it happens
before any stored procedure processing takes place, preventing wasted effort
and boostingperformance.
• When calling a stored
procedure from your application, it is important that you call it using its
qualified name, for example:
exec dbo.myProc
...instead of:
exec myProc
• If you think a stored
procedure will return only a single value and not a record set, consider
returning the single value as an output parameter.
• Use stored procedures
instead of views. They offer better performance.
• Don't include code,
variable or parameters that don't do anything.
• Don't be afraid to make
broad-minded use of in-line and block comments in your Transact-SQL code. They will not affect the performance of
your application and they will enhance your productivity when you have to come
back to the code and try to modify it.
• If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce theperformance and scalability of your applications.
• If you have the choice of
using a join or a sub-query to perform the same task within a query, generally
the join is faster. This is not always the case, however, and you may want to
test the query using both methods to determine which is faster for your
particular application.
• If your application
requires you to create temporary tables for use on a global or per connection
use, consider the possibility of creating indexes for these temporary tables.
While most temporary tables probably won't need -- or even use -- an index,
some larger temporary tables can benefit from them. A properly designed index
on a temporary table can be as great a benefit as a properly designed index on
a standard database table.
• Instead of using temporary
tables, consider using a derived table instead. A derived table is the result
of using a
SELECT
statement
in the FROM
clause of an existing SELECT
statement. By using derived tables
instead of temporary tables, you can reduce I/O and often boost your
application's performance.
• For better performance, if you need a temporary table in
your Transact-SQL code, consider using a table variable
instead of creating a conventional temporary table.
• Don't repeatedly reuse the
same function to calculate the same result over and over within your Transact-SQL code.
• If you use
BULK INSERT
to import data into SQL Server, then use the TABLOCK
hint along with it. This will preventSQL Server from
running out of locks during very large imports and will also boost performance due
to the reduction of lock contention.
• Always specify the
narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store and the faster SQL Server is able to read and write data. In
addition, if any sorts need to be performed on the column, the narrower the
column, the faster the sort will be.
• If you need to store large
strings of data and they are less than 8000 characters, use a
VARCHAR
data type instead of a TEXT
data type. TEXT
data types have extra overhead that drag
down performance.
• Don't use the
NVARCHAR
or NCHAR
data types unless you need to store
16-bit character (Unicode) data. They take up twice as much space as VARCHAR
or CHAR
data types, increasing server I/O
and wasting unnecessary space in your buffer cache.
• If the text data in a
column varies greatly in length, use a
VARCHAR
data type instead of a CHAR
data type. The amount of space saved by
using VARCHAR
over CHAR
on variable length columns can greatly
reduce the I/O reads that the cache memory uses to hold data, improving overall SQL Server performance.
• If a column's data does
not vary widely in length, consider using a fixed-length
CHAR
field instead of a VARCHAR
.
While it may take up a little more space to store the data, processing
fixed-length columns is faster in SQL Serverthan processing variable-length columns.
• If you have a column that
is designed to hold only numbers, use a numeric data type such as
INTEGER
instead of aVARCHAR
or CHAR
data type. Numeric data types generally
require less space to hold the same numeric value than does a character data
type. This helps to reduce the size of the columns and can boost performance when
the columns are searched (WHERE
clause), joined to another column or
sorted.
• If you use the
CONVERT
function to convert a value to a
variable length data type such as VARCHAR
,
always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30.
Ideally, you should specify the shortest length to accomplish the required
task. This helps to reduce memory use and SQL Serverresources.
• Avoid using the new
BIGINT
data type unless you really need its
additional storage capacity. The BIGINT
data type uses 8 bytes of memory, versus
4 bytes for the INT
data type.
• Don't use the
DATETIME
data type as a primary key. From a performance perspective,
it is more efficient to use a data type that uses less space. For example, the DATETIME
data type uses 8 bytes of space, while
the INT
data type only takes up 4 bytes. The
less space used, the smaller the table and index, and the less I/O overhead
that is required to access the primary key.
• If you are creating a
column that you know will be subject to many sorts, consider making the column
integer-based and not character-based. This is because SQL Server can sort integer data much faster than
character data.
• Carefully evaluate whether
your
SELECT
query needs the DISTINCT
clause or not. Some developers
automatically add this clause to every one of their SELECT
statements, even when it is not
necessary. This is a bad habit that should be stopped.
• When you need to use
SELECT INTO
option, keep in mind that it can lock
system tables, preventing other users from accessing the data they need while
the data is being inserted. In order to prevent or minimize the problems caused
by locked tables, try to schedule the use of SELECT INTO
when your SQL Server is less busy. In addition, try to keep
the amount of data inserted to a minimum. In some cases, it may be better to
perform several smallerSELECT INTO
s instead of performing one large SELECT INTO
.
• If you need to verify the
existence of a record in a table, don't use
SELECT COUNT (*)
in your Transact-SQL code to identify it. This is very
inefficient and wastes server resources. Instead, use the Transact-SQL IF EXISTS
to determine if the record in question
exists, which is much more efficient.
• By default, some
developers -- especially those who have not worked with SQL Server before -- routinely include code
similar to this in their
WHERE
clauses when they make string
comparisons:SELECT column_name FROM table_name
WHERE LOWER (column_name) = 'name'
In other words, these
developers are making the assumption that the data in SQL Server is case-sensitive, which it generally
is not. If your SQL Server database is not configured to be case
sensitive, you don't need to use
LOWER
orUPPER
to force the case of text to be equal
for a comparison to be performed. Just leave these functions out of your code.
This will speed up the performance of your query, as any use of text
functions in a WHERE
clause hurtsperformance.
However, what if your database has
been configured to be case-sensitive? Should you then use the
LOWER
andUPPER
functions to ensure that comparisons are
properly compared? No. The above example is still poor coding. If you have to
deal with ensuring case is consistent for proper comparisons, use the technique
described below, along with appropriate indexes on the column in question:SELECT column_name FROM table_name
WHERE column_name = 'NAME' or column_name = 'name'
This code will run much
faster than the first example.
• If you currently have a
query that uses
NOT IN
,
which offers poor performance because the SQL Server optimizer has to use a nested table
scan to perform this activity, instead try to use one of the following options,
all of which offer better performance:
·
Use
EXISTS
or NOT EXISTS
·
Use
IN
·
Perform a
LEFT OUTER JOIN
and check for a NULL
condition
• When you have a choice of
using the
IN
or the EXISTS
clause in your Transact-SQL, you will generally want
to use the EXISTS
clause, as it is usually more efficient
and performs faster.
• If you find that SQL Server uses a
TABLE SCAN
instead of an INDEX SEEK
when you use an IN
/OR
clause as part of your WHERE
clause, even when those columns are
covered by an index, consider using an index hint to force the Query Optimizer
to use the index.
• If you use
LIKE
in your WHERE
clause, try to use one or more leading
characters in the clause, if possible. For example, use:LIKE 'm%' instead of LIKE ‘%m’
• If your application needs
to retrieve summary data often, but you don't want to have the overhead of
calculating it on the fly every time it is needed, consider using a trigger
that updates summary values after each transaction into a summary table.
• When you have a choice of
using the IN or the
BETWEEN
clauses in your Transact-SQL, you will generally want
to use the BETWEEN
clause, as it is much more efficient.
For example...SELECT task_id, task_name
FROM tasks
WHERE task_id in (1000, 1001, 1002, 1003, 1004)
...is much less efficient
than this:
SELECT task_id, task_name
FROM tasks
WHERE task_id BETWEEN 1000 and 1004
• If possible, try to avoid
using the
SUBSTRING
function in your WHERE
clauses. Depending on how it is
constructed, using the SUBSTRING
function can force a table scan instead
of allowing the optimizer to use an index (assuming there is one). If the
substring you are searching for does not include the first character of the
column you are searching for, then a table scan is performed.
• If possible, you should
avoid using the
SUBSTRING
function and use the LIKE
condition instead for betterperformance. Instead of
doing this:WHERE SUBSTRING(task_name,1,1) = 'b'
Try using this instead:
WHERE task_name LIKE 'b%'
• Avoid using optimizer
hints in your
WHERE
clauses. This is because it is generally
very hard to out-guess the Query Optimizer. Optimizer hints are special
keywords that you include with your query to force how the Query Optimizer
runs. If you decide to include a hint in a query, this forces the Query
Optimizer to become static, preventing the Query Optimizer from dynamically
adapting to the current environment for the given query. More often than not,
this hurts -- not helps -- performance.
• If you have a
WHERE
clause that includes expressions
connected by two or more AND
operators, SQL Server will evaluate them from left to right
in the order they are written. This assumes that no parentheses have been used
to change the order of execution. Because of this, you may want to consider one
of the following when using AND
:
·
Locate the least likely true
AND
expression first.
·
If both parts of an
AND
expression are equally likely of being
false, put the least complex AND
expression first.
·
You may want to consider using Query Analyzer or Management
Studio to look at the execution plans of your queries to see which is best for
your situation
• Don't use
ORDER BY
in your SELECT
statements unless you really need to, as
it adds a lot of extra overhead. For example, perhaps it may be more efficient
to sort the data at the client than at the server.
• Whenever SQL Server has to perform a sorting operation,
additional resources have to be used to perform this task. Sorting often occurs
when any of the following Transact-SQL statements
are executed:
·
ORDER BY
·
GROUP BY
·
SELECT DISTINCT
·
UNION
• If you have to sort by a
particular column often, consider making that column a clustered index. This is
because the data is already presorted for you and SQL Server is smart enough not to resort the
data.
• If your
WHERE
clause includes an IN
operator along with a list of values to
be tested in the query, order the list of values so that the most frequently
found ones are placed at the start of the list and the less frequently found
ones are placed at the end of the list. This can speed up performance because
the IN option returns true as soon as any of the values in the list produce a
match. The sooner the match is made, the faster the query completes.
• If your application
performs many wildcard (
LIKE %
) text searches on CHAR
or VARCHAR
columns, consider usingSQL Server's full-text search option. The Search
Service can significantly speed up wildcard searches of text stored in adatabase.
• The
GROUP BY
clause can be used with or without an
aggregate function. However, if you want optimumperformance, don't use the GROUP BY
clause without an aggregate function.
This is because you can accomplish the same end result by using the DISTINCT
option instead, and it is faster. For
example, you could write your query two different ways:SELECT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
GROUP BY OrderID
...or:
SELECT DISTINCT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
• It is important to design
applications that keep transactions as short as possible. This reduces locking
and increases application concurrently, which helps to boost performance.
• In order to reduce network
traffic between the client or middle-tier and SQL Server --
and also to boost your SQLServer-based
application's performance -- only the data needed by the client
or middle-tier should be returned bySQL Server. In other words,
don't return more data (both rows and columns) from SQL Server than you need to the client or
middle-tier and then further reduce the data to the data you really need at the
client or middle-tier. This wastes SQL Server resources
and network bandwidth.
• To make complex queries
easier to analyze, consider breaking them down into their smaller constituent
parts. One way to do this is to simply create lists of the key components of
the query, such as:
·
List all of the columns that are to be returned
·
List all of the columns that are used in the
WHERE
clause
·
List all of the columns used in the
JOIN
s
(if applicable)
·
List all the tables used in
JOIN
s
(if applicable)
·
Once you have the above information organized into this
easy-to-comprehend form, it is much easier to identify those columns that could
potentially make use of indexes when executed.
No comments:
Post a Comment