Quick and short: Database Indexes

By Ashish Khandelwal, December 17, 2009

This article is written under ‘Quick and Short’ edition. In this article I will explain about database indexes, type of indexes, best practices to use index, how to use indexes to improve performance of T-SQL (database query).

Read more Quick and Short articles.

What is index

Indexes in databases are very similar to indexes in libraries. Indexes allow locating information within a database fast, much like they do in libraries. If all books in a library are indexed alphabetically then you don’t need to browse the whole library to find particular book. Instead you’ll simply get the first letter from the book title and you’ll find this letter’s section in the library starting your search from there, which will narrow down your search significantly.

An Index can be created on a single column or a combination of columns in a database table. A table index is a database structure that arranges the values of one or more columns in a database table in specific order. The table index has pointers to the values stored in specified column or combination of columns of the table. These pointers are ordered depending on the sort order specified in the index.

Best Practices to use indexes

There are some general rules which describe when to use indexes.

  • When dealing with relatively small tables, indexes do not improve performance. In general indexes improve performance when they are created on fields used in table joins.
  • Use indexes when most of your database queries retrieve relatively small datasets, because if your queries retrieve most of the data most of the time, the indexes will actually slow the data retrieval.
  • Use indexes for columns that have many different values (there are not many repeated values within the column).
  • Although indexes improve search performance, they slow the updates, and this might be something worth considering.

 

Get best performance form indexes

Normally to get a maximum benefit (performance) from the indexes:

  • Ordering of columns in the indexes and in the where clause of select query should be same. E.g. An index on columns A, B, C can be searched efficiently on A, on A, B, and A, B, C. but will not be much efficient if column order in where clause will be C, B, A.
  • The column filter which filters maximum records should be placed first so that later condition will not be required to run on large record set. E.g. if filter conditions are “price <= 0.9999, sumvalue > 0, and item = ’123456′” then the order of conditions should be “item = ’123456′ AND sumvalue > 0 AND price <= 0.9999” because item = ’123456′ condition will filter maximum records and then the next condition will only be required to run on small set of records.
  • The filter condition which has specific value should be placed first and then the column which performs join condition should be placed.

Please remember the ordering really matters lot. Also, sometimes you may create more than one indexes on table (usually due to one is created by Primary key, and another you create based on your need). In this case, you may require to force the Query Engine to use the index which best suits for the query optimization. Because, if you will not force, the engine will automatically pick the index which it feels best (normally clustered index), this might not return the best performance. To force the query engine to use specific, use with(index([index Name])) – see below example

SELECT price, sumtotal, item
FROM  itemmaster with(index(IX_tbl_itemmaster)) ,
invoice with(index(IX_tbl_Invoice))
WHERE item = 123456
 AND price <= 0.999
 AND sumvalue > 0

Whenever you create an index, you should test the query performance. Use the Query Analyzer and run the query. Concentrate on number of “reads” query used to find the matching records, because “reads” says how many pages were scanned by engine to retrieve the records for the given conditions. Each logical “reads” represents 8KB page. For example, if the number of logical reads is 85 then SQL Server logically read 5,570,560 bytes of data (85 pages times 8192 bytes per page). Try to minimize the number of reads, the performance will automatically increase.

 

Types of Indexes

clustered index

A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. However, the index can comprise multiple columns (a composite index), like the way a telephone directory is organized by last name and first name.

A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query. Also, if there is a column(s) that is used frequently to sort the data retrieved from a table, it can be advantageous to cluster (physically sort) the table on that column(s) to save the cost of a sort each time the column(s) is queried.

Note  PRIMARY KEY constraints create clustered indexes automatically if no clustered index already exists on the table and a nonclustered index is not specified when you create the PRIMARY KEY constraint.

Alternatively, a clustered index could be created on lname, fname (last name, first name), because employee records are often grouped and queried in this way rather than by employee ID.

 

Clustered index Considerations

It is important to define the clustered index key with as few columns as possible. If a large clustered index key is defined, any nonclustered indexes that are defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key. The Index Tuning Wizard does not return an error when saving an SQL script to a disk with insufficient available space.

The Index Tuning Wizard can consume significant CPU and memory resources during analysis. It is recommended that tuning should be performed against a test version of the production server rather than the production server. Additionally, the wizard should be run on a separate computer from the computer running SQL Server.

Before creating clustered indexes, understand how your data will be accessed. Consider using a clustered index for:

  • Columns that contain a large number of distinct values.
  • Queries that return a range of values using operators such as BETWEEN, >, >=, <, and <=.
  • Columns that are accessed sequentially.
  • Queries that return large result sets.
  • Columns that are frequently accessed by queries involving join or GROUP BY clauses; typically these are foreign key columns. An index on the column(s) specified in the ORDER BY or GROUP BY clause eliminates the need for SQL Server to sort the data because the rows are already sorted. This improves query performance.
  • OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Create a clustered index on the primary key.

Clustered indexes are not a good choice for:

  • Columns that undergo frequent changes: This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.
  • Wide keys: The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.

 

Nonclustered Indexes

A nonclustered index is analogous to an index in a textbook. The data is stored in one place, the index in another, with pointers to the storage location of the data. The items in the index are stored in the order of the index key values, but the information in the table is stored in a different order (which can be dictated by a clustered index). If no clustered index is created on the table, the rows are not guaranteed to be in any particular order.

Similar to the way you use an index in a book, Query Engine searches for a data value by searching the nonclustered index to find the location of the data value in the table and then retrieves the data directly from that location. This makes nonclustered indexes the optimal choice for exact match queries because the index contains entries describing the exact location in the table of the data values being searched for in the queries. If the underlying table is sorted using a clustered index, the location is the clustering key value; otherwise, the location is the row ID (RID) comprised of the file number, page number, and slot number of the row. For example, to search for an employee ID (emp_id) in a table that has a nonclustered index on the emp_id column, SQL Server looks through the index to find an entry that lists the exact page and row in the table where the matching emp_id can be found, and then goes directly to that page and row.

Nonclustered Indexes Considerations

Before you create nonclustered indexes, understand how your data will be accessed. Consider using nonclustered indexes for:

  • Columns that contain a large number of distinct values, such as a combination of last name and first name (if a clustered index is used for other columns). If there are very few distinct values, such as only 1 and 0, most queries will not use the index because a table scan is usually more efficient.
  • Queries that do not return large result sets.
  • Columns frequently involved in search conditions of a query (WHERE clause) that return exact matches.
  • Decision-support-system applications for which joins and grouping are frequently required. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
  • Covering all columns from one table in a given query. This eliminates accessing the table or clustered index altogether.

 —————————————————

Readers are requested to add comment, and add more information in case they want.

Read more Quick and Short articles.

VN:F [1.7.2_963]
Rating: 5.0/5 (1 vote cast)

Leave a Reply