Category: Indexes

The tipping point and covering indexes

So you’re tuning a slow query and you create a new index to eliminate a table or clustered index scan.  In your testing you find that it works well for some search arguments, but is ignored by the Optimizer for others.  You notice that the “ignored” queries tend to return larger record sets.  What’s going on here and how do you persuade the Optimizer to use your index (spoiler alert:  you don’t!).

Odds are good that you’ve created a non-covering index.  This is simply an index that doesn’t have all of the columns necessary to execute a query.    Quick quiz…

Which (if any) of the queries below are covered by this index?

CREATE INDEX ix_LastName_FirstName__MiddleName
 ON Person.Person (LastName, FirstName) INCLUDE (MiddleName);
  1. SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName = N’Martinez’;
  2. SELECT * FROM Person.Person WHERE LastName LIKE N’M%’;
  3. SELECT FirstName, LastName FROM Person.Person WHERE LastName = N’Martinez’;
  4. SELECT FirstName, LastName FROM Person.Person WHERE FirstName = N’Daniel’;

The answer is both C and D.  In each case, the index contains all the data necessary to execute the query.  You can see that in the respective query plans:

Note:  Even though we need to scan the nonclustered index to find the Daniels, it’s still a net win over the cost of having to scan the clustered index (107 vs 3821 logical I/Os).

In the absence of a covering index (queries A and B), the Optimizer must choose between index seeks plus key lookups or a clustered index scan as illustrated below:

And its decision is driven primarily by I/O cost which in turn depends on row counts and index structure.

B-tree indexes (for both clustered and nonclustered indexes) are built of 8K pages arranged in levels.  Each level is an index for the layer immediately below it, and there’s always a single page at the top (or root).  You can find out how deep and wide your indexes are using the sys.dm_db_index_physical_stats() function:

SELECT i.name, s.index_type_desc, s.alloc_unit_type_desc, s.index_depth,
   s.index_level, s.page_count
FROM sys.indexes i
    CROSS APPLY sys.dm_db_index_physical_stats ( DB_ID (), i.object_id, i.index_id, NULL, 'DETAILED' ) s
WHERE i.object_id = OBJECT_ID ( 'Person.Person' ) AND
 i.index_id IN ( 1, 2 ) AND
s.alloc_unit_type_desc = N'IN_ROW_DATA';

Be careful using DETAILED mode – it requires that the full index be read.  That’s a lot of expensive, slow I/O for big tables/indexes.  LIMITED mode returns less information, but is a safer option for production systems.

The query returned data on two indexes on the Person table – its clustered index and a nonclustered index.  The clustered index (blue in the diagram below) is 3 levels deep and 3811 pages wide at the leaf level (level = 0).  The nonclustered index (green) is both shallower (2 levels) and narrower (105 pages at the leaf level):

An Index Seek starts at the topmost root page then traverses any intermediate layers to get to the leaf level pages where the column data you’re after resides.  The cost of each seek is the depth of the index.  [Note that deeper indexes (larger tables and/or wider index keys) will have a higher seek costs.]  Once we get to the leaf level, we might scan additional pages laterally to return a range of data (e.g., all the persons whose last name is Martinez). 

An Index Scan just reads through all the leaf level pages searching for data that matches your filter criteria.  The cost is the number of pages at the leaf level.

To execute the this query using our nonclustered (non-covering) index…

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName = N'Martinez'; 

…the Optimizer first performs a seek + range scan on the nonclustered index to gather the first and last names plus the clustered index keys for all the folks named Martinez.  Then, one at a time, it performs seeks into the clustered index (here referred to as key lookups) to retrieve their titles.  The kicker is that each individual key lookup costs 3 logical reads, so to return 173 rows the cost is in the ballpark of 173 * 3 = 519 logical I/Os.  And as the row counts increase the I/O costs increase 3X faster.  As we approach 1300 rows we get to a point where it would be cheaper to simply scan the clustered index once (at a cost of ~3800 logical reads) and be done with it.

And this is why the Optimizer stops using your non-covering index – it’s simply too costly to use it.  The point where this transition happens is called the tipping point

You can’t predict where exactly this switch will happen.  There’s lots of secret, internal magic around determining tipping points, but per Kimberly Tripp a ballpark figure is when you’re reading around 30% of the pages in the table.

If you’re feeling adventurous you can go hunting for tipping points using an approach like this:

SET STATISTICS IO ON;

-- Query an ever widening range of indexed values
SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'D%';            
/*  556 rows; 1717 logical reads */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[DE]%';            
/*  743 rows; 2291 logical reads */

/* The tipping point is here.  The queries above use key lookups, those below a clustered index scan */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[D-F]%';  
/* 1111 rows; 3821 logical reads */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[D-G]%';  
/* 2465 rows; 3821 logical reads */

Somewhere north of 743 rows the key lookup I/O cost becomes high enough that the Optimizer switches to using a clustered index scan. 

The tipping point determination isn’t always spot on, but it will be close enough most of the time, so generally speaking, you shouldn’t be overriding the Optimizer’s judgement, although you can.

So what if you really want the Optimizer to use your lovely index?  You can literally “force” the issue with a Table Hint, but it’s probably not a good idea as it’s likely to increase your I/O costs – and more I/O means slower queries.  Let’s use the FORCESEEK hint on the last couple of queries where the Optimizer chose to use a clustered index scan:

SELECT Title, FirstName, LastName FROM Person.Person WITH (FORCESEEK) WHERE LastName LIKE N'[D-F]%';   
/* 1111 rows; 3420 logical reads */
SELECT Title, FirstName, LastName FROM Person.Person WITH (FORCESEEK) WHERE LastName LIKE N'[D-G]%';   
/* 2465 rows; 7575 logical reads */

In the first instance, the I/O costs are actually a bit lower, but, most of the time this won’t be the case.  Note that the I/O cost for the 2465 row query is nearly double that when the Optimizer was allowed to do its thing.

A better solution here is to create a covering index which will eliminate both key lookups and clustered index seeks!  Since indexes are costly to create and maintain, focus on creating indexes to support your most important queries well.  Let’s create an index that covers that last set of test queries then rerun them:

CREATE INDEX ix_LastName_FirstName__Title 
ON Person.Person (LastName, FirstName) INCLUDE (Title);

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'D%';            
/*  556 rows; 6 logical reads */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[DE]%';            
/*  743 rows; 7 logical reads */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[D-F]%';  
/* 1111 rows; 9 logical reads */

SELECT Title, FirstName, LastName FROM Person.Person WHERE LastName LIKE N'[D-G]%';  
/* 2465 rows; 16 logical reads */

Now all 4 can be executed with index seeks and the I/O cost plummets greatly improving performance!

Happy Exploring!

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

How to identify redundant/duplicate indexes in SQL Server

What are redundant indexes in SQL Server?

Identifying redundant indexes is a very important task for a DBA. I personally do not find any reason of having redundant indexes on databases. Otherwise, it is an over head to the system like maintaining the index/storage space etc.

The challenge is to identify a redundant/duplicate index in SQL Server? If we do not understand the meaning of redundant/duplicate, then we might end up with removing important indexes and it can lead to performance degradation.

I would like to list out few considerations to categorize indexes as redundant; if:

1. An index has Same key columns in the same order with another index

2. An index has Key columns those are left based subset of another index

3. Meeting all of the above, ONLY for similar index types.(Clustered and no clustered are not to be considered as duplicate indexes)

4. Meeting all of the above, And the Key Columns specified with same ordering (ASC/DESC)

The below query does not identify the duplicate index, but it gives you enough information with which we can easily identify the duplicate indexes in your system.

Note:(String_AGG is new function in SQL Server 2017)

Select Object_Schema_name(ix.object_id) Schema_Name,
		Object_name(ix.object_id) Object_Name,ix.name,ix.type_desc ,
		string_agg(Cast(c.name as nvarchar(MAX)) + ' (' + case when is_descending_key = 0  then 'ASC' Else 'DESC' END + ') ',',') 
		within group(Order by ixc.key_ordinal asc) As KeyCols
From
	sys.indexes ix
	inner join sys.index_columns ixc on ix.index_id = ixc.index_id and ix.object_id = ixc.object_id
	inner join sys.columns c on ixc.object_id = c.object_id and c.column_id = ixc.column_id 
Where objectpropertyex(ix.object_id,'IsMSShipped') =0 
		And ixc.is_included_column=0 and ix.index_id <> 1 
		Group by ix.object_id,ix.name,ix.type_desc
Order by 1 asc,2 asc

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

How to identify non-indexed foreign keys in SQL Server

Here is a script to identify foreign keys without any supporting index in SQL Server.

The script is developed on an assumption to verify ONLY the leading column of the index and the foreign key. This is to avoid any duplicate indexes that could create without a detailed analysis.


;WITH CTE AS 
(
	SELECT A.NAME,SCHEMA_NAME(A.SCHEMA_ID) SCHEMA_NAME,OBJECT_NAME(B.REFERENCED_OBJECT_ID) PARENT_TABLE_NAME,OBJECT_NAME(B.PARENT_OBJECT_ID) CHILD_TABLE_NAME,
	B.PARENT_OBJECT_ID,B.PARENT_COLUMN_ID,C.COLUMN_NAME COLUMNNAME
	,ROW_NUMBER()OVER(PARTITION BY A.NAME,A.SCHEMA_ID,B.REFERENCED_OBJECT_ID,B.PARENT_OBJECT_ID ORDER BY C.ORDINAL_POSITION ASC) RN
	FROM SYS.FOREIGN_KEYS A
	INNER JOIN SYS.FOREIGN_KEY_COLUMNS B ON A.OBJECT_ID = B.CONSTRAINT_OBJECT_ID
	INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON C.COLUMN_NAME=COL_NAME(B.PARENT_OBJECT_ID,B.PARENT_COLUMN_ID) AND OBJECT_NAME(B.PARENT_OBJECT_ID)=C.TABLE_NAME 
)
SELECT A.NAME FOREIGNKEY_NAME,A.SCHEMA_NAME,A.PARENT_TABLE_NAME,A.CHILD_TABLE_NAME,A.COLUMNNAME FORIEGNKEY_LEADCOLUMN
FROM CTE A
LEFT JOIN SYS.INDEX_COLUMNS D ON D.OBJECT_ID = A.PARENT_OBJECT_ID AND D.COLUMN_ID = A.PARENT_COLUMN_ID AND KEY_ORDINAL=1 
LEFT JOIN SYS.INDEXES D1 ON D.OBJECT_ID = D1.OBJECT_ID AND D.INDEX_ID=D1.INDEX_ID
LEFT JOIN SYS.COLUMNS E ON E.COLUMN_ID=D.COLUMN_ID AND E.OBJECT_ID=D.OBJECT_ID  
WHERE D.COLUMN_ID IS NULL AND A.RN=1
ORDER BY CHILD_TABLE_NAME ASC

Extending sp_helpindex to get more useful information about index in SQL Server

This post is to help people who had tough time to get information about index and its included columns in SQL Server. Whenever I need, I always depend on object explorer and get the definition of the index.So I thought of compiling a snippet as below that can provide these information handy.

I created a procedure called – sp_helpindexExtended as below and the Tablename as parameter(this is an optional parameter).

Procedure Definition


Create Proc sp_helpindexExtended (@TableName sysname = NULL) 
as 
Begin 
	  ;With cte as 
	  (
		 Select Object_name(A.object_id) objectname,A.name index_name,
			Lower(type_desc) + Case When IS_Unique =1 then ', unique' Else '' end +
			Case when is_primary_key =1 Then ', primary key' Else '' End+
			Case when IS_Unique_Constraint = 1 Then ',unique key' Else '' End  index_description,
			Case when is_included_column=0 then c.name  + '('+
            Case When System_type_id = User_Type_id Then Type_Name(System_type_id) Else Type_Name(User_Type_id) End +')'  Else NULL end ColumnName,
            Case when is_included_column=1 then c.name  + '('+
            Case When System_type_id = User_Type_id Then Type_Name(System_type_id) Else Type_Name(User_Type_id) End +')'  Else NULL end IncludedColumnnames
            ,Is_Unique,Is_Primary_Key,IS_Unique_Constraint,Fill_factor,key_ordinal
    From sys.indexes A  
    Inner Join sys.index_columns B On A.object_id = B.object_id And A.index_id = B.index_id 
    Inner Join sys.columns C On c.object_id = B.object_id  And C.column_id  = B.column_id 
	Where A.Object_ID = Case when @TableName is not null Then OBJECT_ID(@TableName) Else A.object_id End
	)	SELECT objectname,index_name,index_description,
       stuff(( SELECT ','+ColumnName AS [text()] FROM cte p2
          WHERE p2.objectname = p1.objectname and p2.index_name = p1.index_name
          ORDER BY key_ordinal asc FOR XML PATH('') ), 1, 1,'') AS index_keys,
		stuff( ( SELECT IsNull(','+IncludedColumnnames ,'')  FROM cte p2
          WHERE p2.objectname = p1.objectname and p2.index_name = p1.index_name
          ORDER BY key_ordinal asc FOR XML PATH('') ), 1, 1,'')  AS included_keys
		  ,fill_factor 
      FROM cte p1 GROUP BY objectname,index_name ,index_description,fill_factor 
	 order by objectname asc
End

Procedure Usage

-- To get for all tables
exec sp_helpindexextended
-- To get only for a table
exec sp_helpindexextended 'tablename'