Filtered indexes are probably my favorite feature in 2008. That's saying a lot, since there are so many great new features to choose from. In this post, I want to explore a little about how filtered indexes work, how they can be applied, and some of the "gotchas" to be aware of.
First, for those of you who may not yet know about filtered indexes, allow me enlighten you. In short, filtered indexes allow you to create an index on a subset of data using a filtering predicate. Filters can only be applied to non-clustered indexes. The general syntax of a filtered index is:
For our purposes, we're going to be working with the Sales.SalesOrderDetail table in the AdventureWorks database. Let's look at a specific example. Suppose we have a query that regularly searches on the [SpecialOfferID] column.
We notice that there's no covering index for this query by looking at the actual execution plan:
If this is a commonly executed query, then we'd probably want to toss an index on it. Before we get started, let's take a look at what the distribution of values are on that column:
Our distribution of values is:
As you can see, [SpecialOfferID] = 1 accounts for 96% of our values. In 2005, we'd create an index that may look something like this:
Now if we re-run our original query, this is what we see:
So we're now performing a non-clustered index seek instead of a clustered index scan. Already this results in some pretty significant performance improvements. To see this, we're going to use the INDEX query hint to force an index scan. We're also going to use the DBCC command DROPCLEANBUFFERS, which will allow us to clear the buffer cache and better examine what's happening with our IO.
As you can see, the non-clustered (NC) index seek performs quite a bit better. Now let's create a filtered index and explore what happens:
First, let's look at the pages consumed by each index:
If you scroll over, you'll see that the clustered index consumes the most pages, naturally. The non-filtered NC index consumes less pages than the clustered index because it's narrower; however, it still consumes more pages than the filtered index because it's storing every data row. The filtered index, with only 5433 rows stored, is by far our smallest index, consuming 96% less space than our non-filtered NC index.
Because we're using less space to store this index, we should also see an equivalent performance boost. Let's verify that this is the case:
As expected, we get the best results with our filtered index scan.
You'll notice that I did *not* create the index on the [SpecialOfferID] column like I did in [IX_Sales_SalesOrderDetail_SpecialOfferID]. This is because my query doesn't care what my [SpecialOfferID] value is, just as long as it's not equal to 1. My non-filtered NC index was created on [SpecialOfferID] because it needed to navigate the B-TREE to find the records where [SpecialOfferID] <> 1. With my filtered index, the query optimizer knows that all of my records already meet the criteria, so doesn't need to navigate through the index to find the matching results.
We could choose to include the [SpecialOfferID] data in our filtered index, but we'd most likely want to make it an included column rather than part of the index key. In fact, it's important to note that, if I don't add [SpecialOfferID] as an included column and I want to return it in the results, i.e.
my filtered index will not be used and I will instead scan on the clustered index once more (assuming [IX_Sales_SalesOrderDetail_SpecialOfferID] does not exist). This is because the filtering criteria is not included anywhere on the actual index page. This is actually good news, in my opinion, since it allows you to create even leaner indexes. And like I already mentioned, if you do need the data returned, you can always add the filtering criteria as included columns.
What if you're trying to find out whether or not an index is filtered, and what it's filtered on? The sys.indexes catalog view has been updated in 2008 to include this information:
I personally recommend Kimberly Tripp's system stored proc, sp_helpindex2. It returns a lot of good information about your indexes, such as included columns and filtering criteria.
That's all I have for today. Hopefully, you now understand how powerful filtered indexes can be. When used properly, filtered indexes can use less space, consume less IO, and improve overall query performance.