Using Filtered Index to Maintain Unique Key Combinations
When building a datawarehouse project, the usage of many to many tables is done to support multiple natural keys which are mapped to the same surrogate key. This is happening often with products and EAN codes. Each version of the same product will get a different EAN code, cause that make sense for the inventory guys who work in the warehouse, but for analysis from the marketing department’s point of view, the product is the same, and sometimes we want to just add the sales of all versions of the same products to measure how many liters of “Orange Juice” were sold in a given month, regardless of the package variant (normal size bottle vs promotion 10% more bottle).
Another issue with EAN code is that EAN codes are suppose to be unique world wide, but it’s OK and common practice to reuse EAN codes which have been out of the market for 3 months. Because EAN codes cost money, companies reuse old EAN codes.
What I want to show you is how you can maintain the uniqueness of your mapping table using filtered indexes.
So let’s describe the problem first:
We have a table Product and a table ProductEAN, which represents the many to many relations that may occur between a products and EAN codes. The ProductEAN table includes some columns which specify the date span ,for which the EAN mapping is relevant, this is done to support reload of historic data in the right context, this is also marked using a boolean columns IsActive. There is also a boolean indicator for which EAN is the default EAN, that is the EAN we wish to use in the a report.
Now the thing is that we wish to make sure that invalid data is not entered to the ProductEAN table. Invalid data can be if the same EAN is mapped to 2 different ProductKey, and both of them are an active mapping. Another case of invalid data is if more than one EAN is marked to be the default for the same ProductKey.
We can implement this data consistency checks using filtered indexes. Now it’s not why Microsoft chose to implement them, never the less it gets the job done!
create unique index UIX_ProductEAN_Only_One_Active_Product_Per_EAN on ProductEAN (EAN) where IsActive = 1; create unique index UIX_ProductEAN_Only_One_Default_EAN_Per_Product on ProductEAN (Productkey) where IsDefault = 1;
So what is it we are doing here?
A unique index will not allow duplicates to occur. So when we build an unique index on the EAN column, we force that the same EAN code will not be entered twice to the table. Note that this alone is not enough in order to solve the problem, because we do allow the same EAN to exist multiple times, but we allow only one active relation for each EAN, so by adding the where clause to the index definition, we are telling the database engine to build the unique index only on the columns which have the IsActive value = 1, hence we are forcing uniqueness only on a sub set of the table, which apply to the condition in the where clause .
Lets look into some data:
select * from ProductEAN where EAN = '5019487040114'
Any attempt to invalidate the business rule will result in a invalidation of the index, and the query will fail.
update ProductEAN set IsActive = 1 where ProductEANKey = 8
This will result in the following error message:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object ‘dbo.ProductEAN’ with unique index ‘UIX_ProductEAN_Only_One_Active_Product_Per_EAN’.
The duplicate key value is (5019487040114).
The statement has been terminated.
To summarize, we are able to enforce uniqueness on a sub set of a table, using a unique filtered index. This allows us to maintain the business rule of single active relation, and/or single default relation for each member in the many to many relation.
Remember to name the indexes wisely, so that the error message becomes self describing.