Using Filtered Index to Maintain Unique Key Combinations

Home > Uncategorized > Using Filtered Index to Maintain Unique Key Combinations

Using Filtered Index to Maintain Unique Key Combinations

November 17, 2011 sirpadk Leave a comment Go to comments

When building a datawarehouse project, the usage of many to many tables is done to support multiple natural keys which are mapped to the same surrogate key. This is happening often with products and EAN codes. Each version of the same product will get a different EAN code, cause that make sense for the inventory guys who work in the warehouse, but for analysis from the marketing department’s point of view, the product is the same, and sometimes we want to just add the sales of all versions of the same products to measure how many liters of “Orange Juice” were sold in a given month, regardless of the package variant (normal size bottle vs promotion 10% more bottle).

Another issue with EAN code is that EAN codes are suppose to be unique world wide, but it’s OK and common practice to reuse EAN codes which have been out of the market for 3 months. Because EAN codes cost money, companies reuse old EAN codes.

You can read about how to solve these issues using map tables at Thomas Kejser’s Database Blog
(part 1 – part 2)

What I want to show you is how you can maintain the uniqueness of your mapping table using filtered indexes.

So let’s describe the problem first:

We have a table Product and a table ProductEAN, which represents the many to many relations that may occur between a products and EAN codes. The ProductEAN table includes some columns which specify the date span ,for which the EAN mapping is relevant, this is done to support reload of historic data in the right context, this is also marked using a boolean columns IsActive. There is also a boolean indicator for which EAN is the default EAN, that is the EAN we wish to use in the a report.

Now the thing is that we wish to make sure that invalid data is not entered to the ProductEAN table. Invalid data can be if the same EAN is mapped to 2 different ProductKey, and both of them are an active mapping. Another case of invalid data is if more than one EAN is marked to be the default for the same ProductKey.

We can implement this data consistency checks using filtered indexes. Now it’s not why Microsoft chose to implement them, never the less it gets the job done!

create unique index UIX_ProductEAN_Only_One_Active_Product_Per_EAN
on ProductEAN (EAN) where IsActive = 1;

create unique index UIX_ProductEAN_Only_One_Default_EAN_Per_Product
on ProductEAN (Productkey) where IsDefault = 1;

So what is it we are doing here?

A unique index will not allow duplicates to occur. So when we build an unique index on the EAN column, we force that the same EAN code will not be entered twice to the table. Note that this alone is not enough in order to solve the problem, because we do allow the same EAN to exist multiple times, but we allow only one active relation for each EAN, so by adding the where clause to the index definition, we are telling the database engine to build the unique index only on the columns which have the IsActive value = 1, hence we are forcing uniqueness only on a sub set of the table, which apply to the condition in the where clause .

Lets look into some data:

select *
from ProductEAN
where EAN = '5019487040114'

Any attempt to invalidate the business rule will result in a invalidation of the index, and the query will fail.

update ProductEAN set IsActive = 1 where ProductEANKey = 8

This will result in the following error message:

Msg 2601, Level 14, State 1, Line 1

Cannot insert duplicate key row in object ‘dbo.ProductEAN’ with unique index ‘UIX_ProductEAN_Only_One_Active_Product_Per_EAN’.

The duplicate key value is (5019487040114).

The statement has been terminated.

To summarize, we are able to enforce uniqueness on a sub set of a table, using a unique filtered index. This allows us to maintain the business rule of single active relation, and/or single default relation for each member in the many to many relation.

Remember to name the indexes wisely, so that the error message becomes self describing.

Categories: Uncategorized Tags: Data Integrity, Datawarehouse, Filtered Index, Index, Many to many

Comments (3) Trackbacks (0) Leave a comment Trackback

Martijn Evers (@DM_Unseen)

November 18, 2011 at 2:30 pm

Reply

Not sure on

create unique index UIX_ProductEAN_Only_One_Default_EAN_Per_Product
on ProductEAN (Productkey,EAN) where IsDefault = 1;

Either it allows multiple EANS to be default for the same productkey (even for the same/overlapping period)

or

create unique index UIX_ProductEAN_Only_One_Default_EAN_Per_Product
on ProductEAN (Productkey) where IsDefault = 1;

Will be more restrictive but will not allow to track the default over time (there can be only 1 EAN per Product as default for all time).

PS

Is there a diff between the HIGH_END_DATE (9999-12-31) and IsActive=1 (ie does IsACtiveToDate=9999-12-31 AND IsActive=0 make sense?)?

Wouldn’t a key like

Productkey,EAN,IsACtiveFromDate or Productkey,EAN,IsACtiveToDate make sense also?(logically)
sirpadk

November 18, 2011 at 3:00 pm

Reply

Hi Martijn,

Thanks for correcting me on that mistake, I’ve fixed the second index to do what was intended now.

The IsActive is dependent on the current date and the IsActiveFromDate and IsActiveToDate. Either you choose to make is computed, or you maintain it in the ETL logic is up to what fits the current needs. It’s redundant info, but it’s easier and more economic to filter by a true/false column than to check for dates spans and/or date types columns.

Notice that an historic EAN, which is not active can still be the default EAN for the historic product, the notion of an active EAN and a default EAN are non related.

Regarding the overlapping dates, I’ll try to think about a way to enforce that they may not occur on the same EAN. I’m thinking that spatial data may be useful here some how, maybee a topic for another blog post.

\Rafi Asraf
Thomas Kejser

April 27, 2012 at 10:12 am

Reply

Hi Rafi

An interesting use of indexes. One might have been tempted to normalize your way out of the problem, but like this, you are giving the optimizer much more information to work with and get away with fewer tables. Neat

No trackbacks yet.

Rafi Asraf SQL Server Blog

Using Filtered Index to Maintain Unique Key Combinations

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Rafi Asraf SQL Server Blog

Using Filtered Index to Maintain Unique Key Combinations

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta