Loading SAS files with SSIS

February 8, 2012 21 comments

Hi

I’ve been given an assignment to load data from a SAS Institute data file to SQL Server using SSIS. How hard can that be I thought to my self. All I needed to do was to find the oledb provider for SAS, install it on my machine, and use it within SSIS, a no brainer!

Well first it took some time for me to actually find the oledb provider. You see SAS Institute provide it for free, which is great, but they kinda keep it in a secret place. If you google “sas oledb provider” the first result is this, which is a general documation spot for the SAS Oledb Provider, which is a great source for content regarding how to use the provider, but where to download it? Well you need to click on some links, only to get to a page that tells you where you can download it, but guess what, the part that tells you where to download it from is NOT a link, it’s URL indeed, but it’s not marked as a link, and very hard to find in all the other text on the page. Unfortunately I’m not a reader, I’m a scanner (we’ll get back to that later also). When I need to read something fast, which is almost always, I scan the text looking for known patterns instead of actually reading the entire content. So when I look for a link for downloading the file, I tend to not read any irrelevant text, which in this case was the link, disguising as normal text. But I’m a good person, so I’ll help you guys with that, here is the direct link to the download page, if you need to download it. You’ll need to register and confirm by an email first.

Now that I got the provider, I opened SSIS and did the following:

  1. Created anew connection manager to my SAS file using the “Native OLE DB\SAS Local Data Provider 9.3″
  2. In the Server or FileName I wrote the full path of the file name (which we’ll later go back to)
  3. Pressing Test Connection to make sure it works and pressed OK
  4. Added a new OLE DB Source to my data flow task
  5. In the OLE DB Source Editor I tried to pick the table in the SAS file, only to get an error message with the following content: “fn =ydedopn; context =sasej9lib.dll.”

So I thought I’ll google that, and guess what, one result only, to a linked in post of a guy asking for help with the same problem, no one with a solution.

I thought well let’s try and isolate the issue, so I started windows console project, to test to connectivity logic using an OleDb provider to the SAS file. I’ve spent 2 hours on this, trying with all kind of different ways to make it work, but either I got an empty data table, or the same error message as in SSIS. Frustrated I kept on going back and forward between the SAS website documentation, looking for a hint, and there, just before I was to give up, I saw the problem!

In the documentation it states that the datasource property is “A physical directory that contains the SAS data set that you want to access with the connection.” How did I missed that? (damn you scanners!) I shouldn’t have used the file name, but the file directory! The provider treats the folder as the “database” and each file is a “table” in the “database”!

Changing that simple fact, I was able to select a table from the Ole Db Source Editor, and beside some page code warnings which I’ll look into later, everything works as is should.

So the conclusions are:

  • don’t scan the documentation, read it! (this conclusion is pointed to me only)
  • when pointing to a SAS Institute files use only the directory in the data source!

I hope this will help others

/Rafi

Categories: Uncategorized Tags: ,

Catch Custom Events with SSIS Event Handler

January 5, 2012 6 comments

Background

When I started working for PensionDanmark summer 2011, we started looking into using SSIS as the main ETL tool for the Datawarehouse team. One of the main requirements were to be able to bring SSIS into the enterprise scale, and to be able to design and implement a pattern that will be able to cover a wide range of ETL related tasks though out the enterprise.

I knew that SSIS is lacking some of these enterprise needs, and when I saw Andy Leonard approach using a framework, I decided to try and implement one similar in PensionDanmark. What really got my attention was the ability to have a meta data driven approach to the execution of SSIS, and the ability to centralize the settings of logging, and by that to allow multiple child package to be simple and yet to inherit a complex set of events logging, that can be maintained easily, without effecting the child packages.

After the implementation of the first version of the framework, we had a request from the users to be able to report on non standard events. I thought that it should be quite simple, as SSIS has a custom event, and SSIS Script Task Component has the ability to fire custom events. But I was wrong to think it was that simple, because the SSIS event handler has no implementation for the custom event.

The problem description is like this: the frameworks central logging feature is build on the fact that the event handlers of the parent package (the framework package) is able to catch events that are fired in the child package too. But there is no event handler for the custom event, so what can we do?

A Solution

What we need is a workaround, that will allow us to catch a custom event, fired in the child package, using an event handler of the parent package. I started to look into the types of events which have an event handler, and to see weather I could hijack one, and make it act as a custom event handler, and than I saw it, the perfect victim for the task,  the VariableChanged event handler!

Why is it perfect?

  • I’ve never used it in production in my years developing SSIS packages.
  • The event is fully controlled by the developer. That is the SSIS developer chooses to enable this event on a variable. This is something we cannot do with any other event!
  • I believe that it’s by far a event that one is willing to live without for the price of having custom event in a centralize manner. I know I would :-).

Let’s take a look on how we actually doing it:

Here is the parent package, it’s made simple, in order to better illustrate the event catch scenario. An SSIS Framework looks of course more complex than this simple parent package.

The parent package

The parent package has a script component in the OnVariableValueChanged. With that script we will need to catch the variables which are available for this type of event, and use them to log the real event that is fired by the child package.

The Parent Package Event Handler

This is how the script looks like:

'The follwing variables are accesable from this type of event handler:
'System::TaskName
'System::SourceName
'System::VariableDescription
'System::VariableID
'System::VariableName
'System::VariableValue

Public Sub Main()
    Dim S As String

    'Building a string that proof that the values of the variables
    ' is originated from the event fires in the child package
    S = "VariableName: " & Dts.Variables("System::VariableName") _
      .Value.ToString() & vbCrLf _
      & "VariableID: " & Dts.Variables("System::VariableID") _
      .Value.ToString() & vbCrLf _
      & "VariableDescription: " & Dts.Variables("System::VariableDescription") _
      .Value.ToString() & vbCrLf _
      & "VariableValue: " & Dts.Variables("System::VariableValue") _
      .Value.ToString() & vbCrLf _
      & "TaskName: " & Dts.Variables("System::TaskName") _
      .Value.ToString() & vbCrLf _
      & "SourceName: " & Dts.Variables("System::SourceName") _
      .Value.ToString()

    'Showing the string value as a message box
    MsgBox(S)

    Dts.TaskResult = ScriptResults.Success
End Sub

Note I’m not telling you here how to log the event, but just showing which variables are available for usage, and how to manipulate this variables from the event fire method

Here is a screen shot of the child package:

The child package

The first script task is there just to proof the fact that the event is not fired, unless we set the event on a variable.

The second script task is where the magic is done, let’s look into the code:

Public Sub Main()

        Dts.Events().FireCustomEvent("OnVariableValueChanged", _
                                     "", _
                                     New Object(3) _
                                        { _
                                        "First Value", _
                                        "Second Value", _
                                        "Third Value", _
                                        "Forth Value"}, _
                                     "", _
                                     False)

        Dts.TaskResult = ScriptResults.Success
    End Sub

What we are doing here is this, we fire a custom event, of type OnVariableValueChanged. The FireCustomEvent can take an object, as the third parameter, and this object is later used by the SSIS Event Handler. I’ve doing some testing, and found out that at least 4 items can be included in this object, and that this 4 objects correspond to 4 SSIS system variables, which are accessible in the SSIS Event Handler. All 4 are strings, so we must use strings, to prevent errors.

To illustrate that, I’m sending 4 text values, let’s look into what happens when we ran the parent package:

Running the parent package

Let’s zoom in:

The message box

In real life, we will use this 4 strings to send information about the custom event, it’s source, level, severity what ever is relevant for the specific system requirements.

The entire project is available for download.

Categories: Uncategorized Tags: ,

New Localizations for Azure DateStream

December 8, 2011 Leave a comment

When I saw that Boyan Penev (blog) together with the SQL Azure Team had put a date dimension feed for free on the DataMarket, I thought wouldn’t it be great if we all contribute to the content of this valuable free data feeds?

I come from Israel, and I live in Denmark. So I thought that I’ll be able to contribute with a translation of the calendar to danish and Hebrew. I contacted Boyan, and sent him the translated versions.

I was happy to read this morning that the translated feeds are now available, for free, on the DataMarket. 4 new languages were added (Hebrew, Danish, German and Bulgarian), and other languages are on the way.

I hope you’ll find this information useful.

 

 

 

 

 

Using Filtered Index to Maintain Unique Key Combinations

November 17, 2011 3 comments

When building a datawarehouse project, the usage of many to many tables is done to support multiple natural keys which are mapped to the same surrogate key. This is happening often with products and EAN codes. Each version of the same product will get a different EAN code, cause that make sense for the inventory guys who work in the warehouse, but  for analysis from the marketing department’s point of view, the product is the same, and sometimes we want to just add the sales of all versions of the same products to measure how many liters of “Orange Juice” were sold in a given month, regardless of the package variant (normal size bottle vs promotion 10% more bottle).

Another issue with EAN code is that EAN codes are suppose to be unique world wide, but it’s OK and common practice to reuse EAN codes which have been out of the market for 3 months. Because EAN codes cost money, companies reuse old EAN codes.

You can read about how to solve these issues using map tables at Thomas Kejser’s Database Blog
(part 1part 2)

What I want to show you is how you can maintain the uniqueness of your mapping table using filtered indexes.

So let’s describe the problem first:

We have a table Product and a table ProductEAN, which represents the many to many relations that may occur between a products and EAN codes. The ProductEAN table includes some columns which specify the date span ,for which the EAN mapping is relevant, this is done to support reload of historic data in the right context, this is also marked using a boolean columns IsActive. There is also a boolean indicator for which EAN is the default EAN, that is the EAN we wish to use in the a report.

Product to ProductEAN

Now the thing is that we wish to make sure that invalid data is not entered to the ProductEAN table. Invalid data can be if the same EAN is mapped to 2 different ProductKey, and both of them are an active mapping. Another case of invalid data is if more than one EAN is marked to be the default for the same ProductKey.

We can implement this data consistency checks using filtered indexes. Now it’s not why Microsoft chose to implement them, never the less it gets the job done!

create unique index UIX_ProductEAN_Only_One_Active_Product_Per_EAN
on ProductEAN (EAN) where IsActive = 1;

create unique index UIX_ProductEAN_Only_One_Default_EAN_Per_Product
on ProductEAN (Productkey) where IsDefault = 1;

So what is it we are doing here?

A unique index will not allow duplicates to occur. So when we build an unique index on the EAN column, we force that the same EAN code will not be entered twice to the table. Note that this alone is not enough in order to solve the problem, because we do allow the same EAN to exist multiple times, but we allow only one active relation for each EAN, so by adding the where clause to the index definition, we are telling the database engine to build the unique index only on the columns which have the IsActive value = 1, hence we are forcing uniqueness only on a sub set of the table, which apply to the condition in the where clause .

Lets look into some data:

select *
from ProductEAN
where EAN = '5019487040114'

image

Any attempt to invalidate the business rule will result in a invalidation of the index, and the query will fail.

update ProductEAN set IsActive = 1 where ProductEANKey = 8

This will result in the following error message:

Msg 2601, Level 14, State 1, Line 1

Cannot insert duplicate key row in object ‘dbo.ProductEAN’ with unique index ‘UIX_ProductEAN_Only_One_Active_Product_Per_EAN’.

The duplicate key value is (5019487040114).

The statement has been terminated.

To summarize, we are able to enforce uniqueness on a sub set of a table, using a unique filtered index. This allows us to maintain the business rule of single active relation, and/or single default relation for each member in the many to many relation.

Remember to name the indexes wisely, so that the error message becomes self describing.

Follow

Get every new post delivered to your Inbox.