#DAX – Joining to a Slowly Changing Dimension

The following is one of the scenarios that I showed during my “Drop your DAX” talk at SQL Saturday #296 in Melbourne.

Currently SSAS Tabular and PowerPivot models can only have a relationship based on a single column. So what do you do when you need to join based on multiple columns?

Ideally you would solve this during your ETL. With a type 2 slowly changing dimension you typically want to insert the surrogate key for the dimension into the fact table. As you may know, “type 2” dimensions can have one or more records for a given business key. These records will have a different effective start and end dates.  To find the correct surrogate key you have to match on the business key and a date, where the date is between the effective start and end dates.

This is pretty easy to do in SQL, but what do you do if your two tables are not only on separate servers, but also on different types of servers. I had this situation recently. We were building a quick proof of concept and we had an existing type 2 dimension in Teradata, but the fact table was coming from an Oracle server. One option would have been to stage both sets of data locally and do the join there, but that was going to take time and resources and this was just meant to be a quick proof-of-concept.

So is there a way of fixing this in the model using DAX?

To demonstrate this I’m going to work with the following simplified dimension:

image

And the following simple fact table:

image

What I wanted to do was to try to create a calculated column in the fact table to look up the surrogate from the dimension table. Creating a filter statement to find the matching records was not too hard.

FILTER(BU,Fact[BU_Code] = BU[BU_Code] && Fact[Date] >= BU[StartDate] && Fact[Date] < BU[EndDate])

But FILTER returns a table, not a scalar value so you can’t use it in a calculated column. My first attempt involved using the powerfull CALCULATE function to return a single value from the BU_Key column.

= CALCULATE (
    VALUES ( BU[BU_Key] ),
    FILTER (
        BU,
        'Fact'[BU_Code] = BU[BU_Code]
            && 'Fact'[Date] >= BU[StartDate]
            && 'Fact'[Date] < BU[EndDate]
    )
)

This produced the results I wanted, but when I tried to create a relationship against this column I got the following error:

image

Sounds a bit strange doesn’t it. The issue here is that CALCULATE() is trying to transform the filter context into a row context in order to evaluate the column expression. The problem with this is that relationships influence the  filter context. Because CALCULATE() needs to be able to follow all the relationships in order to return a result - you can't create a relationship over a column that uses CALCULATE(). That’s where the circular dependency comes from.

The solution to this is to switch the expression to use one of the iterative functions like MINX or MAXX. I’ve used MINX here, but it does not matter which one you use as the FILTER should only return a single row.

=MINX (
    FILTER (
        BU,
        'Fact'[BU_Code] = BU[BU_Code]
            && 'Fact'[Date] >= BU[StartDate]
            && 'Fact'[Date] < BU[EndDate]
    ),
    BU[BU_Key]
)

Now we can create a relationship over our calculated column.

However it’s not all chocolates and roses…

While this is a handy technique to have in your tool chest for quick prototypes, there are some downsides that you should be aware of.

Tabular processing happens in 3 phases:

1. Data acquisition

2. Data compression

3. Evaluation of calculations

You’ll notice 2 issues here. Firstly adding calculated columns will increase your processing time. Secondly calculated columns are not compressed which will result in slower scan speeds. So this technique will have a negative impact on both processing and query performance. For a long term solution it probably better to look at an ETL based solution.

Print | posted on Wednesday, April 9, 2014 7:18 AM

Comments on this post

# re: #DAX – Joining to a Slowly Changing Dimension

Requesting Gravatar...
Have you considered using Power Query to load the data? Then you can add the surrogate key before handing the data to Power Pivot. Should improve the scan speeds (though not the load speeds).
Left by David on Apr 10, 2014 5:54 AM

# re: #DAX – Joining to a Slowly Changing Dimension

Requesting Gravatar...
@David - For PowerPivot models Power Query is a great option. But the person I was helping that had this issue had a SSAS Tabular model so it was not an option.
Left by Darren Gosbell on Apr 10, 2014 6:35 AM

# re: #DAX – Joining to a Slowly Changing Dimension

Requesting Gravatar...
In SSAS we need to load the fact with the SCD surrogate key. worst case you can do that with the view that the cube looks at. Best case you would have a datamart that is a rekeyed star schema of the data warehouse.
Left by Kevin on Feb 28, 2015 2:42 AM

# re: #DAX – Joining to a Slowly Changing Dimension

Requesting Gravatar...
@Kevin - I agree that ideally you would have the surrogate key inserted into the fact during the ETL. This technique is a short term tactical approach, not a good long term strategy. We actually had two separate data warehouses and wanted to pull a fact from one over into a cube based off the other. The long term plan is to merge these two warehouses together, but in the short term we can provide the reporting the business needs (although at a slight query performance penalty)
Left by Darren Gosbell on Feb 28, 2015 10:41 AM
comments powered by Disqus