SSAS: Can I make my Distinct Count totals subtotal correctly?

I've seen this question come up a couple of times. What is usually being asked is "can we make the subtotals additive for a distinct count measure?" – and the short answer is a resounding No!

But the short answer does not really make for a compelling blog post, so let's dig a little deeper into this issue. The more complete answer is that while there are some techniques that can make a distinct count "appear" to be additive in some circumstances, they are fundamentally flawed and will produce inconsistent results as distinct count measures are inherently non-additive.

I believe that the real issue here is not a technology issue, it is a problem with user expectations. So the correct action here is not to try to change the results, but to educate your users as to what is happening.

Let's explore this by taking a look at a simple example. The following matrix shows the products bought by two different customers over a 3 month period:

   Month   Customer 1   Customer 2 
   =====   ==========   ==========

   Jan     Product A        -
   Feb     Product B    Product C
   Mar     Product A    Product D

 

So, obviously over this 3 month period, there were only 2 distinct customers. However, if we group the records by month and look at the distinct count we get the following output.

   Month  Count
   =====  =====

   Jan    1
   Feb    2
   Mar    2

   Total  5

 

Which, if we try to add up the distinct count measure, makes it look like, we had 5 customers, instead of 2 over the past 3 months.

Then if we look at the distinct count of customers by product over the 3 month period we get the following:

   Product    Count
   =========  =====
   Product A  1
   Product B  1
   Product C  1
   Product D  1

   Total      4

 

Which someone could interpret to meaning that we actually had 4 customers over this 3 month period. Instead, what really happened is that our 2 distinct customers bought 2 different products each. 

This issue is quite easy to see with a small dataset, but with a larger dataset the results can be more subtle, but the effects will still be there. The end result of this will be that this inconsistency will cause your users to loose confidence in your solution, which is the worst thing that can happen to a BI project.

So if you ever get asked to make a distinct count measure additive, tell them "no" and feel free to point the person making the request at this blog post.

Update: 22 Feb 2010

There still appears to be some confusion here so I thought it may help to post the "correct" output for both the queries above. (Which is what you would get if you created a distinct count measure over the CustomerID.)

   Month  Count
   =====  =====

   Jan    1
   Feb    2
   Mar    2

   Total  2

 

   Product    Count
   =========  =====
   Product A  1
   Product B  1
   Product C  1
   Product D  1

   Total      2

In both cases, irrespective of the attribute on the row axis the correct total is a distinct count of 2 customers. Trying to make this total any other figure apart from 2 would be incorrect.

Print | posted on Wednesday, January 13, 2010 5:29 AM

Comments on this post

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
Actually I believe this can be done, and at that in two ways.

First off you can write a query that will give you the total as a distinct count over the whole period for which you are querrying.

Second solution using SSRS would be to do a CountDistinct of Customer or Product on the whole DataSet and display that as a total.
Left by Antek S. Baranski on Feb 22, 2010 4:08 AM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
Well, isn't it all a question of defining your report correctly?

For example, in your first example the 5 means total product purchased over the last three months.
Left by Arkady Gurevich on Feb 22, 2010 9:49 AM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
@Arkady - No, it's not that simple. There are only actually 4 products (A,B,C,D). You could possibly say that there were 5 "Sales" made, but as soon as you added more data that would fail to be true. And the report is meant to be a count of Distinct Customers, so a figure of 5 never makes any sense.

The point is that for both of the examples that I have posted, there is only ever 2 distinct customers. For a distinct count, any other total would be incorrect.
Left by Darren Gosbell on Feb 22, 2010 10:37 AM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
@Antek - have a look at the update that I just made to this post. SSAS distinct count measures will automatically calculate the correct (non-additive) totals. The point of this post was that any attempt to override the distinct count totals to produce additive totals is misguided.
Left by Darren Gosbell on Feb 22, 2010 4:23 PM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
Darren,

Interesting post, and I had a similar requirement recently which was to produce average counts for the preceding periods while using a distinct count. Perhaps our testing has not been as thorough as possible but I ssemed to satisfy the requirements using a scope statement to override to automatic aggregate (below). Do you see other problems with this?

SCOPE([Date].[Year - Quarter - Month], [MEASURES].[Headcount]);

SCOPE([Date].[Year - Quarter - Month].[Year]);
this = AVG([Date].[Year - Quarter - Month].CurrentMember.Children, [MEASURES].[Headcount]);
END SCOPE;

SCOPE([Date].[Year - Quarter - Month].[Quarter]);
this = AVG([Date].[Year - Quarter - Month].CurrentMember.Children, [MEASURES].[Headcount]);
END SCOPE;

// to prevent the All member being displayed at attribute level (not hierarchy)
SCOPE([Date].[Month]);
this = NULL;
END SCOPE;

END SCOPE;
Left by Dan Kennedy on Feb 22, 2010 8:48 PM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
@Dan - don't think using averages would suffer from the same problem. I've been trying to think through a few different scenarios and I can't see any major issues.

When I've done headcount in cubes in the past I have usually used a snapshot style fact table and the year and quarter figures were taken from the last day in the respective periods. Although I can see that an average would be just as valid.

The only thing I would have done differently is the year calculation. Currently it is an average of the quarter averages and that could result in one abnormal month having more impact on the year than it should. Generally if I am calculating averages I like to make sure that they are all calculated off the same granularity. (which would me calculating the year as the average of the months in that year)

eg.

SCOPE([Date].[Year - Quarter - Month].[Year]);
this = AVG(EXISTING [Date].[Year - Quarter - Month].[Month].Members, [MEASURES].[Headcount]);
END SCOPE;


Left by Darren Gosbell on Feb 23, 2010 6:44 AM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
Appreciate your feedback Darren, cheers.
Left by Dan Kennedy on Feb 23, 2010 9:55 PM

# re: SSAS: Can I make my Distinct Count totals subtotal correctly?

Requesting Gravatar...
Here are this and some other articles on SSAS Distinct Count:

http://ssas-wiki.com/w/Articles#Distinct_Count

Left by Sam Kane on Mar 19, 2011 1:44 AM

Your comment:

 (will show your gravatar)