When is an Empty cell Non Empty and a Non Empty Cell Empty?

I came across an interesting isssue on the microsoft.public.sqlserver.olap newsgroup the other day. A calculated member, that calculates with an amount was being excluded from the results when the NON EMPTY clause was added to the axis.

At first I thought this could have been related to the solve order issue that Chris Webb discussed at http://spaces.msn.com/cwebbbi/Blog/cns!7B84B0F2C239489A!649.entry, but the only calculated member involved here was the one created in the query scope, so that could not be the issue. Out of curiousity, I did a quick test by setting up this member as a calculated member in the cube itself. I think that will be a more common scenario than setting up a query scoped calculation on a dimension other than the measures dimension. So although it was not unexpected

After a bit of investigation, my guess is that the calculated member is possibly being optimised out. When I ran the original query and watched it in Profiler I only saw one "Query Subcube" event for the cache data. When I ran the query below, (which is functionally equivalent, but includes a calculated measure) I see 4 "Query Subcube" events and it returns the expected results.

WITH
  MEMBER
[Customer].[Customer Geography].[All Customers].[Sample] AS
    
 '[Customer].[Customer Geography].[United States] * 2'
  MEMBER Measures.CalcMeasure as [Measures].[Internet Sales Amount]
SELECT
  {[Measures].[CalcMeasure]} ON COLUMNS,
  NON EMPTY CROSSJOIN([Customer].[Customer Geography].[Country].AllMembers,
    {[Date].[Calendar Year].[Calendar Year].[CY 2004]} ) ON ROWS
FROM
[Adventure Works]

In this second version, 2 of the “Query Subcube“ events occurred during the Non Empty evaluation stage. Whereas in the first version, there were no additional “Query Subcube“ events. This leads me to guess that some optimization of the NON EMPTY Crossjoin is kicking in here.

I suspect that the query optimizer detects that there are only non-calculated measures in the query and is determining the members of the row axis based on the data in the measue group. This results in the calculated member on the row axis being excluded.

I found couple of other ways to work around this issue to get the expected results. One is to remove the crossjoin and put the date dimension in the WHERE clause.

WITH
  MEMBER
[Customer].[Customer Geography].[All Customers].[Sample] AS
     '[Customer].[Customer Geography].[United States] * 2'
SELECT
  {[Measures].[Internet Sales Amount]} ON COLUMNS,
  NON EMPTY [Customer].[Customer Geography].[Country].AllMembers ON ROWS
FROM
[Adventure Works]
WHERE ([Date].[Calendar Year].[Calendar Year].[CY 2004])

But this is not going to help when you need a crossjoin on your row axis.

The other thing that works is to use the new NonEmpty() function, instead of the NON EMPTY keywords. This involves the least change to your original query.

WITH
MEMBER [Customer].[Customer Geography].[All Customers].[Sample] AS
'[Customer].[Customer Geography].[United States] * 2'

SELECT
   {[Measures].[Internet Sales Amount]} ON COLUMNS,
   NONEMPTY( CROSSJOIN
(
     [Customer].[Customer Geography].[Country].AllMembers
,
     {[Date].[Calendar Year].[Calendar Year].[CY 2004]} )) ON ROWS

FROM [Adventure Works]

I submitted this as a bug at http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx?FeedbackID=FDBK48726.

If any further developments come to light I will post updates here.

Update 18 Apr 2006: Deepak (SQL Server MVP) came up with another query that returns the correct result set.

WITH
MEMBER [Customer].[Customer Geography].[All Customers].[Sample] AS
'[Customer].[Customer Geography].[United States] * 2'

SELECT
   {[Measures].[Internet Sales Amount]} ON COLUMNS,
   NON EMPTY GENERATE([Customer].[Customer Geography].[Country].AllMembers
   
 CROSSJOIN
(
     {[Customer].[Customer Geography].CurrentMember}
,
     {[Date].[Calendar Year].[Calendar Year].[CY 2004]} )) ON ROWS

FROM [Adventure Works]

On the Adventure Works sample database there is no noticable difference in performance between all these queries. But on repeated executions this last query only hits cached data once. While the query using the NonEmpty() function always appears to initially bypass the cache and hit non-cached data and then does a second data read, this time hitting the cached data. So I suspect that on a larger dataset the Generate() function would probably out perform the NonEmpty() function.

Print | posted on Monday, April 17, 2006 8:50 PM

Comments on this post

# re: When is an Empty cell Non Empty and a Non Empty Cell Empty?

Requesting Gravatar...
Concerning your last comment, I think you are wrong. Generate will never be faster than a CrossJoin with NonEmpty function.
Left by Tigran on Jul 24, 2006 6:42 PM

# re: When is an Empty cell Non Empty and a Non Empty Cell Empty?

Requesting Gravatar...
You could be right, as I said, it is just a suspicion. Before I traced the query I would have agreed with you 100%. But after tracing the above query I am not so sure, in the query above it appears to potentially have more efficient cache useage. I did not even think to try using Generate until Deepak suggested it, and it proved to be a viable alternative.
Left by Darren Gosbell on Jul 24, 2006 8:24 PM

Your comment:

 (will show your gravatar)