The perils of calculating an Average of Averages

I've seen questions around issues calculating averages come up a few times in various forums and it came up again last week and I feel that there is some benefit in walking through the details of this issue. For many of you the following will be nothing new, but I'm hoping that this may serve as a reference that you can point to when you get requests for this sort of calculation.

The core issue here is really a fundamental mathematical one. Personally I see it surfacing most often in DAX and MDX as those are languages that I spend a lot of time with, but also because of their multi-dimensional natures you need to be able to write generic calculations that will work regardless of how the end users slice and dice the data.

The discussions invariably start with a statement like the following:

"I have a calculated measure that an average, but my totals are calculating incorrectly"

There are 2 different issues I see relating to this.

The first one is trying to use the AVG() function in MDX. Basically if you want an average calculation that works with all your different dimensions then avoid this function. The AVG function in MDX calculates the average over a fixed set. You may be able to use it in a static query, but to calculate an average in your MDX script simply create the two base measures - a sum and a count, then divide the sum by the count. This is not as much of an issue in DAX as the built-in AVERAGE, AVERAGEA and AVERAGEX generally work as expected.

The other sort of question that I see is related to how the totals are calculated and the question is usually something like the following:

"I have an average measure calculated by doing sum / count - which produces the correct average for each row, but the total is calculated as "sum of sums" / "sum of counts" and my user wants to see it as the average of all the averages."

And to put it bluntly this requirement is invalid. You should never "total" a series of averages by averaging them. The easiest way to explain why this is the case is to illustrate with some data. So let's have a look at a few scenarios.

The first problem you will see with the "average of averages" approach is that it gives too much weight to outlying amounts.

Category

Amount

Count

Average

Bikes

1,000

1

1,000

Helmets

10,000

1,000

10

TOTAL

11,000

1,001

???

Given the data above how should we calculate the total average? if we do the "average of averages" approach we have:

(1000 + 10) / 2 = 505

If we take the SUM(Amount) / SUM(Count) approach we get the following:

11000 / 1001 = 10.99

This is an extreme example to prove a point, but which do you think is correct? Should the 1 bike we sold for $1000 skew the average to $505 or should the fact that it was just one product out of 1001 mean that the average should only be $10.99?

Your business user might be happy seeing a higher average amount, but what if the situation was reversed and we had sold 1000 bikes and just one helmet? This would make the "average of averages" still equal 505 while recalculating the average at the total level would give us $999.01 - I know which calculation I think is giving a better indication of the total average sales.

It's possible that you may be thinking at this point that this is not so much of a big deal for you because you don't have that sort of variability in your data. However that is only the start of the issues. If you are still unsure about the evils of averaging averages then read on because it only gets worse.

To show the next nasty side effect we need to look at just a little bit more data. Take the following 4 records for example where we have data split between 2 cities and 2 product categories

City Category

Amount

Count

Melbourne Bikes

18

3

Melbourne Helmets

25

5

Seattle Bikes

21

3

Seattle Helmets

16

4

When we group the data by City we get the following results. The "Total" line is where the average is recalculated at the total level. Where as the "Avg of Averages" line is where I've take the average of the 2 City averages.

City

Amount

Count

Average

Melbourne

43

8

5.38

Seattle

37

7

5.29

Avg of Averages    

5.34

Total

80

15

5.33

Now lets have a look at what happens to the figures when we group the data by the product category. Notice that the Total line has remained unchanged, but the "Avg of Averages" is now different!

Category

Amount

Count

Average

Bikes

39

6

6.5

Helmets

41

9

4.56

Avg of Averages    

5.53

Total

80

15

5.33

This sort of behaviour - where the figures reported for total and sub-totals will vary depending on how the data is sliced and diced - will be the death of your BI project.

Trust - the most important "feature" of any BI project

I would argue that possibly the most important "feature" of any BI solution is trust. You can always add features and missing functionality, but it can be very difficult to win back the trust of your users once it's been lost. And nothing will erode the trust of your users than seeing inconsistent results.

It's not just straight Averages that are the issue

Anytime you are mixing calculations that do sums and divisions you need to be careful of the order of operations. Ratios, Percentages and moving averages are just a few of the examples of other calculation types for which you need to take care of the order in which add and divide things.

Print | posted on Monday, July 28, 2014 7:18 AM

Comments on this post

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
Regular BI projects would not want an average of averages. However- I'm currently working on a control chart project where there proper methodology is to average a set of subgroup averages.
Left by Phil on Nov 20, 2014 3:44 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
I love this post. I work at an environmental engineering firm and one of our spreadsheets calculates the average flow rate of different remediation pumps. It calculates the average flowrate monthly, then averages those averages in order to get a final average over the reporting period. I've begun automizing the spreadsheet using PowerPivot for excel (using DAX formulas) and my "Total Averages" for the reporting period were showing discrepancies from the old spreadsheet. At first, I though I was doing something wrong, but after reading this post, I'm confident my result is correct. Your examples really hit this point home (giving too much weight to outliers and the occurrence of varying subtotals are great examples).
Left by David Coe on May 02, 2015 6:28 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
I am analyzing 35 survey questions which are regrouped according to 9 competencies . The rating is on the 4 point likert scale. After calculating the weighted averages, is there a method to get a final figures when the survey questions are regrouped under the 9 competencies?
Left by Arti on Mar 04, 2016 1:53 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@Arti - An average is basically a sum divided by a count. To get a grand total you should get the sum of all the "sums" and divide that by the sum of all the counts.
Left by Darren Gosbell on Mar 04, 2016 7:27 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
What is the solution in SSAS. How can we effectively achieve SUM(Amount) / SUM(Count) KPI?
Left by brian on Apr 28, 2016 7:42 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@brian - you would create a calculated measure using that exact expression.
Left by Darren Gosbell on Apr 28, 2016 4:52 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
I have a table consisting of date and averagesales.

like
Date AverageSales
01/06/2016 125
03/06/2016 120
05/06/2016 115

The overall average is 120 = (125+120+115)/ 3.
I can plot this a line in a combo chart with averageSales as a barchart and AverageofAverages as a line.

i used dax
CALCULATE(Averagex(DayWiseSales,[AverageSales]),All(DayWiseSales)
)

The problem I am facing is if the user selects only two dates in the slicer then also it shows the averageofaverages 120 instead of showing the average of averages for those two dates alone.

Any help please.
Left by Srinivasan on Jun 29, 2016 2:19 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@Srinivasan - you should not be storing averages in your table as they are non-additive (that is the whole point of this blog post). You should store a sales amount and a count of sales. Then you can easily calculate an average and it will work correctly at all levels.
Left by Darren Gosbell on Jun 29, 2016 3:12 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
If we have different type and different value its batter to not sum up average but get an average from adding the value together (grand total )
in case of the first example
Type Amount Count
Bike 1000 1
Helmets 10000 1000
so the Average 9991.0 908.18
Left by Ashenafi Nuguse on Mar 30, 2017 12:34 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@Ashenafi - I can't figure out how you arrived at those figures for the averages. But the correct total average is (1000 + 10000) / (1 + 1000) = 10.989
Left by Darren Gosbell on Mar 30, 2017 5:34 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
Project 1: 100 bugs in size 10
Project 2: 200 bugs in size 10
Project 3: 100 bugs in size 20

Avg Project1: 10
Avg Project2: 20
Avg Project3: 5
Avg of Avg : (10+20+5)/3 =11.66
Overall avg: (100+200+100)/(10+10+20)=10

Which one is correct to provide insight of overall average?
Left by Raheel on Oct 12, 2017 2:01 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@Raheel, both your options are wrong. You have to sum all the size records and then divide by the count. The correct overall average would be:

((100*10) + (200*10) + (100*20) / (100+200+100) =12.5
Left by Darren Gosbell on Oct 12, 2017 5:29 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
What i was trying to achieve was average bugs per unit size.
Kindly name the average you calculated so that I can get an understanding of what you did.
Left by Raheel on Oct 13, 2017 9:54 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
So I've calculated the average size per bug. I can;t think how you would do an average per size unit. I'm not sure if it makes sense. You could do an average of bugs per project - then you could compare the average per size to the total average to see if large projects average more bugs than smaller ones.
Left by Darren Gosbell on Oct 13, 2017 6:52 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
What I am trying to calculate is the following:

What is the average bugs per size

What you are calculating is average size per bug.

Kindly assist me in calculating average bugs per size and whether to take average of average or not?

Left by Raheel on Oct 16, 2017 1:49 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
What do you have to say about this article that seems to say that both cases have legit business uses cases: http://mathforum.org/library/drmath/view/74812.html.
Also, what about Tableau's two-pass totals, that allow doing this?
Thanks
Left by Duarte Cunha Leão on Nov 30, 2017 11:34 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
If you have some specific business need for a multi-level measure like "average sales per employee per company" then there is a case to do the calculations in the second example in that article. However if you are doing this your metric should be named appropriately to indicate this extra complexity. But generally most of what I see are just simple averages and most of the time you should not be averaging those to get a grand total.
Left by Darren Gosbell on Nov 30, 2017 2:00 PM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
Hello,

Thanks for the great post.
I am struggling at the moment to put in place an average like the one stated in your comment of the Oct 12, 2017 5:29 AM.
For example I have 3 amounts and 3 counts values :
100 3
50 5
200 2

In MDX if I do Amount/Average the value is correct per row but the grand total is 100 + 50 + 200 / 3 + 5 + 8 = 43.75 while I need to have
100 * 3 + 50 * 5 + 200 * 2 / 3 + 5 + 2= 118.75.

How can I achieve this using MDX ?

Any tips will be greatly appreciated
Left by Alex on Jan 04, 2018 3:12 AM

# re: The perils of calculating an Average of Averages

Requesting Gravatar...
@Alex - if you want to do the multiplication at the row level, the easiest way to do this is to create a calculated column either in your source view or in the DSV, then create a new measure over that column using a SUM aggregation. Then you just divide this new measure by the a row count measure.
Left by Darren Gosbell on Jan 04, 2018 10:47 PM

Your comment:

 (will show your gravatar)