The problem of power users, DAX and difficult calculations
Reposted from Chris Webb's blog with the author's permission.
Ever since PowerPivot got released, one of the questions I’ve heard debated over and over is whether it’s easy for non-IT users to learn and use DAX or not. The stock answer from Microsoft, and I agree with them here, is that anyone with basic Excel knowledge can do simple calculations in DAX, but the more complex calculations (for example, those which need to use the CALCULATE() function) are probably only ever going to be written and understood by BI professionals. Even then there’s plenty of evidence that even BI and Excel pros struggle to get to drips with DAX, as blog posts like this and this suggest. This is certainly good news for me professionally – my feeling is that for more complex calculations, DAX is just as difficult as MDX although conceptually very different, and I’ve made a good living out of MDX over the years – but on the other hand I can’t help but feel this represents a capitulation on the part of Microsoft. After all, isn’t the target PowerPivot user someone like an accountant, actuary or statistician, someone who is mathematically literate and capable of creating amazing complicated spreadsheets? How can self-service BI be truly self-service if certain calculations are still too difficult for anyone but IT professionals?
The problem isn’t the design of DAX as such, I think, but the fact that DAX exists as a language. I’m sure most BI professionals have seen worked examples of calculations created in Excel that we have been expected to translate into SQL/MDX/DAX/whatever tool we’re using. It’s the act of translating the calculation into an expression that’s the difficult thing: end-users understand how the calculations work, but they can’t speak the languages that BI tools use whereas IT people can. So why can’t our BI tools allow users to express calculations in a way that users can understand – as a series of spreadsheet formulas?
One example of a tool that tries to do this already is the Intelligencia OLAP control. However, what prompted these thoughts was Project Dirigible, a new web-based spreadsheet from the people who produce Resolver One. The key feature that sets it apart from regular spreadsheets-in-a-browser is that it’s designed for scenarios where users want to scale out complex calculations over a large number of CPUs in the cloud. What makes this possible is the run_worksheet() function, documented here, that allows you to write a calculation that overwrites the values in cells on another sheet, recalculates that sheet, and then retrieves values from cells in that sheet – as this blog post explains, it allows you to use a spreadsheet as a function from within another spreadsheet. I think this is a deeply cool idea.
Here’s an extremely simple example of how it works. The following sheet:
http://www.projectdirigible.com/user/cwebb/sheet/1254/
Contains an example of a tax calculation, with three cells: the value before tax, the tax rate, and the value after tax.
This second sheet then contains a number of sales records and calls the first sheet to calculate the value for each of the sales after tax:
http://www.projectdirigible.com/user/cwebb/sheet/1264/
Here’s the function call I’ve used to calculate tax:
=run_worksheet("http://www.projectdirigible.com/user/cwebb/sheet/1254/", {(2,3) -> c3}).d3.value
Apart from the parallelism, there are a number of obviously good things that follow on from breaking calculations out into separate sheets like this: the ability to reuse this calculation across multiple spreadsheets; the ability to completely change how this calculation works at some later date so long as the parameters remain the same; and the fact that how it works is easily comprehensible to non-IT users because it’s expressed in a very visual way. I think this general approach could be applied to DAX: in DAX, most advanced calculations involve manipulating tables of data, filtering them and aggregating values in different columns. We have tables in Excel, these tables can be filtered, we can create calculated fields inside these tables and we can aggregate and perform other calculations on the data in these tables elsewhere in the sheet, so everything we can do in a DAX calculation can already be represented in the form of a spreadsheet. So if it was possible for the Vertipaq engine to understand calculations expressed in spreadsheet form instead of as DAX expressions then non-IT users would have far fewer limits on the kind of calculations they could write. Of course, this is a very big ‘if’ and there’s only a slim chance that something like this might happen – it would require a lot of development effort and probably that PowerPivot became a native feature of Excel.
Sadly, PowerPivot apart, I’m not sure I see much evidence of exciting new ideas in the world of Excel; indeed it seems like Excel has barely got to grips with the idea of the internet, let alone a truly cloud-based model. Why can’t all the interesting parallelisation stuff that’s possible with Windows HPC Server 2008 R2 be translated to the cloud and be made available to all Excel users? Why doesn’t the Excel Web App have any kind of API yet? I’m sure all the usual responses about lack of time and resources can be reused here, but I’d hate to see Excel become yet another IE6, neglected by MS while the competition innovates – and after all, isn’t Office one of Microsoft’s cash-cows, and so shouldn’t it be getting the kind of time and money lavished on stuff like Windows Phone and Bing…?
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ . |