MDX Calculations Part 3 – Subcube Vectors, Solve Order, Block vs Cell By Cell Computation

Understanding Block vs Cell By Cell Computation

The key idea behind subspace computation is best introduced by contrasting it with a cell-by-cell evaluation of a calculation. (This is also known as a naïve calculation.) Consider a trivial calculation RollingSum that sums the sales for the previous year and the current year, and a query that requests the RollingSum for 2005 for all Products.
RollingSum = (Year.PrevMember, Sales) + Sales

SELECT 2005 on columns, Product.Members on rows WHERE RollingSum

Cell-by-cell evaluation

The 10 cells for [2005, All Products] are each evaluated in turn. For each, the previous year is located, and then the sales value is obtained and then added to the sales for the current year. There are two significant performance issues with this approach.

Firstly, if the data is sparse (that is, thinly populated), cells are calculated even though they are bound to return a null value. In the previous example, calculating the cells for anything but Product 3 and Product 6 is a waste of effort. The impact of this can be extreme—in a sparsely populated cube, the difference can be several orders of magnitude in the numbers of cells evaluated.

Secondly, even if the data is totally dense, meaning that every cell has a value and there is no wasted effort visiting empty cells, there is much repeated effort. The same work (for example, getting the previous Year member, setting up the new context for the previous Year cell, checking for recursion) is redone for each Product. It would be much more efficient to move this work out of the inner loop of evaluating each cell.

Block (Subspace) evaluation

Now consider the same example performed using subspace computation. In subspace computation, the engine works its way down an execution tree determining what spaces need to be filled. Given the query, the following space needs to be computed, where * means every member of the attribute hierarchy.
[Product.*, 2005, RollingSum]

Given the calculation, this means that the following space needs to be computed first.
[Product.*, 2004, Sales]
Next, the following space must be computed.
[Product.*, 2005, Sales]

Finally, the + operator needs to be added to those two spaces.
If Sales were itself covered by calculations, the spaces necessary to calculate Sales would be determined and the tree would be expanded. In this case Sales is a base measure, so the storage engine data is used to fill the two spaces at the leaves, and then, working up the tree, the operator is applied to fill the space at the root. Hence the one row (Product3, 2004, 3) and the two rows { (Product3, 2005, 20), (Product6, 2005, 5)} are retrieved, and the + operator applied to them to yields the following result.

The + operator operates on spaces, not simply scalar values. It is responsible for combining the two given spaces to produce a space that contains each product that appears in either space with the summed value. This is the query execution plan. Note that it operates only on data that could contribute to the result. There is no notion of the theoretical space over which the calculation must be performed.
A query execution plan is not one or the other but can contain both subspace and cell-by-cell nodes. Some functions are not supported in subspace mode, causing the engine to fall back to cell-by-cell mode. But even when evaluating an expression in cell-by-cell mode, the engine can return to subspace mode.

Understanding Subcube Vectors

Understanding MDX Calculations Solve Order