The performance goal of dimension processing is to refresh dimension data in an efficient manner that does not negatively impact the query performance of dependent partitions. The following techniques for accomplishing this goal are discussed in this section:
- Reducing attribute overhead.
- Optimizing SQL source queries.
Improving Dimensions Processing Performance:
1) Design attribute relationships and user hierarchies: These are the most important thing to consider in dimension designing. By default all attributes are related to the key attribute, so for Date dimension all attributes such as Year, Month, Quarter are related to DateKey and whenever we query any non key attribute(say quarter) they are fetched using key attribute which does not generate optimally performing query plan since SSAS engine has no knowledge about relationships between non key attributes so we should define attribute relationships wherever applicable. Now let’s say attribute relationship Date -> Month -> Quarter -> Year is designed correctly. Now when month or quarter hierarchy is used in some query it gets stored in cache for use in subsequent queries and when we hit new query which involves year it will aggregate 4 quarters or 12 months of data to arrive at yearly figures instead of having to aggregate 365 days improving query execution performance. Another benefit comes when we perform crossjoin as now it don’t need to go through key to join different attribute members of same dimension.
Also by default an attribute relationship is considered Flexible, but wherever applicable make it Rigid for better stability (both gives same performance but for flexible aggregations may be dropped during dimension processing). If you make it rigid, SSAS doesn’t bother updating members of a dimension on subsequent processing and hence improves the performance. Please make sure you are changing relationships to rigid only in cases where it does not change or else you may get exceptions during processing.
Attribute hierarchies also Improves dimension processing performance since attribute processing order is also determined by relationships, so make sure you don’t create redundant hierarchy, for example “days roll up into month” and “months roll up into quarter” and also “days roll up into quarter” because this would add extra overhead in processing, also see Understanding Dimension Processing in SSAS Multidimensional
We also need to appropriately set KeyColumn and NameColumn property to ensure that the KeyColumn property is set to identify unique values, for e.g. a month value of 1 is insufficient if the dimension contains more than a single year, so in this case combine Year and Month columns together to make them unique or key columns.
2) Turn off AttributeHierarchyEnabled property: Set AttributeHierarchyEnabled to False for all those attributes for which you don’t need aggregation to be calculated and want them to access it as member properties i.e for informational purpose. Attribute hierarchies can be used in the query axis of MDX queries, but member properties cannot. They can be used only by querying the attribute hierarchy that contains the member property. For example, a employees Email or Phone number may be needed for informational purposes in reports, but not to slice or dice a measure on a query axis. Hence date of birth and email address can be classified as member properties rather than attribute hierarchies. Setting the AttributeHierarchyEnabled property improves the processing performance and also reduces the overall cube size as those attributes will not be considered in aggregation and for index creation. This makes sense for all those attributes which are not used for slicing and dicing. However this turn into a drawback as well since filtering measures using member properties will be much slower than filtering by attribute hierarchies.
3) Turn off AttributeHierarchyOptimized property: During processing of the primary key attribute, bitmap indexes are created for every related attribute. Building the bitmap indexes for the primary key can take time if it has one or more related attributes with high cardinality (for example SSN ). At query time, the bitmap indexes for these attributes are not useful in speeding up retrieval, since the storage engine still must sift through a large number of distinct values to reach the desired values. Unwanted bitmap indexes increase processing time, increase the cube size as well as they may have a negative impact on query response time. To avoid spending time building unnecessary bitmap indexes during processing set the AttributeHierarchyOptimizedState property to Not Optimized. See this post for Optimizing Bitmap Indexes during Partition Processing in SSAS Multidimensional
4) Turn off AttributeHierarchyOrdered property: The AttributeHierarchyOrderded property controls whether members of an attribute hierarchy are sorted in a particular order or not. The OrderBy and OrderByAttribute property can be further used to control the exact way in which sorting is desired. In case there are certain attributes where the order of members is not important, then setting AttributeHierarchyOrdered to false can save processing time for the dimension.
5) Delete extra and unnecessary attributes: We might be tempted to include all the columns in the cube dimension from the database dimension, although there is no harm to include them all if all the columns are required for analysis, but if not then don’t include them. Including unnecessary columns puts extra overhead on SSAS for managing/storage of these columns and takes longer for processing and querying. for e.g. we can merge ProductKey and ProductName in one attribute with keycolumn as ProductKey and namecolumn as ProductName and we can use ProductKey in OrderBy property also for ordering purposes.
6) Check if ByTable processing improves performance: Change the ProcessingGroup property of the dimension to be ByTable (instead of the default value of ByAttribute). As the ByTable option takes the entire table data for that dimension into memory, do this only if you have lots of memory or when dimension table is small. Also this option will only works only when you have only a single table to draw the data from. Multiple tables in star schema or snow-flake schema will issue multiple distinct queries against the data source, so for dimension derived from multiple tables, we must use a view. Also we must set error config KeyDuplicate=IgnoreError. This option can also be used when there are constant updates in the source database which might cause attribute key errors while processing dimension since this option gets all data at once not attribute by attribute. For constant updates in source database we can try using Snapshot Isolation.
7) Optimize SQL source queries
- Use a view and NOLOCK hint
- Remove JOINs and Build indexes
8) Use efficient data types
- For surrogate keys use integers (tinyint, smallint, int, bigint)
- For integer measures use integers (tinyint, smallint, int, bigint)
- Hold dates as integers in format yyyyMMdd
- Numeric measures use smallmoney, money, real, float (Note that decimal and vardecimal require more CPU power to process than money and float types)