The performance goal of partition processing is to refresh Cube/OLAP data in an efficient manner that does not negatively impact the query performance of dependent partitions. The following techniques for accomplishing this goal are discussed in this section:
- Optimizing Aggregations to make queries run faster.
- Using Partitioning to avoid processing entire data every time.
- Optimizing Bitmap Indexes, see Optimizing Bitmap Indexes during Partition Processing in SSAS Multidimensional
- Optimizing Data Compression for fact data, see here.
Define the aggregation optimally for the measure groups as aggregations reduce the number of values that SSAS has to scan from the disk to generate the response. While having more (all required) aggregations improves the query performance it will be too slow during cube processing whereas if you have too few aggregations it slows down the query performance, but increases the processing performance. Individual aggregations are organized into collections of aggregations called AggregationDesigns. Once created, an AggregationDesign can be applied to many partitions. A single measure group can also have multiple AggregationDesigns, so that you can choose different sets of aggregations for different partitions.
To help Analysis Services successfully apply the AggregationDesign algorithm, you can perform the following optimization techniques to influence and enhance the aggregation design. In this section we will discuss the following:
Tips for designing optimal aggregations strategy:
- Use correct Attribute Relationships, see this post for understanding Optimizing Dimension Processing in SSAS Multidimensional
- Don’t over aggregate and consolidate aggregations:
- For e.g. don’t include key column with another key column in aggregations e.g. ProductKey (n distinct values) and DateKey(m distinct values)) since combinations will result in cross join(m*n) and aggregate data file may be larger than fact data file which is red flag. These will result in rigid aggr since both are keys. Also observe in profiler it will use MergeAgg not just Agg since cardinality is large (m*n)
- Let say you have two different aggregation, first is Month with Product Category and another one is Year with Product Category, let’s say as per business requirement both of these aggregations are required to speed up different types of queries. Solution is to consolidate these two aggregations into one Month with Product Category aggregation, this aggregation will be used to speed up both year and month queries if attribute relationship is defined correctly.
- Usage-based optimization: You can define the fact table source record count in the EstimatedRows property of each measure group, and you can define attribute member counts in the EstimatedCount property of each attribute. This way you can ensure your metadata is up-to-date which will improve the effectiveness of your aggregation design and creation. Generally we create aggregations to gain 20%-30% performance in the beginning and the later use the Usage Based Optimization wizard to create more aggregations for all the queries being run against the cube. The idea is you enable logging for queries being run against your cube and then you use the collected information as an input to the Usage Based Optimization wizard for creating aggregations for all or long running queries. To learn more about this click here. If you have created partitions on measure groups, you might consider having a higher percentage of aggregation for all those older partitions which are less likely to change whereas lower percentage of aggregations for those recent partitions which are more likely to change. You should not create aggregations that are larger than one-third of the size of the fact data.
- Avoid spilling to disk, Ensure max parallelism, CPU should be saturated!
- See this optimization here for understanding order of aggregation and which is picked if more than one aggregation is there to satisfy a single query.
- Finally to see which details for aggregations inside a partition run below query
--Aggregrations SELECT * FROM SYSTEMRESTRICTSCHEMA($SYSTEM.DISCOVER_PARTITION_STAT , CUBE_NAME = 'sales' , DATABASE_NAME = 'mdx-fundamentals' , MEASURE_GROUP_NAME = 'Reseller Sales' , PARTITION_NAME = 'Reseller Sales' )
Always apply a partitioning strategy for all the measure groups (especially those which are quite large in size) and partition them by one or more dimensions(mostly DATE) as per usage. This will greatly improve the cube processing as well as query performance of the cube. The processing and query performance improves because of the fact that multiple threads can work together on multiple partitions of a measure group in parallel for processing or for serving query response. You can even define a different aggregation strategy for each partition. For example, you might have a higher percentage aggregation for all those older partitions which are less likely to change whereas a lower percentage of aggregations for those recent partitions which are more likely to change.
Tips for designing optimal partitioning strategy:
- At least one partition per core.
- Try and make even size partitions.
- Ensure fact.data smaller than index.data/agg.data (agg size should not be more than 30% of data size).