Understanding Dimension Processing in SSAS Multidimensional

In Analysis Services, when we first DEPLOY a database it contains only objects but not data. When we PROCESS a database, it directs the server to recursively process those objects that store data in the model, such as dimensions and partitions. Dimension processing refreshes or populates dimension data in the cube and store data into series of dimension stores and to create these dimension stores, the storage engine uses the series of jobs to extract, index, and persist data in these data stores. So we will first try to understand these jobs, below diagram shows these jobs in the order they are executed. 

Dimension Processing Phases:

1) Build Attribute store:
This is first step in the dimension processing and for each attribute in a dimension, a job is instantiated to extract and persist the attribute members into an attribute store. The attribute store consists of the key store, name store, and relationship store. The data structures build during this phase are saved to disk with the following extensions:

  • Key store: *.kstore, *.khstore and *.ksstore
  • Name store: *.asstore, *.ahstore and *.hstore
  • Relationship store: *.data and *.data.hdr

2) Build Hierarchy store:
Hierarchy store is a persistent representation of the tree structure. For each natural hierarchy in the dimension, a job is instantiated to create the hierarchy stores. The data structures build during this phase are saved to disk with the following extensions:

  • Hierarchy store: *.ostore, *.sstore and *.lstore

3) Build Decoding Store and Bitmap Indexes:
Decoding stores are used extensively by the storage engine. During querying, they are used to retrieve data from the dimension. During processing, they are used to build the dimension’s bitmap indexes. The data structures build during this phase are saved to disk with the following extensions:

  • Decoding store: *.dstore

Bitmap Indexes are used to efficiently locate related attribute data in the relationship store at querying time, the storage engine creates bitmap indexes at processing time. For attributes with a very large number of members, the bitmap indexes can take some time to process. In most scenarios, the bitmap indexes provide significant querying benefits; however, when you have high-cardinality attributes, the querying benefit that the bitmap index provides may not outweigh the processing cost of creating the bitmap index. The data structures build during this phase are saved to disk with the following extensions:

  • Bitmap indexes: *.map and *.map.hdr

Attribute Relationships and Processing Order:

Let us consider processing of DimDate dimension with two user hierarchies Fiscal and Calendar, the All attribute proceeds first, given that it has no dependencies to another attribute, followed by the Fiscal Year and Calendar Year attributes, which can be processed in parallel. The other attributes proceed according to the dependencies in the execution tree, with the key attribute always being processed last, because it always has at least one attribute relationship, except when it is the only attribute in the dimension.
The time taken to process an attribute is generally dependent on 1) the number of members and 2) the number of attribute relationships. While you cannot control the number of members for a given attribute, you can improve processing performance by using cascading attribute relationships. This is especially critical for the key attribute, because it has the most members and all other jobs (hierarchy, decoding, bitmap indexes) are waiting for it to complete. Attribute relationships lower the memory requirement during processing. When an attribute is processed, all dependent attributes must be kept in memory. If you have no attribute relationships, all attributes must kept in memory while the key attribute is processed. This may cause out-of-memory conditions.

Dimension Processing Options:

Process Full

Process Update (Incremental Processing)

Process Add (Incremental Processing)

Processing Method

Process Full sends SQL query to database to read entire data of dimension and completely rebuilds the object. It applies to dimension, cube, measure group and partition

Process Update sends SQL query to database to read entire data of dimension and make addition, update and deletion changes smartly without clearing dependent partition data. It applies only to Dimension

 Process Add sends SQL query to database to return only new members of dimension and add them to the dimension object. Have to build XMLA or use AMO and need to set error configuration KeyDuplicate = IgnoreError. It applies to dimension, cube, measure group and partition

Dependent Objects (Partition Data)

Process Full sends explicit Process Clear command which clears dependent partition data and Process Full(or Process Data) is needed for dependent partition data.

Process Update makes addition, deletion and updates of members and preserves the data in dependent partitions. Process Full(or Process Data) is not needed for dependent partition data.

Process Add only makes addition and thus preserves the data in dependent partitions. Process Full(or Process Data) is not needed for dependent partition data.

Dependent Objects (Index and Aggregations)

Process Full clears data in dependent partitions and rebuilds them so it rebuilds indexes and aggregation automatically and explicit Process Index is required.

During Process Update if only new members were added, then the partitions are not affected. But if members were deleted or if member relationships changed (e.g a Customer moved from new jersey to new york), then some of the aggregation data and bitmap indexes on the partitions are dropped so explicit Process Index command is needed on affected partitions. The cube is still available for queries, albeit with lower performance.

 Process Add also preserves dependent partition data and also preserves bitmap indexes and aggregation on dependent partition, so explicit Process Index command is not needed.

Performance

Process Full is faster than Process Update.

Process Update is comparatively slower than Process Full.

Process Add is the fastest method among the three.

Process Full = Process Data + Process Index
Process Data – Applies to dimension, cube, measure group and partition. It discards the storage contents and rebuilds only data.
Dimension it builds Hierarchy and Attributes
Partitions it builds the fact data.
Process Index – Applies to dimension, cube, measure group, and partition. It requires that the object must already have its “data” built; otherwise, it raises an error. ProcessIndexes preserves the data and rebuilds the “indexes”. For dimensions, it builds the bitmap indexes. For partitions, it builds the aggregation data and bitmap indexes. ProcessIndexes is recursively applied to all descendants of the object as well.