In these set of Interview Question we will learn about basics of SSIS such as Data flow, Control flow, Various different type of data flow transformations, Execution Trees and Buffers
Q1) What is a Control flow and Data Flow elements in SSIS?
Control flow element is one that performs any function or provides structure or control the flow of the elements. There must be at least one control flow element in the SSIS package. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result. A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow
All ETL tasks related to data are done by data flow elements. It is not necessary to have a data flow element in the SSIS package. A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package.
Q2) What are Synchronous and Asynchronous transformations in SSIS?
1) Synchronous transformation(Non-blocking)
Processes data Row by Row: Output is synchronous with input, it occurs at the same time. Therefore, to process a given row, the transformation does not need information about other rows in the data set. Physical layout of the result set is not changed.
Buffers are Reused: The output of a synchronous component uses the same buffer as the input and does not require data to be copied to a new buffer to complete the transformation.
An example of a synchronous transformation is the Derived Column transformation. For each incoming row, it add/replace the specified column and sends the row on its way down the pipeline. Each derived column value for a row is independent of all the other rows in the data set.
Examples: Conditional Split , Multicast(SSIS uses pointers in this case), Derived Column, Row Count, Lookup with no cache or partial cache, OLE DB Command, Script Component
2) Partially Blocking Asynchronous Transformations
Works with group of rows : This type of transformation has to acquire multiple buffers (not all) of input data before it can perform its processing. An example is the Merge transformation, where the component has to examine multiple rows from each input and then merge them in sorted order.
Data copied to new buffers: The output buffer or output rows are not in sync with the input buffer, output rows use a new buffer. In these situations it’s not possible to reuse the input buffer because an asynchronous component can have more, the same or less output records than input records.
Examples: Merge, Merge Join, Union All
3) Partially Blocking Asynchronous Transformations
Needs all input rows before producing any output: There is no one-to-one correspondence between input rows and output rows.
Data is copied to new buffers: This type of transformation has to acquire all buffers of input data before it can perform its processing. For example, a sort transformation must see all rows before sorting and block any data buffers from being passed down the pipeline until the output is generated.
Examples: Sort, Aggregate, Pivot, Unpivot, Fuzzy grouping, Fuzzy lookup
Note: As Synchronous components reuse buffers and therefore are generally faster than asynchronous components, Sort is a fully blocking transformation, so it’s better to sort your data using the SQL command in OLE DB Source instead of using sort transformation. Merge transform requires Sort but not Union All, so use Union All wherever possible.
Q3) What is execution tree in SSIS dataflow, When it is created, When it ends and what is the scope of an buffer?
At run time, the data flow engine breaks down Data Flow task operations(transformations) into multiple chunks called execution unit. The individual execution unit is called an execution tree. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new memory structure called a data buffer and can be executed by separate thread along with other execution trees in a parallel manner, the buffer created has the scope of each individual execution tree.
An execution tree normally starts at either the source or an asynchronous transformation and ends at the next asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing pointers to buffer.
Execution trees are enormously valuable in understanding buffer usage. They can be displayed for packages by turning on package logging for the Data Flow task