Informatica Interview Questions
Informatica Interview Questions
Informatica Interview Questions
Static cache
Dynamic cache
Persistent cache
Shared cache
Recache
Change the number of rows that pass through the transformation: For instance, the Filter
transformation is active because it removes rows that do not meet the filter condition.
Change the transaction boundary: For e.g., the Transaction Control transformation is active
because it defines a commit or roll back transaction based on an expression evaluated for each
row.
Change the row type: For e.g., the Update Strategy transformation is active because it flags rows
for insert, delete, update, or reject.
Passive Transformation: A passive transformation is one which will satisfy all these conditions:
Does not change the number of rows that pass through the transformation
Maintains the transaction boundary
Maintains the row type
On issuing the STOP command on the session task, the integration service stops reading data from the
source although it continues processing the data to targets. If the integration service cannot finish
processing and committing data, we can issue the abort command.
ABORT command has a timeout period of 60 seconds. If the integration service cannot finish processing
data within the timeout period, it kills the DTM process and terminates the session
What are the similarities and differences between ROUTER and FILTER?
The differences are:
For E.g.:
Imagine we have 3 departments in source and want to send these records into 3 tables. To achieve this, we
require only one Router transformation. In case we want to get same result with Filter transformation then
we require at least 3 Filter transformations.
Similarity:
A Router and Filter transformation are almost same because both transformations allow you to use a
condition to test data.
We can make use of sorter transformation and select distinct option to delete the duplicate rows.
What are the different ways to filter rows using Informatica transformations?
Source Qualifier
Joiner
Filter
Router
What are the different transformations where you can use a SQL override?
Source Qualifier
Lookup
Target
The Informatica Powercenter Partitioning Option increases the performance of the Powercenter through
parallel data processing. The Partitioning option will let you split the large data set into smaller subsets
which can be processed in parallel to get a better session performance.
Database partitioning: The Integration Service queries the database system for table partition
information. It reads partitioned data from the corresponding nodes in the database.
Round-Robin Partitioning: Using this partitioning algorithm, the Integration service distributes data
evenly among all partitions. It makes sense to use round-robin partitioning when you need to distribute
rows evenly and do not need to group data among partitions.
Hash Auto-Keys Partitioning: The Powercenter Server uses a hash function to group rows of data
among partitions. When the hash auto-key partition is used, the Integration Service uses all grouped or
sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank,
Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they
enter these transformations.
Hash User-Keys Partitioning: Here, the Integration Service uses a hash function to group rows of data
among partitions based on a user-defined partition key. You can individually choose the ports that define
the partition key.
Key Range Partitioning: With this type of partitioning, you can specify one or more ports to form a
compound partition key for a source or target. The Integration Service then passes data to each partition
depending on the ranges you specify for each port.
Pass-through Partitioning: In this type of partitioning, the Integration Service passes all rows from one
partition point to the next partition point without redistributing them.
Mention a few design and development best practices for Informatica.
Mapping design tips: Standards – sticking to consistent standards is beneficial in the long run. This
includes naming conventions, descriptions, environment settings, parameter files, documentation, among
others.
Reusability – in order to react quickly to potential changes, use Informatica components like
mapplets, worklets, and reusable transformations.
Scalability – when designing and developing mappings, it is a good practice to keep volumes in
mind. This is caching, queries, partitioning, initial vs incremental loads.
Simplicity – it is recommended to create multiple mappings instead of few complex ones. Use
Staging Area and try to keep the processing logic as clear and simple as possible.
Modularity – use the modular design technique (common error handling, reprocessing).
Source Qualifier – use shortcuts, extract only the necessary data, limit read of columns and rows
on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL
Query override which may impact database resources and make unable to use partitioning and
push-down.
Expressions – use local variables to limit the amount of redundant calculations, avoid datatype
conversions, reduce invoking external scripts (coding outside of Informatica), provide comments,
use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally
faster than string operations.
Filter – use the Filter transformation as close to the source as possible. If multiple filters need to
be applied, usually it’s more efficient to replace them with Router.
Aggregator – use sorted input, also use as early (close to the source) as possible and filter the
data before aggregating.
Joiner – try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is
good practice to use a source with fewer rows, such as a Master source.
Lookup – relational lookup should only return ports that meet the condition. Call Unconnected
Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review
the database objects and add indexes to database columns when possible. Use Cache Calculator in
session to eliminate paging in lookup cache.
After a while, data in a table becomes old or redundant. In a scenario where new data enters the table, re
cache ensures that the data is refreshed and updated in the existing and new cache.
Differentiate between Source Qualifier and Filter Transformation?
2. Can filter rows only from relational sources. 2. Can filter rows from any type of source system.
3. It limits the row sets extracted from a source. 3. It limits the row set sent to a target.
4. It enhances performance by minimizing the 4. It is added close to the source to filter out the
number of rows used in mapping. unwanted data early and maximize performance.
i. If the source is DBMS, you can use the property in Source Qualifier to select the distinct
records.
ii. You can use, Aggregator and select all the ports as key to get the distinct values. After you pass
all the required ports to the Aggregator, select all those ports , those you need to select for de-
duplication. If you want to find the duplicates based on the entire columns, select all the ports as
group by key.
iii. You can use Sorter and use the Sort Distinct Property to get the distinct values.
iv. You can use Sorter, Expression and Filter transformation, to identify and remove duplicate if
your data is sorted.
v. When you change the property of the Lookup transformation to use the Dynamic Cache, a
new port is added to the transformation. NewLookupRow. The Dynamic Cache can update the
cache, as and when it is reading the data. If the source has duplicate records, you can also use
Dynamic Lookup cache and then router to select only the distinct one.
If we have a requirement to join the mid-stream or the sources are heterogeneous, then we will have to
use the Joiner transformation to join the data.
Un- cached lookup– Here, the lookup transformation does not create the cache. For each record,
it goes to the lookup Source, performs the lookup and returns value. So for 10K rows, it will go
the Lookup source 10K times to get the related values.
Cached Lookup– In order to reduce the to and fro communication with the Lookup Source and
Informatica Server, we can configure the lookup transformation to create the cache. In this way,
the entire data from the Lookup Source is cached and all lookups are performed against the
Caches.
Based on the types of the Caches configured, we can have two types of caches, Static and Dynamic.
The Integration Service performs differently based on the type of lookup cache that is configured. The
following table compares Lookup transformations with an uncached lookup, a static cache, and a dynamic
cache:
Persistent Cache
By default, the Lookup caches are deleted post successful completion of the respective sessions but, we
can configure to preserve the caches, to reuse it next time.
Shared Cache
We can share the lookup cache between multiple transformations. We can share an unnamed cache
between transformations in the same mapping. We can share a named cache between transformations in
the same or different mappings.
During session configuration, you can select a single database operation for all rows using the Treat
Source Rows As setting from the ‘Properties’ tab of the session.
Insert: – Treat all rows as inserts.
Delete: – Treat all rows as deletes.
Update: – Treat all rows as updates.
Data Driven :- Integration Service follows instructions coded into Update Strategy flag rows for
insert, delete, update, or reject.
Once determined how to treat all rows in the session, we can also set options for individual rows, which
gives additional control over how each rows behaves. We need to define these options in the
Transformations view on mapping tab of the session properties.
Steps:
1. Design the mapping just like an ‘INSERT’ only mapping, without Lookup, Update Strategy
Transformation.
2. First set Treat Source Rows As property as UPDATE.
3. Next, set the properties for the target table as Insert and Update else Insert. Choose the
properties Insert and Update else Insert.
These options will make the session as Update and Insert records without using Update Strategy in Target
Table.
When we need to update a huge table with few records and less inserts, we can use this solution to
improve the session performance.
The solutions for such situations is not to use Lookup Transformation and Update Strategy to insert and
update records.
The Lookup Transformation may not perform better as the lookup table size increases and it also
degrades the performance.
Why update strategy and union transformations are Active? Explain with
examples.
1. The Update Strategy changes the row types. It can assign the row types based on the
expression created to evaluate the rows. Like IIF (ISNULL (CUST_DIM_KEY), DD_INSERT,
DD_UPDATE). This expression, changes the row types to Insert for which the
CUST_DIM_KEY is NULL and to Update for which the CUST_DIM_KEY is not null.
2. The Update Strategy can reject the rows. Thereby with proper configuration, we can also filter
out some rows. Hence, sometimes, the number of input rows, may not be equal to number of
output rows.
Here we are checking if CUST_DIM_KEY is not null then if SRC_CUST_ID is equal to the
TGT_CUST_ID. If they are equal, then we do not take any action on those rows; they are getting rejected.
Union Transformation
In union transformation, though the total number of rows passing into the Union is the same as the total
number of rows passing out of it, the positions of the rows are not preserved, i.e. row number 1 from
input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the
output is repeatable. Hence it is an Active Transformation.
How do you load first and last records into target table? How many ways
are there to do it? Explain through mapping flows.
The idea behind this is to add a sequence number to the records and then take the Top 1 rank and
Bottom 1 Rank from the records.
I have 100 records in source table, but I want to load 1, 5,10,15,20…..100 into
target table. How can I do this? Explain in detailed mapping flow.
This is applicable for any n= 2, 3,4,5,6… For our example, n = 5. We can apply the same logic for any n.
The idea behind this is to add a sequence number to the records and divide the sequence number by n (for
this case, it is 5). If completely divisible, i.e. no remainder, then send them to one target else, send them to
the other one.
How do you load unique records into one target table and duplicate records
into a different target table?
Source Qualifier -> Aggregator (count) -> Router (count = 1 and count >1) -> targets
You can use this kind of query to fetch more than 1 Max salary for each department.
SELECT * FROM (
Informatica Approach:
This will give us the top 3 employees earning maximum salary in their respective departments.
Target load order (or) Target load plan is used to specify the order in which the integration service loads
the targets. You can specify a target load order based on the source qualifier transformations in a
mapping. If you have multiple source qualifier transformations connected to multiple targets, you can
specify the order in which the integration service loads the data into the targets.
A target load order group is the collection of source qualifiers, transformations and targets linked in a
mapping. The integration service reads the target load order group concurrently and it processes the target
load order group sequentially. The following figure shows the two target load order groups in a single
mapping.
Write the Unconnected lookup syntax and how to return more than one
column.
:LKP.lokup_transformation_name()
We can only return one port from the Unconnected Lookup transformation. As the Unconnected
lookup is called from another transformation, we cannot return multiple columns using
Unconnected Lookup transformation.
However, there is a trick. We can use the SQL override and concatenate the multiple columns,
those we need to return. When we do the lookup from another transformation, we need to
separate the columns again using substring.
Why aggregator outputs last row when you do not select any groupby port.
The integration service performs aggregate calculations and produces one row for each group. If you do
not specify any group by ports, the integration service returns one row for all input rows. By default, the
integration service returns the last row received for each group along with the result of aggregation. By
using the FIRST function, you can specify the integration service to return the first row of the group.
It creates 2 caches – data cache (for input ports) and Index cache (for group by ports)
Look at this question from a different point of view: imagine you are an Aggregator. During the session,
you collect all input records one by one. Finally, you're asked to produce one output record. But no one
has told you how to create any aggregate value (e.g. a sum, medium value, median, or whatever). So what
will you output? There are only two logical choices: the first or the last record. And Informatica R&D has
opted for the latter option (last record). To me this sounds quite simple.
What is a domain?
The main organizational point sometimes undertakes all the interlinked and interconnected nodes and
relationship and this is known as the domain. These links are covered mainly by one single point of
organization.
2. Numerous transformations
Define workflow?
The workflow includes a set of instructions which allows the server to communicate for the
implementation of tasks.
1. Task Designer
2. Task Developer
3. Workflow Designer
4. Worklet Designer
1. Source Definition
3. Workflow
4. Target Definition
5. Mapping
6. ODBC Connection
1. Global Repositories
2. Local Repositories
Mainly Extraction, Loading (ETL) and Transformation of the above-mentioned metadata are performed
through the Power Centre Repository.
How to use PMCMD Utiliy Command?
1. It is a command based client program that communicates with integration service to perform some of
the tasks which can also be performed using workflow manager client.
1. Starting workflow.
2. Scheduling workflow.
1. Interactive Mode.
Running.
3. Can’t be used with SQL override 3. Can be used with SQL override
Mapping variables is used for incremental extraction.
1. Reusable worklet
2. Non-Reusable worklet
2. Assign users to access the folders with read, write and execute permissions.
Informatica administrator.
Mapping is nothing but ETL Application.
A session is a set of instructions that tells ETL server to move the data from source to destination.
Workflow:
Workflow is a set of instructions that tells how to run the session taks and when to run the session tasks.
It is a GUI based ETL product from informatica corporation which was founded in 1993 Red wood city,
California.
There are many products in informatica corporation:
1. Informatica Analyzer.
2. Life cycle management.
3. Master data
Having many products in informatica.
2. A Data Modeler (or) Database Architech Designs the warehouse Database using a GUI based data
modeling tool called “ERWin”.
1. Star Schema.
2. Snowflake Schema.
3. Galary Schema.
It stores/records current and up to date which is It stores/analyze historical data which is used
used in daily operations for information support on a long-term basis.
It consists of detailed and primitive data where It consists of summarized a consolidated data
its view is flat relational. where its view is multidimensional.
Low performance is observed for Analytical Analytical queries are judged here as high
queries. performance.