Nothing Special   »   [go: up one dir, main page]

Informatica Interview Questions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

What are the differences between connected lookup and unconnected lookup?

Connected vs Unconnected Lookups

Connected Lookup Unconnected Lookup

1. It receives input from the pipeline & participates in


1. It receives input from the result of an LKP.
the data flow.
2. It can use both, dynamic and static cache. 2. It can’t be dynamic.
3. It can return more than one column value i.e. output
3. It can return only one column value.
port.
4. It caches only the lookup output ports in the return
4. It caches all lookup columns.
port & lookup conditions.
5. It supports user-defined default values. 5. It doesn’t support user-defined default values.

Name the different lookup cache(s)?


Informatica lookups can be cached or un-cached (no cache). Cached lookups can be either static or
dynamic. A lookup cache can also be divided as persistent or non-persistent based on whether Informatica
retains the cache even after completing session run or if it deletes it.

 Static cache
 Dynamic cache
 Persistent cache
 Shared cache
 Recache

What is the difference between active and passive transformation?


Active Transformation - An active transformation can perform any of the following actions:

 Change the number of rows that pass through the transformation: For instance, the Filter
transformation is active because it removes rows that do not meet the filter condition.
 Change the transaction boundary: For e.g., the Transaction Control transformation is active
because it defines a commit or roll back transaction based on an expression evaluated for each
row.
 Change the row type: For e.g., the Update Strategy transformation is active because it flags rows
for insert, delete, update, or reject.

Passive Transformation: A passive transformation is one which will satisfy all these conditions:

 Does not change the number of rows that pass through the transformation
 Maintains the transaction boundary
 Maintains the row type

Example: Expression Transformation, lookup transformation

Is ‘sorter’ an active or passive transformation?


When the Sorter transformation is configured to treat output row as distinct, it assigns all ports as part of
the sort key. The integration service discards duplicate rows that were compared during the sort operation.
The number of input rows will vary as compared to the output rows and hence it is an active
transformation.

How do you differentiate dynamic cache from static cache?


The differences are shown in the table below:
What is the difference between STOP and ABORT options in Workflow Monitor?

On issuing the STOP command on the session task, the integration service stops reading data from the
source although it continues processing the data to targets. If the integration service cannot finish
processing and committing data, we can issue the abort command.

ABORT command has a timeout period of 60 seconds. If the integration service cannot finish processing
data within the timeout period, it kills the DTM process and terminates the session

What are the similarities and differences between ROUTER and FILTER?
The differences are:

Advantages of Router transformation over Filter transformation:

 Better Performance; because in mapping, the Router transformation Informatica server processes


the input data only once instead of as many times, as you have conditions in Filter transformation.
 Less complexity; because we use only one Router transformation instead of
multiple Filter transformations.
 Router transformation is more efficient than Filter transformation.

For E.g.:

Imagine we have 3 departments in source and want to send these records into 3 tables. To achieve this, we
require only one Router transformation. In case we want to get same result with Filter transformation then
we require at least 3 Filter transformations.

Similarity:
A Router and Filter transformation are almost same because both transformations allow you to use a
condition to test data.

What are mapplets?

 A Mapplet is a reusable object that we create in the Mapplet Designer.


 It contains a set of transformations and lets us reuse that transformation logic in multiple
mappings.

What is the difference between Mapping and Mapplet?

How can we delete duplicate rows from flat files?

We can make use of sorter transformation and select distinct option to delete the duplicate rows.

What are the different ways to filter rows using Informatica transformations?
 Source Qualifier
 Joiner
 Filter
 Router

What are the different transformations where you can use a SQL override?
 Source Qualifier
 Lookup
 Target

State the differences between SQL Override and Lookup Override?


 The role of SQL Override is to limit the number of incoming rows entering the mapping pipeline,
whereas Lookup Override is used to limit the number of lookup rows to avoid the whole table
scan by saving the lookup time and the cache it uses.
 Lookup Override uses the “Order By” clause by default. SQL Override doesn’t use it and
should be manually entered in the query if we require it
 SQL Override can provide any kind of ‘join’ by writing the query
Lookup Override provides only Non-Equi joins.
 Lookup Override gives only one record even if it finds multiple records for a single condition
SQL Override doesn’t do that.

What is parallel processing in Informatica?


After optimizing the session to its fullest, we can further improve performance by exploiting under
utilized hardware power. This refers to parallel processing and we can achieve this in Informatica
Powercenter using Partitioning Sessions.

The Informatica Powercenter Partitioning Option increases the performance of the Powercenter through
parallel data processing. The Partitioning option will let you split the large data set into smaller subsets
which can be processed in parallel to get a better session performance.

What are the different ways to implement parallel processing in Informatica?


We can implement parallel processing using various types of partition algorithms:

Database partitioning: The Integration Service queries the database system for table partition
information. It reads partitioned data from the corresponding nodes in the database.

Round-Robin Partitioning: Using this partitioning algorithm, the Integration service distributes data
evenly among all partitions. It makes sense to use round-robin partitioning when you need to distribute
rows evenly and do not need to group data among partitions.

Hash Auto-Keys Partitioning: The Powercenter Server uses a hash function to group rows of data
among partitions. When the hash auto-key partition is used, the Integration Service uses all grouped or
sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank,
Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they
enter these transformations.

Hash User-Keys Partitioning: Here, the Integration Service uses a hash function to group rows of data
among partitions based on a user-defined partition key. You can individually choose the ports that define
the partition key.

Key Range Partitioning: With this type of partitioning, you can specify one or more ports to form a
compound partition key for a source or target. The Integration Service then passes data to each partition
depending on the ranges you specify for each port.

Pass-through Partitioning: In this type of partitioning, the Integration Service passes all rows from one
partition point to the next partition point without redistributing them.
Mention a few design and development best practices for Informatica.
Mapping design tips: Standards – sticking to consistent standards is beneficial in the long run. This
includes naming conventions, descriptions, environment settings, parameter files, documentation, among
others.

 Reusability – in order to react quickly to potential changes, use Informatica components like
mapplets, worklets, and reusable transformations.
 Scalability – when designing and developing mappings, it is a good practice to keep volumes in
mind. This is caching, queries, partitioning, initial vs incremental loads.
 Simplicity – it is recommended to create multiple mappings instead of few complex ones. Use
Staging Area and try to keep the processing logic as clear and simple as possible.
 Modularity – use the modular design technique (common error handling, reprocessing).

Mapping development best practices

 Source Qualifier – use shortcuts, extract only the necessary data, limit read of columns and rows
on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL
Query override which may impact database resources and make unable to use partitioning and
push-down.
 Expressions – use local variables to limit the amount of redundant calculations, avoid datatype
conversions, reduce invoking external scripts (coding outside of Informatica), provide comments,
use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally
faster than string operations.
 Filter – use the Filter transformation as close to the source as possible. If multiple filters need to
be applied, usually it’s more efficient to replace them with Router.
 Aggregator – use sorted input, also use as early (close to the source) as possible and filter the
data before aggregating.
 Joiner – try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is
good practice to use a source with fewer rows, such as a Master source.
 Lookup – relational lookup should only return ports that meet the condition. Call Unconnected
Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review
the database objects and add indexes to database columns when possible. Use Cache Calculator in
session to eliminate paging in lookup cache.

Explain shared cache and re cache.


To answer this question, it is essential to understand persistence cache. If we are performing lookup on a
table, it looks up all the data brings it inside the data cache. However, at the end of each session, the
Informatica server deletes all the cache files. If you configure the lookup as a persistent cache, the server
saves the lookup under an anonymous name. Shared cache allows you to use this cache in other mappings
by directing it to an existing cache.

After a while, data in a table becomes old or redundant. In a scenario where new data enters the table, re
cache ensures that the data is refreshed and updated in the existing and new cache.
Differentiate between Source Qualifier and Filter Transformation?

Source Qualifier vs Filter Transformation

Source Qualifier Transformation Filter Transformation

1. It filters rows while reading the data from a


1. It filters rows from within a mapped data.
source.

2. Can filter rows only from relational sources. 2. Can filter rows from any type of source system.

3. It limits the row sets extracted from a  source. 3. It limits the row set sent to a target.

4. It enhances performance by minimizing the 4. It is added close to the source to filter out the
number of rows used in mapping.  unwanted data early and maximize performance.

5. It defines a condition using any statement or


5. In this, filter condition uses the standard SQL to
transformation function to get either TRUE or
execute in the database.
FALSE.

How do you remove Duplicate records in Informatica? And how many ways


are there to do it?
There are several ways to remove duplicates.

i. If the source is DBMS, you can use the property in Source Qualifier to select the distinct
records.
ii. You can use, Aggregator and select all the ports as key to get the distinct values. After you pass
all the required ports to the Aggregator, select all those ports , those you need to select for de-
duplication. If you want to find the duplicates based on the entire columns, select all the ports as
group by key.
iii. You can use Sorter and use the Sort Distinct Property to get the distinct values. 
iv. You can use Sorter, Expression and Filter transformation, to identify and remove duplicate if
your data is sorted. 
v. When you change the property of the Lookup transformation to use the Dynamic Cache, a
new port is added to the transformation. NewLookupRow. The Dynamic Cache can update the
cache, as and when it is reading the data. If the source has duplicate records, you can also use
Dynamic Lookup cache and then router to select only the distinct one.

What are the differences between Source Qualifier and Joiner


Transformation?
The Source Qualifier can join data originating from the same source database. We can join two or more
tables with primary key-foreign key relationships by linking the sources to one Source Qualifier
transformation.

If we have a requirement to join the mid-stream or the sources are heterogeneous, then we will have to
use the Joiner transformation to join the data.

Differentiate between joiner and Lookup Transformation.


Below are the differences between lookup and joiner transformation:

 In lookup we can override the query but in joiner we cannot.


 In lookup we can provide different types of operators like – “>,<,>=,<=,!=” but, in joiner only “=
“ (equal to )operator is available.
 In lookup we can restrict the number of rows while reading the relational table using lookup
override but, in joiner we cannot restrict the number of rows while reading.
 In joiner we can join the tables based on- Normal Join, Master Outer, Detail Outer and Full Outer
Join but, in lookup this facility is not available. Lookup behaves like Left Outer Join of
database.

How can you increase the performance in joiner transformation?


Below are the ways in which you can improve the performance of Joiner Transformation.

 Perform joins in a database when possible.


In some cases, this is not possible, such as joining tables from two different databases or flat file
systems. To perform a join in a database, we can use the following options:
Create and Use a pre-session stored procedure to join the tables in a database.
Use the Source Qualifier transformation to perform the join.

 Join sorted data when possible


 For an unsorted Joiner transformation, designate the source with fewer rows as the master source.
 For a sorted Joiner transformation, designate the source with fewer duplicate key values as the
master source.

What are the types of Caches in lookup? Explain them.


Based on the configurations done at lookup transformation/Session Property level, we can have following
types of Lookup Caches.

 Un- cached lookup– Here, the lookup transformation does not create the cache. For each record,
it goes to the lookup Source, performs the lookup and returns value. So for 10K rows, it will go
the Lookup source 10K times to get the related values.
 Cached Lookup– In order to reduce the to and fro communication with the Lookup Source and
Informatica Server, we can configure the lookup transformation to create the cache. In this way,
the entire data from the Lookup Source is cached and all lookups are performed against the
Caches.

Based on the types of the Caches configured, we can have two types of caches, Static and Dynamic.

The Integration Service performs differently based on the type of lookup cache that is configured. The
following table compares Lookup transformations with an uncached lookup, a static cache, and a dynamic
cache:

Persistent Cache

By default, the Lookup caches are deleted post successful completion of the respective sessions but, we
can configure to preserve the caches, to reuse it next time.

Shared Cache

We can share the lookup cache between multiple transformations. We can share an unnamed cache
between transformations in the same mapping. We can share a named cache between transformations in
the same or different mappings.

How do you update the records with or without using Update Strategy?


We can use the session configurations to update the records. We can have several options for handling
database operations such as insert, update, delete.

During session configuration, you can select a single database operation for all rows using the Treat
Source Rows As setting from the ‘Properties’ tab of the session.
 Insert: – Treat all rows as inserts.
 Delete: – Treat all rows as deletes.
 Update: – Treat all rows as updates.
 Data Driven :- Integration Service follows instructions coded into Update Strategy flag rows for
insert, delete, update, or reject.

Once determined how to treat all rows in the session, we can also set options for individual rows, which
gives additional control over how each rows behaves. We need to define these options in the
Transformations view on mapping tab of the session properties.

 Insert: – Select this option to insert a row into a target table.


 Delete: – Select this option to delete a row from a table.
 Update :- You have the following options in this situation:
o Update as Update: – Update each row flagged for update if it exists in the target table.
o Update as Insert: – Insert each row flagged for update.
o Update else Insert: – Update the row if it exists. Otherwise, insert it.
 Truncate Table: – Select this option to truncate the target table before loading data.

Steps:

1. Design the mapping just like an ‘INSERT’ only mapping, without Lookup, Update Strategy
Transformation.
2. First set Treat Source Rows As property as UPDATE.

3. Next, set the properties for the target table as Insert and Update else Insert. Choose the
properties Insert and Update else Insert.

These options will make the session as Update and Insert records without using Update Strategy in Target
Table.

When we need to update a huge table with few records and less inserts, we can use this solution to
improve the session performance.

The solutions for such situations is not to use Lookup Transformation and Update Strategy to insert and
update records.

The Lookup Transformation may not perform better as the lookup table size increases and it also
degrades the performance.

Why update strategy and union transformations are Active? Explain with
examples.
1. The Update Strategy changes the row types. It can assign the row types based on the
expression created to evaluate the rows. Like IIF (ISNULL (CUST_DIM_KEY), DD_INSERT,
DD_UPDATE). This expression, changes the row types to Insert for which the
CUST_DIM_KEY is NULL and to Update for which the CUST_DIM_KEY is not null.
2. The Update Strategy can reject the rows. Thereby with proper configuration, we can also filter
out some rows. Hence, sometimes, the number of input rows, may not be equal to number of
output rows.

Like IIF (IISNULL (CUST_DIM_KEY), DD_INSERT,

IIF (SRC_CUST_ID! =TGT_CUST_ID), DD_UPDATE, DD_REJECT))

Here we are checking if CUST_DIM_KEY is not null then if SRC_CUST_ID is equal to the
TGT_CUST_ID. If they are equal, then we do not take any action on those rows; they are getting rejected.

Union Transformation

In union transformation, though the total number of rows passing into the Union is the same as the total
number of rows passing out of it, the positions of the rows are not preserved, i.e. row number 1 from
input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the
output is repeatable. Hence it is an Active Transformation.

How do you load first and last records into target table? How many ways
are there to do it? Explain through mapping flows.
The idea behind this is to add a sequence number to the records and then take the Top 1 rank and
Bottom 1 Rank from the records.

I have 100 records in source table, but I want to load 1, 5,10,15,20…..100 into
target table. How can I do this? Explain in detailed mapping flow.
This is applicable for any n= 2, 3,4,5,6… For our example, n = 5. We can apply the same logic for any n.

The idea behind this is to add a sequence number to the records and divide the sequence number by n (for
this case, it is 5). If completely divisible, i.e. no remainder, then send them to one target else, send them to
the other one.

How do you load unique records into one target table and duplicate records
into a different target table?
Source Qualifier -> Aggregator (count) -> Router (count = 1 and count >1) -> targets

How do you load more than 1 Max Sal in each Department through


Informatica or write sql query in oracle?
SQL query:

You can use this kind of query to fetch more than 1 Max salary for each department.

SELECT * FROM (

SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME, DEPARTMENT_ID, SALARY, RANK ()


OVER (PARTITION BY DEPARTMENT_ID ORDER BY SALARY) SAL_RANK FROM
EMPLOYEES)

WHERE SAL_RANK <= 2

Informatica Approach:

We can use the Rank transformation to achieve this.

Use Department_ID as the group key.

In the properties tab, select Top, 3.

This will give us the top 3 employees earning maximum salary in their respective departments.

What is meant by Target load plan?


Target Load Order:

Target load order (or) Target load plan is used to specify the order in which the integration service loads
the targets. You can specify a target load order based on the source qualifier transformations in a
mapping. If you have multiple source qualifier transformations connected to multiple targets, you can
specify the order in which the integration service loads the data into the targets.

Target Load Order Group:

A target load order group is the collection of source qualifiers, transformations and targets linked in a
mapping. The integration service reads the target load order group concurrently and it processes the target
load order group sequentially. The following figure shows the two target load order groups in a single
mapping.

Write the Unconnected lookup syntax and how to return more than one
column.
:LKP.lokup_transformation_name()
We can only return one port from the Unconnected Lookup transformation. As the Unconnected
lookup is called from another transformation, we cannot return multiple columns using
Unconnected Lookup transformation.

However, there is a trick. We can use the SQL override and concatenate the multiple columns,
those we need to return. When we do the lookup from another transformation, we need to
separate the columns again using substring.

Splitting source data into multiple files dynamically


you can achive this by using dynamic target file concept.
 
SRC-->SQ-->TCT (Transaction Control Transformation)TGT
 
For target in ports tab you have option create FileName as port.
 
In TCT transformation apply condition IIF (Value=1, TC_COMMIT_AFTER,
TC_CONTINUE_TRANSATION)

Why aggregator outputs last row when you do not select any groupby port. 

The integration service performs aggregate calculations and produces one row for each group. If you do
not specify any group by ports, the integration service returns one row for all input rows. By default, the
integration service returns the last row received for each group along with the result of aggregation. By
using the FIRST function, you can specify the integration service to return the first row of the group.

It creates 2 caches – data cache (for input ports) and Index cache (for group by ports)

Look at this question from a different point of view: imagine you are an Aggregator. During the session,
you collect all input records one by one. Finally, you're asked to produce one output record. But no one
has told you how to create any aggregate value (e.g. a sum, medium value, median, or whatever). So what
will you output? There are only two logical choices: the first or the last record. And Informatica R&D has
opted for the latter option (last record). To me this sounds quite simple.

What is the meaning of Enterprise Data Warehousing?


Enterprise Data Warehousing is the data of the organization being created or developed at a single point
of access. The data is globally accessed and viewed through a single source since the server is linked to
this single source. It also includes the periodic analysis of the source.  
How many input parameters can be present in an unconnected lookup? 
The number of parameters that can include in an unconnected lookup is numerous. However, no matter
how many parameters are put, the return value would be only one. For example, parameters like column
1, column 2, column 3, and column 4 can be put in an unconnected lookup but there is only one return
value.  

What is the difference between a data warehouse, a data mart and a


database?  
Data warehouse consists of different kinds of data. A database also consists of data but however, the
information or data of the database is smaller in size than the data warehouse. Data mart also includes
different sorts of data that are needed for different domains. Examples - Different dates for different
sections of an organization like the sales, marketing, financing etc.  

What is a domain?
The main organizational point sometimes undertakes all the interlinked and interconnected nodes and
relationship and this is known as the domain. These links are covered mainly by one single point of
organization.  

 What are the different mapping design tips for Informatica?


The different mapping design tips are as follows -  
Standards - The design should be of a good standard. Following a standard consistently is proven to be
beneficial in the long run projects. Standards include naming descriptions, conventions, environmental
settings, documentation and parameter files etc.  
Reusability - Using reusable transformation is the best way to react to the potential changes as quickly as
possible. mapplets and worklets, these types of Informatica components are best suited to be used.  
Scalability - It is important to scale while designing. In the development of mappings, the volume must
be correct.  
Simplicity - It is always better to create different mappings instead of creating one complex mapping. It
is all about creating a simple and logical process of design  
Modularity - This includes reprocessing and using modular techniques for designing.  
What is the meaning of the word ‘session’? Give an explanation of how to
combine execution with the assistance of batches?
Converting a data from a source to a target is generally implemented by a teaching service and this is
known as a session. Usually, session's manager executes the session. In order to combine session’s
executions, batches are used in two ways - serially or parallelly.  

How many numbers of sessions is grouped in one batch? 


Any number of sessions can be grouped in one batch but however, for an easier migration process, it is
better if the number is lesser in one batch.  

Differentiate between mapping parameter and mapping variable?


Mapping variable refers to the changing values of the sessions' execution. On the other hand, when the
value doesn't change during the session then it is called mapping parameters. Mapping procedure explains
the procedure of the mapping parameters and the usage of this parameter. Values are best allocated before
the beginning of the session to these mapping parameters.  

What are the features of complex mapping?  


1. Difficult requirements

2. Numerous transformations

3. Complex logic regarding business

These are the three most important features of complex mapping.  

Which option helps in finding whether the mapping is correct or not?  


The debugging option helps in judging whether the mapping is correct or not without really connecting to
the session.  

What is a session task?


When the Power Centre Server transfers data from the source to the target, it is often guided by a set of
instruction and this is known as the session task. 
 
What is the meaning of command task?  
Command task only allows the flow of more than one shell command or sometimes flow of one shell
command in Windows while the work is running. 
What is the meaning of standalone command task?
The type of command task that allows the shell commands to run anywhere during the workflow is
known as the standalone task. 

Define workflow?
The workflow includes a set of instructions which allows the server to communicate for the
implementation of tasks. 

How many tools are there in workflow manager?  


There are four types of tools -  

1. Task Designer

2. Task Developer

3. Workflow Designer
4. Worklet Designer

Define Power Centre repository of Informatica?


Informatica Power Centre consists of the following Metadata like -  

1. Source Definition

2. Session and session logs

3. Workflow

4. Target Definition

5. Mapping
6. ODBC Connection

Two repositories are as follows – 

1. Global Repositories

2. Local Repositories

Mainly Extraction, Loading (ETL) and Transformation of the above-mentioned metadata are performed
through the Power Centre Repository.  
How to use PMCMD Utiliy Command?
1. It is a command based client program that communicates with integration service to perform some of
the tasks which can also be performed using workflow manager client.

2. Using PMCMD we can perform the following tasks:

1. Starting workflow.

2. Scheduling workflow.

3. The PMCMD can be operated in two different modes:

1. Interactive Mode.

2. Command line Mode.


Scheduling a Workflow?
1. A schedule is an automation of running the workflow at a given date and time.
2. There are 2 types of schedulers:
 
(i) Reusable scheduler
(ii) Non Reusable scheduler
 
(i) Reusable scheduler:-

A reusable scheduler can be assigned to multiple workflows.


(ii) Non Reusable scheduler:-
 
- A non reusable scheduler is created specific to the workflow.
- A non reusable scheduler can be converted into a reusable scheduler.
 
The following are the 3rd party schedulers:
 
1. Cron (Unix based scheduling process)
2. Tivoli
3. Control M
4. Autosys
5. Tidal
6. WLM (work hard manager)
 
- 99% production people will do scheduling.
- Before we run the workflow manually. Through scheduling we run workflow this is called Auto

Running.

How to use PowerCenter Command Line in Informatica?


The transformation language provides two comment specifiers to let you insert comments in expression:
- Two Dashes ( - - )
- Two Slashes ( / / )
 
The Power center integration service ignores all text on a line preceded by these two comment specifiers.

Differences between variable port and Mapping variable?


 

Variable Port Vs Mapping Variable

Variable Port Mapping Variable

1. Local to the T/R 1. Local to the Mapping

2. Values are non-persistant 2. Values are persistent

3. Can’t be used with SQL override 3. Can be used with SQL override

 
 Mapping variables is used for incremental extraction.

 In mapping variables no need to change the data. It automatically changed.

 In mapping parameter you have to change the data and time.

Which is the T/R that builts only single cache memory?


Rank can build two types of cache memory. But sorter always built only one cache memory.
Cache is also called Buffer.

Stand alone Email task?


1. It can be used any where in the workflow, defined will Link conditions to notify the success or failure
of prior tasks.

2. Visible in Flow Diagram.

3. Email Variables can be defined with stand alone email tasks.

What is Mapping Debugger?


 Debugger is a tool. By using this we can identify records are loaded or not and correct data is
loaded or not from one T/R to other T/R.
 Session succeeded but records are not loaded. In this situation we have to use Debugger tool.

What is the functionality of F10 in informatica?
F10 --> Next Instance

What is Worklet and types of worklets?


1. A worklet is defined as group of related tasks.

2. There are 2 types of the worklet:

1. Reusable worklet

2. Non-Reusable worklet

3. Worklet expands and executes the tasks inside the workflow.

4. A workflow which contains the worklet is known as Parent Workflow.

(a) Reusable Worklet:-


Created using worklet designer tool.

Can be assigned to Multiple workflows.

(b) Non-Reusable Worklet:-

Created using workflow designer tool.

Created Specific to workflow.

What is a Repository Manager?


It is a GVI based administrative client which allows to perform the following administrative tasks:

1. Create, edit and delete folders.

2. Assign users to access the folders with read, write and execute permissions.

3. Backup and Restore repository objects.

What is meant by Informatica PowerCenter Architecture?


The following components get installed:

 Power Center Clients

 Power Center Repository.

 Power Center Domain.

 Power Center Repository Service  (PCRS)

 Power Center Integration Service (PCIS)

Informatica administrator.
Mapping is nothing but ETL Application.
 

What is Workflow Monitor?


1. It is a GUI based client application which allows use to monitor ETL objects running an ETL Server.

2. Collect runtime statistics such as:


a. No. of records extracted.
b. No. of records loaded.
c. No. of records rejected.
d. Fetch session log
e. Throughput

 Complete information can be accessed from workflow monitor.

 For every session one log file is created.

If Informatica have own scheduler why using third party scheduler?


The client uses various applications (mainframes, oracle apps use Tivoli scheduling tool) and integrate
different applications & scheduling that applications it is very easy by using third party schedulers.

What is Workflow Manager?


It is a GUI based client which allows you to create following ETL objects.
1. Session
2. Workflow
3. Scheduler.
 
Session:

 A session is a task that executes mapping.

 A session is created for each Mapping.

 A session is created to provide runtime properties.

 A session is a set of instructions that tells ETL server to move the data from source to destination.
Workflow:

Workflow is a  set of instructions that tells how to run the session taks and when to run the session tasks.

What is Informatica PowerCenter?


A data integration tool which combines the data from multiple OLTP source systems, transforms the data
into homogeneous format and delivers the data through out the enterprise at any speed.

It is a GUI based ETL product from informatica corporation which was founded in 1993 Red wood city,
California.
There are many products in informatica corporation:
1. Informatica Analyzer.
2. Life cycle management.
3. Master data
Having many products in informatica.

Informatica power center is one of the product of informatica.


Using informatica power center we will do the Extraction, transformation and loading.
 

What is a Dimensional Model?


1. Data Modeling:- It is a process of designing the database by fulfilling business requirements
specifications.

2. A Data Modeler (or) Database Architech Designs the warehouse Database using a GUI based data
modeling tool called “ERWin”.

3. ERWin is a datamodeling tool from computer Associates (A).

4. A dimensional modeling consists of following types of schemas designed for Datawarehouse:

1. Star Schema.

2. Snowflake Schema.

3. Galary Schema.

5. A schema is a data model which consists of one or more tables.

How does Rank transformation handle string values?


Rank transformation can return the strings at the top or the bottom of a session sort order. When the
Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort
order associated with the Code Page of IS which may be French, German, etc. When the Integration
Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.

What is the format of INFORMATICA objects in a repository? What are the


databases that INFORMATICA can connect to Windows?
INFORMATICA objects can be written in XML format.
Following is the list of databases that INFORMATICA can connect to:
 SQL Server
 Oracle
 MS Access
 MS Excel
 DB2
 Sybase
 Teradata
Which are the different editions of INFORMATICA PowerCenter that are
available?
Different editions of INFORMATICA PowerCenter are:
 Standard Edition
 Advance Edition
 Premium Edition
The current version of PowerCenter available is v10 with a high-performance increase.

What is Incremental Aggregation?


Incremental Aggregation is generated as soon as a session created. Incremental Aggregation is used to
calculate changes in the source data that do not change target data with significant changes.

What is a Surrogate Key?


A surrogate key is a sequentially generated integer value which is used as another substitute or
replacement for the primary key which is required as a unique identification of each row in a table.
The primary key can be changed frequently as per the need which makes the update process more
difficult for a future requirement, Surrogate Key is the only solution for this problem.

What is Session task and Command task?


Session Task is a set of instructions that are to be applied while transferring data from source to target
using Session Command. Session Command can be either pre-session command or post session
command.
Command Task is a specific task that allows one or multiple shell commands of UNIX to run in Windows
during workflow

What is Standalone command task?


Standalone command task can be used to run Shell Command anywhere and anytime in the workflow.

What is Event and what are the tasks related to it?


The event can be any action or function that occurs in the workflow.
There are two tasks related to it, which includes:
 Event Wait Task: This task waits until an event occurs, once the event is triggered this task gets
accomplished and assigns the next task.
 Events Raise Task: Event Raise Task triggers the specific event in the workflow.

What is a pre-defined event and User-defined event?


Predefined Events are system-Defined Events that wait until the arrival of a specific file at a specific
Location. It is also called as File-Watcher Event.
User Defined Events are created by the user to raise anytime in the workflow once created.

What is Target Designer and Target Load Order?


Target Designer is used for defining the Target of data.
When there are multiple sources or a single source with multiple partitions linked to different targets
through INFORMATICA server then the server uses Target Load Order to define the order in which the
data is to be loaded at a target.

What is Staging Area?


Staging Area is a database where temporary tables connected to the work area are stored or fact tables to
provide inputs for data processing.

How to update Source Definition?


There are two ways to update source definition in INFORMATICA.
They are:
 You can edit the existing source definition.
 You can import new source from the database.

How to implement Security Measures using Repository manager?


There are 3 ways to implement security measures.
They are:
 Folder Permission within owner, groups, and users.
 Locking (Read, Write, Retrieve, Save and Execute).
 Repository Privileges viz.
 Browse Repository.
 Use Workflow Manager(To create session and batches and set its properties).
 Workflow Operator(To execute Session and batches).
 Use Designer, Admin Repository(Allows any user to create and manage repository).
 Admin User(Allows the user to create repository server and set its properties).
 Super User(All the privileges are granted to the user).

Enlist the advantages of INFORMATICA.


Being considered as the most favored Data Integration tool, there are multiple advantages that need to be
enlisted.
They are:
 It can effectively and very efficiently communicate and transform the data between different data
sources like Mainframe, RDBMS, etc.
 It is usually very faster, robust and easy learning than any other available platforms.
 With the help of INFORMATICA Workflow Monitor, jobs can be easily monitored, failed jobs
can be recovered as well as slow running jobs can be pointed out.
 It has features like easy processing of database information, data validation, migration of projects
from one database to another, project development, iteration, etc.

Enlist few areas or real-time situations where INFORMATICA is required.


Data Warehousing, Data Integration, Data migration & Application Migration from one platform to other
platforms are few examples of real-time usage area.

Explain the ETL program with few examples.


Known for its uniqueness, ETL tool stands for Extract, Transform and Load tool which basically solves
the purpose of extracting data and sending somewhere as defined by altering it.
To be very precise:
 Extraction task is to collect the data from sources like the database, files, etc.
 Transformation is considered as altering the data that has been received from the source.
 Loading defines the process of feeding the altered data to the defined target.
To understand in a technical way, ETL tool collects data from heterogeneous sources and alters to make it
homogeneous so that it can be used further for analysis of the defined task.

Some basic program examples include:


 Mappings derive the ETL process of reading data from their original sources where mapping
process is done in the Designer.
 Workflows consist of the multiple tasks which are decided and designed in the Workflow
Manager Window.
 The task consists of a set of multiple steps that determine the sequence of actions to be performed
during run-time.

Enlist the differences between Database and Data Warehouse.


Refer the below table to understand the differences between the two:
Database Data Warehouse

It stores/records current and up to date which is It stores/analyze historical data which is used
used in daily operations for information support on a long-term basis.

Its orientation is on Online Transactional Its orientation is on Online Analytical


processing which includes simple and short Processing which includes complex queries.
transactions.

It consists of detailed and primitive data where It consists of summarized a consolidated data
its view is flat relational. where its view is multidimensional.

Low performance is observed for Analytical Analytical queries are judged here as high
queries. performance.

Efficiency is determined by measuring Efficiency is determined by measuring query


transaction throughput. throughput and response time.

During the running session, output files are created by INFORMATICA


server. Enlist few of them.
Mentioned below are the few output files:
 Cache files: These files are created at the time of memory cache creation. For circumstances like
Lookup transformation, Aggregator transformation, etc index and data cache files are created by
the INFORMATICA server.
 Session detail file: As the name defines, this file contains load statistics like table name, rows
rejected or written for each target in mapping and can be viewed in the monitor window.
 Performance detail file: This file is a part of session property sheet and contains session
performance information in order to determine improvement areas.
 INFORMATICA server log: Server creates a log for all status and error messages and can be
seen in INFORMATICA home directory.
 Session log file: For each session, the server creates a session log file depending on the set
tracing level. The information that can be seen in log files about sessions can be:
 Session initialization process,
 SQL commands creation for reader and writer threads,
 List of errors encountered and
 Load summary
 Post session email: This helps in communicating the information about the session (session
completed/session failed) to the desired recipients automatically.
 Reject file: This file contains information about the data that has not been used/written to targets.
 Control file: In case, when the session uses the external loader, control file consists of loading
instructions and data format about the target file.
 Indicator file: This file basically contains a number which highlights the rows marked for
INSERT/UPDATE/DELETE or REJECT.
 Output file: Output file is created based on the file properties.

Enlist some properties of sessions.


A session is available in the workflow manager and is configured by creating a session task. Within a
mapping program, there can be multiple sessions and it can be either reusable or non-reusable.
Some of the properties of the session are as follows:
 As per the requirement, session tasks can be run either concurrently or sequentially.
 A session can be configured to analyze the performance.
 To create or run a session task, it is required to have general information about Session name,
schedule and integration service.
 Other important property of session includes Session log file, the test load, error handling,
commit interval, target properties, etc.

Enlist the tasks for which Source qualifier transformation is used.


Source qualifier is considered as an active transformation which reads the rows that are involved in
integration service within the running session. It determines the way in which the data is fetched from the
source and is automatically added while adding a source to mapping.
The list of different tasks where source qualifier is used is as follows:
 Rows filtering
 Data sorting
 Custom query creation
 Joining tables from the same source
 Selecting distinct values

You might also like