Etl Interview Questions
Etl Interview Questions
Etl Interview Questions
© Copyright by Interviewbit
Contents
Almost every business relies heavily on data nowadays, which is good! With
subjective and accurate data, we can grasp more than we can comprehend with our
human brains. What matters is when. Data processing, like any system, is prone to
errors. What is the value of data when there is a possibility that some of it could be
lost, incomplete, or irrelevant?
This is where ETL testing comes into play. In business processes today, ETL is
considered an important component of data warehousing architecture. Data is
extracted from source systems, transformed into a consistent data type, and loaded
into a single repository through ETL (Extract, Transform, and Load). Validating,
evaluating, and qualifying data is an important part of ETL testing. We conduct ETL
testing a er extracting, transforming, and loading the data to verify that the final
data was appropriately loaded into the system in the correct format. It ensures that
data reaches its destination safely and is of high quality before it enters your BI
(Business Intelligence) reports.
Technology evolved over time, so did solutions. Nowadays, various ways can be used
for ETL testing depending on the source data and the environment. There are several
ETL vendors that focus on ETL exclusively, such as Informatica. So ware vendors like
IBM, Oracle, and Microso provide other tools as well. Open source ETL tools have
also recently emerged that are free to use. The following are some ETL so ware tools
to consider:
Enterprise So ware ETL
Informatica PowerCenter
IBM InfoSphere DataStage
Oracle Data Integrator (ODI)
Microso SQL Server Integration Services (SSIS)
SAP Data Services
SAS Data Manager, etc.
Open Source ETL
Talend Open Studio
Pentaho Data Integration (PDI)
Hadoop, etc.
Typically, ETL tool-based data warehouses use staging areas, data integration layers,
and access layers to accomplish their work. In general, the architecture has three
layers as shown below:
Staging Layer: In a staging layer, or source layer, data is stored that is extracted
from multiple data sources.
Data Integration Layer: The integration layer plays the role of transforming
data from the staging layer to the database layer.
Access Layer: Also called a dimension layer, it allows users to retrieve data for
analytical reporting and information retrieval.
In contrast to data warehouses, each data mart has a unique set of end users, and
building a data mart takes less time and costs less, so it is more suitable for small
businesses. There is no duplicate (or unused) data in a data mart, and the data is
updated on a regular basis.
Comparatively, data
Data warehouses simplify every
mining techniques are
type of business data.
inexpensive.
There are no
specifications for turning It is capable of converting local
a local repository into a repositories into global ones.
global repository.
Several analysis services databases rely on relational schemas, and the Data source
view is responsible for defining such schemas (logical model of the schema).
Additionally, it can be easily used to create cubes and dimensions, thus enabling
users to set their dimensions in an intuitive way. A multidimensional model is
incomplete without a DSV. In this way, you are given complete control over the data
structures in your project and are able to work independently from the underlying
data sources (e.g., changing column names or concatenating columns without
directly changing the original data source). Every model must have a DSV, no matter
when or how it's created.
Using the Data Source View Wizard to create a DSV
You must run the Data Source View Wizard from Solution Explorer within SQL Server
Data Tools to create the DSV.
In solution explorer, Right Click Data source view folder -> Click New Data Source
View.
Choose one of the available data source objects, or add a new one.
Click Advanced on the same page to specifically select schemas, apply a filter, or
exclude information about table relationships.
Filter Available Objects (If we use a string as a selection criterion, it is possible to
prune the list of the available objects).
A Name Matching page appears if there are no table relationships defined for the
relational data source, and you can choose the appropriate method for
matching names by clicking on it.
This technique is
This technique is applied to OLTP
applied to OLAP
systems.
systems.
They reduce errors, bottlenecks, and latency, ensuring the smooth flow of
information between systems.
With ETL pipelines, businesses are able to achieve competitive advantage.
The ETL pipeline can centralize and standardize data, allowing analysts and
decision-makers to easily access and use it.
It facilitates data migrations from legacy systems to new repositories.
It requires technical
The test is an automated process,
expertise in SQL and
which means that no special
Shell scripting since
technical knowledge is needed aside
it is a manual
from understanding the so ware.
process.
In addition to being
It is extremely fast and systematic, time-consuming, it is
and it delivers excellent results. highly prone to
errors.
Manual testing
Databases and their counts are focuses on the
central to ETL testing. program's
functionality.
It lacks metadata,
Metadata is included and can easily
and changes require
be altered.
more effort.
As data increases,
It is very good at handling historical
processing time
data.
decreases.
User Interface Bug: GUI bugs include issues with color selection, font style,
navigation, spelling check, etc.
Input/Output Bug: This type of bug causes the application to take invalid values
in place of valid ones.
Boundary Value Analysis Bug: Bugs in this section check for both the minimum
and maximum values.
Calculation bugs: These bugs are usually mathematical errors causing incorrect
results.
Load Condition Bugs: A bug like this does not allow multiple users. The user-
accepted data is not allowed.
Race Condition Bugs: This type of bug interferes with your system’s ability to
function properly and causes it to crash or hang.
ECP (Equivalence Class Partitioning) Bug: A bug of this type results in invalid
types.
Version Control Bugs: Regression Testing is where these kinds of bugs normally
occur and does not provide version details.
Hardware Bugs: This type of bug prevents the device from responding to an
application as expected.
Help Source Bugs: The help documentation will be incorrect due to this bug.
Additive: Facts that are fully additive are the most flexible and useful. We can
sum up additive facts across any dimension associated with the fact table.
Semi-additive: We can sum up semi-additive facts across some dimensions
associated with the fact table, but not all.
Non-Additive: The Fact table contains non-additive facts, which cannot be
summed up for any dimension. The ratio is an example of a non-additive fact.
Advantages
Data integrity is reduced because of structured data.
Data are highly structured, so it requires little disk space.
Updating or maintaining Snowflaking tables is easy.
Disadvantages
Snowflake reduces the space consumed by dimension tables, but the space
saved is usually insignificant compared with the entire data warehouse.
Due to the number of tables added, you may need complex joins to perform a
query, which will reduce query performance.
An important part of ETL is dimension identification, and this is largely done by the
Bus Schema. A BUS schema is actually comprised of a suite of verified dimensions
and uniform definitions and can be used for handling dimension identification across
all businesses. To put it another way, the bus schema identifies the common
dimensions and facts across all the data marts of an organization just like identifying
conforming dimensions (dimensions with the same information/meaning when
being referred to different fact tables). Using the Bus schema, information is given in
a standard format with precise dimensions in ETL.
SCD (Slowly Changing Dimensions) basically keep and manage both current and
historical data in a data warehouse over time. Rather than changing regularly on a
time-based schedule, SCD changes slowly over time. SCD is considered one of the
most critical aspects of ETL.
Data migration projects commonly use ETL tools. As an example, if the organization
managed the data in Oracle 10g earlier and now they want to move to SQL Server
cloud database, the data will need to be migrated from Source to Target. ETL tools
can be very helpful for carrying out this type of migration. The user will have to spend
a lot of time writing ETL code. The ETL tools are therefore very useful since they
make coding simpler than P-SQL or T-SQL. Hence, ETL is a very useful process for
data migration projects.
37. What are the conditions under which you use dynamic cache
and static cache in connected and unconnected
transformations?
In order to update the master table and slowly changing dimensions (SCD) type
1, it is necessary to use the dynamic cache.
In the case of flat files, a static cache is used.
Conclusion
With abundant job opportunities and lucrative salary options, ETL testing has
become a popular trend. ETL Testing has an extensive market share and is one of the
cornerstones of data warehousing and business analytics. To make this process more
organized and simpler, many so ware vendors have introduced ETL testing tools.
Most employers who seek ETL testers look for candidates with specific technical skills
and experience that meet their needs. No worries, this platform is a great resource
for both beginners and professionals. In this article, we have covered 35+ ETL testing
interview questions ranging from freshers to experienced level questions typically
asked during interviews. Preparation is key before you go for your job interview.
Recommended Resources:
SQL
Python
Java
Informatica
Css Interview Questions Laravel Interview Questions Asp Net Interview Questions