Talend Project Audit: User Guide
Talend Project Audit: User Guide
Talend Project Audit: User Guide
User Guide
6.4.1
Talend Project Audit
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their respective
owners.
1. General information
1.1. Purpose
This User Guide helps to understand the reports resulted from auditing a project based on Talend
Project Audit in a normal operational context.
1.2. Audience
This guide is for business users in charge of checking the quality of the processes used to create data
integration Jobs realized in Talend studios.
1.3. Content
Chapter 1: gives Talend Project Audit main concepts.
• text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu options,
•
The icon indicates an item that provides additional information about an important point. It is
also used to add comments related to a table or a figure,
•
The icon indicates a message that gives information about the execution requirements or
recommendation type. It is also used to refer to situations or information the end-user needs to be
aware of or pay special attention to.
https://community.talend.com/
A project audit is the process of collecting and evaluating information about job operations designed in a Studio.
Talend Administration Center is required to invoke an audit. The evaluation of obtained information determines if
the technical processes and data flows you are using in your audited project are operating effectively and efficiently
to achieve the project goals or objectives.
Audit results are presented in printable reports generated to provide extensive data regarding different areas in
the audited project.
For more information about the areas audited in a project, see Key areas audited in a project.
You can later use the results provided by Talend Project Audit to get a clear picture regarding the status and
performance of different elements in the audited project. Using this significant amount of information, you can
analyze and monitor the performance of these elements in order to improve efficiency.
For detailed information about the data presented in Talend Project Audit reports, see Talend Project Audit reports.
Before auditing, ensure that you have the required rights to access a project or the project items you want to audit. For
further information about the rights required to access a project and its items, see Talend Administration Center User Guide.
Talend Project Audit provides several functions for auditing a project through investigating different elements in
Jobs designed in a Studio. Talend Project Audit reviews:
• Job analysis.
These aspects depend on each other and are correlated accordingly during project investigation in Talend Project
Audit.
For more information about the audit results of the above key areas, see Talend Project Audit reports.
The evaluation of obtained information determines if the projects designed in a studio operates effectively and
efficiently to achieve the organization's objectives.
This opportunity will lead to quick adoption of best practices to help your organization to apply data integration
subsequent projects effectively.
To summarize what has been discussed earlier, possible benefits of auditing a project are:
• evaluating the performance of the techniques used in the Jobs included in the investigated project,
Periodic audits of the processes of different projects ensure that standards and techniques related to job realization
are identified, investigated, and fixed and the process is improved. As a result, every aspect of the project is equally
important and must be reviewed thoroughly.
This chapter aims at helping the interpretation of a Talend Project Audit report. It discusses report content related
to each and every audited aspect in a data integration project.
If you need further information about how to carry out an audit and then generate a report accordingly, see Talend
Administration Center User Guide.
Talend Project Audit report is divided into one introductory section listing the properties of the audited project
along with six other sections, each presenting quantitative and qualitative data about the investigated key area in
the audited project.
The below sections discuss in details the audit results for every key area in the investigated project.
Item Description
Project file Path that specifies the location of the audited project
Label The technical name (used by the system) of the actual project file
Description User-defined description of the audited project
Author Login of the user who initially created the project in Talend Studio
Product version the version of the Studio used to build the project to be audited
Item Description
Job count Total number of Jobs used in the project.
Subjob count Total number of subjobs used in the project
Component count Total number of components used in the project.
Note count Total number of notes used in the project
Context variable count Total number of context variable used in the project
Schema count Total number of schemas used in the project
Schema column count Total number of schema columns used in the project
Complexity rating Complexity rating for the project that is equal to the sum of all individual job ratings
Click the highlighted text (clickable text) in your Audit report to display the detailed referenced content without having to
scroll down the pdf pages to find it.
The chart represents quantitative and qualitative structures that show the number of Jobs used in the project along
with their complexity rating, ranging from very simple to very complex.
Job complexity rating is calculated using numeric values specific to the elements complexity in the investigated
Job. Examples of those elements are number of components used in the Job, number of context variables used in
the Job, number of the tMap components used in the Job and so on.
The below list gives the basis for calculating job complexity:
where "n" is a coefficient that helps balancing the actual "weight" of each criteria in the audit results.
And then the resulting figures are defined as the following to give the different job complexity ratings used:
• 51 - 100: simple,
The job rating chart is accompanied (in the Job rating - details section in Talend Project Audit) by job rating
details providing numeric values for each complexity element shown.
The below table lists the numeric values for different complexity elements of the investigated Job.
Item Description
Identifier Name of the Job
Auth. Email of the author of the Job
Creation Creation date of the Job
Update Date of the last modification done on the Job
Status Status of the Job
Version Version of the Job
Components Number of components used in the Job
Context Number of context variables used in the Job
Notes Number of notes used in the Job
Rating Complexity rating for the Job based on the defined criteria
Click the highlighted text (clickable text) to display the detailed referenced content without having to scroll down the pdf
pages to find it.
It is split into segments, illustrating percentages of components based on their types. It uses those percentages or
fractions to compare component types used in the project. The whole is equal to 100%.
All component types with less than 2.5% proportion will be grouped under Other.
Item Description
Component type Name of components used in the audited project
Count Number of occurrences of the same component per project
Percentage The fraction of 100 representing the frequency of each of the components used in the audited
project
Click the highlighted text (clickable text) to display the detailed referenced content without having to scroll down the pdf
pages to find it.
Item Description
Job Name of the Job that uses a specific component
Count Number of occurrences of the same component per Job
Click the highlighted text (clickable text) to display the detailed referenced content without having to scroll down the pdf
pages to find it.
The chart represents quantitative and qualitative structures to show in one instance the number of columns used
in each schema along with complexity rating, ranging from very low to very high.
Schema column complexity rating is calculated using numeric values for the number of columns used in each
schema in the audited project.
• 11 - 30: low,
• 31 - 60: moderate,
• 61 - 100: high,
The schema column rating chart is accompanied (in the Schema columns-details section in Talend Project Audit
report) by schema columns details providing numeric values for each element shown in the chart.
The below table lists the numeric values for column count per schema and schema count per project.
Item Description
Schema column count Number of columns per schema
Schema count Number of schema per project, grouped by number of columns used in schema
Percentage The fraction of 100 representing the frequency of each group of schemas (grouped by column
count) used in the audited project
Consequently, this section in Talend Project Audit report will provide 10 tables presented in ascending order. Each
table groups all Jobs that hold the same figure among the 10 highest schema column figures.
Each of the 10 tables is preceded by the schema column figure used as the grouping factor for all Jobs listed in the
table. The number of schemas used in each of the listed Jobs is given in a separate column.
Item Description
Job Name of the Job in the audited project
Schema count Number of schema per Job
This report is parameterized. It is possible to define the number used as grouping factors.
This type of investigation will help diagnosing system performance problems, for example through identifying
the columns that are present in the job design but not really used during job execution.
An example of this is when you use the Lookup flow in your job design. Usually, a lookup schema has numerous
columns and you only use limited number in your Job.
Consequently, this section in Talend Project Audit report will provide 5 tables presented in ascending order. Each
table groups all Jobs that hold the same figure among the 5 lowest schema column figures.
This type of investigation will help identifying Jobs where schema is not defined, for example.
Each of the 5 tables is preceded by the schema column figure used as a grouping factor for all Jobs listed in the
table. The number of schemas used in each of the listed Jobs is given in a separate column.
Item Description
Job Name of the Job used in the audited project
Schema count Number of schema per Job
This report is parameterized. It is possible to define the number of figures used as the grouping factor.
Item Description
Trigger type Triggers can be any of the following types:
-On Subjob Ok
-Run if
-On Component Ok
Item Description
Trigger type Triggers can be any of the following types:
-On Subjob Ok
-Run if
-On Component Ok
For example, if On Subjob Ok is used in 25 Jobs where global job count in the project is 154
and the number of jobs using at least one trigger type is 48, the percentage can be figured by
the proportion 25*100 / 154 which works out to be about 16.23
The audit results will show if you profit from the documentation and versioning functions in the Studio to correctly
classify different types and versions of documents and Jobs in the investigated project.
Item Description
Document item type Type of the document item
Item count Number of items of the same type per project
Percentage The fraction of 100 representing the frequency of each item type in the audited project
Item Description
Document folder level Numeric value that represents folder level:
...
Folder count Number of folders (of the same level) per project
Percentage The fraction of 100 representing the frequency of each folder level in the audited project
Item Description
Document item status Status of the item used: checked, unchecked, validated
Item count Number of items of the same status per project
Percentage The fraction of 100 representing the frequency of each item status in the audited project
If you create many versions of the same item, only the last item version is taken into account.
Item Description
Document item versioning Last version of the added item
Item count Number of items of the same version per project
Percentage The fraction of 100 representing the frequency of each item version in the audited project
Item Description
Folder level Numeric value that represents job folder level:
...
Job count Number of Jobs of the same level per project
Percentage The fraction of 100 representing the frequency of each job level in the audited project
If you create many versions of the same Job, only the last job version is taken into account.
Item Description
Job version Last version of each of the Jobs in the audited project
Job count Number of Jobs of the same version in the audited project
Percentage The fraction of 100 representing the frequency of each job version in the audited project
2.5. Metadata
Talend Project Audit calculates the percentage of the usage of repository, property, and schema metadata in the
investigated project.
The audit results will show if you use the Metadata repository integrated in Talend Studio to store your predefined
metadata and thus be able to reuse it in different jobs.
Item Description
Metadata type Metadata can be of one of the following types:
-DB connections
-File delimited
-SAP connections
-File Excel
-File regex
-File positional
-File Idif
-File xml
-LDAP schema
-WSDL schema
-Generic schema
-Salesforce schema
Count Number of each of the above metadata types used in the audited project
Percentage The fraction of 100 representing the frequency of each metadata type in the audited project
Item Description
Metadata type Property metadata can be of one of the following two types:
-Built in
-Repository
Count Number of each of the above property metadata types used in the audited project
Percentage The fraction of 100 representing the frequency of each property metadata type in the audited
project
Item Description
Metadata type Schema metadata can be of one of the following two types:
-Built in
-Repository
Count Number of each of the above schema metadata types used in the audited project
Percentage The fraction of 100 representing the frequency of each schema metadata type in the audited
project
2.6. Layout
The placement of different types of components in a job design is very important to simplify understanding the
process in the Job.
In a Job design, all types of components are placed in relation to the main flow component. Input components
should be placed to the left, output components should be placed to the right, and a subflow component with a
Lookup link should come higher above.
To assess the quality of the placement of different components in Jobs part of the audited project, Talend Project
Audit analyzes misplaced input and output components and components with a Lookup links in subjobs.
The below figure illustrates an example of the placement of different components in a specific Job.
Item Description
Job Name of Job
Subjob Name of subjob used in the specified Job
Input component Number of input components in the specified subjob that are not placed to the left
Lookup component Number of subflow components with a lookup link that are not placed above the main flow
component in the specified subjob
Output component Number of output components in the specified subjob that are not placed to the right
Click the highlighted text (clickable text) in your Audit report to display the detailed referenced content without having to
scroll down the pdf pages to find it.
Full description of findings is then presented in separate tables, one table per Job. These tables are listed in
descending order: more important to less important.
Item Description
Indicator Investigated synthetic elements can be one of the following types:
-Job rating
-Component count
-Trigger usage
Value Number of each of the above investigated elements in the investigated Job of the audited
project
Click the highlighted text (clickable text) in your Audit report to display the detailed referenced content without having to
scroll down the pdf pages to find it.