Datastage - Parameters - Schema Files
Datastage - Parameters - Schema Files
Datastage - Parameters - Schema Files
Job Properties
• View/Modify from Designer/Director
• General Properties
• Job Descriptions
• Enable RCP – To be discussed shortly
• Multiple Instances – To allow parallel execution of multiple instances. Job Design must ensure that there is no conflict, e.g. writing into the same file,
etc.
• Before/After Job Subroutine
August 7, 2021 2
Job Properties
August 7, 2021 3
Recap Of Job Parameters
#XXX#
Usage as stage parameter for string substitution
August 7, 2021 4
Recap Of Job Parameters
• Run Date
• Business Date
• Path
• Filename suffix/prefix (input/output)
• User Name/password
• Database connection parameters, DSNs,
etc.
• Base currency, etc.
** - To be discussed later
August 7, 2021 5
Recap Of Job Parameters
• Can also set/override Environment Variables Values - valid only within the job
August 7, 2021 6
OSH – Orchestrate Shell Script
August 7, 2021 7
• Some DS Features
• Schema Files
• Schema Files & RCP
August 7, 2021 8
Schema Files
• Alternative way to specify column definitions for data used in EE jobs
• Written in a plain text file
• Can be imported into the DataStage Repository
• Creating a Schema
• Using a text editor
• Follow correct syntax for definitions
• Import from an existing data set or file set
• Manager import > Table Definitions > Orchestrate Schema Definitions
• Select checkbox for a file with .fs or .ds
• Import from a database table
• Create from a Table Definition
• Click Parallel on Layout tab
August 7, 2021 9
Schema Files
• Schema file for data accessed through stages that have the “Schema Files” property, e.g.
Sequential File
• Sample Use
• if source file format may change without functional impact to the DS code
• say columns inserted, reordered, deleted, etc.
• Job access the file only through the definition in the schema file
• Schema file may be changed without affecting the job(s)
August 7, 2021 10
RCP - Runtime Column Propogation
August 7, 2021 11
RCP & Schema Files Demonstrated
August 7, 2021 12
RCP & Schema Files Demonstrated
• Refinement Case 1
• The input file may in the future
• include extra columns that are not relevant to the requirement, these must be
dropped/ignored by the job
• The record format may change, e.g. become comma delimited, order in which the fields
appear may change
• The job must be capable of accepting this input file without impact
• To Do
• Define a schema file to define the input file & point to it within the sequential file stage
record
{final_delim=end, record_delim='\n', delim='|', quote=double,
charset="ISO8859-1"}
(
REGION_ID:int32 {quote=none};
SALES_TOTAL:int32 {quote=none};
)
August 7, 2021 13
RCP & Schema Files Demonstrated
• To Do
Column definition will define all columns that must be carried through to the next stage
Column definition column name must match those defined in the schema file
Ensure RCP is disabled for the output links
record
{final_delim=end, record_delim='\n', delim='|', quote=double, • When the input format changes
charset="ISO8859-1"}
( • ONLY the schema file must be modified!
REGION_ID:int32 {quote=none}; • Data Set will always contain the columns for
SALES_CITY:ustring[max=255]; which the definition is included within the stages
SALES_ZONE:ustring[max=255]; as well as the computed field
SALES_TOTAL:int32 {quote=none};
)
August 7, 2021 14
RCP & Schema Files Demonstrated
• Refinement Case 2
• The input file may in the future include extra columns BUT THESE MUST BE CARRIED ON into
the target DataSet as it is
• The job must be capable of accepting this input file without impact
• To Do
• Define & use schema file
• Ensure RCP is enabled at the project level as well as for all output link along which data is to be
propagated at run time
• Define all columns that require processing
• Other columns may of may not be defined
In this case, Region_ID need not be defined in the stage
But if a column is defined and found missing from schema &/or data file at run time, the job
will abort!
• Shared Containers
• Shared Containers & RCP
August 7, 2021 16
Shared Container
August 7, 2021 17
Shared Container
• Refined Solution
• Create a Shared container – “Validate Geography”
• Select the stages that are to be shared
• Select menu item Edit > Construct Container > Shared
August 7, 2021 18
Shared Container
• Refined Solution
• Create a Shared container – “Validate Geography”
• Select the stages that are to be shared
• Select menu item Edit > Construct Container > Shared
Stages replaced by a single icon
representing the shared container
August 7, 2021 19
Shared Container
• To make it truly reusable
• Within the Shared Container Definition
• Rename the link & columns names to generic names
• Ensure that the stage defines only the fields used within the processing, in this case, Zone & Region
• Ensure RCP is enabled on the output links. This ensures that all fields in the input are passed on to
the output
August 7, 2021 20
Shared Container
August 7, 2021 21
Shared Container
Container can be reused as shown to validate the geography information of the employee-master file
August 7, 2021 22
Shared Container
• If in the future
August 7, 2021 23