Nothing Special   »   [go: up one dir, main page]

DM 1205datastageopsdb1 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Get started with the IBM InfoSphere DataStage and

QualityStage Operations Console Database, Part 1: An


introduction
A deep dive into the key relationships of the schema that defines the
Operations Database
Len Greenwood (len.greenwood@uk.ibm.com)
DataStage Core Architect
IBM

17 May 2012

Arron Harden (arron.harden@uk.ibm.com)


Senior Software Engineer
IBM
Geoff McClean (geoff.mcclean@uk.ibm.com)
Senior Software Developer
IBM
Sumit Kumar (sumit.kumar6@in.ibm.com)
Senior Software Developer
IBM
This article is a deep dive into the schema of the IBM InfoSphere DataStage and
QualityStage Operations Database, and the tables and columns that make up its key
relationships. Specimen SQL queries are included to demonstrate how data can be read
from these tables to answer specific operational questions. You can adapt these to build, for
example, custom reports based on the operational data collected at your particular DataStage
and QualityStage installation.
View more content in this series

Introduction
The DataStage and QualityStage Operations Database (DSODB) is designed as a relational
schema whose tables contain information about job runs and system resources used on a system
Copyright IBM Corporation 2012
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Trademarks
Page 1 of 22

developerWorks

ibm.com/developerWorks/

that has a DataStage engine installed. This information is used to drive the Operations Console,
but can also be queried directly given some knowledge of the key columns of the tables and the
relationships among them.
This article describes the main tables and views of the DSODB, including all columns used
as primary or natural keys, and all foreign key relationships defined. It gives examples of SQL
SELECT statements, which can be used to answer specific questions based on the information
held by DSODB.
Online reference material for DSODB 8.7, including descriptions of all current tables and columns,
can be found in the schema document reference in the Resources section.

Table diagrams
Some pictures are included to illustrate the relationships between various tables. Each box
represents a table or view. Relevant column names are shown, with icons against each column to
show their use as follows (Figure 1).

Figure 1. Key to types of column

Note that only the columns of each table that are mentioned in the text of this article are shown.
For a complete list of table columns, consult the schema document reference in the Resources
section.
The arrows between tables show the direction of relationships established by foreign keys and
SQL views, as shown next (Figure 2).

Figure 2. Key to relationships

The HOST table


The HOST table serves to partition the whole schema so information collected from multiple
engines can be stored in a single database schema if required (Figure 3).
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 2 of 22

ibm.com/developerWorks/

developerWorks

Figure 3. HOST table keys

Each row in the HOST table represents a system on which all or part of an InfoSphere Information
Server engine is installed. The HOSTID column is a surrogate primary key; The HostName column
is the system name as derived from the hostname operating system command or the HA_ALIAS
environment variable, if that is set in the environment of the EngMonApp process when it starts.
For systems acting as a conductor node and on which an instance of the EngMonApp process
is running, the InstallationDir column is set to a path name. This is the home directory for the
Information Server engine installation, such as /opt/IBM/InformationServer/Server/DSEngine. The
key of the HOSTS table is the combination of HostName and InstallationDir, which allows for more
than one engine to be installed on the same host system.
The CreatedTimestamp column is the UTC time at which EngMonApp inserted the row. Each
time a ResMonApp instance starts, it looks to see if there is a row in the table with the values of
HostName and InstallationDir it requires, and if so, it updates the MonStartTimestamp column;
otherwise, it creates a new row. ResMonApp also creates new HOST entries, as required.
Remote nodes have InstallationDir set to a single hyphen character, '-'. These are systems whose
HostNames have been defined in the DSODBConfig.cfg file via the ResourceNode property. These
rows have a CreatedTimestamp, but no MonStartTimestamp entry.
Cascading delete constraints are set so that deleting a row in HOST removes all rows that point
back to it. Therefore, records of any jobs that ran on that system and all resource usage records
are deleted at the same time.

Tables related to job runs


This section describes the tables used to record job run information in the default configuration.
Note that all such rows ultimately relate to an entry in the HOSTS table (Figure 4). To completely
identify a run via a SQL SELECT, so it is necessary to join on the HOSTID column if there are runs
from more than one engine recorded in the database. (The SQL examples below assume that is
the case.)

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 3 of 22

developerWorks

ibm.com/developerWorks/

Figure 4. Key columns and relationships between tables related to JOBRUN

The JOBEXEC table


Every time a new instance of a DataStage and QualityStage job run starts, it creates an initial
event describing which version of a job executable has started, when, and what its invocation ID
string is in the case of a multi-instanceable job. The JOBEXEC table holds details about each
unique job executable for which a run has been seen. Note that this might not be the same list as
would be found by querying the Metadata Repository; a job therein might never have been run, or
it might no longer exist if it has been deleted from the repository since it last ran.
Rows have a surrogate key JOBID used to relate all runs and other details of this particular version
of the job. Job names are unique within a DataStage project, and project names are unique for
a given host. Whenever a job is compiled, the job's executable has potentially been changed,
so one can distinguish between versions of a job's executable by including the compilation time.
Hence, the unique primary key for the table is a combination of the ProjectName, JobName, and
CompilationTimestamp columns, combined with the HOSTID as a foreign key to the HOST table to
identify the system where the project is located.
These rows are only created once. Each run of a job either finds an existing row put there by a
previous run of the same version of the job or creates a new one.

Listing 1. SQL Example: Querying the JOBEXEC table


-- List names and locations of all job sequences that have
-ever been run on host H.
SELECT
X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp
FROM
DSODB.JOBEXEC AS X
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.JobType = "SEQ"
ORDER BY X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 4 of 22

ibm.com/developerWorks/

developerWorks

Cascading delete constraints are set so that deleting an entry in JOBEXEC removes all records to
do with any runs of that version of the job.

Listing 2. SQL Example: Deleting from the JOBEXEC table


-- Delete all monitored information for all jobs that have
-been run from project P on host H.
DELETE
FROM
DSODB.JOBEXEC
WHERE
JOBID IN (
SELECT
X.JOBID
FROM
DSODB.JOBEXEC AS X
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.ProjectName = "P"
)

The JOBRUN table


One row is inserted in this table for each monitored run of a particular version of a job. It is updated
as the job run progresses. The RUNID column is a surrogate primary key, used to relate run
details from other tables (such as job parameters and log messages). The JOBID column is a
foreign key to the row in the JOBEXEC table that contains the description of the exact version of
the job executable that is running which host is started on, in which project, the job's name,
and its compilation time. The other columns that form the unique primary key are InvocationId
and CreationTimestamp. The former is only relevant for jobs with the multi-instanceable property
set; this allows more than one instance of a job to be running at once, as long as each instance
has a different invocation ID, which is an arbitrary string assigned by the user at runtime. Note
that if a run does not have an invocation ID string, the value is set to a single hyphen. The
CreationTimestamp is the UTC time when the monitoring system created the first run event. Note
that this might be different from the RunStartTimestamp, in the case of a run that gets queued. The
RunStartTimestamp is the UTC time that the engine really launched the run, as opposed to when it
was first submitted for running.
If a run is started from a job sequence, or via job control from another job, the
CONTROLLING_RUNID column contains a foreign key to another entry in JOBRUN, which is the
run that started this one. Therefore, all runs started by a particular instance of a sequence can be
related by their common CONTROLLING_RUNID, and these potentially form a tree-structured set
of relationships if job sequences are nested.
As a run progresses, it generates other events when its status changes or when it produces new
values for some of its monitored properties. These cause its row in JOBRUN to be updated, and
the LastUpdateTimestamp column records the UTC time that last happened.
A note about run statuses: there are two columns in the JobRun table that hold status values:
RunMajor Status and RunMinorStatus. The major status can be used to quickly filter those runs
that have finished from those that are starting up or still running. The minor status gives the
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 5 of 22

developerWorks

ibm.com/developerWorks/

specific status for the run (see the schema document reference in the Resources section for all
possible values).

Listing 3. SQL Example: Querying the JOBRUN table - 1


-- List start and finish times, names, and statuses of all jobs that have
-been run on host H, started after time YYYY-MM-DD HH:MM:SS,
-and have finished.
SELECT
X.ProjectName, X.JobName,
R.RunStartTimestamp, R.RunEndTimestamp, R.RunMinorStatus
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND R.RunMajorStatus = "FIN"
AND R.RunStartTimestamp >= "YYYY-MM-DD HH:MM:SS"
ORDER BY R.RunStartTimestamp

Listing 4. SQL Example: Querying the JOBRUN table - 2


-- List names of all jobs on host H that were running at time T and sort them
-in descending order of total CPU usage for the whole run.
SELECT
X.ProjectName, X.FolderPath, X.JobName,
R.RunStartTimestamp, R.RunEndTimestamp, R.TotalCPU
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND R.RunStartTimestamp <= "YYYY-MM-DD HH:MM:SS"
AND ( R.RunEndTimestamp >= "YYYY-MM-DD HH:MM:SS"
OR R.RunEndTimestamp IS NULL )
ORDER BY R.TotalCPU DESC

Cascading delete constraints are set so that deleting an entry in JOBRUN deletes all related
details of the run (parameters, logs, metrics, etc.). If the run started other runs (if it was a job
sequence, for example), all those runs are also deleted, along with their details.

Listing 5. SQL Example: Deleting from the JOBRUN table


-- Delete all job runs from project P on host H
-that finished normally.
DELETE
FROM
DSODB.JOBRUN
WHERE
RUNID IN (
SELECT
R.RUNID
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.ProjectName = "P"
AND R.RunMinorStatus = '"FOK"
)

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 6 of 22

ibm.com/developerWorks/

developerWorks

The JOBRUNPARAMS table and JOBRUNPARAMSVIEW view


There is one row in the JOBRUNPARAMS table to describe the values of the job parameters
that have been set for a particular run. These are the values you would see in a message at the
beginning of the run's log in the Director client. The RUNID column of this table is a foreign key to
the JOBRUN table for the run to which these parameters relate.
The ParamList column contains the information as an XML string; you can easily extract the
names and values for the parameters by querying through the JOBRUNPARAMSVIEW view.
This expands the XML into separate rows for each individual ParamName and ParamValue, with
RUNID as the same foreign key.

Listing 6. SQL Example: Querying the JOBRUNPARAMSVIEW view


-- List names and start times of all job runs on host H
-that contained a parameter named P which had the value V at run time.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBRUNPARAMSVIEW AS P ON P.RUNID = R.RUNID
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND P.ParamName = "P" AND P.ParamValue = "V"
ORDER BY X.ProjectName, X.JobName, R.RunStartTimestamp

Constraints are set so that deleting an entry in JOBRUN deletes any corresponding entry in
JOBRUNPARAMS.

The JOBRUNLOG table


As a job runs, certain of its log messages are captured and stored in the JOBRUNLOG table,
where each row's RUNID column is a foreign key back to a RUNID in the JOBRUN table. Exactly
which messages are captured depends on how the DSODBConfig.cfg file is configured, but in
general this is a small subset of the overall jog logs, and contains only the job's initial and final
messages, fatal messages, and the first N warnings.
The EventId column is an integer that can be used to get messages in the order emitted by the
run, since the LogTimestamp column is the UTC time to the nearest second and might not be
unique.

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 7 of 22

developerWorks

ibm.com/developerWorks/

Listing 7. SQL Example: Querying the JOBRUNLOG table


-- List job details and message info for all runs on host H that have
-emitted fatal messages since time YYYY-MM-DD HH:MM:SS.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
L.LogTimestamp, L.MessageId, L.MessageText
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBRUNLOG AS L ON R.RUNID = L.RUNID
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND L.LogType = "FAT"
AND R.RunStartTimestamp > "YYYY-MM-DD HH:MM:SS"
ORDER BY R.RunStartTimestamp, R.RUNID, L.EventID

Constraints are set so that deleting an entry in JOBRUN deletes all corresponding entries in
JOBRUNLOG.

The PARALLELCONFIG table and PARALLELCONFIGNODES view


Parallel job runs always specify a path to a file containing configuration information for the
parallel engine (via the APT_CONFIG_FILE environment variable). The path is recorded in the
ConfigFileName column of the JOBRUN table. However, that path might point to a temporary file,
or the contents of the file my have changed since the job ran. So the monitoring system also reads
and parses the file and stores information about how many nodes the run was configured for in the
PARALLELCONFIG table.
Since many runs can use the same configuration file, or at least files whose parallelization
instructions are equivalent, the schema saves space by making all runs with equivalent
configurations point to the same row. So the CONFIGID column in this table is a surrogate key and
the CONFIGID column on the JOBRUN table is a foreign key to it. The PARALLELCONFIG table
also has HOSTID as a foreign key, so configurations are partitioned by host that the run started on.
The list of physical and logical nodes in the configuration is held as an XML column. Use the
PARALLELCONFIGNODES view to query the physical name of each node in the configuration and
the number of logical nodes assigned to each. (Note that the NodeListHash column is used simply
to find whether an existing row with the same configuration exists, since the XML column cannot
be used directly in a lookup.)

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 8 of 22

ibm.com/developerWorks/

developerWorks

Listing 8. SQL Example: Querying the PARALLELCONFIGNODES view


-- Get the names of all job runs started on host H that used a node named N
-and ran between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.
SELECT
R.RunStartTimestamp, X.ProjectName, X.JobName
FROM
DSODB.JOBRUN AS R
JOIN DSODB.PARALLELCONFIGNODES AS N ON R.CONFIGID = N.CONFIGID
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = 'H'
AND N.PhysicalName = "N"
AND R.RunStartTimestamp >= "YYYY-MM-D1 HH:MM:SS"
AND R.RunEndTimestamp <= "YYYY-MM-D2 HH:MM:SS"
ORDER BY R.RunStartTimestamp

Constraints are set so that deleting a row in HOST deletes all PARALLELCONFIG entries
generated by runs on that host (and all the runs are deleted at the same time). However, deleting
entries in JOBRUN does not delete entries in PARALLELCONFIG; they might be linked to by a
later run of another job on that host.

Listing 9. SQL Example: Deleting from the PARALLELCONFIG table


-- Delete all rows in PARALLELCONFIG that are not referenced
-by any row in JOBRUN.
DELETE
FROM
DSODB.PARALLELCONFIG
WHERE
NOT EXISTS (
SELECT *
FROM
DSODB.JOBRUN AS R
JOIN DSODB.PARALLELCONFIG AS C ON R.CONFIGID = C.CONFIGID
)

Tables related to job run usage


The JOBRUNUSAGE table is only populated if the DSODBConfig.cfg file has the property
JobRunUsage=1, which is the default setting (Figure 5). This table is used by the Operations
Console UI to generate a graph showing the progress of a job run over time.

Figure 5. Key columns and relationships between tables related to


JOBRUNUSAGE

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 9 of 22

developerWorks

ibm.com/developerWorks/

The JOBRUNUSAGE table and JOBRUNTOTALROWSUSAGE view


During the course of a server or parallel job run, the number of rows read and written by its source
and target stages (if any) is collected on a timed basis. So a snapshot of the volume of data
processed by the run is built up at intervals, and can be used to graph its progress over time.
Each row in JOBRUNUSAGE contains an XML structure containing a set of snapshots taken at
increasing elapsed times since the start of the job. The set is identified by the StartTimestamp
column, which gives the UTC time at which the set of snapshots starts, and the RUNID column,
which is a foreign key to the JOBRUN table to say which run this relates to. These rows should
be queried through the JOBRUNTOTALROWSUSAGE view. This expands the snapshot sets
into individual snapshots that give the elapsed time since the start of the run, and the values of
TotalRowsConsumed and TotalRowsProduced at that point in the run.

Listing 10. SQL Example: Querying the JOBRUNTOTALROWSUSAGE view


-- List the total rows consumed and produced, for each recorded interval
-by ascending time since the start of the run, for the run of
-job J in project P of host H that started at time YYYY-MM-DD HH:MM:SS.
SELECT
X.ProjectName, X.JobName, U.RunElapsedSecs,
U.TotalRowsConsumed, U.TotalRowsProduced
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBRUNTOTALROWSUSAGE AS U ON R.RUNID = U.RUNID
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.JobName = "J"
AND X.ProjectName = '"P"
AND R.RunStartTimeStamp = "YYYY-MM-DD HH:MM:SS"
ORDER BY U.RunElapsedSecs

Tables related to stage and link monitoring


These tables are only populated if the DSODBConfig.cfg file has the property MonitorLinks=1
(by default, this is set to 0). At the end of each run of a parallel or server job, the system collects
information about the overall CPU used by each stage, and the total number of rows processed
on each link that is a source or target of the job (Figure 6). This information is aggregated into the
JOBRUN table, and the row count aggregates are also added periodically to the JOBRUNUSAGE
table, for which see the previous section.

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 10 of 22

ibm.com/developerWorks/

developerWorks

Figure 6. Key columns and relationships between tables related to


JOBRUNSTAGE and JOBRUNLINK

The JOBSTAGE table


A row appears in this table for each stage to be reported on for a job run. It is created if it does
not exist and relates back to the JOBEXEC entry for the current run via its JOBID foreign key. The
entry contains static information about the stage, such as name, description and stage type. Note
that since stage names are only unique within the top level of the job, or a container if they are
at a lower level, there is a ContainerPath column that might need to be used to fully identify the
stage as well as the StageName column. Each row is given a surrogate key so it can be joined to
its links.

The JOBLINK table


This contains rows for each link determined to be a source or target link for an overall job that
has been run. It uses foreign key FROMSTAGEID and TOSTAGEID to point back to the row in
JOBSTAGE table that describes the stage it connects to in its role as a source or target. This
depends on the setting of the IsSource and IsTarget columns; in general, only one is set and only
one of the foreign keys is correspondingly related.
The LinkName is unique, in combination with FROMSTAGEID/TOSTAGEID, since a stage cannot
have two links with the same name.

The JOBRUNSTAGE table


Every run that is monitored in this mode generates one row per stage, to record, among other
things, the number of CPU milliseconds that can be ascribed to the stage. A parallel job might
be running each stage in a partitioned mode, which involves more than one process. Therefore,
the NumInstances column might be greater than 1, and the InstanceCPUList might be a commaGet started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 11 of 22

developerWorks

ibm.com/developerWorks/

separated string of numbers rather than a single value. But it is easier to select the TotalCPU
column, which gives the overall value by summing entries in the list, if necessary.

Listing 11. SQL Example: Querying the JOBRUNSTAGE table


-- List the stages of jobs run on host H in descending order of CPU used.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, RS.TotalCPU
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS
ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
WHERE
H.HostName = "H"
ORDER BY RS.TotalCPU DESC

The JOBRUNLINK table


Every run of a job that has source or target links also generates one row per such link. This
records the number of rows that passed down the link (these rows contribute to the "consumed" or
"produced" total row counts if the link is marked as a source or target, respectively). As for stages,
parallelization is recorded via a comma-separated list of row counts per partition, where relevant,
and the TotalRows column is always the sum over all partitions.

Listing 12. SQL Example: Querying the JOBRUNLINK table


-- List the links, and the stages to which they are attached, of jobs
-run on host H in descending order of number of rows produced.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, JL.LinkName, RL.TotalRows
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS
ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
JOIN DSODB.JOBLINK AS JL
ON
(JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)
JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID
WHERE
H.HostName = "H"
AND JL.IsTarget = 1
ORDER BY RL.TotalRows DESC

The DATALOCATOR table


A link can also record "locator" information. This involves providing eight strings that identify the
location of an external data resource. So for a sequential file stage, these strings should identify a
file-system path name; for a database table, this should be a database type, name, schema, and
table name. Exact details depend on the stage type. For each unique set of identifiers, there is a
separate row in the DATALOCATOR table, with a surrogate key in the LOCATORID column. Each
link that has such information attempts to see if a row with the same identifiers has been created
and creates it if not. The LOCATORID column in the JOBRUNLINK table is then a foreign key to
that row.
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 12 of 22

ibm.com/developerWorks/

developerWorks

Table 1. The DATALOCATOR columns are used as follows


Column name

Usage

ComputerName

Name of system that hosts the resource

SoftwareProductName

Name of product that manages the resource

DataStoreSubClass

Overall type of the data store

DataStoreName

Overall name of database or path to top level of resource

DataSchemaSubClass

Type of next level of resource if applicable

DataSchemaName

Name of next level of resource if applicable

DataCollectionSubClass

Type of lowest level of resource

DataCollectionName

Name of lowest level of resource, such as table or file

Listing 13. SQL Example: Querying the DATALOCATOR table


-- List the links of all runs that have written to a database named D,
-what the table names were and how many rows were written.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, JL.LinkName,
DL.DataCollectionName AS TableName, RL.TotalRows
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS
ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
JOIN DSODB.JOBLINK AS JL
ON
(JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)
JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID
JOIN DSODB.DATALOCATOR AS DL ON RL.LOCATORID = DL.LOCATORID
WHERE
H.HostName = "H"
AND JL.IsTarget = 1
AND DL.DataStoreName = "D"
ORDER BY TableName, RL.TotalRows DESC, R.RunStartTimestamp

Table for host system descriptions


Note that rows are only recorded in the HOSTDETAIL table (Figure 7) when the engine
is configured to run resource monitoring; that is, ResourceMonitor is not set to 0 in
DSODBConfig.cfg.

Figure 7. Key columns and relationships between the HOST and HOSTDETAIL
tables

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 13 of 22

developerWorks

ibm.com/developerWorks/

The HOSTDETAIL table


Every row in the HOST table should have at least one row from the HOSTDETAIL table
related to it. These rows contain information about the operating system properties that
were in force at some time when an instance of the ResMonApp process started. The
CreatedTimestamp column is the UTC time when the particular set of properties was first inserted.
The LastCheckedTimestamp is the UTC time when an instance of ResMonApp last determined
that the properties of that particular system had not changed since the previous check (i.e., none
of the values in any of the other columns needed to be changed). If any column would require a
different value, a new row is inserted, rather than the exiting row being updated. The new row has
its CreatedTimestamp and LastCheckedTimestamp set to "now." The rows form a timeline that can
be used to track changes to the operating system properties of a particular node.
The entries in HOSTDETAIL are related to the HOST table by two foreign keys. HOSTID is
the primary key of the HOST row that identifies which host name this set of detail records
describes. HEAD_HOSTID is the primary key of the HOST row that identifies the host on which
the instance of ResMonApp that inserted the detail record is running. In the case of a conductor
node, HOSTID and HEAD_HOSTID are the same; for a remote node, they are different. In the
latter case, the detail record is recording the properties of a remote node as seen from one
particular conductor node. Note that the unique primary key for this table is the combination of
HOSTID, HEAD_HOSTID, and CreatedTimestamp, since any change to columns other than
LastCheckedTimestamp results in a new row being inserted, rather than this row being updated.

Listing 14. SQL Example: Querying the HOSTDETAIL table 1


-- List operating system name and version for host H as they were at time
-YYYY-MM-DD HH:MM:SS (this means finding the row whose CreatedTimestamp
-is nearest to that time).
SELECT
Max(HD.CreatedTimestamp) AS "Last Changed",
HD.PlatformName, HD.PlatformVersion
FROM
DSODB.HOSTDETAIL AS HD
JOIN DSODB.HOST AS H ON HD.HEAD_HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND HD.CreatedTimestamp <= "YYYY-MM-DD HH:MM:SS"
GROUP BY HD.PlatformName, HD.PlatformVersion

Listing 15. SQL Example: Querying the HOSTDETAIL table 2


-- List most recent operating system details for each node of an engine
-monitored from host H.
SELECT
H2.HostName AS "Node", Max(HD.CreatedTimestamp) AS "Last Changed",
HD.PlatformName, HD.PlatformVersion
FROM
DSODB.HOSTDETAIL AS HD
JOIN DSODB.HOST AS H ON HD.HEAD_HOSTID = H.HOSTID
JOIN DSODB.HOST AS H2 ON HD.HOSTID = H2.HOSTID
WHERE
H.HostName = "H"
GROUP BY H2.HostName, HD.PlatformName, HD.PlatformVersion

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 14 of 22

ibm.com/developerWorks/

developerWorks

Tables related to system resource usage


These tables hold information about how much CPU, memory, or other system resource is in
use at particular times (Figure 8). Note that rows are only recorded in the RESOURCESNAP
and RESOURCEUSAGE tables when the engine is configured to run resource monitoring (i.e.,
ResourceMonitor is not set to 0 in DSODBConfig.cfg).

Figure 8. RESOURCESNAP and RESOURCEUSAGE tables; key columns and


related views

The RESOURCESNAP table


This table only contains one row per host that is being monitored from a particular engine. Its
HOSTID column is a foreign key to an entry in the HOST table to identify which system's resources
are described. The HEAD_HOSTID column in RESOURCESNAP is also defined as a foreign key
that identifies which engine system inserted the row. In the case of a conductor node, HOSTID,
and HEAD_HOSTID are the same; for a remote node, they are different.
On each engine (corresponding to a HEAD_HOSTID value), for each system it is monitoring
including itself (the HOSTID values), a row is updated at intervals containing all the system
resource usage information being collected for that system. The LastUpdateTimestamp column
tells you when the last update took place.
The content of RESOURCESNAP should always be queried through either of the views
RESOURCESNAPSYSTEM or RESOURCESNAPDISKS.
For the full set of columns available via these views, please see the schema document reference
in the Resources section.

The RESOURCESNAPSYSTEM view


This view presents several columns that hold various values for CPU, free memory, process
counts, and paging. The names of the columns are used to group them counters. So CPUPctxxxx
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 15 of 22

developerWorks

ibm.com/developerWorks/

refers to percentage of CPU used; MemFreeKBxxxx refers to free memory in kilobytes;


ProcNumxxxx refers to number of processes; PageNumxxxx refers to number of paging events.
For example, CPUPctUser is the percentage of the CPU assigned to user processes in the interval
leading up to LastUpdateTimestamp; MemFreeKBPhysical is the number of kilobytes of physical
memory that were free in that period.

The RESOURCESNAPDISKS view


This view expands a repeating field in RESOURCESNAP, which is controlled by the number of
disk paths, if any, that have been entered in the DSODBConfig.cfg configuration file (properties
ResourceLocalFS and ResourceRemoteFS, which can be repeated). The DiskPathMonitored
column identifies each file system path, and the DiskTotalKB and DiskFreeKB columns give the
total number of kilobytes and number currently unused for the disk mounted on that path.

The RESOURCEUSAGE table


This contains counters derived from those in the RESOURCESNAP table, aggregated over
intervals and arranged as a timeline. As an entry in RESOURCESNAP is updated, its values are
used to calculate the latest maximum, minimum, and average numbers over the most recent time
period, then at intervals a row is inserted to describe the behavior over that period. Using the
default configuration settings the RESOURCESNAP table is updated every 10 seconds, and every
60 seconds, a new row is inserted in RESOURCEUSAGE containing the maximum, minimum,
and averages of the last six updates for that combination of HOSTID and HEAD_HOSTID. The
StartTimestamp column of this table can be used to sort the entries into time order.

The RESOURCEUSAGESYSTEM view


For each column of RESOURCESNAP that holds system counters, there are three columns,
with the suffixes Avg, Max, and Min. So CPUPctUserAvg is the average value derived
form the CPUPctUser column, and MemFreeKBPhysicalMin is the minimum value of the
MemFreeKBPhysical column over the period as recorded between StartTimestamp and
EndTimestamp.

Listing 16. SQL Example: Querying the RESOURCEUSAGESYSTEM view


-- List average user CPU percentage usage for engine system H
-between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.
SELECT
RS.StartTimestamp, RS.CPUPctUserAvg
FROM
DSODB.RESOURCEUSAGESYSTEM AS RS
JOIN DSODB.HOST AS H ON RS.HEAD_HOSTID = H.HOSTID
WHERE
H.HostName= "H"
AND RS.StartTimeStamp >= "YYYY-MM-D1 HH:MM:SS"
AND RS.EndTimeStamp <= "YYYY-MM-D2 HH:MM:SS"

The RESOURCEUSAGEDISKS view


This allows you to query the average, minimum, and maximum number of free kilobytes for each
monitored disk path over each monitored period.
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 16 of 22

ibm.com/developerWorks/

developerWorks

Listing 17. SQL Example: Querying the RESOURCEUSAGEDISKS view


-- List any periods where the disk free KB for any paths
-for host N as monitored from host H fell below 1 MB.
SELECT
RD.DiskPathMonitored, RD.StartTimestamp, RD.EndTimestamp,
RD.DiskFreeKBMin
FROM
DSODB.RESOURCEUSAGEDISKS AS RD
JOIN DSODB.HOST AS H ON RD.HEAD_HOSTID = H.HOSTID
JOIN DSODB.HOST AS H2 ON RD.HOSTID = H2.HOSTID
WHERE
H.HostName = "H"
AND H2.HostName = "N"
AND RD.DiskFreeKBMin < 1024

Using the lookup tables to expand codes


There are a number of tables with columns that hold enumerated code values, designed as threecharacter strings that are all uppercase ASCII and vaguely mnemonic (Figure 9). These can be
expanded into a more readable form if required by using the MASTERREF table and the views
built from it.

Figure 9. MASTERREF table; key columns and related views

The MASTERREF table


This contains a row for each distinct value of each enumerated type. The combination of the
Enumeration and Code columns forms the primary key that can be used by the following views to
look up a specific code value. The Name and Description columns then give you correspondingly
longer strings that can be used in reports instead of the code value itself.

The xxxxREF Views


Each view corresponds to a table and column that contains a code, as shown in Table 2.

Table 2. Reference views and the columns they describe


View name

Table

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Column

Page 17 of 22

developerWorks

ibm.com/developerWorks/

RUNMAJORSTATUSREF

JOBRUN

RunMajorStatus

RUNMINORSTATUSREF

JOBRUN

RunMinorStatus

RUNTYPEREF

JOBRUN

RunType

JOBTYPEREF

JOBEXEC

JobType

LOGTYPEREF

JOBRUNLOG

LogType

STAGESTATUSREF

JOBRUNSTAGE

StageStatus

LINKTYPEREF

JOBLINK

LinkType

For example, if a row of the JOBRUN table contains "FIN" in the RunMajorStatus column, this can
be looked up via the RUNMAJORSTATUSREF view as Name = "Finished" and Description = "Run
has finished". A RunMinorStatus of "FWF" can be looked up via RUNMINORSTATUSREF as Name
= "Finished aborted" and Description = "Run has finished and logged at least one fatal
message".

Listing 18. SQL Example: Querying the JOBRUN table using lookups against
reference views
-- List the job names , job types, run types, and minor status names
-of all runs on host H that have finished.
SELECT
X.ProjectName, X.JobName,
JT.JobTypeName, RT.RunTypeName,
R.RunStartTimestamp, R.RunEndTimestamp,
SR.MinorStatusName
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X
ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H
ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBTYPEREF AS JT
ON X.JobType = JT.JobTypeCode
JOIN DSODB.RUNTYPEREF AS RT
ON R.RunType = RT.RunTypeCode
JOIN DSODB.RUNMINORSTATUSREF AS SR ON R.RunMinorStatus = SR.MinorStatusCode
WHERE
H.HostName = "H"
AND R.RunMajorStatus = "FIN"
ORDER BY R.RunStartTimestamp

Conclusion
We have examined how the tables that make up the Operations Database schema contain
rows that describe job runs, host systems, and system resource usage. We have described the
identifying key fields for the main tables, some of the other important fields, and how foreign
key fields are used to join other tables to them. We have given several examples of real-world
questions that can be answered by appropriate use of these fields.

Acknowledgements
Thanks go to the members of the DataStage Operations Console development team who ensured
that the schema got ported to the various flavors of database supported: DB2 LUW 9.5 and 9.7,
Oracle 10g and 11gR2, Microsoft SQL Server 2005 and 2008, on the various Windows and
UNIX platforms.
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction

Page 18 of 22

ibm.com/developerWorks/

developerWorks

Thanks also to the QA team who patiently checked that data could indeed be inserted and
retrieved correctly from all those variants.

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 19 of 22

developerWorks

ibm.com/developerWorks/

Resources
Learn
The operations database schema is documented on the IBM InfoCenter.
IBM InfoSphere Information Server 8.7 What's New whitepaper, containing details of the
Operations Console.
IBM DB2 9.7 for Linux, UNIX, and Windows Information Center and SQL reference.
Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation, how-to articles, education, downloads, product
information, and more.
Stay current with developerWorks technical events and webcasts.
Follow developerWorks on Twitter.
Get products and technologies
Build your next development project with IBM trial software, available for download directly
from developerWorks.
Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2
Express Edition for the community that offers the same core data features as DB2 Express
Edition and provides a solid base to build and deploy applications.
Discuss
Check out the developerWorks blogs and get involved in the developerWorks community.

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 20 of 22

ibm.com/developerWorks/

developerWorks

About the authors


Len Greenwood
Len Greenwood was a member of the small development team that produced the first
version of DataStage in 1996, prior to it being acquired from Ascential Software by
IBM in 2005. It now forms a mainstay of the IBM InfoSphere Information Server suite.
He has worked in the related areas of data and metadata integration for the past
15 years and is currently the main product architect for the core components of the
DataStage and QualityStage development and production tools. He recently designed
the database schema that underlies the Information Server Operations Console, used
to monitor activity at the DataStage engine level.

Arron Harden
Arron Harden is a senior software engineer for IBM InfoSphere DataStage and
QualityStage. Staying with the DataStage product after several mergers and
acquisitions, he has worked on DataStage for more than 12 years, joining IBM
through the acquisition of Ascential Software Inc in 2005. Having spent a year
working in Boston, he is currently based in the United Kingdom, working at the IBM
Milton Keynes office. In his most recent role, he was the lead developer for the web
application component of the DataStage and QualityStage Operations Console,
written using the Dojo toolkit.

Geoff McClean
Geoff McClean was on the original DataStage development team at its inception and
is currently a senior software developer for core components of the IBM InfoSphere
DataStage and QualityStage development and production tools, part of the IBM
InfoSphere Information Server suite. He oversaw the implementation of the database
management, event handling, and resource tracking services of the IBM InfoSphere
DataStage and QualityStage Operations Console.

Sumit Kumar
Sumit Kumar has 13 years of industry experience, including the financial, banking,
telecom, supply-chain management, insurance, and healthcare domains. He has
worked with IBM InfoSphere Information Server from early 2010, including the
Operations Console feature of Information Server, from the beginning of its design

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 21 of 22

developerWorks

ibm.com/developerWorks/

and development phase as a key contributor. He was involved in the implementation


of most of the public APIs and service layers of the Operations Console.
Copyright IBM Corporation 2012
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Get started with the IBM InfoSphere DataStage and QualityStage


Operations Console Database, Part 1: An introduction

Page 22 of 22

You might also like