1.ab Initio - Unix - DB - Concepts & Questions - !
1.ab Initio - Unix - DB - Concepts & Questions - !
1.ab Initio - Unix - DB - Concepts & Questions - !
Using which component we can specify the rate of data movement from input to output?
A. Throttle
What do you call the file which can treat several serial files having the
same record format as a single graph component?
Adhoc Multifile
Rollup[ has more control over data than aggregate bczdata is summarized based on
key(like grp by)
Dynamic dml-
dml needs to b configured before the graph starts
Suppose at different time, different data comes for processing having
different dml, then we can use flag in the dml & tht flag is first read in i/p file
received & accordingly dml is chosen
Metaprogramming utilities can be used to generate dml like
dml_field_info_vec.
To parse the dml -> read_type, record_info, length_of etc..
Can also be hndled using conditional dml
Following are the components that breaks the pipeline parallelism Fuse,
Scan, Interleave, Sort, Sort within groups, Rollup, Join [any in-memory
cmpts).
Also remember whenever there is a phase change in the graph, the
pipeline parallelism is broken.
The short answer is that the Replicate copies a flow while a Broadcast
multiplies it. Broadcast is a partitioner where Replicate is a simple flow-
copy mechanism.
Departition components
Concatenate - maintains sequence
Interleave - combines blocks of data records from multiple flow partitions
in round-robin fashion
Merge –combines data records from multiple flow partitions that have
been sorted according to the same key, and maintains the sort order
Gather - Gather combines data records from multiple flow partitions
(mfs) or multiple flows arbitrarily and make the flow serial
open a1.dat
Different level of parameters (Project, Sandbox, and Graph) and their evaluation
order
The host setup script is run.
Common (that is, included) sandbox parameters are evaluated.
Sandbox parameters are evaluated.
The project-start.ksh script is run.
Formal/input parameters are evaluated.
Graph parameters are evaluated.
The graph Start Script is run.
Dml evaluation occurs.
You can see the stdenv variables in the stdenv sandbox .sandbox.pset &
.project.pset. It usually has all the environmental variables like the serilal
dir,MFS dir,MFS depth,etc.
2. Checkin the Graphs and other necessary objects (dmls, xfrs.......). This will put
the updated things in the EME.
3. Create a tag (say xyz) for the graphs / dmls / xfrs that u want to move to other
EME. Use the command
air tag create abc /Projects/DEV/ENV/mp/load_dw.mp
5. Load the .save file in the PROD / QA EME using the following cmd -
air load <name of .save file created in last step> -relocate <Dev Project Path>
<QA Project Path>.
Putting an example --
6. Now checkout the Graphs and other files like dmls, xfrs.......from the PROD
Environment to ur sandbox (If required).
"continous flow"
components in Abinito which uses "Message Queing" technology.
PUBLISH,MULTIPUBLISH,SUBSCRIBE are some of the components that
comes
under this category.With the help ,of these components extract process
is made to cycle on the data with CDC technology inplace at definite
time intervals.Those data is pushed to QUEUE through the PUBLISH or
MULTIPUBLISH components.One file per run of extract is made to
accumulate in the queue.The Load process which uses SUBSCRIBE connects
to the queue to process all the files that exist in the queue.you may
cycle the load process with the same time interval as that of the
extract.This technique reflects the business in NEAR REAL-TIME an not in
REAL-TIME.It's left to you to choose.One last point is Continous
components exhibit well results in the of NEAR REAL-TIME integration
Types of queue
1 MQ
2 Abinitio Queues
3 JMS Queues
FTP - Download Files in Continuous Mode - build a dedicated continuous graph that
simply reads from the FTP Server using UNIVERSAL SUBSCRIBE and then publishes
into a Ab Initio Queue. (Easy to test, maintain, etc.)
Scripting Related
Scenario Based questions can be expected.
How to sort the files in directory based on its name, date, owner etc.
Based on name : ls sorts by name by default
ls -1 | sort
To sort them in reverse order: ls -1 | sort –r
Based on date: ls –lt or ls –lrt
Based on owner: ls -l | sort -k3,3
Sort start a key at POS1 (origin 1), end it at POS2 (default end of line)
N,
-s, --size
print the allocated size of each file, in blocks
-n, --numeric-sort
compare according to string numerical value
ls -ltr |grep ^d
The caret ^ and the dollar sign $ are meta-characters that respectively match the
empty string at the beginning and end of a line.
first character of that is either d for a directory or - for a file
displays the whole line, even when it doesn’t find any line that has | (pipe) as delimiter.
specified delimiter using -s option. doesn’t display any output, if no “|” found
Select All Fields Except the Specified Fields
The following example displays all the fields from /etc/passwd file except field 7
To change the output delimiter use the option –output-delimiter as shown below. In
this example, the input delimiter is : (colon), but the output delimiter is # (hash).
2>/dev/null
/dev/null is the null device it takes any input you want and throws it away. It can be
used to suppress any output.
Here the actions in the begin block are performed before processing the file
and the actions in the end block are performed after processing the file. The
rest of the actions are performed while processing the file.
Examples:
Create a file input_file with the following data. This file can be easily created using
the output of ls -l.
From the data, you can observe that this file has rows and columns. The rows
are separated by a new line character and the columns are separated by a
space characters. We will use this file as the input for the examples discussed
here.
This will prints the sum of the value in the 5th column. In the Begin block the
variable sum is assigned with value 0. In the next block the value of 5th
column is added to the sum variable. This addition of the 5th column to the
sum variable repeats for every row it processed. When all the rows are
processed the sum variable will hold the sum of the values in the 5th column.
This value is printed in the End block.
#!/usr/bin/awk -f
BEGIN {sum=0}
{sum=sum+$5}
END {print sum}
4. awk '{ if($9 == "t4") print $0;}' input_file
This awk command checks for the string "t4" in the 9th column and if it finds a
match then it will print the entire line. The output of this awk command is
This will print the squares of first numbers from 1 to 5. The output of the command
is
square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25
If the fields in the file are separted by any other character, we can use the FS
variable to tell about the delimiter.
By default whenever we printed the fields using the print statement the fields are
displayed with space character as delimiter. For example
center 0
center 17
center 26
center 25
center 43
center 48
center:0
center:17
center:26
center:25
center:43
center:48
8. awk '{print NF}' input_file
This will display the number of columns in each row.
>cat file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.
The below simple sed command replaces the word "unix" with "linux" in the file.
Here the "s" specifies the substitution operation. The "/" are delimiters. The
"unix" is the search pattern and the "linux" is the replacement string.
Vi editor
0 Move to the begining of the line
$ Move to the end of the line
1G Move to the first line of the file
G Move to the last line of the file
nG Move to nth line of the file
The below command replaces the second occurrence of the word "unix" with "linux"
in a line.
The substitute flag /g (global replacement) specifies the sed command to replace all
the occurrences of the string in the line.
In this case the url consists the delimiter character which we used. In that
case you have to escape the slash with backslash character, otherwise the
substitution won't work.
There might be cases where you want to search for the pattern and replace that
pattern by adding some extra characters to it. In such cases & comes in handy. The &
represents the matched string.
You can restrict the sed command to replace the string on a specific line number. An
example is
You can delete the lines a file by specifying the line number or a range or numbers.
XARGS
The xargs command (by default) expects the input from stdin, and executes
/bin/echo command over the input
When you type xargs without any argument, it will prompt you to enter the
input through stdin:
$ xargs
Hi,
Welcome to TGS.
After you type something, press ctrl+d, which will echo the string back to you on
stdout as shown below.
$ xargs
Hi,
Welcome to TGS.Hi, Welcome to TGS.
It is one of the most important usage of xargs command. When you need to find
certain type of files and perform certain actions on them (most popular being the
delete action).
The xargs command is very effective when we combine with other commands.
$ ls
one.c one.h two.c two.h
$ ls
one.h two.h
air -version <version number> project export <EME Path Name> -basedir
/tmp/<existing or new sandbox > -files mp/<graph name> -cofiles -find-required-
files -gde
air project export <eme path> -basedir <existing or new sandbox> \ -common <Eme
path of common sandbox> <path to where it needs to be checked out>
Database Related
How to delete the least value key after finding the duplicates? – create temp table
with duplicate rows & then keep the min id row
The duplicated rows have a count greater than one. If you only want to see rows that
are duplicated, you need to use a HAVING clause (not a WHERE clause), like this:
select day, count(*) from test group by day HAVING count(*) > 1;
+------------+----------+
| day | count(*) |
+------------+----------+
| 2006-10-08 | 2|
create temporary table to_delete (day date not null, min_id int not null);
What are the different partition and de-partition components available? Explain
about them?
If file containing a data how do you find out the data types of each column to create
the DML?
Any knowledge on validate data (or validate records) component?
Have you done performance tuning? And How?\
Lookup concept? How did you use in your graph? What precautions have to be
taken when lookup file is multi-file? Usage of lookup and lookup_local functions.
How do you check-in from command prompt? What is the command to check-in the
different sandboxes under one project at a time?
What is the command to see the disk usage of particular partition in multi-file?
Du – disk usage
Du –h -> human readable format
Su –s -> summary of a grand total disk usage size of an directory
Df –device filesystem
Df –a -> filesystem usage
For serial and mfs there are many ways the components can be used.
2. We can also use multiple LEADING RECORDS component for meeting the
requirement.
What is the difference between the rollup and scan components?
What is the m_dump command how do you use it?
If input data is containing the duplicate data which partition component is better to
use? - PBK
Do you know Autosys scheduler and did you work through the GDE or from
command prompt?
What is the difference between “ON-ICE” and “ON-HOLD” status of jobs in Autosys?
What is the command to print duplicate records in file? - uniq
What is the command to print 3rd column in a file? How do you replace the 3rd
column value with some other column value? Awk ‘{print$3}
How do you perform find & replace in a file? sed
finding the 2nd highest salary in SQL
SELECT MAX(Salary) FROM Employee
WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee )
FROM Employee
ORDER BY Salary DESC
) AS Emp
ORDER BY Salary
select * FROM (
select EmployeeID, Salary
,rank() over (order by Salary DESC) ranking
from Employee
)
WHERE ranking = N;
Note that the DESC used in the query above simply arranges the salaries in
descending order – so from highest salary to lowest. Then, the key part of the query
to pay attention to is the “LIMIT N-1, 1″. The LIMIT clause takes two arguments in
that query – the first argument specifies the offset of the first row to return, and the
second specifies the maximum number of rows to return. So, it’s saying that the
offset of the first row to return should be N-1, and the max number of rows to return
is 1. What exactly is the offset? Well, the offset is just a numerical value that
represents the number of rows from the very first row, and since the rows are
arranged in descending order we know that the row at an offset of N-1 will contain
the (N-1)th highest salary.
Indexing:
Why is it needed?
When data is stored on disk based storage devices, it is stored as blocks of data. These blocks are
accessed in their entirety, making them the atomic disk access operation. Disk blocks are structured
in much the same way as linked lists; both contain a section for data, a pointer to the location of the
next node (or block), and both need not be stored contiguously.
Due to the fact that a number of records can only be sorted on one field, we can state that searching
on a field that isn’t sorted requires a Linear Search which requires N/2 block accesses (on average),
where N is the number of blocks that the table spans. If that field is a non-key field (i.e. doesn’t
contain unique entries) then the entire table space must be searched at N block accesses.
Whereas with a sorted field, a Binary Search may be used, this has log2 N block accesses. Also since
the data is sorted given a non-key field, the rest of the table doesn’t need to be searched for duplicate
values, once a higher value is found. Thus the performance increase is substantial.
What is indexing?
Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a
table creates another data structure which holds the field value, and pointer to the record it relates
to. This index structure is then sorted, allowing Binary Searches to be performed on it.
The downside to indexing is that these indexes require additional space on the disk, since the indexes
are stored together in a table using the MyISAM engine, this file can quickly reach the size limits of
the underlying file system if many fields within the same table are indexed.
Data Definition Language (DDL) - Data definition language (DDL) commands enable you to
perform the following tasks:
Create, alter, and drop schema objects
Data Manipulation Language (DML) - These SQL commands are used for storing, retrieving,
modifying, and deleting data. These commands are SELECT, INSERT, UPDATE, and DELETE
Transaction Control Language (TCL) - Transaction control commands manage changes made by
DML commands. These SQL commands are used for managing changes affecting the data. These
commands are COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL) - It is used to create roles, permissions, and referential integrity as
well it is used to control access to database by securing it. These SQL commands are used for
providing security to database objects. These commands are GRANT and REVOKE.
Reason:When you type DELETE.all the data get copied into the Rollback Tablespace
first.then delete operation get performed.Thatswhy when you type ROLLBACK after
deleting a table ,you can get back the data(The system get it for you from the
Rollback Tablespace).All this process take time.But when you type TRUNCATE,it
removes data directly without copying it into the Rollback Tablespace.Thatswhy
TRUNCATE is faster.Once you Truncate you cann't get back the data.
Scripts:
$ vi ginfo
#
#
# Script to print user information who currently login , current date & time
#
clear
echo "Hello $USER"
echo "Today is \c ";date
echo "Number of user login : \c" ; who | wc -l
echo "Calendar"
cal
exit 0
You can see system variables by giving command like $ set, some of the important
System variables are:
echo [options] [string, variables...]
Displays text or variables value on screen.
Options
-n Do not output the trailing new line.
-e Enable interpretation of the following backslash escaped characters in the strings:
\a alert (bell)
\b backspace
\c suppress trailing new line
\n new line
\r carriage return
\t horizontal tab
\\ backslash
` Back quote
`Back quote` - To execute command
$ vi sayH
#
#Script to read your name from key-board
#
echo "Your first name please:"
read fname
echo "Hello $fname, Lets be friend!"
Now if you want to print 1st line to next 5 line (i.e. 1 to 5 lines) then give command
:1,5 p
:8 s/lerarns/learn/
Sed cmd
8 Goto line 8, address of line.
S Substitute
/lerarns/ Target pattern
If target pattern found substitute the
learn/
expression (i.e. learn/ )
:1,$ s/Linux/Unix/
:1,$ Substitute for all line
g All occurrence
/[^ [^] This means not
Empty line, Combination
/^$
of ^ and $.
To view entire file without blank line you can use command as follows:
:g/[^/^$]
Description
a=5.66
b=8.67
c=`echo $a + $b | bc`
echo "$a + $b = $c"
sed
Syntax: $ sed -n -e Xp -e Yp FILENAME
sed : sed command, which will print all the lines by default.
-n : Suppresses output.
-e CMD : Command to be executed
Xp: Print line number X
Yp: Print line number Y
FILENAME : name of the file to be processed.
The example below will display line numbers 101 – 110 of /var/log/anaconda.log
file
$ cat /var/log/anaconda.log | tail -n +101 | head -n 10
http://www.freeos.com/guides/lsst/ch08.html
Q.11.Write script to determine whether given file exist or not, file name is supplied
as command line argument, also check for sufficient number of command line
argument
if [ $# -ne 1 ]
then
echo "Usage - $0 file-name"
exit 1
fi
if [ -f $1 ]
then
echo "$1 file exist"
else
echo "Sorry, $1 file does not exist"
fi
Q.1. How to write shell script that will add two nos, which are supplied as command
line argument, and if this two nos are not given show error and its usage
Answer: See Q1 shell Script.
Q.2.Write Script to find out biggest number from given three nos. Nos are supplies
as command line argument. Print error if sufficient arguments are not supplied.
Answer: See Q2 shell Script.
Q.4. Write Script, using case statement to perform basic math operation as
follows
+ addition
- subtraction
x multiplication
/ division
The name of script must be 'q4' which works as follows
$ ./q4 20 / 3, Also check for sufficient command line arguments
Answer: See Q4 shell Script.
Q.5.Write Script to see current date, time, username, and current directory
Answer: See Q5 shell Script.
Q.6.Write script to print given number in reverse order, for eg. If no is 123 it must
print as 321.
Answer: See Q6 shell Script.
Q.7.Write script to print given numbers sum of all digit, For eg. If no is 123 it's sum
of all digit will be 1+2+3 = 6.
Answer: See Q7 shell Script.
Q.8.How to perform real number (number with decimal point) calculation in Linux
Answer: Use Linux's bc command
Q.10.How to perform real number calculation in shell script and store result to
third variable , lets say a=5.66, b=8.67, c=a+b?
Answer: See Q10 shell Script.
Q.11.Write script to determine whether given file exist or not, file name is supplied
as command line argument, also check for sufficient number of command line
argument
Answer: See Q11 shell Script.
Q.12.Write script to determine whether given command line argument ($1) contains
"*" symbol or not, if $1 does not contains "*" symbol add it to $1, otherwise show
message "Symbol is not required". For e.g. If we called this script Q12 then after
giving ,
$ Q12 /bin
Here $1 is /bin, it should check whether "*" symbol is present or not if not it should
print Required i.e. /bin/*, and if symbol present then Symbol is not required must
be printed. Test your script as
$ Q12 /bin
$ Q12 /bin/*
Answer: See Q12 shell Script
Q.13. Write script to print contains of file from given line number to next given
number of lines. For e.g. If we called this script as Q13 and run as
$ Q13 5 5 myf , Here print contains of 'myf' file from line number 5 to next 5 line of
that file.
Answer: See Q13 shell Script
Q.14. Write script to implement getopts statement, your script should understand
following command line argument called this script Q14,
Q14 -c -d -m -e
Where options work as
-c clear the screen
-d show list of files in current working directory
-m start mc (midnight commander shell) , if installed
-e { editor } start this { editor } if installed
Answer: See Q14 shell Script
Q.15. Write script called sayHello, put this script into your startup file called
.bash_profile, the script should run as soon as you logon to system, and it print any
one of the following message in infobox using dialog utility, if installed in your
system, If dialog utility is not installed then use echo statement to print message : -
Good Morning
Good Afternoon
Good Evening , according to system time.
Answer: See Q15 shell Script
Q.16. How to write script, that will print, Message "Hello World" , in Bold and Blink
effect, and in different colors like red, brown etc using echo command.
Answer: See Q16 shell Script
Q.17. Write script to implement background process that will continually print
current time in upper right corner of the screen , while user can do his/her normal
job at $ prompt.
Answer: See Q17 shell Script.
Q.18. Write shell script to implement menus using dialog utility. Menu-items and
action according to select menu-item is as follows:
Note: Create function for all action for e.g. To show date/time on screen create
function show_datetime().
Answer: See Q18 shell Script.
Q.20.Write shell script using for loop to print the following patterns on screen
Q.21.Write shell script to convert file names from UPPERCASE to lowercase file
names or vice versa.
Answer: See the rename.awk - awk script and up2sh shell script.
TERRADATA
Normalisation of DB:
It is a technique of organizing data in database. It ensures
- Eliminating redundant(useless) data
- Ensuring data dependencies make sense i.e. data is logically stored
Without this it will be tuf to handle & update database, without facing data loss.
1NF – A database is in first normal form if it satisfies the following conditions:
Student Age
Adam 22
Eve 21
Student Subject
Adam Physics
Adam Bio
Eve Chemistry
In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines
[Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID]
and we have transitive functional dependency, and this structure does not
satisfy third normal form.
To bring this table to third normal form, we split the table into two as
follows:
- 3NF does not deal satisfactorily with the case of a relation with
overlapping candidate keys
BCNF
BCNF is based on the concept of a determinant.
A determinant is any attribute (simple or composite) on which some other
attribute is fully functionally dependent.
A relation is in BCNF is, and only if, every determinant is a candidate key.
facts,dimensions,
- These tables contain the basic data used to conduct detailed analyses and
derive business value
- Fact contain measurable attributes (how much, how many)
Eid Salary
The information contained within a fact table is typically numeric data
Eg: sales happening in retail
ProductID CustomerID Unit Sold
cubes,
An OLAP cube is a method of storing data in a multidimensional form, generally for
reporting purposes.
- The cubes divide the data into subsets that are defined by dimensions.
- A cube provides an easy-to-use mechanism for querying data with quick
and uniform response times.
- Cubes are the main objects in online analytic processing (OLAP), a
technology that provides fast access to data in a data warehouse. A cube
is a set of data that is usually constructed from a subset of a data
warehouse and is organized and summarized into a multidimensional
structure defined by a set of dimensions and measures.
- A cube can be stored on a single analysis server and then defined as a linked
cube on other Analysis servers. End users connected to any of these analysis
servers can then access the cube. This arrangement avoids the more costly
alternative of storing and maintaining copies of a cube on multiple analysis
servers. linked cubes can be connected using TCP/IP or HTTP.
Virtual-cubes
These are combinations of one or more real cubes and require no disk space to store them.
They store only the definitions and not the data of the referenced source cubes. They are
similar to views in relational databases.
materialised views
- a materialized view is a database object that contains the results of a
query.
- A materialized view is a pre-computed table comprising aggregated
and/or joined data from fact and possibly dimension tables.
- Builders of data warehouses will know a materialized view as a summary
or aggregation.
- Unlike an ordinary view which is only a stored select statement that runs
if we use the view, a materialized view stores the result set of the select
statement as a container table.
First of all, some definitions are in order. In a star schema, dimensions that reflect a
hierarchy are flattened into a single table. For example, a star schema Geography
Dimension would have columns like country, state/province, city, state and postal
code. In the source system, this hierarchy would probably be normalized with
multiple tables with one-to-many relationships.
A snowflake schema does not flatten a hierarchy dimension into a single table. It
would, instead, have two or more tables with a one-to-many relationship. This is a
more normalized structure. For example, one table may have state/province and
country columns and a second table would have city and postal code. The table with
city and postal code would have a many-to-one relationship to the table with the
state/province columns.
There are some good for reasons snowflake dimension tables. One example is a
company that has many types of products. Some products have a few attributes,
others have many, many. The products are very different from each other. The
thing to do here is to create a core Product dimension that has common attributes
for all the products such as product type, manufacturer, brand, product group, etc.
Create a separate sub-dimension table for each distinct group of products where
each group shares common attributes. The sub-product tables must contain a
foreign key of the core Product dimension table.
One of the criticisms of using snowflake dimensions is that it is difficult for some of
the multidimensional front-end presentation tools to generate a query on a
snowflake dimension. However, you can create a view for each combination of the
core product/sub-product dimension tables and give the view a suitably description
name (Frozen Food Product, Hardware Product, etc.) and then these tools will have
no problem.
Top-down: If you use a top-down approach, you will have to analyze global
business needs, plan how to develop a data warehouse, design it, and implement
it as a whole.
In top-down approach complete data warehouse is created first and then the datamarts are
derived from the data warehouse.
Disadvtg: high-cost estimates
analyzing and bringing together all relevant sources is a very difficult task
since no prototype is going to be delivered in the short term, users cannot check for
this project to be useful,
Bottom-Up: In bottom-up approach datamarts are created first. T datamarts are
then integrated and a comprehensive data warehouse is created. As the datamarts
are created first, business solutions can be answered quickly.
scd2 type
lookup_count() Lookup_next() are the functions used to set the index to first record in a
group of records (if your lookup has duplicates for a key) and then walk through those
records using lookup_next() to pick the right one.
Use lookup_count for finding the duplicates and lookup_next for retrieving it.If
lookup_count (string file_label, [ expression [ , expression ... ] ] )>0lookup_next (
lookup_identifier_type lookup_id, string lookup_template ) Njoi!!Abhi - fresh as dew!
lookup_count
Syntax
lookup_count_local
lookup_local
lookup_next
Syntax
You can use lookup_match to quickly determine whether or not a particular key value is
contained in a lookup file. It is faster than lookup and lookup_count, and you should use
it when:
if (is_defined(lookup(FILE, key)))
statement1
else
statement2
if (lookup_match(FILE, key))
statement1
else
statement2
Each record in the Lookup File represents an interval, that is, a range of values.
The lower and upper bounds of the range are usually given in two separate fields
of the same type.
A key field marked interval_bottom holds the lower endpoint of the interval.
A key field marked interval_top holds the upper endpoint.
If a field in the Lookup File's key specifier is marked as interval, it must be the
only key field for that Lookup File. You cannot specify a multipart key as an
interval lookup.
The file contains well-formed intervals:
o For each record, the value of the lower endpoint of the interval must be
less than or equal to the value of the upper endpoint of the intervals.
o The intervals must not overlap.
o The intervals must be sorted into ascending order.
To use a lookup file for interval lookups, the modifier for the key field in the
lookup file must be interval, interval_bottom, or interval_top.
For example, in a Lookup File that is an interval lookup, the following DML
function returns the record, if any, for which arg is between the lower and upper
endpoints.
lookup("Lookup_File_name",arg)
By default, the interval endpoints are inclusive, but you can add the modifier
exclusive to specify otherwise. For example, suppose Lookup File
insurance_coverage has the following key specifier:
"{coverage_start interval_bottom exclusive; coverage_end interval_top}"
This identifies the fields coverage_start and coverage_end as the endpoints of
the interval.
The following DML function returns record R if R.coverage_start is less than
arg and less than or equal to R.coverage_end.
lookup("insurance_coverage",arg)
The most reliable way to ftp a file without advance knowledge of it's dml is
to give it a dml of:
void(1)
It then becomes a byte stream that Ab Initio does not try to interpret
XML split has more control over the dml being generated.
Go for XML SPLIT if you have XSDs of type parent -child- sub child etc...
the component will just do everything for you.. and all that you need to do
is just specify the base elements.
To find size of mfs file: M_ls will disply you the number of partitions in a multifile
system If you want to know how much disk space is used by a multi file u can use the
following two commands
m_du: Will give you the disk usage of a multi file system , it will clerly gives you how
much space is used for each partion on the disk
m_df : Disk fragmentation will also help's
Compute check sum is the component which will give you the total size and the number
of records in it, but it is a time taking process and you need to build a graph for the
same.Hope the answers will help.
The file extension .mfctl will contain the URLs of all the data partitions. The file with the
extension .mdir will contain the URL of the control file used by MFS