Optimizing Large Database Imports: Logical
Optimizing Large Database Imports: Logical
Optimizing Large Database Imports: Logical
IBM Informix data movement utilities provide an easy way to move databases from one
machine to another. Large databases, especially those with a large number of associated
stored procedures and triggers, can pose special challenges. This article provides a sample
script that you can modify to make the job easier and faster.
Introduction
Dbexport and dbimport are IBM Informix® data movement utilities that move databases from one
machine to another. The whole process is pretty simple and straightforward. Dbexport creates a
special directory into which it creates a schema file that records the current database structure,
and then it unloads data into ASCII files for every database object. You can keep those schema
and data files on disk or on tape and then easily transfer them to another machine using UNIX® tar
or other utilities. At the same time, dbexport creates a message file in the user's current directory
to record any errors and warnings that occurred during the dbexport operation. The name of the
message file is dbexport.out.
The dbimport utility uses the schema and data files created by dbexport and creates exactly
the same database on other machines. Like dbexport, dbimport also creates a message file,
dbimport.out, in the user's current directory to log errors and warnings during the dbimport
operation.
You use these utilities for logical backups. By logical backup, I mean the backup of the database
structure or schema, and logical database objects such as tables, indexes, views, triggers and
stored procedures. A logical backup is quite different from physical backups produced by such
utilities as ontape, on-archive, or on-bar, which back up the entire Informix instance and raw disk
space.
One big advantage of logical backup is its flexibility. The schema file created by dbexport is an
ASCII file and can be edited at anytime before importing the database. There are quite a few areas
you can modify:
• Extent size for tables. You can enlarge your first and second table extent size such that the
table will have fewer but larger extents. This can improve your overall performance.
• Physical location for a table or an index. Based on your studies of your operating system's
disk I/O, you may want to move some tables and indexes to another location so as to reduce
I/O load.
• Names for indexes and constraints. You may want to replace automatically generated indexes
and constraints names with more meaningful names.
• Structures of database objects. You may apply a different fragmentation strategy for tables
and indexes by editing the schema file, so that when the database gets imported, tables and
indexes will be fragmented across different disks to achieve better performance.
In addition, you can change the name of the exported database so that you can keep more than
one copy of the same database on the same Informix instance. This is really good for developers
and testers who would always like to keep an original database for comparison.
Another advantage of logical backup is its finer granularity. When you restore, you don't need to
shut down whole Informix instance; instead, you can restore it on the database level. In this way,
you can avoid interfering with other people who are working with other databases.
Because of those advantages, we use dbexport and dbimport utilities frequently when developing,
unit testing, box testing, and integration testing.
Because IBM Informix stored procedure language (SPL) is a handy mid-tier API, a lot of
developers and programmers like to use it to implement business rules and as a link between
front-end applications and the database server. We are expecting that the number of stored
procedures and triggers will grow from release to release. We've had no problem in exporting
and importing small databases - databases with a few hundreds of MBs, but when we import
large databases - databases with large amount of data and a large number of triggers and stored
procedures - we had problems with long transactions and long compilation times.
Long transactions
IBM Informix considers the whole process of importing a database as a single transaction. When
data is large and triggers and store procedures are many, it is easy to get into a "long transaction"
state. A long transaction is defined by Informix as a transaction that starts in one logical log file and
is not committed when the database server needs to reuse the same logical log file. When a long
transaction hits the long trasaction high water mark, it starts to roll back; when the long transaction
exclusive access high water mark, the Informix server is blocked while the long transaction is
rolled back. In other words, other database operations on the server such as SELECT, INSERT,
DELETE or UPDATE are held until the long transaction is rolled back. The common way to avoid
this is either to turn off the database logging or increase logical logs to allow larger transactions to
go through.
On the secondary system, there are only three small databases: small_db1, small_db2 and
small_db3.
As you can see from the flowchart, the heart of this script is composed of two functions: export_db
and import_db. The basic logic here is to divide and conquer. In export_db, we take the advantage
of dbexport/dbimport utilities, which allow us to edit the exported schema. After we export big_db,
we split its exported schema into three parts as follows:
• For tables, we use the dbimport utility to create the database and then import tables with data
only. This part goes very fast, because there are no indexes and constraints for the tables.
• The second part, stored procedures, used to be the bottleneck, because the dbimport
utility compiles stored procedures line by line. Because the database has already created
by dbimport, we can now use the UNIX pipe utility for this part. This improves the speed
significantly because the UNIX pipe utility can do the batch processing for compiling stored
procedures.
• We use the same method for the third part to create all views, indexes, constraints and
triggers.
I'll explore this in more detail later.
. /usr/informix/.profile
DBACCESS=${INFORMIXDIR}/bin/dbaccess
DBEXPORT=${INFORMIXDIR}/bin/dbexport
DBIMPORT=${INFORMIXDIR}/bin/dbimport
EXPORT_PATH="/usr/gsm/db"
STAT_DB=stat_db
WEN_DB=wen_db
CONFIG_BIN=/SYS_SETUP/SYSCONFIG
LOGFILE="/usr/gsm/logs/cutover/db_exim.$$"
#######################
# stop_start_informix #
#######################
stop_start_informix() {
echo ""
echo "Stopping/Starting Informix."
echo "This may take up to 200 seconds. Please Wait..."
/usr/bin/su - informix -c "onmode -ky" >/dev/null 2>&1
#Wait for 50 seconds then start the informix engine backup
sleep 50
/usr/bin/su - informix -c "oninit; exit 0" >/dev/null 2>&1
rc=1
while [ $rc -ne 5 ]
do
echo "Informix oninit return code is $rc"
echo "Waiting for 50 seconds..."
sleep 50
echo "Checking if the Informix Engine is Online..."
/usr/bin/su - informix -c "onstat -" >/dev/null 2>&1
rc=$?
#
# onstat return codes:
# 0 = Initialization
# 1 = Quiescent
# 2 = Fast Recovery
# 3 = Archive Backup
# 4 = Shutting Down
# 5 = Online
# 6 = System Aborting
#
done
echo "The Informix Engine is Online."
echo ""
#############
# rename_db #
#############
rename_db() {
rc=1
while [ $rc -ne 0 ]
do
echo "Trying to rename the database $1 to $2"
echo "rename database $1 to $2" | $DBACCESS
rc=$?
if [ $rc -ne 0 ]
then
echo "WARNING: Failed to rename the database $1 to $2."
echo "Retrying. Please wait..."
#
# bounce the Informix engine
#
stop_start_informix
fi
done
#
# bounce the Informix engine
#
stop_start_informix
The purpose of the two functions in Listing 1 is to ensure that all database connections are killed
before exporting the database. When you export databases, you need exclusive access to the
database; otherwise you will get an error from Informix and won't be able to export. The logic here
is to bounce Informix to get rid of all database connections and then when the Informix engine
comes back, immediately rename the database so that other application programs cannot access
the database we are going to export. Then we can export the database without any interference.
Listing 2 shows the most important function of the script, the split_schema function.
TABLE_SQL=table.sql
PROC_SQL=proc.sql
REST_SQL=rest.sql
cp ${EXPORTED_SQL} ${EXPORTRED_SQL}.orig
if [ "$?" -ne 0 ]
then
echo "error with creating files.please check directory permission"
exit 1
fi
if [ $position0 -lt 0 ]
then
echo "First seperator $file_seperator0 is not found in file. "
exit 1
elif [ $position1 -lt 0 ]
then
echo "Second seperator $file_seperator1 is not found in file."
exit 1
elif [ $position2 -lt 0 ]
then
echo "Second seperator $file_seperator2 is not found in file."
exit 1
fi
mv ${EXPORTED_DIR}/${TABLE_SQL} ${EXPORTED_SQL}
This is the heart of the script, and the logic is explained in detail in The script's logic. This function
is called right after big_db gets exported in export_db function, as shown in Listing 3.
date
DB=$1
lockedDB=$2
EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql"
EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp"
if [ -d "$EXPORT_PATH/$lockedDB.exp" ]; then
if [ "$DB" = "small_db2" ]
then
su - informix -c "echo 'grant dba to root' | $DBACCESS $DB; exit 0"
fi
if [ "$DB" = "big_db" ]
then
split_schema
fi
As we mentioned before, export_db is one of the major functions of the script, so let's explore it a
little more.
First, the function calls the rename_db function. The purpose is to obtain exclusive access
to the database to be exported, so that we can export it without any interference from other
programs. Then the function exports the database. Note that it uses quiet mode when exporting
the database. Because quiet mode does not show the errors and warnings during export on the
screen, it is very important to check dbexport.out after the database gets exported; otherwise, you
may miss errors and warnings. Right after export, the functions calls rename_db function again
to rename the database we just exported so that other programs can access it. The function then
checks the database name parameter or argument to see if it is big_db. If it is big_db, it splits the
exported schema into three parts. For the small databases, we don't split their exported schemas.
infxcon="/usr/informix/etc/onconfig"
cp ${infxcon} ${infxcon}.ori
cp ${infxcon}.ori ${infxcon}
The purpose of the function in Listing 4 is to turn off logging while importing the large database,
so that the dbimport utility will use a minimum number of logical logs, thus preventing importing
operation getting into "long transaction" state and rolling back. We will then turn logging on after
the import.
Listing 5. import_db
#############
# import_db #
#############
import_db() {
date
DB=$1
lockedDB=$2
DBSPACE=$3
EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql"
EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp"
if [ "$DB" = "big_db" ]
then
$DBIMPORT -q $lockedDB -d $DBSPACE -i $EXPORT_PATH >$TEMP3 2>&1
/usr/bin/su - informix -c "cat ${EXPORTED_DIR}/proc.sql | $DBACCESS $DB; exit 0"
/usr/bin/su - informix -c "cat ${EXPORTED_DIR}/rest.sql | $DBACCESS $DB; exit 0"
turnonlogging $DB
else
$DBIMPORT -q $lockedDB -d $DBSPACE -l -i $EXPORT_PATH >$TEMP3 2>&1
fi
exit 1;
fi
Import_db (Listing 5) is another major function of the script. It first checks the database name
parameter or argument passed in. If it is big_db, it uses the dbimport utility to process the tables
part of exported schema and then uses the UNIX pipe utility for compiling stored procedures,
indexes, and triggers. The purpose is to speed up the whole process and, as I mentioned before,
by doing it this way, we reduced the time used for import big_db by at least 50-60%. For big_db,
we also turn off logging before import and turn on logging after import to avoid "long transaction."
We only do this for big_db; for other databases, we believe we have plenty of logical logs and will
never run into "long transaction."
The rest of the script is to create a user-friendly and easy-to-use menu, which I don't describe in
this article, but you can see it for yourself in the downloadable script.
Conclusion
As you have seen, the heart of the script is the logic to split an exported schema into three parts
and to process each of them with different Informix dbimport and UNIX cat utilities. For us, this
method is more efficient than just using the Informix dbimport utility to import large databases; it
greatly speeds up the whole database import process and is better at using Informix and system
resources such as logical logs and memory. Since the script is composed of many functions, it is
very flexible and could easily be modified to meet your specific requirements. This script should
give you some practical and useful ideas and suggestions on how to import large databases more
efficiently.
Downloadable resources
Description Name Size
new.sh ( HTTP | FTP ) 14KB