Database Question

Q. 1. What do you mean by database?
Ans. A database is a collection of occurrence of multiple record types containing the relationship between
records, data aggregate and data items. A database may be defined as A database is a collection of interrelated
data store together without harmful and unnecessary redundancy (duplicate data) to serve multiple applications
The data is stored so that they are independent of programs, which use the data. A common and control
approach is used in adding the new data, modifying and retrieving existing data or deletion of data within the
database A running database has function in a corporation, factory, government department and other
organization. Database is used for searching the data to answer some queries. A database may be design for
batch processing, real time processing or on line processing.
DATABASE SYSTEM
Database System is an integrated collection of related files along with the detail about their definition,
interpretation, manipulation and maintenance. It is a system, which satisfied the data need for various
applications in an organization without unnecessary redundancy. A database system is based on the data. Also
a database system can be run or executed by using software called DBMS (Database Management System). A
database system controls the data from unauthorized access.
Foundation Data Concept
A hierarchy of several levels of data has been devised that differentiates between different groupings, or
elements, of data. Data are logically organized into:
 Character
It is the most basic logical data element. It consists of a single alphabetic, numeric, or other symbol.
 Field
It consists of a grouping of characters. A data field represents an attribute (a characteristic or quality) of
some entity (object, person, place, or event).
 Record
The related fields of data are grouped to form a record. Thus, a record represents a collection of attributes
that describe an entity. Fixed-length records contain, a fixed number of fixed-length data fields. Variable-
length records contain a variable number of fields and field lengths.
 File
A group of related records is known as a data file, or table. Files are frequently classified by the application
for which they ar primarily used, such as a payroll file or an inventory file, or the type of data they contain,
such as a document file or a graphical image file. Files are also classified by their permanence, for example, a
master file versus a transaction file. A transaction file would contain records of
all transactions occurring during a period, whereas a master file contains all the permanent records. A history
file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage
called archival storage.
 Database
It is an integrated collection of logically related records or objects. A database consolidates records
previously stored in separate files into a common pool of data records that provides data for many applications.
The data stored in a database is independent of the application programs using it and o the ‘type of secondary
storage devices on which it is stored.
Q. 2. What are the various characteristics of DBMS?

Ans. The major characteristics of database approach are:
• Self-describing Nature of a Database System
• Insulation between Programs and Data, and Data Abstraction
1
• Support of Multiple Views of the Data
• Sharing of Data and Multi user Transaction Processing
Q. 3. What are the various characteristics of DBMS approach?
Ans.
1. Self-contained nature
DBMS system contains data plus a full description of the data (called “metadata”) “metadata” is data about
data - data formats, record structures, locations, how to access, indexes metadata is stored in a catalog and is
used by DBMS software to know how to access the data. Contrast this with the file processing approach where
application programs need to know the structure and format of records and data.
2. Program-data independence
Data independence is immunity of application programs to changes in storage structures and access
techniques. E.g. adding a new field, changing index structure, changing data format, In a DBMS environment
these changes are reflected in the catalog. Applications aren’t affected. Traditional file processing programs
would all have to
change, possibly substantially.
3. Data abstraction
A DBMS provides users with a conceptual representation of data (for example, as objects with properties
and inter-relationships). Storage details are hidden. Conceptual representation is provided in terms of a data
model.
4. Support for multiple views
DBMS may allow different users to see different “views” of the DB, according to the perspective each one
requires. E.g. a subset of the data - For example; the people using the payroll system need not/should not see
data about students and class schedules. E.g. data presented in a different form from the way it is stored - For
example someone interested in student transcripts might get a view which is formed by combining information
from separate files or tables.
5. Centralized control of the data resource

The DBMS provides centralized control of data in an organization.
This brings a number of advantages:
2
(a) reduces redundancy
(b) avoids inconsistencies
(c) data can be shared
(d) standards can be enforced
(e) security restrictions can be applied
(f) integrity can be maintained
a, b. Redundancy and Inconsistencies
Redundancy is unnecessary duplication of data. For example if accounts department and registration
department both keep student name, number and address.
Redundancy wastes space and duplicates effort in maintaining the data.
Redundancy also leads to inconsistency.
Inconsistent data is data which contradicts itself - e.g. two different addresses for a given student number.
Inconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy).
Controlled redundancy: Some redundancy may be desirable (for efficiency). A DBMS should be aware of
it, and take care of propagating updates to all copies of a data item.
This is an objective, not yet currently supported.
c. Sharing
• Need concurrency control
• Multiple user views
d. Standards
E.g. data formats, record structures, naming, documentation
International, organizational, departmental ... standards
e. Security
- restricting unauthorized access
DBMS should perform security checks on all accesses.
f. Integrity
Maintaining validity of data;
e.g. employee numbers must be in some range
e.g. every course must have an instructor
e.g.. student number must be unique
e.g. hours worked cannot be more than 150
These things are expressed as constraints.
DBMS should perform integrity checks on all updates. Currently DBMSs provide limited integrity checks.
Q. 3. What are the various types of databases?
Ans. Types of Databases

Continuing developments in information technology and its business applications have resulted in the
evolution of several major types of databases. Several major conceptual categories of databases that may be
found in computer-using organizations include:
Operational Databases
The databases store detailed data needed to support the operations of the entire organization. They are also
called subject area databases (SADB), transaction databases, and production databases: Examples are customer
databases, personnel databases, inventory databases, and other databases containing data generated by business
operations
Distributed Databases
3
Many organizations replicate and distribute copies or parts of databases to network
sewers at a variety of sites. These distributed databases can reside on network servers
on the World Wide Web, on corporate Intranets or extranets, or on other company networks. Distributed
databases may be copies of operational or analytical. databases,
hypermedia or discussion databases, or any other type of database. Replication and distribution of databases is
done to improve database performance and security.
External Databases
Access to external, privately owned online databases or data banks is available for a fee to end users and
organizations from commercial online services, and with or without charge from many sources on the Internet,
especially the Web.
Hypermedia Databases
It consists of hyperlinked pages of multimedia (text, graphics, and photographic images, video clips, audio
segments, etc.). From a database management point of view, the set of interconnected multimedia pages at a
website is a database of interrelated hypermedia page elements, rather than interrelated data records.
Q. 4. What do you mean by DBMS?
Ans. A DBMS is best described as a collection of programs that manage the database structure and that
control shared access to the data in the database. Current DBMSes also store the relationships between the
database components; they also take care of defining the required access paths to those components
A database management system (DBMS) is the combination of data, hardware, software and users to help
an enterprise manage its operational data.
The main function of a DBMS is to provide efficient and reliable methods of data retrieval to many users.
Efficient data retrieval is an essential function of database systems. DBMS must be able to deal with several
users who try to simultaneously access several items and most frequently, the same data item A DBMS is a set
of programs that is used to store and manipulation data that include the following:
• Adding new data, for example adding details of new student.
• Deleting unwanted data, for example deleting the details of students who have
completed course.
• Changing existing data, for example modifying the fee paid by the student.
A database is the information to be stored whereas the database management system is the system used to
manage the database. . This structure may be regarded in terms of its hardware implementation, called the
physical structure, or this structure may be regarded independently of its hardware implementation, called the
logical structure. In either case, the data structure is regarded as static because a database cannot ‘process
anything. The DBMS is regarded as dynamic because it is through the DBMS that all database processing
takes place. How the DBMS presents data to the user is called the view structure.
There are two general modes for data use: queries and transactions. Both forms use the DBMS for
processing. The query is processed for presentation in views and none of these processes are written to the
database. The transactional is processed for updating values in the database variables. These updates are
written to the database. A DBMS provides various functions like data security, data integrity, data sharing, data
concurrence, data independence, data recovery etc. However, all database management systems that are now
available in the market like Sybase, Oracle, and MS-Access do not provide the same set of functions, though
all are meant for data management.
Q. 5. What are the various components of DBMS?
Ans. Basic Components: A database system has four components. These four
components are important for understanding and designing the database system. These
4
are:
1. Data
2. Hardware
3. Software
4. Users
1. Data
As we have discussed above, data is raw hand information collected by us. Data is made up of data item or
data aggregate. A Data item is the smallest unit of named data: It may consist of bits or bytes. A Data item is
often referred to as field or data element. A Data aggregate is the collection of data items within the record,
which is given a name and referred as a whole. Data can be collected orally or written. A database can be
integrated and shared. Data stored in a system is partition into one or two databases. So if by chance data lost
or damaged at one place, then it can be accessed from the second place by using the sharing facility of data
base system. So a shared data also cane be reused according to the user’s requirement. Also data must be in the
integrated form. Integration means data should be in unique form i.e. data collected by using a well-defined
manner with no redundancy, for example Roll number in a class is non-redundant form and so these have
unique resistance, but names in class may be in the redundant form and can create lot of problems later on in
using and accessing the data.
2. Hardware
Hardware is also a major and primary part of the database. Without hardware nothing can be done. The
definition of Hardware is “which we can touch and see”, i.e. it has physical existences. All physical quantity or
items are in this category. For example, all the hardware input/output and storage devices like keyboard,
mouse, scanner, monitor, storage devices (hard disk, floppy disk, magnetic disk, and magnetic drum) etc. are
commonly used with a computer system.
3. Software
Software is another major part of the database system. It is the other side of hardware. Hardware and
software are two sides of a coin. They go side by side. Software is a system. Software are further subdivided
into two categories, First type is system software (like all the operating systems, all the languages and system
packages etc.) and second one is an application software (payroll, electricity billing, hospital management and
hostel administration etc.). We can define software as which we cannot touch and see. Software only can
execute. By using software, data can be manipulated, organized and stored. -
4. Users
Without user all of the above said components (data, hardware & software) are meaning less. User can
collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by
executing the software. Other components
1. People - Database administrator; system developer; end user.
2. CASE tools: Computer-aided Software Engineering (CASE) tools.
3. User interface - Microsoft Access; PowerBuilder.
4. Application Programs - PowerBuilder script language; Visual Basic; C++; COBOL.
5. Repository - Store definitions of data called METADATA, screen and report formats, menu definitions,
etc.
6. Database - Store actual occurrences data.
7. DBMS - Provide tools to manage all of this - create data, maintain data, control security access to data
and to the repository, etc.
Q. 6.What are the various functions of DBMS?
Ans. These functions will include support for at least all of the following:
5
• Data definition: The DBMS must be able to accept data definitions (external schemas, the conceptual
schema, the internal schema, and all associated mappings) in source form and convert them to the appropriate
object form.
• Data manipu1ation: The DBMS must be able to handle requests from the users to retrieve, update, or
delete existing data the database, or to add new data to the database. In other words, the DBMS must include a
data manipulation language (DML) processor component.
• Data security and integrity: The DBMS must monitor user requests and reject
any attempt to violate the security and integrity rules defined by the DBA.
• Data recovery and concurrency: The DBMS - or else some other related software component, usually
called the transaction manager - must enforce certain recovery and concurrency controls.
• Data Dictionary: The DBMS must provide a data dictionary function. The data dictionary can be regarded
as a database in its own right (but a system database, rather than a user database). The dictionary contains “data
about the data” (sometimes called metadata) - that is, definitions of other objects in the system - rather than
just”raw data.” In particular, all the various schemas and mapping (external, conceptual, etc.) will physically
be stored, in both source and object form, in the dictionary. A comprehensive dictionary will also include
cross- reference information, showing, for instance, which programs use which pieces of the database, which
users require which reports, which terminals are connected to the system, and so on. The dictionary might even
- in fact, probably should — be integrated into the database it defines, and thus include its own definition. It
should certainly be possible to query the dictionary just like any other database, so that, for example, it is
possible to tell which programs and or users are likely to be affected by some proposed change to the system.
Performance: It goes without saying that the DBMS should perform all of the functions identified above as
efficiently as possible.
Q7. What are the advantages and disadvantages of a database approach?

Ans. ADVANTAGES OF DBMS
One of the major advantages of using a database system is that the organization
can be handled easily and have centralized management and control over the data by
the DBA. Some more and main advantages of database management system are given
below:
The main advantages of DBMS are:
1. Controlling Redundancy
In a DBMS there is no redundancy (duplicate data). If any type of duplicate data arises, then DBA can
control and arrange data in non-redundant way. It stores the data on the basis of a primary key, which is always
unique key and have non-redundant information. For example, Roll no is the primary key to store the student
data.
In traditional file processing, every user group maintains its own files. Each group independently keeps files on
their db e.g., students. Therefore, much of the data is stored twice or more. Redundancy leads to several
problems:
• Duplication of effort
• Storage space wasted when the same data is stored repeatedly
Files that represent the same data may become inconsistent (since the updates are applied independently by
each users group).We can use controlled redundancy.
2. Restricting Unauthorized Access
A DBMS should provide a security and authorization subsystem.
• Some db users will not be authorized to access all information in the db (e.g., financial data).
• Some users are allowed only to retrieve data.
6
• Some users are allowed both to retrieve and to update database.
3. Providing Persistent Storage for Program Objects and Data Structures
Data structure provided by DBMS must be compatible with the programming language’s data structures.
E.g., object oriented DBMS are compatible with programming languages such as C++, SMALL TALK, and
the DBMS software automatically performs conversions between programming data structure and file formats.
4. Permitting Inferencing and Actions Using Deduction Rules
Deductive database systems provide capabilities for defining deduction rules for inferencing new
information from the stored database facts.
5. Inconsistency can be reduced
In a database system to some extent data is stored in, inconsistent way. Inconsistency is another form of
delicacy. Suppose that an em1oyee “Japneet” work in department “Computer” is represented by two distinct
entries in a database. So way inconsistent data is stored and DBA can remove this inconsistent data by using
DBMS.
6. Data can be shared
In a database system data can be easily shared by different users. For example, student data can be share by
teacher department, administrative block, accounts branch arid laboratory etc.
7. Standard can be enforced or maintained
By using database system, standard can be maintained in an organization. DBA is overall controller of
database system. Database is manually computed, but when DBA uses a DBMS and enter the data in
computer, then standard can be enforced or maintained by using the computerized system.
8. Security can be maintained
Passwords can be applied in a database system or file can be secured by DBA. Also in a database system,
there are different coding techniques to code the data i.e. safe the data from unauthorized access. Also it
provides login facility to use for securing and saving the data either by accidental threat or by intentional
threat. Same recovery procedure can be also maintained to access the data by using the DBMS facility.
9. Integrity can be maintained
In a database system, data can be written or stored in integrated way. Integration means unification and
sequencing of data. In other words it can be defined as “the data contained in the data base is both accurate and
consistent”. ‘Data can be accessed if it is
compiled in a unique form. We can take primary key ad some secondary key for integration of data.
Centralized control can also ensure that adequate checks are
incorporated in the DBMS to provide data integrity.
10. Confliction can be removed
In a database system, data can be written or arranged in a well-defined manner by DBA. So there is no
confliction between the databases. DBA select the best file structure and accessing strategy to get better
performance for the representation and use of the
data.
11. Providing Multiple User Interfaces
For example query languages, programming languages interfaces, forms, menu- driven interfaces, etc.
12. Representing Complex Relationships Among Data
It is used to represent Complex Relationships Among Data
13. Providing Backup and Recovery
The DBMS also provides back up and recovery features.
DISADVANTAGES OF DBMS
Database management system has many advantages, but due to some major problem
arise in using the DBMS, it has some disadvantages. These are explained as:
7
1.Cost
A significant disadvantage of DBMS is cost. In addition to the cost of purchasing or developing the
software, the organization *111 also purchase or upgrade the hardware
and so it becomes a costly system. Also additional cost occurs due to migration of data
from one environment of DBMS to another environment.
2. Problems associated with centralization
Centralization also means that data is accessible from a single source. As we know the centralized data can
be accessed by each user, so there is no security of data from unauthorized access and data can be damaged or
lost.
3. Complexity of backup and recovery
Backup and recovery are fairly complex in DBMS environment. As in a DBMS, if you take a backup of the
data then it may affect the multi-user database system which is in operation. Damage database can be
recovered from the backup floppy, but iterate duplicacy in loading to the concurrent multi-user database
system.
4. Confidentiality, Privacy and Security
When information is centralized and is made available to users from remote locations, the possibilities of
abuse are often more than in a conventional system. To reduce the chances of unauthorized users accessing
sensitive information, it is necessary to take technical, administrative and, possibly, legal measures. Most,
databases store valuable information that must be protected against deliberate trespass and destruction.
5. Data Quality
Since the database is accessible to users remotely, adequate controls are needed to control users updating
data and to control data quality. With increased number of users accessing data directly, there are enormous
opportunities for users to damage the data. Unless there are suitable controls, the data quality may be
compromised.
6. Data Integrity
Since a large number of users could be using .a database concurrently, technical safeguards are necessary
to ensure that the data remain correct during operation. The main threat to data integrity comes from several
different users attempting to update the same data at the same time. The database therefore needs to be
protected against inadvertent changes by the users.
7. Enterprise Vulnerability
Centralizing all data of an enterprise in one database may mean that the database becomes an indispensable
resource. The survival of the enterprise may depend on reliable information being available from its database.
The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorized modification
of the database.
8. The Cost of using a DBMS
Conventional data processing systems are typically designed to run a number of well-defined, preplanned
processes. Such systems are often “tuned” to run efficiently for the processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new applications may be difficult to
implement and/or expensive to run, they are usually very efficient for the applications they are designed for.
The database approach on the other hand provides a flexible alternative where new applications can be
developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the
additional cost of running applications that the conventional system was designed for. Using standardized
software is almost always less machine efficient than specialized software.
Q. 8. List five significant differences between a file-processing system and a DBMS.

8
Ans. Before differentiating between file and database systems, there be need to understand the DBMS and
its component. Let us consider an organization have a huge amount (collection) of data on its different
departments, its employees, its products, sale and purchase order etc. As we know such type of data is accessed
simultaneously by different and several employees. Now some users apply number of queries and want
answers quickly. If data is stored in the files, then it will create a problem of slow processing. As we try to deal
with this type of data management problem by storing the data in a collection of operating system files. Such
type of techniques creates number of problems or drawbacks, which are discussed as below:
1. As we have not 1000GB main memory (primary memory) to store the data, so we store the data in
some permanent storage device (secondary memory) like magnetic disk or magnetic tape etc. So file-oriented
system fails in primary memory cases and we apply data base management system to store the data files
permanently.
2. Suppose if we have such a large amount of primary memory on a 16 bit or 32 bit computer system, then
there be a problem occur in file based system to use the data by direct or random addressing. Also we cannot
call more then 2GB or 4Gb of data direct to the primary memory at a time. So there be need a database
program to identify the data.
3. Some programs are too lengthy and complex which cannot store large amount of data in the files related to
the operating systems. But a database system made it simple and fast.
4. We cannot change and access file-oriented data simultaneously, so we have requirement a type of system
which can be used to access the large amount of data concurrently.
5. Also we cannot recall or recover the file-oriented data, but centralized database management solve such type
of problem.
6. File oriented operating system provide only a password mechanism for security, but this is not successful in
case of number of users are accessing the same data by using the same login.
At end we can sat that a DBMS is a piece of software that is designed to make the processing faster and
easier.
Q 9 Describe major advantages of a database system over file system Or Discuss the DBMS and File
processing system Also give the limitations of file processing system

Ans. TRADITIONAL FILE PROCESSING
Data are organized, stored, and processed in independent files of data records. In the traditional file
processing approach, each business application was designed to use one or more specialized data files
containing only specific types of data records
TRADITIONAL FILE SYSTEM OR FILE ORIENTED APPROACH
The business computers of 1980 were used in processing of business records and produce information
using file oriented approach or file processing environment At that time that system was reliable and faster
than the manual system of record keeping and processing In this system the data is organized in the form of
different files. Since that system was the collection of files - so we can say it was a file-oriented system.
Following terms was commonly used in this approach or the features of File oriented system.
1. Master file
The file that is created only once i.e. at the starting of computerization or a file which rarely changes. For
example: In a bank master file the account no, name and balance are entered only once and less frequently
changes.
2. File activity ratio
9
The number of records processed one run divided by total number of records. For example: if we changes
100 records from a bank file containing 200 records then file activity ratio is 100/200 0.5. It should be noted
that this ratio of master file is less.
3. Transaction file
A file that is created repeatedly after regular interval of time. For example: the payroll file of employee is
updated at the end of every month.
4. File volatility ratio
It is the number of records updated in a transaction file divided by total number of records. The file
volatility ratio of transaction file is very high.
5. Work file
A temporary file that helps in sorting and merging of records from one file to other.
6. File organization
It means the arrangement of records in a particular order. There were three types of file organizations
1. Sequential
2. Direct
3. Indexed sequential
7. Data island
In this system each dept has its own files designed for local applications. Each department has its own data
processing staff, set of policies, working rules and report formats. It means programs were depending on the
file structure or format of file. If the structure of file changes, the program has also to be changed. These days
the file oriented approach is still used but has following limitations:
LIMITATIONS OF FILE ORIENTED APPROACH
• Duplicate data
Since all the files are independent of each other. So some of the fields or files are stored more than once.
Hence duplicacy is more in case of file approach but dbms has controlled duplicacy.
• Separated and isolated data
To make a decision, a user might need data from two separate files. First, analysts and programmers to
determine the specific data required from each file and the relationships between the data evaluated the files.
Then applications could be written in a third generation language to process and extract the needed data.
Imagine the work involved if data from several files was needed!
• Inconsistency
In this system, data is not consistent. If a data item is changed the all the files containing that data item
need to be changed and updated properly. If all the files are not updated properly there may be high risk of
inconsistency. DBMS have data consistency.
• Poor data integrity
A collection of data has integrity. A file is said to be have data integrity - it means a item is not be stored
in duplicate manner. It has been seen that file oriented system have poor data integrity control. Data integrity
has been achieved in DBMS.
• Every operation is programmable
The processing tasks like searching, editing, deletion etc should have separate programs. It means there
were no functions available for these operations. DBMS have ready-made commands for such operations.
• Data inflexibility
Program-data interdependency and data isolation limited the flexibility of file processing systems in
providing users with ad hoc information requests. Because designing applications was so programming-
intensive, MIS department staff usually restricted information requests Therefore, users often resorted to
manual methods to obtain needed information.
10
• Concurrency problem
It means using a same record at same time. This problem was common in file approach but can be
controlled in DBMS.
• Application programs are dependent on the file format:
In file processing system the physical formats of the files are entered in the programs. The change in file
means change in program and vice versa. No such problem in DBMS.
• Poor data security
All the files are stored in the flat form or text files. These files can be easily located and trapped because
file approach, has no data security.
• Difficult to represent the complex objects:
Some the objects may be of variable length records can be computerized using this approach. DBMS has
capability to handle fixed-length records as well as variable-length records.
• Can not support heavy databases:
The databases on the Internet can be handled by the files system - but DBMS like oracle is used for heavy
data base applications. On the other hand the DBMS have following advantages.
• Difficulty in representing data from the user’s view
To create useful applications for the user, often data from various files must be combined. In file
processing it was difficult to determine relationships between isolated data in order to meet user requirements.
PROBLEMS OF FILE PROCESSING
The file processing approach finally became too cumbersome, costly, and inflexible to supply the
information needed to manage modem businesses. It was replaced by the database management approach. File
processing systems had the following major problems:
• Data Redundancy
Independent data files included a lot of duplicated data; the same data was recorded
and stored in several files. This data redundancy caused problems when data had to
be updated, since separate file maintenance programs had to be developed and
coordinated to ensure that each file was properly updated. Unfortunately, a lot of
inconsistencies occurred among data stored in separate files.
• Lack of Data Integration
Having independent files made it difficult to provide end users with information for
ad hoc requests that required accessing data stored in several different files. Special
computer programs had to be written to retrieve data from each independent file. This
was so difficult, time-consuming, and costly for some organizations that it was
impossible to provide end users or management with such information.
• Data Dependence
In file processing systems, major components of the system - the organization of files,
their physical locations of storage hardware, and the application software used to
access those files — depended on one another in significant ways. Changes in the
format and structure of data and records in a file required that changes be made to all
of the programs that used that file. This program maintenance effort was a major
burden of file processing systems.
• Other Problems
It was easy for data elements to be defined differently by different end users and
applications. Integrity of the data was suspect because there was no control over their
use and maintenance by authorized end users.
Q.10. What are the various types of database uses?
11
Ans. Without user all o the above said components (data, hardware & software) are meaning less. User can
collect the data, operate and handle the hardware. Also operator feeds the data and arranges the data in order by
executing the software. Users are of mainly of four types. These are:
(a) Naïve user
Naïve user has no knowledge of database system and its any supporting software. These are used at the end
form. These are like a layman, which have little bit knowledge or computer system. These users are mainly
used for collecting the data on the notebooks or on the pre-deigned forms. An automated teller machine
(ATMs) user are in these categories. Naïve user can work on any simple GUI base menu driven system.
Internet using non-computer based person are in this form.
(b) End User or Data Entry Operators
Data entry operators are preliminary computer based users. The function of data entry operators are only to
operate the computer (start! stop the computer) and feed or type the collected information (data) in menu
driven application program and to execute it according to the analyst’ requirement. These user are also called
On line users. These user communicate the database directly via an on line terminal or indirectly via a user
interface. These users require certain amount of expertise in the computer programming language, but require
complete knowledge of computer operations.
(c) Application programmer
He is also called simple programmer. The working of application programmer is to develop a new project
i.e. program for a particular application or modify an existing program. Application programmer works
according to some instructions given by database administrator (DBA). Application programmer can handle all
the programming language like Fortran, Cobol, dbase etc.
(d) DBA (Data Base Administrator)
DBA is a major user. DBA either a single person or a group of persons. DBA is only the custodian of the
business firm or organization but not the owner of the organization. As bank manager is the DBA of a bank,
who takes care about the bank money and not use it. Only DBA can handle the information collected by end
user and give the instructions to the application programmer for developing a new program or modifying an
existing program. DBA is also called an overall controller of the organization. In computer department of a
firm either system analysts or an EDP (Electronic Data Processing) Manager works as DBA. In other words
DBA is the overall controller of complete hardware and software.
RESPONSIBILITIES OF DBA
As we know DBA is the overall commander of a computer system, so it has number of duties, but some of
his/her major responsibilities are as follows:
1. DBA can control the data, hardware, and software and gives the instructions to the application
programmer, end user and naive user.
2. DBA decides the information contents of the database. He decides the suitable database file structure
for arrangement of data. He/She uses the proper DDL techniques.
3. DBA compiles the whole data in a particular order and sequence.
4. DBA decides where data can be stored i.e. take decision about the storage structure.
5. DBA decides which access strategy and technique should be used for accessing the data.
6. DBA communicates with the user by appropriate meeting, DBA co-operates with
user.
1. DBA also define and, apply authorized checks and validation procedures.
2. DBA also takes backup of the data on a backup storage device so that if data can be lost then it can be
again recovered and compiled. DBA also recovers the damaged data.
3. DBA also changes the environment according to user or industry requirement and monitor the
performance.
12
4. DBA should be good decision-maker. The decision taken by DBA should be correct, accurate &
efficient.
5. DBA should have leadership quality.
6. DBA liaise with the user in the business to take confidence of the customer about availability of data.
Q11. Discuss the architecture of database management system.

Ans. DBMS ARCHITECTURE
There are many different framework have been suggested for the DBMS over the last several year. The
generalized architecture of a database system is called ANSI/SPARC (American National Standards
Institute/Standards Planning and Requirements Committee) model.
In 1972, a final report about database is submitted by ANSI (American National Standard Institute) and
SPARC (Standard Planning And Requirement Committee). According to this approach, three levels of a
database system was suggested and they are:
• External view (Individual user view)
• Conceptual View (Global or community user view)
• Internal level (physical or storage view).
For the system to be usable, it must retrieve data efficiently. This concern has led to the design of complex
data structures for the representation of data in the database. Since many database systems users are not
computer trained, developers hide the complexity from users through several levels of abstraction, to simplify
users’ interactions with the system.
These three views or levels of the architecture are as shown in the diagram as follows:

OBJECTIVES OF THREE LEVEL ARCHITECTURE
The database views were suggested because of following reasons or objectives of levels of a database:
1. Make the changes easy in database when some changes needed by environment.
2. The external view or user views do not depend upon any change made ii other view. For example
changes in hardware, operating system or internal view should not change the external view.
3. The users of database should not worry about the physical implementation and internal working of
database system.
4. The data should reside at same place and all the users can access it as per their requirements.
5. DBA can change the internal structure without effecting the user’s view.
6. The database should be simple and changes can be easily made.
7. It is independent of all hardware and software.
13
All the three levels are shown below
External/View level
The highest level of abstraction where only those parts of the entire database are included which are of
concern to a user. Despite the use of simpler structures at the logical level, some complexity remains, because
of the large size of the database. Many users of the database system will not be concerned with all this
information. Instead, such users need to access only a part of the database. So that their interaction with the
system is simplified, the view level of abstraction is defined. The system may provide many views for the same
database.
Databases change over time as information is inserted and deleted. The collection of information stored in
the database at a particular moment is called an instance of the database. The overall design of the database is
called the database schema. Schemas are changed infrequently, if at all.
Database systems have several schemas, partitioned according to the levels of abstraction that we
discussed. At the lowest level is the physical schema; at the intermediate level is the logical schema and at the
highest level is a subschema.
The features of this view are
• The external or user view is at the highest level of database architecture.
• Here only one portion of database will be given to user.
• One portion may have many views.
• Many users and program can use the interested part of data base.
• By creating separate view of database, we can maintain security.
• Only limited access (read only, write only etc) can be provided in this view.
For example: The head of account department is interested only in accounts but in library information, the
library department is only interested in books, staff and students etc. But all such data like student, books,
accounts, staff etc is present at one place and every department can use it as per need.
Conceptual/Logical level
Database administrators, who must decide what information is to be kept in the database, use this level of
abstraction. One conceptual view represents the entire database. There is only one conceptual view per
database.
The description of data at this level is in a format independent of its physical representation. It also
includes features that specify the checks to retain data consistence and integrity.
The features are:
• The conceptual or logical view describes the structure of many users.
• Only DBA can be defined it.
• It is the global view seen by many users.
• It is represented at middle level out of three level architecture.
• It is defined by defining the name, types, length of each data item. The create table
commands of Oracle creates this view.
• It is independent of all hardware and software.
14
Internal/Physical level
The lowest level of abstraction describes how the data are stored in the database, and what relationships
exist among those data. The entire database is thus described in terms of a small number of relatively simple
structures, although implementation of the simple structures at the logical level may involve complex physical-
level structures, the user of the logical level does not need to be aware of this complexity.
The features are :
• It describes the actual or physical storage of data.
• It stores the data on hardware so that can be stored in optimal time and accessed
in optimal time.
• It is the third level in three level architecture.
• It stores the concepts like:
• B-tree and Hashing techniques for storage of data.
• Primary keys, secondary keys, pointers, sequences for data search.
• Data compression techniques.
• It is represented as
FILE EMP [
INDEX ON EMPNO
FIELD = {
(EMPNO: BYTE (4),
ENAME BYTE(25))]
Mapping between views
• The conceptual/internal mapping:
o defines conceptual and internal view correspondence
• specifies mapping from conceptual records to their stored counterparts
o An external/conceptual mapping:
• defines a particular external and conceptual view correspondence
• A change to the storage structure definition means that the conceptual/internal
mapping must be changed accordingly, so that the conceptual schema may remain
invariant, achieving physical data independence.
• A change to the conceptual definition means that the conceptual/external mapping
must be changed accordingly, so that the external schema may remain invariant,
achieving logical data independence.
Q. 12. Write a note on Database Language And Interfaces.

Ans. Some main types of languages and facilities are provided by DBMS.
1. Programming Language
2. Data Manipulation Language
3. Data Definition Language
4. Schema Description Language
5. Sub-Schema Description Language
6. SQL (Structured Query Language)
1. Programming Language
All the programming language like Cobol, Fortran, C, C++, Pascal etc. has syntax and semantics. These all
have structured and logical structure, so these all commonly used to solve general and scientific problems. All
the business-oriented problems can be solved by the three GL and Fourth Gt.
2. DML
15
Some language that gives instructions to the programming language and other languages is called data
manipulation language (DML). This language creates interface (linkage) between user and application
program. This is extension of the program of the language used to manipulate data in the database. DML
involves’ retrieval of data from the database, insertion of new data into the database and deletion or
modification of the existing data. Some data manipulation operations are also called QUERY’ or• QUERY
OPERATIONS. A Query is a statement in DML that request the retrieval of data from the database i.e. to
search the data according to the user requirement. The subset of the DML used to operate the query is known
as Query Language. DML provides commands to select & retrieve data from the database. Commands used in
the DML are to insert, to update & to delete the records. The commands have different syntax for different
programming language. For example, Fortran, Cobol, C etc. provide such type of facility with the help of
database management system. The data manipulation function provided by DBMS can be invoked in a
application program directly by procedural calls or by processors statement. This procedure can be done by the
compiler. The DML can become
procedural language according to the user requirement. If the DML is non-procedural than user will indicate
only what is to be retrieved. In both the cases the DBMS optimize the exact answer by using DML.
3. DDL
Database management system provides a facility known as Data Definition Language or data description
language (DDL). DDL can be used to define conceptual schema (Global) and also give some details about how
to implement this schema in the physical devices used to store the data. The definition includes all the entity
sets and their associated attributes as well as the relationship among the entities set. The definitions also have
some constraints which are used in DML. DDL also have some meta-data (it is data about the data in
database). Meta-data have data dictionary, directory, system catalog to describe data about data. The dictionary
contains the information about the data stored in the database and it is consulted by DBMS before any data
manipulation operations. The DBMS maintain the information on the file structure and also used some access
method to access the data efficiently. DDL is used for the help of DML.
We can say that there is another language - Data Sub Language (DSL) which is the
combination of both DML and DDL.
DSL = DML + DDL
4. Schema Description Language (SDL) or Schema
It is necessary to describe the organization of the data in a formal manner. The logical and physical
database descriptions are used by DBMS software. The complete and overall description of data is referred to
as schema. The schema and subschema words are brought into DBMS by CODASYL (Conference on data
system language committee) and also by the CODASYL’s database task group. Schema is also referred to as
conceptual model or global view (community view) of data. Suppose a complete description of collected data
having all classes and student data, all employees (teaching & non-teaching) data and other concept of data
related to the college is called Schema of the college. We can say that we relate whole college data logically,
which is called schema.
5 Sub Schema Description language
The term schema is used to mean an overall chart of the data items, types and record type stored in a
database. The term sub-schema refers to an application programmer’s view of data he uses. Sub-schema is the
part of schema. Many different sub-schemas can be derived from one schema. An application programmer
does not use whole data i.e. full schema, e.g. As in an organization, purchase-order for the maintenance
department is the sub-schema of the whole schema description of the purchase department in the hole industry.
Two or more than two application- programmers use the different sub-schemas. One person named A uses the
sub-schema purchase-order whereas programmer B uses the sub-schema supplier. Their operations and views
16
are different according to their own sub-schema but both combined these two sub-schemas on the basis of a
common key.
6. Structured Query Language (SQL):
SQL organized with the system R. System R means it is relational language. SQL is also called Structure
Query Language. This language was developed in 1974 at IBM’s San Jose Research Center. The purpose of
this language is to provide such non-procedural commands which are used for validation of the data and for
searching the data. By using this language we can do any query about the data. SQL is sometimes named by
SQUARE language. This language was helpful for both DDL and DML for the system R. Some SQL are also
called Relational languages and used in a commercial RDBMS. Some commonly used SQL are ORACLE,
INGRES, SYBASE etc. SQL resembles relational algebra and relational calculus in a relational system
approach.
DBMS INTERFACES
Types of interfaces provided by the DBMS include:
Menu-Based interfaces for Web Clients or Browsing
• Present users with list of options (menus)
• Lead user through formulation of request
• Query is composed of selection options from menu displayed by system.
Forms-Based Interfaces
• Displays a form to each user
• User can fill out form to insert new data or fill out only certain entries.
• Designed and programmed for naïve users as interfaces to canned transactions.
Graphical User Interfaces
• Displays a schema to the user in diagram form. The user can specify a query by manipulating the
diagram. GUIs use both forms and menus.
Natural Language Interfaces
• Accept requests in written English or other languages and attempt to understand them.
• Interface has its own schema, and a dictionary of important words. Uses the schema and dictionary to
interpret a natural language request.
Interfaces for Parametric Users
• Parametric users have small set of operations they perform.
• Analysts and programmers design and implement a special interface for each class of naïve users.
• Often a small set of commands included to minimize the number of keystrokes required. (I.e. function
keys)
Interfaces for the DBA
• Systems contain privileged commands only for DBA staff.
• Include commands for creating accounts, setting parameters, authorizing accounts,
changing the schema, reorganizing the storage structures etc.
Q.13. Describe the Classification of Database Management Systems.

Ans. Categories of DBMS
DBMS (Database Management System)
It is software to manage many databases. A DBMS is a software component or logical tool to handle the
databases. All the queries from user about the data stored in the database will be handled by DBMS. There are
many DBMSs available in market like dBase, FoxBASE, FoxPro, Oracle, Unify, Access etc.
RDBMS (Relational Data Base Management System)
17
Each database system uses a approach to store and maintain the data. For this purpose three data models
were developed like Hierarchical model, Network Model and Relational Model. In the hierarchical model the
data were arranged in the form of trees, in network model the data was arranged in the form of pointers and
network and in relational model the data was arranged in the form of tables. The data stored in the form tables
is easy to stored, maintain and understand. Many DBMS has been developed using approach of hierarchical
and network models. Any DBMS that uses the relational data model for data storage and modeling Is called
RDBMS. In RDBMS we can create relations among tables and can access the information from tables - while
tables store stored in separately file and may or may not have identical structures. The RDBMS is based upon
the rules given by Dr. Codd known as Dr. Codd’s Rules.
HDBMS (Heterogeneous DBMS)
In RDBMS we store the information related to the same kind of data like student data, teacher data,
employee data etc. In HDBMS we store the data in the database which is entirely different.
DDBMS (Distributed DBMS)
During 1950s & 1960s there was trend to use independent or decentralized system. There was a duplication
of hardware and facilities. In a centralized database system, the DBMS & data reside at a single place and all
the control & location is limited to a single location, but the PCs are distributed geographically. Distributed
system is parallel computing using multiple independent computers communicating over a network to
accomplish a common objective or task. The type of hardware, programming languages, operating systems and
other resources may vary drastically. It is similar to computer clustering with the main difference being a wide
geographic dispersion of the resources
For example an organization may have an office in a building and have many sub- buildings that are
connected using LAN. The current trend is towards distributed systems. This is a centralized system connected
to intelligent remote sites. Each remote site have own storage and processing capabilities - but in a centralized
or network there is a single storage.
OODBMS (Object Oriented DBMS)
Object-Oriented Database Management Systems (OODBMSs) have been developed to support new kinds
of applications for which semantic and content are represented more efficiently with the object model.
Therefore, the OODBMSs present the two main problems:
• Impedance mismatch: It is basically due to two reasons. Firstly, the no suitable abstractions of the
operating systems, so when a client object has to invoke a method that is offered by a server object, and both
objects are not into the same address space, it is necessary to use the mechanisms that are offered by the
operating system, and these mechanisms do not became proper to the object oriented paradigm since they are
oriented to communicate processes. In order to solve this problem intermediate software is included (e.g. COM
or CORBA).In the second place, an impedance mismatch is also caused every time that the object-oriented
applications need to use the operating system services.
• Interoperability problem between object models: Although different system elements use the object-
oriented paradigm, an interoperability problem can exist between them. So, an application implemented using
the C++ language, with the C++ object model, can easily interact with its objects, but when it wants to use
objects that have been created with another programming language or another object-oriented database an
interoperability problem appears.
The programming LANGUAGES like C, FORTRAN, PASCAL & FORTRAN use the POP (Procedure
Oriented Approach) to develop applications, but the current trend is towards OOP (Object Oriented
Programming). The languages like C++, Java, Oracle, C# (C Sharp). Visual Basic 6 use this approach. Many
databases have been developed that follows this approach (OI approach) like Oracle. So the DBMS which
follow OOP approach is called OODBMS.
18

Q. 14. Explain the difference between physical and logical data independence.

Ans. One of the biggest advantages of database is data independence. It means we can change the
conceptual schema at one level without affecting the data at other level. It means we can change the structure
of a database without affecting the data required by users and program. This feature was not available in file
oriented approach. There are two types of data independence and they are:
1. Physical data independence
2. Logical data independence
Data Independence The ability to modify schema definition in on level without affecting schema definition
in the next higher level is called data independence. There are two levels of data independence:
1. Physical data independence is the ability to modify the physical schema without
causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to
improve performance. It means we change the physical storage/level without affecting the conceptual or
external view of the data. The new changes are absorbed by mapping techniques.
2. Logical data independence in the ability to modify the logical schema without
causing application program to be rewritten. Modifications at the logical level are necessary whenever the
logical structure of the database is altered (for example, when money-market accounts are added to banking
system).
Logical Data independence means if we add some new columns or remove some columns from table then
the user view and programs should not changes. It is called the logical independence. For example: consider
two users A & B. Both are selecting the empno and ename. If user B add a new column salary in his view/table
then it will not effect the external view user; user A, but internal view of database has been changed for both
users A & B. Now user A can also print the salary.
User A’s External View

(View before adding a new column)

User B’s external view

(View after adding a new column salary)
It means if we change in view then program which use this view need not to be changed.
Logical data independence is more difficult to achieve than is physical data independence, since
application programs are heavily dependent on the logical structure of the data that they access.
Logical data independence means we change the physical storage/level without effecting the conceptual or
external view of the data. Mapping techniques absorbs the new changes.

Q. 15. What is physical data independence?

Ans. Physical data independence is the ability to modify the physical schema without causing application
programs to be rewritten. Modifications at the physical level are occasionally necessary to improve
19
performance. It means we change the physical storage/level without affecting the conceptual or external view
of the data. The new changes are absorbed by mapping techniques.
Q. 16. What do you mean by data redundancy?

Ans. Redundancy is unnecessary duplication of data. For example if accounts department and registration
department both keep student name, number and address.
Redundancy wastes space and duplicates effort in maintaining the data.
Redundancy also leads to inconsistency.
Inconsistent data is data which contradicts itself - e.g. two different addresses for a given student number.
Inconsistency cannot occur if data is represented by a single entry (i.e. if there is no redundancy).
Controlled redundancy
Some redundancy may be desirable (for efficiency). A DBMS should be aware of it, and take care of
propagating updates to all copies of a data item.
This is an objective, not yet currently supported.
Q. 17. What do you mean by database schema?

Ans. It is necessary to describe the organization, of the data in a formal manner. The logical and physical
database descriptions are used by DBMS software. The complete and overall description of data is referred to
as schema, The schema and subschema words are brought into DBMS by CODASYL (Conference on data
system language1 committee) and also by the CODASYL’s database task group. Schema is also referred to as
conceptual model or global view (community view) of data. Suppose a complete description of collected data
having all classes and student data4 all employees (teaching & non-teaching) data and other concept of data
related to the college is called Schema of the college. We can say that we relate whole college data logically,
which is called schema.
Q. 18. Explain the distinctions among the terms primary key, candidate key and superkey.
Or
What is the significance of foreign key? Or What are the various keys?

Ans. Keys: As there are number of keys can be defined, but some commonly and mainly used keys are
explained as below:
1. Primary Key
A key is a single attribute or combination of two or more, attributes of an entity that is used to identify one
or more instances of the set. The attribute Roll # uniquely identifies an instance of the entity set STUDENT. It
tells about student Amrita having address 101, Kashmir Avenue and phone no. 112746 and have paid fees
1500 on basis of Roll No. 15. The 15 is unique value and it gives unique identification of students So here Roll
No is unique attribute and such a unique entity identifies called Primary Key. Primary key cannot be duplicate.
From the definition of candidate key, it should be clear that each relation must have at least one candidate
key even if it is the combination of all the attributes in the relation since all tuples in a relation are distinct.
Some relations may have more t one candidate keys.
As discussed earlier, the primary key of a relation is an arbitrarily but permanently selected candidate key.
The primary key is important since it is the sole identifier for the tuples in a relation. Any tuple in a database
may be identified by specifying relation name, primary key and its value. Also for a tuple to exist in a relation,
20
it must be identifiable and therefore it must have a primary key. The relational data model therefore imposes
the following two integrity constraints:
(a) No component of a primary key value can be null;
(b) Attempts to change the value of a primary key must be carefully controlled.
The first constraint is necessary because if we want to store information about some entity, then we must
be able to identify it, otherwise difficulties are likely to arise. For example, if a relation
CLASS (STUNO, LECTURER, CNO)
has (STUNO, LECTURER) as the primary key then allowing tuples like
3123 NULL CP302
NULL SMITH CP302
is going to lead to ambiguity since the two tuples above may or may not be identical and the integrity of the
database may be compromised. Unfortunately most commercial database systems do not support the concept of
primary key and it would be possible to have a database state when integrity of the database is violated.
The second constraint above deals with changing of primary key values. Since the primary key is the tuple
identifier, changing it needs very careful controls. Codd has suggested three possible approaches:
Method 1
Only a select group of users be authorised to change primary key values.
Method 2
Updates on primary key values be banned. If it was necessary to change a primary key, the tuple would
first be deleted and then a new tuple with new primary key value but same other values would be inserted. Of
course, this does require that the old values of attributes be remembered and be reinserted in the database.
Method 3
A different command for updating primary keys be made available. Making a distinction in altering the
primary key and another attribute of a relation would remind users that care needs to be taken in updating
primary keys.
2. Secondary Key
The ke1 which is not giving the unique identification and have duplicate infonna6o is called secondary key,
e g in a STUDENT entity if Roll Number is the primary key, then Name of the student, address of the student,
Phone number of the student and the fees paid by the student all are secondary keys. A secondary key is an
attribute or combination of attributes that not be primary key and have duplicate data. In otherworlds secondary
key is used after the identification of the primary key. Also we can identify the data from the combination of
the secondary keys.
3. Super Key
If we add additional attributes to a primary key, the resulting combination would still uniquely identify an
instance of the entity set Such keys are called super keys A primary key is therefore a minimum super key For
example, if DOB (date of birth field or attribute) is the primary key, then by adding some additional
information about the day of the month key in the DOB field, this field or attribute becomes more powerful
and useful Such type of key is called super key Super key are less used in a small database file. Now these days
it has less importance, but due to its feature, this key gives the complete description of the database.
4. Candidate Key
There may be two or more attributes or combination of attributes that uniquely identify an instance of an
entity set These attributes or combination of attributes are called candidate keys. Candidate key also gives
unique identification. Candidate key comes with primary key. A candidate is a combination of two or more
attributes e.g. if Roll No. and student name are two different attributes then we combine these two attribute and
form a single attribute Roll No. & Name, then this combination is the candidate key and it is unique and gives
unique identification about a particular roll no. and about particular name.
21
5. Alternative Key
A candidate key which is not the primary key is called alternative key, e.g. if Roll No. and Name
combination is the candidate key, then if Roll No, is the primary key, other key in the candidate key is Name.
Name attribute work as the alternative key.
6 Foreign Key
Suppose there are some relations as: SP (S#, P#, QTY), relation S (S#, S Name, status, city) and relation P
(P#, PName, Color, Weight, City). We know entity SP is defined as the relationship of the relation S and the
relation P. These two relations has sand P# as the Primary Keys in relation S and P respectively, but in the
relation SP we can take either # as the primary key or P# as the primary key. Suppose if we take P# as the
primary key, then other primary key S# which is actually the primary key, but do not work as primary key in
the relation SF is called the Foreign Key. If S# is the primary key then P# is the Foreign Key. Similarly in the
relation ASSIGNMENT, attribute Emp #, Prod #, Job # are given and if S# and P# are the primary keys, then
the Job # key is the Foreign Keys.

Q. 19. What are the major functions of a database administrator?

Ans. RESPONSIBILITIES OF DBA
As we know DBA is the overall commander of a computer system, so it has number of duties, but some of
his/her major responsibilities are as follows:
1. DBA can control the data, hardware, and software and gives the instructions to the application
programmer, end user and naive user.
2. DBA decides the information contents of the database. He decides the suitable database file structure for
arrangement of data. He/She uses the proper DDL techniques.
3. DBA compiles the whole data in a particular order and sequence.
4. DBA decides where data can be stored i.e. take decision about the storage structure.
5. DBA decides which access strategy and technique should be used for accessing the data.
6. DBA communicates with the user by appropriate meeting. DBA co-operates with user.
7. DBA also define and apply authorized checks and validation procedures.
8. DBA also takes backup of the data on a backup storage device so that if data can then lost then it can be
again recovered and compiled. DBA also recovers the damaged data.
9. DBA also changes the environment according to user or industry requirement and monitor the
performance.
10. DBA should be good decision-maker. The decision taken by DBA should be correct, accurate &
efficient.
11. DBA should have leadership quality.
12. DBA liaise with the user in the business to take confidence of the customer about the availability of
data.

Q. 20. What do you mean by relationships? Explain different types of relationships.

Ans. Relationships: One table (relation) may be linked with another in what is known as
a relationship. Relationships may be built into the database structure to facilitate the operation of relational
joins at runtime.
22
1. A relationship is between two tables in what is known as a one-to-many or parent-child or master-
detail relationship where an occurrence on the ‘one’ or ‘parent’ or ‘master’ table may have any number
of associated occurrences on the ‘many’ or ‘child’ or ‘detail’ table. To achieve this, the child table must
contain fields which link back the primary key on the parent table. These fields on the child table are
known as a foreign key, and the parent table is referred to as the foreign table (from the viewpoint of the
child).
2. It is possible for a record on the parent table to exist without corresponding records on the child table,
but it should not be possible for an entry on the child table to exist without a corresponding entry on
the parent table.
3. A child record without a corresponding parent record is known as an orphan.
4. It is possible for a table to be related to itself. For this to be possible it needs a foreign key which points
back to the primary key. Note that these two keys cannot be comprised of exactly the same fields
otherwise the record could only ever point to itself.
5. A table may be the subject of any number of relationships, and it may be the
parent in some and the child in others.
1. Some database engines allow a parent table to be linked via a candidate key, but if this were changed it
could result in the link to the child table being broken.
2. Some database engines allow relationships to be managed by rules known as referential
integrity or foreign key restraints. These will prevent entries onchild tables from being created if
the foreign key does not exist on the parent table, or will deal with entries on child tables when the
entry on the parent table is updated or deleted.

Relational Joins
The join operator is used to combine data from two or more relations (tables) in order to satisfy a particular
query. Two relations may be joined when they share at least one common attribute. The join is implemented by
considering each row in an instance of each relation. A row in relation R1 is joined to a row in relation R2
when the value of the common attribute(s) is equal in the two relations. The join of two relations is often called
a binary join.
The join of two relations creates a new relation. The notation ‘R1 x R2’ indicates the join of relations R1
and R2. For example, consider the following:

Note that the instances of relation RI and R2 contain the same data values for attribute B. Data
normalisation is concerned with decomposing a relation (e.g. R(A,B,C,D,E) into smaller relations (e.g. R1 and
R2). The data values for attribute B in this context will be identical in R1 and R2. The instances of R1 and R2
are projections of
23
the instances of R(A,B,C,D,E) onto the attributes (A,B,C) and (B,D,E) respectively. A projection will not
eliminate data values duplicate rows are removed, but this will not remove a data value from any attribute.
The join of relations RI and R2 is possible because B is a common attribute. The result of the join is:

The row (2 4 5 7 4) was formed by joining the row (2 4 5) from relation R1 to the row (4 7 4) from relation
R2. The two rows were joined since each contained the same value for the common attribute B. The row (2 4
5) was not joined to the row (6 2 3) since the values of the common attribute (4 and 6) are not the same.
The relations joined in’ the preceding example shared exactly one common attribute. However, relations
may share multiple common attributes. All of these common attributes must be used in creating a join. For
example, the instances of relations R1 and R2 in the following example are joined using the common attributes
B and C:
Before the join:

After the join:

The row (6 1 4 9) was formed by joining the row (6 1 4) from relation R1 to the row
24
(1 4 9) from relation R2. The join was created since the common set of attributes (B and
C) contained identical values (1 and 4). The row (6 1 4) from R1 was not joined to the
row (1 2 1) from R2 since the common attributes did not share identical values - (1 4) in
R1 and (1 2) in R2.
The join operation provides a method for reconstructing a relation that was decomposed into two relations
during the normalisation process. The join of two rows, however, can create a new row that was not a member
of the original relation. Thus invalid information can be created during the join process.

Now suppose that a list of courses with their corresponding room numbers is required. Relations R1 and R4
contain the necessary information and can be joined using the attribute HOUR. The result of this join is:
This join creates the following invalid information (denoted by the coloured rows):
• Smith, Jones, and Brown take the same class at the same time from two different instructors in two
different rooms.
• Jenkins (the Maths teacher) teaches English.
25
• Goldman (the English teacher) teaches Maths.
• Both instructors teach different courses at the same time.
Another possibility for a join is R3 and R4 (joined on INSTRUCTOR). The result would be:

This join creates the following invalid information:

• Jenkins teaches Math I and Algebra simultaneously at both 8:00 and 9:00.
A correct sequence is to join R1 and R3 (using COURSE) and then join the resulting relation with R4
(using both INSTRUCTOR and HOUR). The result would be:
Extracting the COURSE and ROOM attributes (and eliminating the duplicate row produced for the English
course) would yield the desired result:

The correct result is obtained since the sequence (R1 x r3) x R4 satisfies the lossless (gainless?) join
property
A relational database is in 4th normal form when the lossless join property can be used to answer
unanticipated queries. However, the choice of joins must be evaluated carefully. Many different sequences of
joins will recreate an instance of a relation. Some sequences are more desirable since they result in the creation
of less invalid data during the join operation.
Suppose that a relation is decomposed using functional dependencies and multi- valued dependencies.
Then at least one sequence of joins on the resulting relations exists that recreates the original instance with no
invalid data created during any of the join operations.
For example, suppose that a list of grades by room number is desired. This question, which was probably
not anticipated during database design, can be answered without creating invalid data by either of the following
two join sequences:
26
The required information is contained with relations R2 and R4, but these relations
cannot be joined directly. In this case the solution requires joining all 4 relations.
The database may require a ‘lossless join’ relation, which is constructed to assure that any ad hoc inquiry’
can be answered with relational operators. This relation may contain attributes that are not logically related to
each other. This occurs because the relation must serve as a bridge between the other relations in the database.
For example, the lossless join relation will contain all attributes that appear only on the left side of a functional
dependency. Other attributes may also be required, however, in developing the lossless join relation.
Consider relational schema R (A, B, C, D), A B and C D. Relations and are in 4th
normal form. A third relation however, is required to satisfy the lossless join property. This relation
can be used to join attributes B and D. This is accomplished by joining relations R1 and R3 and then joining
the result to relation
R2. No invalid data is created during these joins. The relation is the lossless join relation for this
database design.
A relation is usually developed by combining attributes about a particular subject or entity. The lossless
join relation, however, is developed to represent a relationship among various relations. The lossless join
relation may be difficult to populate initially and difficult to maintain - a result of including attributes that are
not logically associated with each other.
The attributes within a lossless join relation often contain multi-valued dependencies. Consideration of 4th
normal form is important in this situation. The lossless join relation can sometimes be decomposed into smaller
relations by eliminating the multi-valued dependencies. These smaller relations are easier to populate and
maintain.
Q. 21. What is an ER-diagram? Construct an ER diagram for a hospital with a set of patients and a
set of doctors. Associate with each patient a log o1 the various tests and examinations conducted.
Or
Discuss in detail the ER diagram.
Or
What is one to many relationship? Give examples.
Or
Draw an ER diagram for a library management system, make suitable assumptions. Describe
various symbols used in ER. diagram.
Or
Construct an ER diagram for a university registrar’s office. The office maintains data about each
class, including the instructor, the enrollment and the time and place of the class meetings. For each
student class pair, a grade is recorded also design a relational database for the said I.R. diagram.

27
Ans. E-R model grew out of the exercise of using commercially available DBMS to model application
database. Earlier DBMS were based on hierarchical and network approach. E-R is a generalization of these
models. Although it has some means of describing the physical database model, it is basically useful in the
design of logical database model. This analysis is then used to organize data as a relation, normalizing relations
and finally obtaining a relational database model.
The entity-relationship model for data uses three features to describe data. These are:
1. Entities, which specify distinct real-world items in an application.
2. Relationships, which connect entities and represent meaningful dependencies
between them.
3. Attributes, which specify properties of entities and relationships.
We illustrate these terms with an example. A vendor supplying items to a company, for example, is an
entity. The item he supplies is another entity. A vendor supplying items are related in the sense that a vendor
supplies an item. The act of supplying• defines a relationship between a vendor and an item. An entity set is a
collection of similar entities. We can thus define a vendor set and an item set. Each member of an entity set is
described by some attributes. For example, a vendor may be described by the attributes:
(vendor code, vendor name, address)
An item may be described by the attributes:
(item code, item name)
Relationship also can be characterized by a number of attributes. We can think of the relationship as supply
between vendor and item entities: The relationship supply can be described by the attributes: (order no. date of
supply)

Relationship between Entity Sets
28
The relationship between entity sets may be many-to-many (M: N), one-to-many (1: M), many-to-one (M:
1) or one-to-one (1:1). The 1:1 relationship between entity sets E1 and E2 indicates that for each entity in
either set there is at most one entity in the second set that is associated with it. The 1: M relationship from
entity set E1 to E2 indicates that for an occurrence of the entity from the set E1, there could be zero, one or
more entities from the entity set E2 associated with it. Each entity in E2 is associated with at most one entity in
the entity set E1. In the M: N relationship between entity sets E1 and E2, there is no restriction to the number
of entities in one set associated with an entity in the other set. The database structure, employing the E-R
model is usually shown pictorially using entity-relationship (E-R) diagram.
To illustrate these different types of relationships consider the following entity sets: DEPARTMENT,
MANAGER, EMPLOYEE, and PROJECT
The relationship between a DEPARTMENT and a MANAGER is usually one-to- one; there is only one
manager per department and a manager manages only one department. This relationship between entities is
shown in Figure. Each entity is represented by a rectangle and the relationship between them is indicated by a
direct line. The relationship for MANAGER to DEPARTMENT and from DEPARTMENT to MANAGER is
both 1:1. Note that a one-to-one relationship between two entity sets does not imply that for an occurrence of
an entity from one set at any time there must be an occurrence of an entity in the other set. In the case of an
organization, there could be times when a department is without a manager or when an employee who is
classified as a manager may be without a department to manage. Figure shows some instances of one-to-one
relationships between the entities DEPARTMENT and MANAGER.
A one-to-many relationship exists from the entity MANAGER to the entity EMPLOYEE because there are
several employees reporting to the manager. As we just pointed out, there could be an occurrence of the entity
type MANAGER having zero occurrences of the entity type EMPLOYEE reporting to him or her. A reverse
relationship, from EMPLOYEE to MANAGER, would be many to one, since many employees may be
supervised by a single manager. However, given an instance of the entity set EMPLOYEE, there could be only
one instance of the entity set MANAGER to whom that employee reports (assuming that no employee reports
to more than one manager). The relationship between entities is illustrated in Figures shows some instances of
this relationship.
Figure: 1:M Relationship
Figure: Instances of 1: M Relationship
29
The relationship between the entity EMPLOYEE and the entity PROJECT can be derived as follows: Each
employee could be involved in a number of different projects, and a number of employees could be working on
a given project. This relationship between EMPLOYEE and PROJECT is many-to-many. It is illustrated in
Figures shows some instances of such a relationship.
Figure: M : N Relationship
Figure: Instances of M:N Relationship

In the entity-relationship (E-R) diagram, entities are represented by rectangles, relationships by a diamond-
shaped box and attributes by ellipses or ovals. The following
E-R diagram for vendor, item and their relationship is illustrated in Figure (a).
Figure (a): E-R diagram for vendor; item and their Relationship

Representation of Entity Sets in the form of Relations

The entity relationship diagrams are useful in representing the relationship among entities they show the
logical model of the database. E-R diagrams allow us to have an overview of the important entities for
developing an information system and other relationship. Having obtained E-R diagrams, the next step is to
replace each entity set and relationship set by a table or a relation. Each table has a name. The name used is the
entity name. Each table has a number of rows and columns. Each row contains a number of the entity set. Each
column corresponds to an attribute. Thus in the E-R diagram, the vendor entity is replaced by table below.
Table: Table For the Entity Vendor
The above table is also known as a relation. Vendor is the relation name. Each row of a relation is called a
tuple. The titles used for the columns of a relation are known as relation attributes. Each tuple in the above
30
example describes one vendor. Each element of a tuple gives specific property of that vendor. Each property is
identified by the title used for an Attribute column. In a relation the rows may be in any order. The columns
may also be depicted in any order. No two rows can be identical.
Since it is inconvenient to show the whole table corresponding to a relation, a more concise notation is
used to depict a relation. It consists of the relation name and its attributes. The identifier of the relation is
shown in bold face.
A specified value of a relation identifier uniquely identifies the row of a relation.
If a relationship is M: N, then the identifier of the relationship entity is a composite identifier, which
includes the identifiers of the entity sets, which are related. On the other hand, if the relationship is 1:N, then
the identifier of the relationship entity is the identifier of one of the entity sets in the relationship.. For example,
the relations and identifiers corresponding to the E-R diagram of Figure are as shown:

Figure: E-R Diagram for Teacher, Student and their relationship
Teacher (Teacher-id, name, department, address)
Teaches (Teacher-id, Student-id)
Student (Student-id, name, department, address)
One may ask why an entity set is being represented as a relation. The main reasons
are case of storing relations as flat files in a computer and, more importantly, the existence of a sound theory
on relations, which ensures good database design. The raw relations obtained as a first step in the above
examples are transformed into normal relations. The rules for transformations called normalization are based
on sound theoretical principles and ensure that the final normalized relations obtained reduce duplication of
data, ensure that no mistake occur when data are added or, deleted and simplify retrieval of required data.
Q. 22. Discuss relational approach of database management system? Explain with the help of
suitable relational operations to demonstrate insert, delete and update functions.
Or
What is relational model compare and contrast it with network and hierarchical model.

Ans. Database models are collection of conceptual tools for describing data
semantics and data constraints.
DBMS has number of ways to represent the data, But some important and commonly
used model are of four types, among which three are mainly used. These are:
I. Relational Model or Relational Approach
II. Hierarchical Model or Hierarchical Approach
III. Network Model or Network Approach
I. Relational Data Model
31
Relational Data Model has been developed from the research in deep and by testing and by trying through
many stages. This model has advantages that it is simple to implement and easy to understand. We can express
queries by using query language in this model. In this model relation is only constructed by setting the
association among the attributes of an entity as well the relationship among different entities. One of the main
reasons for introducing this model was to increase the productivity of the application programmers by
eliminating the need to change application programmer, when a change is mode to the database. In this user
need not know the exact physical structure. Data structure used in the data model represented by both entities
and relationship between them. We can explain relation view of data on relational approach on the basis of
following example.
Suppose there are three tables in which data is organized. These tables are Supplier tables or S table or S
relation, Part table or P table or P relation, Shipment table of SP table or SP relation. The S table further has
some fields or attributes. These are supplier number (S#), supplier name, status of the supplier and the city in
which the supplier resides. Similarly P table has field part number (P#), part name, part color, weight of the
part and location where the part is stored. Also SP table contains field supplier number (S#), part number (P#)
and the quantity which supplier can ship. Each supplier s unique supplier number S# and similarly each part
has unique part umber P#. These three tables are called relational table. S table is also called S-relation because
it gives the relationship between different attributes. These attributes are field name and in the form of column.
Rows of such table are called tuples. Pool of values in a particular w and attributes called domain. In other
words domain is a pool of values from which actual value appearing in a given column are drawn. For
example, in S table - S#, Sname, S-status are the attributes and s1, s2, s3 are domains. A relational table or
relationship can be defined as:
Definition: A relation represented by table having n column, defined on domain Dl, D2, .... Dn is a subset
of cartesian product DI x D2 x……x Dn.
Another definition is : It is collection of Dl, D2, D3,…….Dn then R is relation on these n sets if these n
sets are ordered in n tuples such that each value of attribute belong to Dl, D2,…….Dn. These three relations
are represented by diagram:
S table (Entity) or S Relation:

P table (Entity) or P Relation:

As in the .S table insertion, deletion and modifications can be done easily.
II. Hierarchical Model
It is a tree structure. It has one root and many branches, we call it parent child relationship. In this a single
file has relation with many files and similarly we can say that it is the arrangement of individual data with
group data. In an organization chart manager is the parent root and employees working under the manager are
their children The representation of this model is expressed by linking different tables. Such type of
32
representation is better for a linkage have many relationships with one. Some times it will create ambiguity in
designing and defining the association and relationship between
SP table (Entity) or SP Relation:

In hierarchical approach, insertion can be done if a child has a parent and insertion on the child side is easy.
Deletion and insertion is easy, but you can’t delete a parent: parent has one or more child. In the parent child
relationship updation in parent and child both are difficult.
III. Network Approach
It is a complex approach of DBMS. In this we link all the records by using a chain or pointer. It has many
to many relationships. Network approach is created when there are more than one relations in the database
system. Network approach starts from on point and after connecting similar type of data it returns back to the
same record.
Network approach is more symmetric than the hierarchical structure. In network model insertion at any
point is very complex. We can insert only by creating a new record having linkage with other record. Similarly
deletion is also complex if we delete any record than chain disconnect and whole structure vanish. Updation is
also complex because we cannot change name or any data record because it connected with each other.
Difference between Relational, Hierarchical and Network Approaches:
(A) Relational Approach: Relational Approach (RA) has relationship between different entities and
attribute in a particular entity. RA is in tabular form. RA
has one to one relationships. R-A has table in asymmetric form. Insertion, deletion,
updation in R table is very easy. Languages used in RA are SQL, Ingress, Oracle, Sybase. RA is simple in
nature. Relational approach creates relationship between different entities and different attributes in the same
entity. It is the best approach to represent the data than the other models.
(B) Hierarchical Approach: Hierarchical Approach (HA) creates a linkage between two or more entities.
HA has parent child relationship. HA has one to many relationships. HA relationship is in symmetric form by
defining parent and their child. Insertion, deletion, updation is little difficult than the RA. HA has IMS
language, which is theoretical. It is Complex in nature.
(C) Network Approach: Network Approach (NA) has chain among many entities. NA has chaining
technique or pointer technique. NA has many to many relationships. NA relationship is full or completely
symmetric form because it has one chain symmetry. Insertion, deletion, updation is very difficult. NA has
DBTG (Database Task Group) set hiving different classes & members. More complex than RA & HA.
Q. 26. Write a short note on Mapping.

Ans. Mappings
• The conceptual/internal mapping:
defines conceptual and internal view correspondence specifies mapping from conceptual records to their
stored counterparts
• An external/conceptual mapping:
defines a particular external and conceptual view correspondence
33
• A change to the storage structure definition means that the conceptual/internal mapping must be changed
accordingly, so that the conceptual schema may remain invariant, achieving physical data independence.
• A change to the conceptual definition means that the conceptual/external mapping must be changed
accordingly, so that the external schema may remain invariant,
achieving logical data independence.
Q. 27. Distinguish between RDBMS and DBMS.

Ans.

Q. 1. What is relational algebra?

Ans. Relational Algebra and Relational Calculus are two approaches to specifying manipulations on
relational databases. The distinction between them is somewhat analogous to that between procedural and
declarative programming.
Algebra is equivalent to Relational Calculus, in that every expression in one has an equivalent expression
in the other. Thus relational completeness of a database language can also be established by showing that it can
define any relation expressible in Relational Algebra.
Relational Algebra comprises a set of basic operations. An operation is the application of an operator to
one or more source (or input) relations to produce a new relation as a result. More abstractly, we can think of
such an operation as a function that maps arguments from specified domains to a result in a specified range. In
this case, the domain and range happen to be the same, i.e. relations.
Q. 2. Define relational algebra. Explain the various traditional set operations and relational
operations of it. Or Discuss the basic operations that can performed using relational algebra and SQL.

Ans. Relational Algebra comprises a set of basic operations. An operation is the application of an operator
to one or more source (or input) relations to produce a new relation as a result. This is illustrated in Figure 8.1
below. More abstractly, we can think of such an operation as a function that maps arguments from specified
domains to a result in a specified range. In this case, the domain and range happen to be the same, i.e. relations.
Relational Algebra is a procedural language. It specifies the operations to be reformed on. existing relations
in derived result relations. Therefore, it defines the complete schema for each of the result relations. The
relational algebraic operations can be divided into basic set-oriented operations and relational-oriented
operations. The former are the traditional set operations, the latter, those for performing joins. selection,
projection, and division.
Relational Algebra is a collection of operations to manipulate relations. Each operation takes one or more
relations as its operands and produce another relation as its results. Some mainly used operations are join,
34
selection and projection. Relational algebra is a procedural language. It specifies the operations to be
performed on existing relations to derive relations. The relational algebra operations can be divided into basic
set oriented operations and relational oriented operations. The former are the traditional set operations, the
latter are joins, selections, projection and division.
Basic Operations
Basic operations are the traditional set operations: union, difference, intersection and cartesian product.
Three of these four basic operations - union, intersection, and difference—require that operand relations be
union compatible. Two relations are union compatible if they have the same parity and one-to-one
correspondence of the attributes with the corresponding attributes defined over the same domain. The cartesian
product can be defined on any two relations. Two relations P and Q are said to be union compatible if both P
and Q are of the same degree n and the domain of the corresponding
n attributes are identical, i.e. if P= P [P1,……. Pn] and Q = [Q1, ....... Qn] then
Dom (Pi) = Dom (Qi) for i = (1,2,n)
Where Dom (Pi) represents the domain of the attribute Pi.
Some basic operations used in Relational Algebra are:
Traditional Set Operations: Further traditional set operations are subdivided as:
(a) UNION
(b) INTERSECTION
(c) DIFFERENCE
(d) CARTESIAN PRODUCT
Relational Set Operators: Similarly further Relational Set Operations subdivided as:
(a) PROJECTION
(b) SELECTION
(c) JOIN
(d) DIVISION
Traditional Set Operations
(i) UNION (U): The union of two relations A and B is done by UNION command as:
A UNION B
It is the set of all types belonging to either A or B or both. Let us consider set A and B as:
Table A:
Table B:
If P and Q are two sets, then R is the resultant set by union operations. The R is resented by:

For example, let us consider A be the set of suppliers tuples for suppliers in London
35
and B is the set of supplier who supply part P1. Then A UNION B is the set of supplier
samples for suppliers who are either located in London city or supply part P1 (or both).
It is denoted by the symbol U (union). We can combine it as:
A U B (in mathematical form.)

(ii) Intersection The intersection operation selects the common tuples from the two relations. It is
denoted by the symbol . The intersection of two relations and B is defined as:

or A INTERSECT B
For example, if A and B are two sets, then intersection between these two are in R1.
(iii) Difference (—): The difference operation removes common tuples from t first relation. The difference
between two relations A and B be defined as:
A MINUS B
It is the set of all tuples belonging to set A but not belonging to B. It is denoted by (-). We can represent it
as A - B. For example, from the above said two A and B sets, the difference between A and B be represented
as:
(iv) Cartesian Product: It is denoted by ‘X’ or ‘x’ (Cross). The cartesian product of two relations A and B is
defined as:
A TIMES B
Or A x B
The extended cartesian or simply the cartesian product of two relations is the concatenation of tuples
belonging to the two relations. A new resultant relation schema is created consisting of all possible
combinations of the tuples is represented as:
R = P x Q
For example, let us consider A be the set of all supplier number and B is set of all part number. Then A
TIMES B is the set of all possible supplier number / part number pairs as:
36
Relational Set Operations:
The basic set operations, which provide a very limited data manipulating facility have been supplemented
by the definition of the following operations:
(i) Projection (x): The projection of a relation is defined as the projection of all its tuples over some set of
attributes i.e. it yields a vertical subset of a relation. The projection operation is used to either reduce the
number of attributes in the resultant or the reorder attributes. For example, if P is the table, then we can project
on the field name as and get resultant projected table:

(ii) Selection : Selection is the selection of some tuples based on some condition. It is horizontal subset
of relation. It is denoted by s. It reduces the number of tuples. from a relation, e.g. if P is the relation then R is
the resultant table after selection on P. Condition is to select all tuples having roll no. < 105.
R = s P (RN < 105)
37

(iii) Join The join operator allows the combining of two relations to form a single new relations.
These are of three types:
(i) Theta Join
(ii) Natural Join
(iii) Equi Join
Theta Join is the joining of two tables on the basis of a condition. Natural Join is the joining of two tables
without any condition and equality. Equi Join is the joining of two tables of both having common equal key
field. For example, if S and P are two tables and these are joined on CITY field as S.CITY and P.CITY.
(iv) Division (+): The division operator divides a dividend relation A of degree m + n by a divider relation
B of degree n. It will produce a result relation of degree m. Suppose A is relational table of supplier having
supplier number and B is the relational tables of different types of parts, then A DIVIDE BY B gives the
resultant table R.

Q. 3. What are single-valued and multivalued attributes?

Ans. Multi-Valued Dependencies and Fourth Normal Form
It was proposed as a sample form of 3NF but it was found to be stricter than 3NF because every relation in
BCNF is also in 3NF; however a relation in 3NF is not necessarily in BCNF.
Definition : A normalized relation scheme R<S, F> is in BCNF if for every nontrivial FD in F of the form
X®A where X I S and AIS, X is a super key of R.
BCNF is a special case in 3NF.
Features:
1. Key attributes (candidate keys) are composite (there is no single key which identify record).
3. More than one candidate keys are t
4. here.
3. In each candidate key at least one attribute is overlapping.
Any table if follow above mentioned three features of BCNF, then we will say that this table is in BCNF.
38
5. Course ID is such an attribute which is overlapping. Hence relation shown is in 3NF and also in BCNF.
A relation Schema R(S, F) is in BCNF (S=set of Attributes, F=All of functional dependency), if a set of
attributes X which is subset of S and an attribute Y which belongs to Ds.

One of the following two conditions hold.
(i) Either Y belongs to DX (Y, X) is a Trivial Attribute.
(ii) Or X is a Super key.
Whereas,
Trivial dependency: If the right hand side is a subset of the left hand side is known as trivial dependency.
Super Key: Adding primary key with any attribute is known as super key.

A relation is in BCNF if every determinant is a candidate key.
— It is in INF by definition.
— It is in 2NF since any non key attributes are dependent on the entire key.
— It is in 3NF because it has no transitive dependencies.
1. It is not in BCNF because it has a determinant FName, that is not a candidate
key.
39

STU-ADV Key: (SID,FName) ADV—SUBJ Key: (FName) Relations in BCNF
Now we can say that a relation is in BCNF if and only if every nontrivial left- irreducible FD has a
candidate key as its determinant. Or less formally, A relation is in BCNF if and only if the only determinant are
candidate keys.

Q. 4. Define the term data manipulation language.

Ans. DML(Data Manipulation Language): Some language that gives instructions to the programming
language and other languages is called data manipulation language (DML). DDL (Data Definition
Language): Database management systems provide a facility known as Data Definition Language or data
description language (DDL). DDL can be used to define conceptual schema (Global) and also give some
details about how to implement this schema in the physical devices used to store the data.

Q. 5. What is RDBMS?

Ans. Relational Data Base Management Systems (RDBMS) are database management systems that
maintain data records and indices in tables. Relationships may be created and maintained across and among the
data and tables. In recent years, database management systems (DBMS) have established themselves as the
primary means of data storage for information systems ranging from large commercial transaction processing
applications to PC-based desktop applications. At the heart of most of today’s information systems is a
relational database management system (RDBMS). RDBMSs have been the workhouse for data management
operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and
distribution functions to enterprise-wide data processing and information management systems. Compared to
the file systems, relational database management systems provide organizations with the capability to easily
integrate and leverage the massive amounts of operational data into meaningful information systems. The
evolution of high-powered database engines such as Oracle7 has fostered the development of advanced
“enabling” technologies including client/server, data warehousing, and online analytical processing, all of
which comprise the core of today’s state-of-the-art information management systems.
Relational Database Management System is a software package which manages a relational database,
optimized for rapid and flexible retrieval of data; also called a database engine.
In other words Relational Database Management System is a computer program that lets you store, index,
and retrieve tables of data. The simplest way to look at an RDBMS is as a spreadsheet that multiple users can
update. The most important thing that an RDBMS does is provide transaction.
40
Relational Database Management System is used to store, process and manage data arranged in relational
tables. Often used for transaction processing and data warehouses. RDBMS has ability to access data organized
in tabular files that can be related to each other by a common field (item). An RDBMS has the capability to
recombine the data items from different files, providing powerful tools for data usage. Relational databases are
powerful because they require few assumptions about how data is related or how it will be extracted from the
database. As a result, the same database can be viewed in many different ways. Almost all full-scale database
systems are RDBMS’s.
A database management system (like Oracle) in which the database is organized and accessed according to
the relationships between data items. In a relational database, relationships between data items are expressed
by means of tables. Interdependencies among these tables are expressed by data values rather than by pointers.
This allows a high degree of data independence. Some of the best-known RDBMS’s include Oracle,
Informix, Sybase, PostgreSQL and Microsoft Access.
Characteristics of A Relational Database
• Relational databases consist of one or more tables;, these can be ‘joined’ by the
database software in queries.
• Each table consists of’ rows and fields.
• Each table is about one aspect (or subject) of the database. Thus contexts and
finds are different subjects and are in different tables.
• Each row corresponds to one instance of the subject of the table. Thus each row
is about one context.
• Each row must be unique. This is a logical result of the row being about one
instance. If you have duplicate rows, the results of searching are unpredictable.
• Each field corresponds to a variable and is named to indicate its role. For example
finds have a name and a size/weight.
• Each cell (where the fields and rows intersect) contains only one value. This is
important because otherwise it is not possible properly to search. Using a relational
database, if you find a need for two values per cell - the design has to be altered.
• If fields in different tables have the same range of values and are thus about the
same object, there is an association 6etween the fields and thus the tables - they
are called ‘keys’. The rows corresponding to matching values can be retrieved
from different tables.

Q. 6. What do you mean by Relational Constraints?

Ans. The integrity of the data in a relational database must be maintained as multiple users’ access and
change the data. Whenever data is shared, there is a need to ensure the accuracy of the values within database
tables. The term data integrity has the following meanings:
1. The condition in which data is identically maintained during any operation, such as
transfer, storage, and retrieval.
2. The preservation of data for their intended use.
3. Relative to specified operations, the a priori expectation of data quality.
Another aspect of data integrity is the assurance that data can only be accessed and altered by those
authorized to do so.
Data integrity means, in part, that you can correctly and consistently navigate and manipulate the tables in
the database. There are two basic rules to ensure data integrity; entity integrity and referential integrity. The
41
entity integrity rule states that the value of the primary key can never be a null value (a null value is one that
has no value and is not the same as a blank). Because a primary key is used to identify a unique row in a
relational table, its value must always be specified and should never be unknown. The integrity rule requires
that insert, update, and delete operations maintain the uniqueness and existence of all primary keys. The
referential integrity rule states that if a relational table has a foreign key, then every value of the foreign key
must either be null or match the values in the relational table in which that foreign key is a primary key.
Types of Data Integrity
1. Null Rule : A null rule is a rule defined on a single column that allows or disallows inserts or updates of
rows containing a null (the absence of a value) in that column.
2. Unique Column Values: A unique value rule defined on a column (or set of columns) allows the insert or
update of a row only if it contains a unique value in that column (or set of columns).
3. Primary Key Values: A primary key value rule defined on a key (a column or set of columns) specifies
that each row in the table can be uniquely identified by the values in the key.
4. Referential Integrity Rules: A referential integrity rule is a rule defined on a key (a column or set of
columns) in one table that guarantees that the values in that key match the values in a key in a related table (the
referenced value).
Referential integrity also includes the rules that dictate what types of data manipulation are allowed on
referenced values and how these actions affect dependent values.
Rules for Referential Integrity
The rules associated with referential integrity are:
• Restrict: Disallows the update or deletion of referenced data.
• Set to Null: When referenced data is updated or deleted, all associated
dependent data is set to NULL.
• Set to Default: When referenced data is updated or deleted, all associated
dependent data is set to a default value.
. • Cascade: When referenced data is updated, all associated dependent data is
correspondingly updated. When a referenced row is deleted, all associated
dependent rows are deleted.
• No Action: Disallows the update or deletion of referenced data. This differs from RESTRICT in that it is
checked t the end of the statement, or at the end of
the transaction if the constraint is deferred. (Oracle uses No Action as its default action)
Complex Integrity Checking: Complex integrity checking is a user-defined rule
for a column (or set of columns) that allows or disallows inserts, updates, or deletes of
a row based on the value it contains for the column (or set of columns).
Integrity Constraints Description
An integrity constraint is a declarative method of defining a rule for a column of a table. Oracle supports
the following integrity constraints:
• NOT NULL constraints for the rules associated with nulls in a column
• UNIQUE key constraints for the rule associated with unique column values
• PRIMARY KEY constraints for the rule associated with primary identification values
• FOREIGN KEY constraints for the rules associated with referential integrity.
Oracle supports the use of FOREIGN KEY integrity constraints to define the
referential integrity actions, including:
o Update and delete No Action
o Delete CASCADE
o Delete SET NULL
42
• CHECK constraints for complex integrity rules
You cannot enforce referential integrity using declarative integrity constraints if child and parent tables are
on different nodes of a distributed database. However, you can enforce referential integrity in a distributed
database using database triggers (see next section).
Advantages of Integrity Constraints
This section describes some of the advantages that integrity constraints have over other alternatives, which
include:
• Enforcing business rules in the code of a database application
• Using stored procedures to completely control access to data
• Enforcing business rules with triggered stored database procedures
Types of Integrity Constraints
You can use the following integrity constraints to impose restrictions on the input of column values:
• NOT. NULL Integrity Constraints
• UNIQUE Key Integrity Constraints
• PRIMARY KEY Integrity Constraints
• Referential Integrity Constraints
• CHECK Integrity Constraints

1. NOT NULL Integrity Constraints.
By default, all columns in a table allow nulls. Null means the absence of a value. A NOT NULL constraint
requires a column of a table contain no null values. For example, you can define a NOT NULL constraint to
require that a value be input in the last name
column for every row of the employees table.
2. UNIQUE Key Integrity Constraints
A UNIQUE key integrity constraint requires that every value in a column or set of columns (key) be unique
— that is, no two rows of a table have duplicate values in a specified column or set of columns.
Unique Keys : The columns included in the definition of the UNIQUE key constraint
are called the unique key. Unique key is often incorrectly used as a synonym for the terms UNIQUE key
constraint or UNIQUE index. However, note that key refers only to the column or set of columns used in the
definition of the integrity constraint. If the UNIQUE key consists of more than one column, then that group of
columns is said to be a composite unique key.
This UNIQUE key constraint lets you enter an area, code and telephone number any number of times, but
the combination of a given area code and given telephone number cannot be duplicated in the table. This
eliminates unintentional duplication of a telephone number.
UNIQUE Key Constraints and Indexes : Oracle enforces unique integrity constraints
with indexes. For example Oracle enforces the UNIQUE key constraint by implicitly
creating a unique index on the composite unique key. Therefore, composite UNIQUE
key constraints have the same limitations imposed on composite indexes: up to 32
columns can constitute a composite unique key.
Combine UNIQUE Key ‘and NOT NULL Integrity Constraints: In UNIQUE key constraints allow the
input of nulls unless you also define NOT NULL constraints for the same columns. In fact, any number of
rows can include nulls for columns without NOT NULL constraints because nulls are not considered equal to
anything. A null in a column (or in all columns of a composite UNIQUE key) always satisfies a UNIQUE key
constraint.
43
Columns with both unique keys and NOT NULL integrity constraints are common. This combination
forces the user to enter values in the unique key and also eliminates the possibility that any new row’s data will
ever conflict with an existing row’s data.
3. PRIMARY KEY Integrity Constraints
Each table in the database can have at most one PRIMARY KEY constraint. The
values in the group of one or more columns subject to this constraint constitute the
unique identifier of the row. In effect, each row is named by its primary key values. The
Oracle implementation of the PRIMARY KEY integrity constraint guarantees that both
of the following are true:
• No two rows of a table have duplicate values in the specified column or set of columns.
• The primary key columns do not allow nulls. That is, a value must exist for the
primary key columns in each row.
Primary Keys : The columns included in the definition of a table’s PRIMARY KEY
integrity constraint are called the primary key. Although it is not required, every table
should have a primary key so that:
• Each row in the table can be uniquely identified
• No duplicate rows exist in the table
PRIMARY KEY Constraints and Indexes: Oracle enforces all PRIMARY KEY
constraints using indexes. In the primary key constraint created for the deptno column
is enforced by the implicit creation of:
• A unique index on that column
. • A NOT NULL constraint for that column
Composite primary key constraints are limited to 32 columns, which is the same limitation imposed on
composite indexes. The name of the index is the same as the
name of the constraint. Also, you can specify the storage options for the index by
including the ENABLE clause in the CREATE TABLE or ALTER TABLE statement
used to create the constraint. If a usable index exists when a primary key constraint is
created, then the primary key constraint uses that index rather than implicitly creating a new one.

4. Referential Integrity Constraints
Different tables in a relational database can be related by common columns, and
the rules that govern the relationship of the columns must be maintained. Referential
integrity rules guarantee that these relationships are preserved.
The following terms are associated with referential integrity constraints.
44
A referential integrity constraint requires that for each row of a table, the value in the foreign key matches a
value in a parent key.
Self-Referential Integrity Constraints: Another type of referential integrity constraint is called a self-
referential integrity constraint. This type o- foreign key references a parent key in the same table.
In the referential integrity constraint ensures that every value in the mgr column of the emp table
corresponds to a value that currently exists in the empno column of the same table, but not necessarily in the
same row, because every manager must also be an employee. This integrity constraint eliminates the possibility
of erroneous
employee numbers in the mgr column.
Nulls and Foreign Keys: The relational model permits the value of foreign keys either to match the
referenced primary or unique key value, or be null. If any column of a composite foreign key is null, then the
non-null portions of the key do not have to match any corresponding portion of a parent key.

Q.7. What is the difference between the Relational algebra and the Relational Calculus?

Ans.
1. Relational algebra operations manipulate some relations and provide some expression in the form of
queries where as relational calculus are formed queries on the basis of pairs of expressions.
2. RA have operator like join, union, intersection, division, difference, projection, selection etc. where as
RC has tuples and domain oriented expressions.
3. RA is procedural language where as RC is non procedural query system.
4. Expressive power of RA and RC are equivalent. This means any query that could be expressed in RA
could be expressed by formula in RC.
5. Any KC formula is translated in Algebric query.
6. There is modification which is easy in queries in RA than the RC.
7 RA formed the mathematical form and have no specificjuer1 language RC also has mathematical form but
has one query language QUEL.
8. Relational algebra is easy to manipulate and understand than RC.
9. RA queries are more powerful than the RC.
45
10. RC are formed WFFs where as RA does not form any formula.

Q. 8. Write a note on SQL basic queries.

Ans. Structured Query Language (SQL) is the language used to manipulate relational databases. SQL is
tied very closely with the relational model.
In the relational model, data is stored in structures called relations or tables. Each table has one or more
attributes or columns that describe the table. In relational databases, the table is the fundamental building block
of a database application. Tables are used to store data on Employees, Equipment, Materials, Warehouses,
Purchase Orders, Customer Orders, etc. Columns in the Employee table, for example, might be Last Name,
First Name, Salary, Hire Date, Social Security Number, etc.
SQL statements are issued for the purpose of:
• Data definition - Defining tables and structures in the database (DB).
• Data manipulation - Inserting new data, Updating existing data, Deleting existing data, and Querying the
Database ( Retrieving existing data from the database).
Another way to say this is the SQL language is actually made up of 1) the Data Definition Language
(DDL) used to create, alter and drop scema objects such as tables and indexes, and 2) The Data Manipulation
Language (DML) used to manipulate the
data within those schema objects.
SQL*Plus commands allow a user to manipulate and submit SQL statements.
Specifically, they enable a user to:
• Enter, edit, store, retrieve, and run SQL statements
• List the column definitions for any table
• Format, perform calculations on, store, and print query results in the form of reports
• Access and copy data between SQL databases
The following is a list of SQL*Plus commands and their functions. The most commonly used commands
are emphasized in italics:
• Execute the current SQL statement in the buffer - same as RUN
• ACCEPT - Accept a value from the user and place it into a variable
• APPEND - Add text to the end of the current line of the SQL statement in the buffer.
• AUTOTRACE - Trace the execution plan of the SQL statement and gather statistics
• BREAK - Set the formatting behavior for the output of SQL statements
• BTITLE - Place a title on the bottom of each page in the printout from a SQL statement
• CHANGE - Replace text on the current line of the SQL statement with new text
• CLEAR - Clear the buffer
• COLUMN - Change the appearance of an output column from a query
• COMPUTE - Does calculations on rows returned from a SQL statement
• CONNECT - Connect to another Oracle database or to the same Oracle database under a different user
name
• COPY - Copy data from one table to another in the same or different databases
• DEL - Delete the current line in the buffer
• DESCRIBE - List the columns with data types of a table (Can be abbreviated as DESC)
• EDIT - Edit the current SQL statement in the buffer using an external editor such as vi or emacs
• EXIT - Exit the SQL*Plus program
• GET - Load a SQL statement into the buffer but do not execute it
46
• HELP - Obtain help for a SQL*Plus command (In some installations)
• HOST - Drop to the operating system shell
• INPUT - Add one or more lines to the SQL statement in the buffer
• LIST - List the current SQL statement in the buffer
• QUIT - Exit the SQL*Plus program
• REMARK - Place a comment following the REMARK keyword
• RUN - Execute the current SQL statement in the buffer
• SAVE - Save the current SQL statement to a script file
• SET - Set an environment variable to a new value
• SHOW - Show the current value of an environment variable
• SPOOL - Send the output from a SQL statement to a file
• START - Load a SQL statement located in a script file and then run that SQL statement
• TIMING - Used to time the execution of SQL statements for performance analysis
• TITLE -Place a title on the top of each page in the printout from a SQL statement
• UNDEFINE - Delete a user defined variable

Q 9. What are the various features of SQL?

Ans. SQL Features -
1. It is meant to be an English like Language using set English phrases to manipulate the database How
well it achieves this is questionable
2. It is non procedural. You specify the information required not the navigation and operations required to
access the data. Each RDBMS has an inbuilt query optimiser which parses your SQL statements and works out
the optimum path to the required data.
3. When you query data, all the rows affected by your statement are dealt with in one go as a set, they are
not dealt with separately. The work area that holds the set is known as a CURSOR.
4. SQL encompasses a range of uses and users. DBA’s, application programmers, management and end
users can use SQL.
5. It provides commands for the following tasks :-
• querying data
• inserting, updating and deleting data
• creating, modifying and deleting database objects
• controlling access to the database and database objects
• guaranteeing database consistency
• monitoring database performance and configuration

Q. 10. What is a trigger?

Ans. Triggers are special stored procedures that are executed when a table undergoes an INSERT, a
DELETE, or an UPDATE operation. Triggers often enforce referential integrity and can also call other stored
procedures. Or Triggers are parameter-less procedures that are triggered (fired) either before or after inserting,
updating or deleting rows from a table. Because they are fired by the event and not by choice they cannot have
parameters
47

Q. 11. What is the difference between a procedural and a non-procedural language?

Ans.
Non procedural
• Can be used on its own to specify complex database operations.
• DMBSs allow DML statements to be entered interactively from a terminal, or to be embedded in a
programming language. If the commands are embedded in a general purpose programming language, the
statements must be identified so they can be extracted by a pre-compiler and processed by the DBMS.
Procedural
• Must be embedded in a general purpose programming language.
• Typically retrieves individual records or objects from the database and processes each separately.
• Therefore it needs to use programming language constructs such as loops.
• Low-level DMLs are also called record at a time DMLS because of this.
• High-level DMLs, such as SQL can specify and retrieve many records in a single DML statement, and are
called set at a time or set oriented DMLs.
• High-level languages are often called declarative, because the DML often specifies what to retrieve,
rather than how to retrieve it.

Q. 12. Consider the following employee database, where the primary keys are underlined.
Employee (person-name, street, city)
Works (person-name, company-name, salary)
Company (company-name, city)
Managers (person-name, manager-name)
Give an expression in SQL for each of the following queries.
(i) Find the names of all employees who work for First Bank Corporation and live in Las Vegas.
(ii) Find the names, street address and cities of residences of all employees who work for First Bank
Corporation and earn more than $10000.
(iii) Find all employees who do not work for First Bank Corporation. (iv) Find the company that has
the smallest payroll.
(v) Find all employees in the database who do not live in the same cities and on the same streets as do
their managers.

Ans. (i) Select person name from employee where company name = “ first bank cooperation” and
city =“ Las Vegas “.
(ii) Select person name, street and city from employee where company name “ first bank cooperation” and
salary > “$10,000 “.
(iii) Select * from employee where compant ! = “first bank cooperation”.
(iv) Select *from works where” salary <$1,000 “.
(v) Select *from employee where “employee city name! = manager city name

Q. 13. Consider the following relational database and give an expression in relational algebra to
express each of the following queries:
employee (person-name, street, city)
48
works (person-name, company-name, salary)
company (company-name, city)
managers (person-name, manager-name)
(a) Find the names of all employees who work for First Bank Corporation.
(b) Find the names and cities of residences of all employees who work for First
Bank Corporation.
(c) Find the names of all employees who do not work for First Bank Corporation.
(d) Find names of all employees who earn more than $10000 per annum.
(e) Find names of all employees who earn more than every employee of Small Bank Corporation.

Ans. (i) Select person_name from works where company-name = “first bank
cooperation”.
(ii) Select person_name and city from employee where company name = “first bank
cooperation”.
(iii) Select person_name form works where company-name! “first bank cooperation”.
(iv) Select person_name from works where salary > “$10,000.”
(v) Select * from works where “salary > small bank cooperation”.

Q. 14. List any two procedural programming languages.

Ans. 1. Postgre SQL
2. DB2 SQL
3. PL/SQL.

Q. 15. What are row triggers?

Ans. A row level trigger is fired each time the table is affected by the triggering statement e.g., if an update
statement updates multiple rows of a table, a row trigger is fired once for each row affected by the update
statement. If a triggering statement affects no rows, a row trigger is not executed at all.

Q. 16. Define the term DDL.

Ans. DDL is data definition languages.

Q. 17. Consider the following relational database Employee (person-name, street, city)
Works (person-name, company-name, salary)
Company (company-name, city)
Manager (person-name, manager-name)

Ans. Similar Question Chapter-2, Question No. 22.

49
Q. 18. Give relational algebra expression for each of the following queries:
(a) Find the names of all employees who work for first Bank corporation.
(b) Find names, cities of residence of all employees who work for first Bank corporation and earn
more than $10,000.

Ans. Similar Question Chapter-2, Question No. 13.

Q. 19. Define ‘View’.

Ans. A view is a method of organising table data to meet a specific need. Views
are based on select statement which derive their data from real tables
CREATE VIEW
Create a new view based on b1s in the database. The table names must already exist. The new view name
must not exist. CREATE VIEW has the following syntax:

Additional information on the SELECT statement and SQL queries can be found in
the next section.
Note that an ORDER BY clause may not be added to the sql select statement when
defining a view.
In general, views are read-only. That is, one may query a view but it is normally
the case that views can not be operated on with INSERT, UPDATE or DELETE. This is
especially true in cases where views joining two or more tables together or when a view
contains an aggregate function.
DROP VIEW
Drop a view from the database. The view name must already exist in the database. The syntax for the
DROP VIEW command is:
DROP VIEW;

Q. 20. Define entity and attribute. Or What do you mean by Entities and Attributes?

Ans. Entities and their Attributes:
Entities are the basic units in modeling classes of concrete (real) or abstract objects. Entities have concrete
existence or it contains ideas or concept e.g. a building, a room, a chair, employee etc. are all different entities.
An entity type or entity set is a group of similar objects of an organization, which is used for maintaining the
data. Examples of entity sets are transactions, job positions, employees, inventories of raw and finished
products, students, academic staff, non-academic staff, manager etc.
50
An object can belong to different entity sets simultaneously. A person can be a student as well as a part
time employee. Consider the modeling of flight crew. It consists of a group of individuals employed by an
organization who belong to the entity sets
EMPLOYEE & PERSON.
The individual numbers of the flight crew have different skills and functions. So the entity set EMPLOYEE
add the attribute skill with possible values. So entity set of EMPLOYEE has relationship with attribute skill.
To store data on an entity set, we have to create a model for it. For example, employees of an organization
are modeled by the entity set EMPLOYEE. We must have some properties as characteristics of employee that
may be useful to the organization. Some of these properties are employee, name, employee no., employee
address, employee skill and employees pay. The properties that characterize an entity set are called its attribute.
An attribute is also referred to by the term data item, date element data field, item, elementary item of object
property.


Q. 1. What is normalization? Discuss various Normal forms with the help of examples.
Ans. Normalization is a design technique that is widely used as a guide in design relational databases.
Normalization is essentially a two-step process that puts data in tabular form by removing repeating groups and
then removes duplicated data fro the relational tables.
Normalization theory is based on the concepts of normal forms. A relational tab is said to be a particular
normal form if it satisfied a certain set of constraints. There currently five normal forms that have been
defined. In this section, we will cover first three normal forms that were defined by E. F. Codd
Significance of Normalization
• Improves update efficiency,
• Removes many causes of anomalous data dependencies
• Allows better checks for consistency.
• Is (usually) better for query handling.
• But computational penalties in some SQL operations.
Normalization is also significant due to following reason
1. To make feasible represent any relation in the database
2. To obtain powerful relational retrieval using relational operator
3. To free relation from undesirable insertion, update and deletion anomalies
4. To reduce the need for restructuring the relations as new data types are introduced
Normalization Avoids.
• Duplication of Data — The same data is listed in multiple lines of the database

• Insert Anomaly - A record about an entity cannot be inserted into the table without first inserting
information about another entity - Cannot enter a customer without a sales order
• Delete Anomaly - A record cannot be deleted without deleting a record about a related entity. Cannot
delete a sales order without deleting all of the customer’s information.
• Update Anomaly - Cannot update information without changing information in many places. To update
customer information, it must be updated for each sales order the customer has placed
Before Normalization
51
1. Begin with a list of all of the fields that must appear in the database. Think of this as one big table.
2. Do not include computed fields
3. One place to begin getting this information is from a printed document used by the system.
4. Additional attributes besides those for the entities described on the document can be added to the
database.
Normal Forms
The normalization process as first proposed by Codd (1972), takes a relation schema through a series of
tests to “certify” whether it satisfies a certain normal form. The process, which proceed in a top-down fashion
by evaluating each relation against the criteria for normal form decomposing relation, as necessary, can thus be
considered as relational design by analysis. Initially, Codd proposed three normal forms, which he called first,
second and third normal form. A stronger definition of 3NF—called Boyce Codd normal form (BCNF)—was
proposed later by Boyce Codd. All these normal forms are based on the functional dependencies among the
attributes of a relation. Later, a 4NF and 5NF were proposed, base on the concept of multivalued dependencies
and join dependencies, respectively.

Need of Normalization
Normalization of data can hence be looked upon as a process of analyzing the given relation schemas based
on their FDs and primary keys to achieve the desirable properties of:
1. Minimizing redundancy.
2. Minimizing the insertion, deletion, and updation.
Normal forms are based on primary key.
Normalization: It is the process of structuring an unstructured relation into structural one with the purpose
of removing redundancy and anomalies.
First Normal Form (IN F)
Definition : A relation schema is said to be in INF if the values in the domain each attribute of the relation
are atomic. In other words, only one value is associates with each attribute and the value is not a set. of values
or a list of values. A database schema is in INF if every relation schema included in database scheme is in INF.
A relation is in 1NF if and only if all underlying domains contain scalar value only. Here scalar is
atomicity, meaning there should be single value at the intersecting of each row and column as shown in the
FIRST relation obtained by original relations
52

The functional dependencies in relation FIRST is as follows:

But problem occurs with each of the three operations.
INSERT: We cannot insert the fact that a particular supplier is located in a particular city until that supplier
supplies at least one part. FIRST relation does not show that supplier S5 is located in Athens. The reason is
that, until S5 supplies some part, we
have no appropriate primary key values.
DELETE: If we delete only the FIRST tuple for a particular supplier, we destroy not only the shipment
connecting that supplier to some port but also the information that the supplier is located in particular city.
For example, if we delete the FIRST tuple with S# value S3 P# value P2, we lose the information that S# is
located in Paris.
UPDATE: The city value for a given supplier appears in FIRST many times, in general. This redundancy
causes update problems. For example, if supplier SI moves from London to Amsterdam, we are faced with
either the problem of reaching FIRST to final every tuple connecting S1 and London (and changing it) or the
possibility of producing an inconsistent result (the city for SI might be given as Amsterdam in one tuple,
London in another).
Therefore, to overcome this problem we make 2NF.
Before Proceeding to next form let us denote:
R = Relation Scheme
S = Set of attributes
F = All of functional dependencies
53
Second Normal Formal (2NF)
Definition: A relation schema R<S, F> is in second normal form (2NF) if it is in the INF and if all
nonprime attributes are fully functionally dependent on the relation keys). A database schema is in 2NF if
every relation schema included in the data base schema is in 2NF.
Feature:
1. A relation is in 2NF if it is INF and every nonkey attribute is fully dependent on the key.
2. If the key is a single attribute then the relation is automatically in the 2NF.

Second Normal Form (definition assuming only one candidate key, which is thus the primary key): A
relation is in 2NF if and only if it is in INF and every nonkey attribute is irreducibly dependent on the primary
key. So we decompose FIRST relation in two table word.
It should be clear that revised structure overcomes all the problems with update operation sketched earlier.
INSERT: We can insert the information that S5 is located in Athens, even though S5 does not currently
supply any parts, by simply inserting the appropriate tuple into
SECOND.
DELETE: We can delete the shipment. Connecting S3 and P2 by deleting the opposite tuple from S2; we
do not lose the information that S3 is located in Paris.
UPDATE: The S# - CITY redundancy has been eliminated. Thus we can change the city for SI from
London to Amsterdam by changing it once and for all in the relevant SECOND tuple.
54
Still we have problem with their operations in the following ways:
INSERT: We cannot insert the fact that a particular city has a particular status e.g., we cannot state that
any supplier in Rome must have a status of 50 — until we have some supplier actually located in that city.
DELETE: If we delete the only SECOND tuple for a particular city, we destroy not only the information
for the supplier concerned but also the information that city has that particular status. For example, if we delete
the SECOND tuple of S5, we lose the information that the status for Athens is 30).
UPDATE: The status for a given city appears in SECOND many times, in general (the relation still contain
some redundancy). Thus, if we need to change the status for LONDON from 20 to 30, we are faced with either
the problem of searching SECOND to find every tuple for London (and changing it) or the possibility of
producing an inconsistent result (the status in London might be given 20 in one tuple and 30 in another).
Again to overcome such problems we replace the original relation (SECOND, in
this case) by two projections making 3NF.
Third Normal Form (3NF)
Definition: A’ relation schema R<S, F>is in 3NF, if for all nontrivial functional dependencies in F of the
form X®A, either X contains a key (i.e., X is a superkey) or A is a prime attribute. A database schema is in
3NF if every relation schema included in the database schema is in 3 NF.
Feature: A relation R is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively
dependent on the primary key.
(3NF) (Definition assuming only one candidate key, which is thus the primary
key): A relation .is in 3NF if and only if it is in 2NF and every nonkey attribute is
nontransitively dependent on the primary key. (“No transitive dependencies” implies
no mutual dependencies).
Relation, SC and CS are both in 3NF

Thus, by such relation we have removed transitivity from relation SECOND
Boyce Codd Norma! Form (BCNF)
A relation is in BCNF if and only if every nontrivial, left-irreducible FD has a
candidate key as its determinant. Or, less formally,
BNF (informal definition): A relation is in BCNF if and only if the only
determinants are candidate keys.
Relation FIRST and SECOND, which are not in 3NF, are not in BCNE either; also that relation SP, SC and
CS, which were in 3NF are also in BCNF. Relation FIRST contains three determinants, namely S#, CITY, and
{S#, P#), of these, only {S#, P#J is a candidate key, so FIRST is not in BCNF Similarly, SECOND is not in
BCNF ether because the determinant CITY is not a candidate key. Relation SP, SC and CS on the other hand,
are each in BCNF, because in each case the (single) candidate key in the only determinant in the reduction
Multi-Valued Dependencies and Fourth Normal Form
It was proposed as a sample form of 3NF but it was found to be stricter than 3NF because every relation in
55
Definition: A normalized relation scheme R<S, F> is .in BCNF if for every nontrivial FD in F of the form
x A where X ??S and A?S, X is a superkey of R.
Features.
2. More than one candidate keys are there.
Any table if follow above mentioned three features of BCNF, then we will say that
this table is in BCNF
4.
Explanation

Composite Key working as a candidate key.


Advisor
56
1. It is not in BCNF because it has a determinant FName, that is not a candidate key.

STU-ADV Key: (SID, FName) ADV—SUBJ Key: (FName) Relations in BCNF
Now we can say that a relation is in BCNF if and only if every nontrivial left- irreducible FD has a candidate
key as its determinant. Or less formally, A relation is BCNF if and only if the only determinant are candidate
keys.
Fourth Normal Form (4NF)
Definition: Given a relation schema R such that the set D of FDs and MVDS satisfied, consider a set of
attributes X and Y where X ??R , Y? R. The relation schema
R is in Fourth normal Form (4NF) if for all multivalued dependencies of the form X ?????? D+, either X ???Y
is a trivial MVD or X is a superkey of R. A database scheme is in 4NF if all relation schema included in the
database schema are in 4NF.
Joint Dependencies and Fifth Normal Form
So far in this chapter we have assumed that the sole operation necessary or available in the further
normalization process is the replacement of a relation in a non-loss way by exactly two of its projections. This
assumption has successfully carried us as far 4NF. It comes perhaps as a surprise, therefore, to discover that
there exist relatior5 that cannot be non-loss-decomposed into two projections but can be non-loss- decomposed
into three (or more). To coin an ugly but convenient term, we will describe such a relation as “n-
decomposable” (for some n > 2)— meaning that the relation in question can be non-loss-decomposed into n
projections but not into m for any m <n. A relation that can be non-loss-decomposed into two projections we
will call “2- decomposable,”
In short
1NF
• A relation is in INF if it contains no repeating groups
• To convert an unnormalised relation to I NF either:
• Flatten the table and change the primary key, or
• Decompose the relation into smaller relations, one for the repeating groups and
one for the non-repeating groups.
57
• Remember to put the primary key from the original relation into both new
relations.
• This option is liable to give the best results.
2NF
• A relation is in 2NF if it contains no repeating groups and no partial key functional
dependencies
• Rule: A relation in IN.F with a single key field must be in 2NF
• To convert a relation with partial functional dependencies to 2NF create a set of new relations
• One relation for the attributes that are fully dependent upon the key
• One relation for each part of the key that has partially dependent attributes
3NF
• A relation is in 3F if it contains no repeating groups, no partial functional dependencies, and no transitive
functional dependencies
• To convert a relation with transitive functional dependencies to 3NF, remove the attributes involved in
the transitive dependency and put them in a new relation
• Rule: A relation in 2NF with only one non-key attribute must be in 3NF
• In a normalized relation a non-key field must provide a fact about the key, the whole key and nothing but
the key.
Relations in 3NF are sufficient for most practical database design problems.
However, 3NF does not guarantee that all anomalies have been removed.

Q. 2. What are multivalued dependencies?
Or
Define the tern functional dependency.

Ans. FUNCTIONAL DEPENDENCIES
Introduction
Basically, a functional dependence (usually abbreviated FD) is a many to one relationship from one set of
attributes to another within a given relation. In the shipments relation SP, for example, there is a functional
dependence from the set of attributes (S#, P#) to the set attributes (QTY) what this means is that for many
values of the attribute pair (S#, P#), there is one corresponding value of the attribute (QTY, FDs) provide a
basis for a scientific attack on a number of practical problems. This is because FDs possess a rich set of
interesting formal properties, which make it possible to treat the problems in question in a formal and rigorous
manner. The terms functional dependence are used interchangeably in the technical literature. Customary
English usage would suggest that the term “dependence” be used for the FD concept perse and would reserve
the term “dependency” for “the object that depends.” But we very frequently need to refer to FDs in the plural,
and “dependencies” But we very frequently need to refer to EDs in the plural, and “dependencies” seems to
trip off the tongue more readily than “dependencies”; hence our use of both terms.

Basic definitions
In order to illustrate the ideas of the present section, we make use of a slightly revised version of the
shipments relation, one that includes, in addition to the usual attributes S#, P# and QTY, an attribute CITY,
REPRESENTING TH CITY FOR THE RELEVANT SUPPLIER, We will refer to this revised relation as SCP
to avoid confusion. A possible tabulation of relation SCP is given in Fig.
58
Now, it is very important in this area — as in so many other — to distinguish clearly between a. the value
of a given relation (i.e., relation variable) at a given point in time and b. the set of all possible values that the
given relation (variable) might assume at different times. In what follows, we will first define the concept of
functional dependency as it applies to Case a. and then extend it to apply to Case b. Here then is the definition
for Case (a).
Let R be a relation, and let X and Y be arbitrary subsets of the set of attributes or
R. Then we say that Y is functionally dependent on X— hi symbols.
X Y
(read “X functionally determines 1,” or simply “X arrow Y”) — if and only if each
X-value in R has associated with it precisely one Y-value in R.
In other words, whenever two tuples of R agree on their X-value, they also agree
on their Y-value. For example, the tabulation of relation SCP shown in Fig. 9.1 satisfies the FD.
(s#) (CITY)
because every Scp tuple with a given S# value also has the same City Value. Indeed
it also satisfies several more FDs, the following among them:
Fig: The relation SCP (Sample tabulation)

(Exercise: Check these.)
The left-hand side and right-hand side of an FD are sometimes called the determinant and the dependent,
respectively. As the definition states, the determinant and dependent are both sets of attributes. When the set
contains just one attribute, however — i.e. when it is a singleton set — we will often drop the set brackets and
write ; just E.g.,
S# City
As already explained. The foregoing definitions apply to “Case (a)”—i.e. to individual relation values.
However, when we consider relation variables — in particular,
When we consider base relations — we are usually interested not so much in the FDs that happen to hole in the
particular value that the variable happens to have at some
59
particular time, but rather in those FDs that hold for all possible values of that variable in the case of SCP, for
example, the FD.
S# City
Hole for all possible values of SCP, because, at any given time, a given supplier has excessively one
corresponding city, and so any two tuples appearing in SCP at the same time with the same supplier number
must necessarily have the same city as well. In fact The statement that this FD holds “for all time” (i.e., for all
possible values of SCP) integrity constraint for SCP — it places limits on the values that SCP can legitimately
assume.
Here then is the “Case” definition of functional dependency (the extensions over the Case definition are
shown in boldface.
• Let R be a relation Variable, and let X and Y be arbitrary subsets of the set of attributes of R. Then we say
that Y is functionally dependent on X — in symbols.
X Y
(read “X functionally determines Y,” or simply “X arrow Y”)—if and only if, in every possible legal value
of R, each X — value has associated with it precisely one Y — value
In other words, in every possible legal value of R, whenever two tuples agree on their X-values, they also
agree on their Y — value.
Henceforth, we will usually take the term “functional dependency” to have this latter, more demanding,
time—independent meaning(barring explicit statements to the contrary). Here are some time-independent FDs
that apply to the relation variable SCP:

Notice in particular that the following FDs, which do hold in the sample tabulation
of Fig. 3.8, do not hold “for all time”.

In other words, the statement that (e.g.)”every shipment for a given supplier has the same shipment
quantity” happens to be true for the sample values in Fig 3.8 but it is not true for all possible legal values of
SCP.
It is worth pointing out that if X is a candidate key of relation R—in particular, if it is the primary key —
then all attributes Y of relation R must necessarily be functionally dependent on X (this fact follows from the
definition of candidate key). In the usual parts relation, For example, we must necessarily have:

In fact, if relation R satisfies the FD A ® B and A is not a candidate key, * then R
will involve some redundancy. In the case of relation SCP, for example, the fact that a
given supplier is located in a given city appears many times, in general (see Fig. 3.8).
Now, even if we restrict our attention to FDs that hold “for all time,” the set of
FDs satisfied by all legal values of a given relation can still be very large, as the SCP
example suggests.
Why is this objective desirable? One reason is that (as already stated) FDs represent
integrity constraints, and hence the DBMS needs to check them when updates are
60
performed. Given a particular set S of FDs. Therefore, it is desirable to find some other
set T that is (ideally) much smaller than S and has the property that every FD in S is
implied by the FDs in T. If such a set T can be found, it is sufficient that the DBMS
enforce the FDs in T, and the FDs in S will then be enforced automatically, The problem
of finding such a set T is thus of considerable practical interest.
Trivial and Nontrivial Dependencies
Note: In the remained of this section, we will occasionally abbreviate “functional dependency” to just
“dependency,” Similarly for “functionally dependent on function dependency” to just “dependency.” Similarly
for “functionally dependent on”, functionally determines,” etc.
One obvious way to reduce the size of the set of FDs. we have to deal with is to eliminate the trivial
dependencies, “Is trivial”, if it cannot possibly not be satisfied just one of the FD is trivial if and only if the
right hand side is a subset (not necessarily a proper subset) of the left-hand side.
As the name implies, trivial dependencies are not very interesting in practice; we are usually more
interested in practice in nontrivial dependencies (which are, of course. Precisely the ones that are not trivial),
because these are the ones that correspond to “genuine” integrity constraints. When we are dealing with formal
dependency theory, however, we cannot necessarily assume that all dependencies are nontrivial.

Closure of a Set of Dependencies
As already suggested that certain FDs imply others. As a simply example, the FD
{S#, P#1 (City, Qty.}
Implies both the following FDs:
{S#, P#) (City)
{S#, P#) {Qty.)
As a more complex example, suppose we have a relation R with three attributes A, B, and C, such that the
FDs A B and B C both hold in R. Then it is easy to see that the FD A C also holds in R. The FD A C
here is an example of a transitive FD — C is said to depend on A transitively, via B.
The set of all FDs that are implied by a given set S of FDs is called the closure of S. and is denoted S.
Clearly we need a way of computing S from S. The first attack on this problem appeared in a paper by
Armstrong which gave a set of rules of inference (more usually called Armstrong’s axioms) by which new FDs
can be inferred from given ones. Those rules can be stated in a variety of equivalent ways, one of the simplest
of which is as follows
‘Armstrong’s inference rules: Let A, 13, and C be arbitrary subsets of the set of attributes of the given
relation R, and let us agree to write (e.g.) AB to mean the union of A and B. Then
1. Reflexivity: If B is a subset of A, then A B
2. Augmentation: If A B, then AC BC.
3. Transitivity: If A B and B C, then A C.
Each of these three rules can be directly proved from the definition of functional dependence (the first is
just the definition of a trivial dependence, of course). More over, the rules are complete, in the sense that, given
a set S of FDs, all FDs implied by S can be derived from S using the rules. They are also sound, sense that no
additional FDs (i.e., FDs not implied by S) can be so derive. In other words, the rules can be used to derive
precisely the closure S. Several further rules can be derived from the three given above, the following among
them. These additional rules can be used to simplify the practical task of computing S+ from S. (D is another
arbitrary subset of the set of attributes of
R.)
4. Self-determination: A A
5. Decomposition: If A BC, then A B and A C.
61
6. Union: If A B and A C, then A BC.
7 Composition If A B and C D, then AC BD
And Darwin proves the following rule, which he calls the General Unification
Theorem.
8. If A B and C D, then A(C- B) ®BD (where “E” is union and “ -“ is set difference).
The name “General Unification Theorem” refers to the fact that several of the earlier rules can be seen as
special cases:
Example: Suppose we are given relation R With attributes A, B, C, D, E, F, and the FDs
A BC
B E
CD EF
Observe that we are extending our notation slightly (though not incompatibly) by writing, e.g. BC for the
set consisting of attributes B and C— previously BC would have meant the union of B and C, where B and C
were sets of attributes.
Note: If you would prefer a more concrete example, take A as employee number for a project directed by
that manager (unique within manager), E as department name, and F as percentage of time allocated by the
specified manager to the specified project.
We now show that the FD®AD F holds in R, and so is a member of the closure of the given set:

Closure of a Sat of Attributes
We have not yet given an effective algorithm for computing the closure of a given set S of FDs.
However, in this section we give an effective way of determining whether a given (specified) FD is in that
closure We begin our discussion with the notion of a superkey.
A superkey for a relation R is a set of attributes of R that includes at least one candidate key of R as a subset
— not necessarily a proper subset. Of course (The definition of “superkey” can thus be derived from that of
“candidate key” by simply deleting the irreducibility requirement.) it follows immediately that the superkeys
for a given relation R are precisely those subsets K of the set of attributes of R such that the functional
dependency,

holds true for every attribute A of R.
Now suppose we know the FDs that hold for some given relation, and we need to determine the candidate
keys for that relation. The candidate keys are, by definition, those superkeys that are irreducible. So
determining whether or not a given set of attributes K is a superkey is a big step toward determining whether K
is in fact a candidate key.
To determine whether K is a superkey, we need to determine whether the set of all attributes functionally
dependent on K is in fact the set of all attributes of R and so, given a set S of FDs that hold in R, we need a
way of determining the set of all attributes of R that are functionally dependent on K—the so-called closure K
of K under S.A simple algorithm for computing this closure is given in Fig.
Example: Suppose we are given relation R with attributes A, B, C, D, E, F, and FDs
62

We now compute the closure {A,B}+ of the set of attributes {A,B} under this set of FDs.
1. We initialize the result CLOSURE [K,S] TO {A,B).
2. We now go round the inner loop four times, once for each of the given FDs. On the
first iteration (for the FD, A BC), we find that the left-hand side is indeed a subset
of CLOSURE [K.SJ as computed so far, so we add attributes (B and) C to the
result. CLOSURE {K.S) is now the set (A, B, C}
3. On the second iteration (for the FD, E -, CF), we find that the left-hand side is not a
subset of the result as computed so far, which thus remains unchanged.
4. On the third iteration {for the FD, B -, E), we add E to CLOSURE [K,S] which
now has value (A,B,C,E).
5. On the fourth iteration (for the FD, CD —‘ EF), CLOSURE [K,S], remains
unchanged.
6. Now we go round the inner loop four times again. One the first iteration, the result
does not change; on the second, it expands to {A,B,C,E,F}; on the third and fourth,
it does not change.
7. Now we go round the inner loop four times again. CLOSURE [K.S.] does not
change, and so the whole process terminates, with Note,
therefore, that {A,B} is not a superkey (and hence not a candidate key a fortiori).
An important corollary of the foregoing is as follows: Given a set S of FDs. We can easily tell whether a
specific FD X-’ Y follows from 5, because that FD will follow if and only if Y is a subset of the closure X’ of
X under S. In other words, we now have a simple way of determining whether a given FD X®Y is in the
closure S’ of S.
Irreducible Sets of Dependencies
Let SI and S2 be two sets of FDs. If every FD implied by SI is implied by the FDs in S2, i.e., if S1+ is a
subset of S2+, we say that S2 is a cover for S1*. What this means is that if the DBMS enforces that constraints
represented by the FDs in S2, then it will automatically be enforcing the FDs in SI. Next, if S2 is a cover for S1
and S1 is a cover for S2, i.e., if we say that SI & S2 are equivalent. Clearly, if SI & S2 are
equivalent, then if the DBMS enforces the constraints represented by the FDs in S2, it will automatically be
enforcing the FDs in SI, and vice versa.
63
Now we define a set S of FDs to be irreducible, if and only if it satisfies the following
three properties:
1. The right hand side (the dependent) of every FD in S involves just one attribute (i.e. is a singleton set).
2. The left hand side (the determinant) of every FD in S is irreducible in turn, meaning that no attribute can
be discarded from the determinant without changing the closure (i.e. without converting S into some set not
equivalent to S). We will say that such an FD is left-irreducible.
3. No FD in S can be discarded from S without changing the closure (i.e. without converting S into
same set not equivalent to S).
For example, consider the familiar parts selection P. The following FDs (among
others) hold in that relation:

This set of FD is easily seen to be irreducible. The right-hand side is a single attribute in each case, the left
hand side is obviously irreducible in turn, and none of the FDs can be discarded without changing the closure
(i.e., without losing some information). By contest, the following sets of FDs are not irreducible.
We now claim that for every set of FDA, there exists at least one equivalent set that is irreducible In fact,.
this is easy to see. Let the original set of FDs be S. Thanks to the decomposition sets, we can assume without
loss of generality that every FD in S has a singleton right-hand side. Next, for each FD fin S, we examine each
attribute A in the left hand side of f; if S and the set of FDs obtained by eliminating A from the left hand side
of f are equivalent, we delete A from the left hand side of f. Then, for each FD f remaining in 5, if S and S-f are
equivalent, we delete f from S The final set S is irreducible and is equivalent to the original set S.
Example: Suppose we are given relation R with attributes A, B, C, D, and FDs,

64
We now compute an irreducible set of FDs that is equivalent to this given set
1.The first step is to rewrite the FDs such that each one has. a singleton right-hand side.

We observe immediately that the FD A B occurs twice, so one occurrence can
be eliminated.
2. Next, attribute can be eliminated from the L.H.S. of the FD AC D because we have A C, so A
® AC by augmentation, and we are given AC D so A D by transitivity; thus the C on the left-hand-side
(LHS) of AC D is redundant.
3. Next, we observe that the FD, AB C can be eliminated, because again we have A C, so AB CB
by augmentation, so AB C by decomposition.
4. Finally, the FD, A C is implied by the FD, A B and B C, so it can also be eliminated. We are left
with:

This set is irreducible.
A set 1 of FDs that is irreducible and is equivalent to some other set S of FDs is said to be an irreducible
cover for S. Thus. Given some particular set S of FDs that need to be enforced, t. is sufficient for the system to
find and enforce an irreducible, cover I instead. We should make it clear, however, that a given set of FDs does
not necessarily have a unique irreducible cover.

Q. 3. What is query optimization?

Ans. Optimzation techniques that apply heuristic rules to modify the internal representation of a query,
which is usually in the form of a query tree or a query graph data structure to improve its expected
performance. The parser of a high-level query first generates an initial internal representation, which is then
optimized according to heuristic rules. Following that, query execution plan is generated to execute groups of
operations based on the access paths available on the files involved in the query.
One of the main heuristic rules is to apply SELECT and PROJECT operations before applying the JOIN or
other binary operations. This is because the size of the file resulting from a binary operation, such as JOIN, is
usually a multiplicative function of the sizes of the input files. The SELECT and PROJECT operations reduce
the size of a file and hence, should be applied before a join or other binary operation.

Q. 4. What are the various guidelines for database design?

Ans. The process of database design can be stated as follows:
Design the logical and physical structure of one or more databases to accommodate the information needs
of the users in an organisation for a defined set of applications.
65
The goals of database design are multiple:
1. Satisfy the information content requirements of the specified users and applications
2. Provide a natural and easy-to-understand structuring of the information.
3. Support processing requirements and any performance objectives such as response time, processing
time, and storage space.
These goals are very hard to accomplish and measure, and they involve an inherent tradeoff if one attempts
to achieve more “naturalness” and “understand ability” of the model, it may be at the cost of performance The
problem is aggravated because the database design process often begins with informal and poorly defined
requirements In contrast, the result of the design activity is a rigidly defined database schema that cannot be
modified easily once the database is implemented. We can identify, six main phases of the database design
process
(i) Requirements collection and analysis
(ii) Conceptual database, design
(iii) Choice of a DBMS
(iv) Data Model mapping (also called logical database design)
(v) Physical database design
(vi) Database system implementation and tuning.

Q. 5 Discuss the concepts of normalization in detail.

Ans. NORMAL FORMS BASED ON PRIMARY KEYS
Normalization
In very simple words normalization is a technique which helps to determine the most appropriate grouping
of data items into records, segments or tuples. This is necessary as the data items are arranged in tables which
indicate the structure, relationship integrity in the relational databases.
Normal Forms
The normalization process as first proposed by Codd (1972), takes a relation schema through a series of
tests to “certify” whether it satisfies a certain normal form. The process, which proceed in a top-down fashion
by evaluating each relation against the criteria for normal form decomposing relation, as necessary, can thus be
considered as relational design by analysis. Initially, Codd proposed three normal forms, which he called first,
second and third normal form. A stronger definition of 3NF—called Boyce Codd normal form (BCNF)—was
proposed later by Boyce Codd. All these normal forms are based on the functional dependencies among the
attributes of a relation. Later, a 4NF and 5NF were proposed, base on the concept of mutivalued dependencies
and join dependencies, respectively.
Need of Normalization
Normalization of data can hence be looked upon as a. process of analyzing the
given relation schemas based on their FDs and primary keys to achieve the desirable properties of:
1. Minimizing redundancy.
2. Minimizing the insertion, deletion, and updation.
Normal forms are based on primary key.
Normalization: It is the process of structuring an unstructured relation into structural one with the purpose
of removing redundancy and anomalies.
First Normal Form (INF)
Definition: A relation schema is said to be in INF if the values in the domain of each attribute of the
relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of
66
values or a list of values. A database schema is in INF if every relation schema included in database scheme is
in INF.
A relation is in INF if and only if all underlying domains contain scalar values only. Here scalar is atomic
ity, meaning there should be single value at the intersection of each row and column as shown in the FIRST
relation obtained by original relation

The functional dependencies in relation FIRST is as follows:

But problem occurs with each of the three operations.
INSERT: We cannot insert the fact that a particular supplier is located in a particular city until that supplier
supplies at least one part. FIRST relation does not show that supplier S5 is located in Athens. The reason is
that, until S5 supplies some part, we have no appropriate primary key values.
DELETE: If we delete only the FIRST tuple for a particular: supplier, we destroy not only the shipment
connecting that supplier to some port but also the information that the supplier is located in particular city.
For example, if we delete the. FIRST tuple with S# value S3 P# value P2, we lose the
information that S# is located in Paris.
UPDATE: The city value for a given supplier appears in FIRST many times, in general. This redundancy
causes update problems. For example, if supplier S1 moves from London to Amsterdam, we are faced with
either the problem of reaching FIRST to final every tuple connecting S1 and London (and changing it) or the
possibility of producing an inconsistant result (the city for S1 might be given as Amsterdam in one tuple,
London in another).
Therefore, to overcome this problem we make 2NF.
67
Before Proceeding to next form let us denote:
R = Relation Scheme
S = Set of attributes
F = All of functional dependencies
Second Norma! Formal (2NF)
Definition: A relation schema R<S, F> is in second normal form (2NF) if it Is in the INF and if all
nonprime attributes are fully functionally dependent on the relation keys). A database schema is in 2NF if
every relation schema included in the data base schema is in 2NF.
Feature:
I. A relation is in 2NF if it is INF and every nonkey attribute is fully dependent on the key.
2. If the key is a single attribute then the relation is automatically in the 2NF.

SECOND
Second Normal Form (definition assuming only one candidate key, which is thus the primary key): A
relation is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary
key. So we decompose FIRST relation in two table word.
It should be clear that revised structure overcomes all the problems with update operation sketched earlier.
68
INSERT: We can insert the information that S5 is located in Athens, even though S5 does not currently
supply any parts, by simply inserting the appropriate tuple into SECOND.
DELETE: We can delete the shipment. Connecting S3 and P2 by deleting the opposite tuple from S2; we
do not lose the information that S3 is located in Paris.
UPDATE: The S# - CITY redundancy has been eliminated. Thus we can change the city for SI from
London to Amsterdam by changing it once and for all in the relevant SECOND tuple.
Still we have problem with their operations in the following ways:
INSERT: We cannot insert the fact that a particular city has a particular status — e.g., we cannot state that
any supplier in Rome must have a status of 50 — until we have some supplier actually located in that city.
DELETE: If we delete the only SECOND tuple for a particular city, we destroy not only the information
for the supplier concerned but also the information that city has that particular status. For example, if we delete
the SECOND tuple of S5, we lose the information that the status for Athens is 30).
UPDATE: The status for a given city appears in SECOND many times, in general
3. Course ID is such an attribute which is overlapping.
Hence relation shown is in 3NF and also in BCNF.
A relation Schema R(S, F) =is in BCNF (S=set of Attributes, F=All of functional dependency), if a set of
attributes X which is subset of S and an attribute Y which
belongs to Ds.

One of the following two conditions hold.
i. Either Y belongs to DX (Y, X) is a Trivial Attribute
ii. Or X is a Superkey.
Whereas,
Trivial dependency: If the right hand side is a subset of the left hand side is known as trivial dependency.
eg. (S#, P#) -. S#
Super Key: Adding primary key with any attribute is known as super key.

Advisor
69
(the relation still contain some redundancy). Thus, if we need to change the status for LONDON from 20 to 30,
we are faced with either the problem of searching SECOND to find every tuple for London (and changing it) or
the possibility of producing an inconsistent result (the status in London might be given 20 in one tuple and 30
in another).
Again to overcome such problems we replace the original relation (SECOND, in
this case) by two projections making 3NF.
Third Normal Form (3NF)
Definition: A relation schema R<S, F>is in 3NF, if for all nontrivial functional dependencies in F of the
form X®A, either X contains a key (i.e., X is a superkey) or A is a prime attribute. A database schema is in
3NF if every relation schema included in the database schema is in 3 NF.
Feature: A relation R is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively
dependent on the primary key.
(3NF) (Definition assuming only one candidate key, which is thus the primary key): A relation is in 3NF if
and only if it is ir 2NF and every nonkey attribute is non transitively dependent on the primary key. (“No
transitive dependencies” implies no mutual dependencies).
Relation, SC and CS are both in 3NF

Functional dependencies in the relation SC and CS

Thus, by such relation we have removed transitivity from relation SECOND
Boyce Codd Normal Form (BCNF)
A relation is in BCNF if and only if every nontrivial, left-irreducible FD has a
candidate key as its determinant. Or, less formally,
BCNF (informal definition): A relation is in BCNF if and only if the only
determinants are candidate keys.
Relation FIRST: and SECOND, which are not in 3NF, are not in BCNF either; also
that relation SP, SC and CS, which were in 3NF are also in BCNF. Relation FIRST contains three
determinants, namely S#, CITY, and {S#, P#}; of these, only {S#, P#} is a
candidate key, so FIRST is not in BCNF. Similarly, SECOND is not in BCNF either, because the determinant
CITY is not a candidate key. Relation SP, SC and CS on the other hand, are each in BCNF, because in each
case the (single) candidate key in the only determinant in the reduction.
It was proposed, as a sample form of 3NF but it was found to be stricter than 3NF because every relation in
Definition : A normalized relation scheme R<S, F> is in BCNF if for every nontrivial
FD in F of the form X®A where X I S and AIS, X is a superkey of R.
70
Features:
2. More than one candidate keys are there.
Any table if follow above mentioned three features of BCNF, then we will say that this table is in BCNF.
4. Explanation

1. It is not in BCNF because it has a determinant FName, that is not a candidate key.

STU-ADV Key: (SID,FName) ADV—SUBJ Key: (FName) Relations in I3CNF
71
Now we can say that a relation is in BCNF if and only if every nontrivial left-irreducible FD has a candidate
key as its determinant. Or less formally, A relation is in BCNF if and only if the only determinant are candidate
keys.
Fourth Normal Form (4NF)
Definition: Given a relation schema R such that the set D of FDs and MVDS are
satisfied, consider a set of attributes X and Y where X R , Y R. The relation schema
R is in Fourth normal Form (4NF) if for all multivalued dependencies of the form X
Y D+, either X Y is a trivial MVD or X is a superkey of R. A database scheme is in
4NF if all relation schema included in the database schema are in 4NF.
The 3NF and BCNF normal forms most of the times serve the purpose well. However, there are occasions,
where higher normal forms must be considered. The next higher form of normalization is the fourth normal
form. It makes use of a new kind of dependency, called a multi-valued dependency (MVD); MVDs are a
generalization of FDs. Likewise, the definition of Fifth normal form makes use of another new kind of
dependency, called a join dependency (JD), JDs in turn are a generalization of MVD.
Multi-valued dependence: Let R be a relation, and let A, B, and C be subsets of
the attributes of R. Then we say that B is multi-dependent on A, in symbols:
A B
(read “A multi-determines B,” or simply “A double arrow B”) — if and only if, for every possible legal
value or R, the set of B values matching a give (A value, C value) pair depends only on the A value and is
independent of the C value.
To understand it, we will take an example.
Suppose we are given a relation HCTX (H for “hierarchy”) containing information about course, teachers,
and texts, in which the attributes corresponding to teachers and texts are relation-valued (see Fig.). As you can
see, each HCTX tuple consists of a course name, plus a relation containing teacher names, plus a relation
containing text names (two such tuples are shown in the figure). The intended meaning of such a tuple is that
the specified course can be taught by any of the specified teachers and uses all of the specified text as
references. We assume that, for a given course, there can exist any number of corresponding teachers and any
number of corresponding texts.
Moreover, we also assume — perhaps not very realistically! — that teachers and text are quite independent
of one another; that is, no teacher who actually teaches any particular offering of a given course, the same texts
are used. Finally, we also assume that a given teacher or a given text can be associated with any number of
course.
72
Now suppose that we want to eliminate the relation-valued attributes. One way to do this, however, is
simply to. replace relation HCTX by a relation CTX with three scalar attributes COURSE, TEACHER, and
TEXT as indicated in Fig. 3.10. As you can see from the figure, each tuple of HCTX gives rise to rn * n tuples,
in CTX, where m and n are the cardinalities of the TEACHERS and TEXTS relations in that HCTX tuple.
Note that the resulting relation CTX is “all key” (the sole candidate key for HCTX, by contrast, was just
{COURSE)).
Figure: Value for relation CTX corresponding to the HCTX value in Fig.
The meaning of relation CTX is basically as follow3: A tuple (COURSE: c,TEACHER: t, TEXT x;)
appears in CTX if and only if course c can be taught by teacher t and uses text x as a reference. Observe that,
for a given course, all possible combinations of teacher and text appear; that is, CTX satisfies the (relation)
constraint,
If tuples (c, tl, xl), (c, t2, x2) both appear
then tuples (c, tl, x2), (c, t2, xl) both appear also.
Now, it should be apparent that relation CTX involves a good deal of redundancy, leading as usual to
certain update anomalies. For example, to add the information that the physics course can be taught by a new
teacher, it is necessary to insert two new tuples, one for each of the two texts. Can we avoid such problems?
Well it is easy to see that:
1. The problems in question are caused by the fact that teachers and texts are completely independent of
one another;
73
2. Matters would be much improved if CTX were decomposed into its two projections— call them CT and
CX—on (COURSE.TEACHER) and (COURSE.TEXT1] respectively (see Fig.).
To add the information that the physics course can be taught by a new teacher, all we have to do now is
insert a single tuple into relation CT. (Note that relation CTX can be recovered by joining CT and CX back
together again, so the decomposition is non-loss.) Thus, it does seem reasonable to suggest that there should be
a way of “further normalizing” a relation like CTX.
Note: At this point, you might object that the redundancy in CTX was unnecessary in the first place, and
hence that the corresponding update anomalies were unnecessary too. More specifically, you might suggest
that CTX need not include all possible TEACHER/TEXT combinations for a given course; for example, two
tuples are obviously sufficient to show that the physics course has two teachers and two texts. The problem is,
which two tuples? Any particular choice leads to a relation having a very unobvious interpretation and very
strange update behavior (try slating the predicate for such a relation! — i.e., try slating the criteria for deciding
whether or not some given update is an acceptable operation on that relation).

Figure: Values for relation CT and CX corresponding to the CTX value in Fig.

Informally, therefore, it is obvious that the design of CTX is bad and the decomposition into CT and CX is
better. The trouble is, however, these facts are not formally obvious. Note in particular that CTX satisfies no
functional dependencies at all (apart from trivial ones such as COURSE —‘ COURSE), in fact. CTX is in
BCNF, since as already noted it is all key — any “ill key” relation must necessarily be in BCNF. (Note that the
two projections CT and CS are also all key and hence in BCNF). The ideas of the previous chapter are
therefore of no help with the problem at hand.
The existence of “problem” BCNF relation like CTX was recognized very early on, and the way to deal
with them was also understood, at least intuitively. However, it was not until 1977 that these intuitive ideas
were put on a sound theoretical footing by Fagin’s introduction of the notion of multi-valued dependencies,
MVDs. Multi-valued dependencies are a generalization of functional dependencies, in the sense that every FD
is an MVD, but the converse is not true (i.e., there exist MVDs that are not FDs). In the case of relation CTX
there are two MVDs that hold:
Course Teacher
Course Text
Note the double arrows; the MVD A B is read as “B is multi-dependent on A,” or, equivalently, “A
multi-determines B.” Let us concentrate on the first MVD, Course Teacher. Intuitively, what this MVD
means is that, although a course does not have a single corresponding teacher- i.e. the functional dependence
COURSE TEACHER does not hold-nevertheless, each course does have a well-defined set of
corresponding teachers. By “well-defined”, here we mean, more precisely, that for a given course c an a given
text x, the set of teachers t matching the pair (c, x) in CTX depends on the value c alone-it makes no difference
which particular value of x we choose. The second MYD, COURSE TEXT, is interpreted analogously.
Here then is the formal definition of Multi-valued dependence:
74
Let R be a and let A, B, and C be subsets of the attributes of R Then we say that B is multi-dependent on
A-in symbols.
A B
(read “A multi-determines B,” or simply “A double arrow B”) - if and only if, in every possible legal value
of R, the set of B values matching a give (A value, C value) pair depends only on the A value an is independent
of the C value.
It is easy to show that, given the relation R {A, B, C}, the MVDA B holds if and only if the
MVDA C also holds. MVDs always go together in pairs in this way. For this reason it is common to
represent them both in one statement, thus: A . B / C
For example:
Course Teacher / Text
We stated above that multi-valued dependencies are a generalization of functional dependencies, in the
sense that every FD is an MVD. More precisely, an FD is an MVD
in which the set of dependent (right-hand side) values matching a given determinant
(left-hand side) value is always a singleton set. Thus, if A B. then certainly A B.
Returning to our original CTX problem, we can now see that the trouble with relations such as CTX is that
they involve MVDs that are not also FDs. (In case fit is not obvious, we point out that it is precisely the
existence of those MVDs that leads to the
necessity of— for example — inserting two tuples to add another physics teacher. Those
two tuples are needed in older to maintain the integrity constraint that is represented
by the MVD.) The two projection CT and CX do not involve any such MVDs which is
why they represent an improvement over the original design. We would therefore like
to replace CTX by those two projections, and an important theorem proved by Fagin in
reference allows us to make exactly that replacement:
• Theorem (Fagin) : Let R (A, B, C) be a relation, where A, B and C are sets of attributes. Then R is equal
to the join of its projections on (A, C) and (B, C) if and only if R Satisfies the MVDs A B / C.
• Fourth normal form: Relation R is in 4NF if and only if, whenever there exist subsets A and B of the
attributes of R such that the nontrivial (An MVD A B is trivial if either A is a superset of B or the union of
A and B is the entire heading) MVD A B is satisfied, then all attributes of R are functionally dependent
on A.
In other words, the only nontrivial dependencies (FD5 or MVDs) are in the form K X (i.e., functional
dependency from a superkey K to some other attribute X). Equivalently, R is in 4NF if it is in BCNF and all
MVDs in R are in fact “FDs out of keys.” Note in particular, therefore, that 4NF implies BCNF.
Relation CTX is not in 4NF, since it involves an MVD that is not FD at all, let alone an FD “out of a key.”
The two projections CT and CX are both in 4NF, however. Thus 4NF is an improvement over BCNF in that it
eliminates another form of undesirable dependency. What is more, that 4NF is always achievable; that is, any
relation can be non-loss-decomposed into an equivalent collection of 4NF relations.
Joint Dependencies and Fifth Normal Form
So far in this chapter we have assumed that the sole operation necessary or available in the further
normalization process is the replacement of a relation in a non-loss way by exactly two of its projections. This
assumption has successfully carried us as far as 4NF. It comes perhaps as a surprise, therefore, to discover that
there exist relations that cannot be non-loss-decomposed into two projections but can be non-loss- decomposed
into three (or more). To coin an ugly but convenient term, we will describe such a relation as “n-
decomposable” (for some n > 2)-. meaning that the relation in question can be non-loss-decomposed into n
projections but not into m for any m <n. A relation that can be non-loss-decomposed into two projections we
will call “2- decomposable,”
75
Consider relation SPJ from the suppliers-parts-projects database (but ignore OTY for simplicity); a sample
value is shown at the top of Fig. 3.10. Note that relation SPL is all key and involves no nontrivial FDs or
MVDs at all, and is therefore in 4NF. Note to that Fig. also shows:
(a) The three binary projections SP, PJ, and JS corresponding to the SPJ relation value shown at the top of
the figure;
(b) The effect of joining the SP and PJ projections (over P#);
(c) The effect of joining that result and the JS projection (over J# and S#).
Figure: Relation SPJ is the join of all three of its binary projections
but not of any two

Observe that the result of the first join is to produce a copy of the original plus one additional (spurious)
tuple, and the effect of the second join is then to eliminate that spurious tuple, thereby bringing us back to the
original SPJ relation. In other words, the original SPJ relation is 3-decomposable.
Note: The net result is the same whatever pair of projections we choose for the first join, though the
intermediate result is different in each case.
Exercise: Check this claim.
Now, the example of Fig. 3.12 is of course expressed in terms of relations. However, the 3-
decomposability of SPJ could be a more fundamental, time-independent property — i.e., a property satisfied by
all legal values of the relation—if the relation satisfies a certain time-independent integrity constraint. To
understand what that constraint must be, observe first that the statement “SPJ is equal to join of its three
projections SP, PH and JS” is precisely equivalent to the following statement:
if the pair (S1, P1) appeals in SP
and the pair (P1, J1) appears in PJ
and the pair (J1, SI). appears in JS
then the triple (S1, P1, JI) appears in SPJ

76
because the tripl S1, P1, Ji) obviously appears in the join of SP, PJ, and JS (The converse of this statement,
that if (SI;PI,J1) appears in SPJ then (SI, P1) appears in projection SP etc. is clearly true for any degree-3
relation SPJ.) Since (S1,Pi) appears in SP if and only if (SI, P1, JI) appears in SPJ for some J2, and similarly
for (P1,J1) and (J1,SI), we can rewrite the statement above as a constraint on SPJ:
If (S1,P1,J2), (S2,P1,JI), (SI,P2,J1) appear in SPJ
then (S1,P1,J1) also appears in SPJ
And if this statement is true for all time—i.e., for all possible legal values of relation SPJ—then we do have
a time-independent constraint on the relation (albeit a rather bizarre one) Notice the cyclic nature of that
constraint (“if SI is linked to Fl and JI must all coexists in the same tuple”). A relation will be n-decomposable
for some n> 2 if and only if it satisfies some such (n-way) cyclic constraint.
Suppose then that relation SPJ does in fact satisfy that time-independent constraint (the sample values in
Fig. 3.11 are consistent with this hypothesis). For brevity, let us agree in refer to that constraint as Constraint
3D (3D for 3-decomposable). What does Constraint 3D mean in real-world terms? Let us try to make it a little
more concrete by giving an example. The constraint says that, in the portion of the real world that relation SPJ
is supposed to represent, it is a fact that, if (for example)
(a) Smith supplies monkey wrenches, and
(b) Monkey wrenches are used in the Manhattan project, and
(c) Smith supplies the Manhattan project.
then
Smith supplies monkey wrenches to the Manhattan project.
Note that a b, and c. together normally do not imply d. We are saying there is no trap—because there is an
additional real-world constraint in effect, namely Constraint 3D, that makes the inference of d. from a, b, and
c;. valid in this particular case.
To return to the main topic of discussion: Because Constraint 3D is satisfied if and
only if the relation concerned is equal to the join of certain of its projections, we refer to
that constraint as a join dependency (JD). A JD is a constraint on the relation concerned,
just as a MVD or an FD is a constraint on the relation concerned.
• Joint dependency: Let R be a relation, and let A, B Z be subsets of the
attributes of R. Then we say that R satisfies the JD
(A, B ,Z}*
(read “star A, 13. Z) if and only if every possible legal value of R is equal to the join of its projections on A,
B,... Z.
For example, if we agree to use SP, it means the subset (S#,P#) of the set of attributes of SPJ, and similarly
for PJ and JS, then relation SPJ satisfies the JD (SP, PJ, JS).
We have seen, then, that relation SPJ, with its JD (SP, PJ, JS), can be 3-
decomposed. The question is, should it be? And the answer is “Probably yes.”
Relation SPI (with its JD) suffers from a number of problems over update
operations, problems that are removed when it is 3-decomposed.
77
Fig. Sample update problems in SPJ
Fagin’s theorem, to the effect that R(A,B,C) can be non-loss-decomposed into its projections on (A, B) and
(A, C) if and only if the MVDs A ®® B and A ®® C hold in A, can now be restated as follows:
1. R (A,B,C) satisfies the JD (AB,AC) if and only if it satisfies the MVDs A B/C. Since this theorem
can be taken as a definition of multi-valued dependency, it follows that an MVD is just a special case of
a JD, or (equivalently) that JDs are a generalization of MVDs.
Formally, we have
A B/C=*(AB,AC)
Note: It follows from the definition that join dependencies are the most general form of dependency
possible (using, of course, the term “dependency” in a very special sense). That is, there does not exist a still
higher form of dependency such that JDs are merely a special case of that higher form — so long as we restrict
our attention to dependencies that deal with a relation being decomposed via projection and recomposed via
join. (However, if we permit other decomposition and recomposition operators, then other types of
dependencies might come into play).
Returning now to our example, we can see that the problem with relation SPJ is that it involves a JD that
isnot an MVD, and hence not an FD either. We have also seen that it is possible, and probably desirable, to
decompose such a relation into smaller components-namely, into the projections specified by the join
dependency. That decomposition process can be repeated until all resulting relations are in fifth normal form,
which we now define:
Fifth normal form: A relation R is in 5NF, also called projection-join normal form (PJNF) if and only if
every nontrivial join dependency that holds for R is implied by the
candidate keys of R.
Note: We explain below what it means for a JD to be “implied by candidate keys.”
Relation SPJ is not in 5NF. It satisfies a certain join dependency, namely Constraint 3D, that is certainly not
implied by its sole candidate key (that key being the combination of all of its attributes). To state this
differently, relation SPj is not in 5NF, because (a) it can be 3 decomposed and (b) 3-decomposability is not
implied by the fact that the combinations (S#, P#, J#) is a candidate key. By contrast, after 3-decomposition,
the three projections SP, P1, and JS are each in 5NF, since they do not involve any (nontrivial) JDs at all.
Although it might not yet be obvious-because we have not yet explained what it
means for a JD to be implied by candidate keys—it is a fact that any relation in 5NF is
automatically in 4NF also, because (as we have seen) an MVD is a special case of a JD.
In fact any MVD that is implied by a candidate key must be in fact an FD in which that
candidate key is the determinant, that any given relation can be non-less-decomposed
into an equivalent of location of 5NF relations; that is, 5NF is always achievable.
We now explain what it means for a JD to be implied by candidate keys. First we consider a simple
example. Suppose once again that the familiar SUPPLIERS relation S has two candidate keys, (S#) and
78
{SNAME). Then that relation satisfies several join dependencies—for example, it satisfies the JD * (A,B,. . .Z)
is trivial if and only if one of
the projections A, B,.. .Z is the identity projection R (i.e., the projection over all attributes
of R).
{{S#, SNAME , STATUS}, {S#, CITY}}
That is, relation S is equal to the join of its projections on (S#, SNAMES, STATUS) and (S#, CITY), and
hence can be non-loss-decomposed into those projections. (This fact does not mean that it should be so
decomposed, of course, only that it could be.) This JD is implied by the fact that (S#) is a candidate key.
Likewise, relation S also satisfies the JD.
{{ S#. SNAME), {S#, STATUS} {SNAME, CITY))
This JD is implied by the fact that (S#) and {SNAME} are both candidate keys.
As the foregoing example suggests, a given JD * (A, B Z) is implied by candidate keys if and only if each
of A, B,... Z is in fact a superkey for the relation in question. The
given, relation R, we can decompose it in 5NF so long as we know all candidate keys and all JDs in R.
However, discovering all the JDs might itself be a nontrivial operation. That is, whereas it is relatively easy to
identify FDs and MVDs (because they have a fairly straight forward real-world interpretation), the same
cannot be said for JDs rmal that is, they are not MDs and not FDs-because the intuitive meaning of JDs might
not be obvious. Hence the process of determining when a given relation is in 4NF but not in 5NF, and so could
probably be decomposed to advantage, is still unclear. Experience suggests that such relations are pathological
cases and likely to be rare in practice.
In conclusion, we note that it follows from the definition that 5NF is the ultimate normal form with respect
to projection and join (which accounts for its alternative name. projection-join normal form). That is, a relation
in 5NF is guaranteed to be free of anomalies that can be eliminated by taking projections. For if a relation is in
5NF, the only join dependencies are those that are implied by candidate keys, and so the only valid
decompositions are ones that are based on those candidate keys. (Each projection in such a decomposition will
consist of one or more of those candidate keys, plus zero or more additional attributes.) For example, the
SUPPLIERS relation S is in 5NF, It can be further decomposed in several nonloss ways, as we saw earlier, but
every projection in any such decomposition will still include one of the original candidate keys, and hence
there does not seem to be any particular advantage in further reduction.
The Normalization Procedure Summarized
Up to this point in this chapter, we have been concerned with the technique of nonloss decomposition as an
aid to database design. The basic idea is as follows: Given some 1NF relation R and some set of FDs, MVDs,
and JDs that apply to R, we systematically reduce R to a collection of “smaller” (i.e.. lower-degree) relations
that are equivalent to R in a certain well-defined sense but are also in some way more desirable. (The original
relation R might have been obtained by first eliminating certain relation-valued attributes) Each step of the
reduction process consists of taking projections of the relations resulting from the preceding step. The given
constraints are used at each step to guide the choice of which projections to take next. The overall process can
be stated informally as a set of rules, thus:
1. Take projections of the original INF relation to eliminate any FDs that are not irreducible. This step will
produce a collection of 2NF relations.
2 Take projections of those 2NF relations to eliminate any transitive FDs. This step will produce a
collection of 3NF relations.
3. Take projections of those 3NF relations to eliminate any remaining FDs in which the determinant is not
a candidate key. This step will produce a collection of BCNF relations.
79
Q 1 What is database security? Explain the mechanism for maintaining database security.

Ans. Security ma database involves both policies and mechanism to protect the data and ensure that it is
not accessed, altered or deleted without proper authorization As well as information is increasingly in an
organization, more and more database created day-to-day. So there are all database should be secure from
unauthorized access or manipulations from the hand of unknown person Data has to be protected in the
database. There are two dimensions for the protection of data in the database,. First a certain class of data is
available only to those person who are authoize4 to access it This makes the data confidential e g the medical
records of patients in a hospital are accessible to health care officer Second, the data must be protected from
accidental or intentional corruption or destruction e g data on national defense is vital to the security of a state
There is safety of data processing in a chemical plant In addition to the economic or strategic reasons for
protecting data from un-authorization access, corruption or destruction, there is a privacy dimension for data
security and integrity

Security and Integrity Threats:
Some security and integrity threats are:
Some types of threats can only be addressed using social, behavior and control mechanism to damage the
data. The threats are either accidental or intentional. So there are two types of security & integrity threats occur
in the security integrity concept

Accidental security and Integrity threats.

Some accidental security and integrity threats are:
1. A user can get access to a portion of, the database which other users cannot access Also that user damage
a part of the data accidentally then whole data may be corrupted e g if an application programmer accidentally
delete some function or subroutine then whole of the program in database will be affected.
2. Sometimes failure of any portion effect the whole data, or example, during a transaction processing if
power supply becomes (fail) off then the computed data will not be transferred to the storage device and so
data will be lost. I Proper recovery procedures are normally used to recover from the failure occurring during
transaction processing.
3 .Sometimes concurrent processing or concurrent usage of data gives problem and it will be lost or
damaged
4 .Sometime system error occur. A dial in user may be assigned the identity of another dial in user who was
disconnected accidentally or who hung up without going through a long off procedure.
5. Sometimes improper authorization will cause the problem, which could lead to database security and / or
integrity violation.
6. Hardware failure also causes the problem of data destruction. So to avoid this Hardware failure security,
integrity should be needed.

Malacious or Intentional Security and Integrity Threats :
Some intentional security and integrity threats factors are as :
1. A computer system operator or system programmer can intentionally by pass the normal security and
integrity mechanisms, alter or destroy the data in the database or make unauthorized copies of sensitive data.
2. An unauthorized user can bet access to a secure terminal or the password of an authorized user and
compromise the database. Such user could also destroy the data base file.
3. Authorized users could pass on sensitive information under pressure or form personal gain.
80
4. System and application programmers could by pass normal security in, their program by directly
accessing database files, and making changes and copies for illegal use.
5. An unauthorized person can get access to the computer system, physically or by. using communication
channel and compromise the database.

Protection:

Four levels of defense (protection) are generally recognized for database security.
These are:
(a) Human Factor: which encompass the ethical, legal and social environments. An
organization depends on these to provide a certain degree of protection.

(b) Physical Security: mechanism includes appropriate locks and keys and entry
1og to computing facility and terminals. Security of the physical storage devices (magnetic tapes, disk pack
etc) within the organization and when being transmitted from one location to another must be maintained. User
identification and password have to be kept confidential otherwise unauthorized user compromises the
database.

(c)Administrative control: Controls are the security and access control policies that determine what
information will be accessible to what class of user and the type of access that will be allowed to this class.

(d) OS and DBMS mechanism: These are very good feature of security. Operating system gives protection
to the data and progress both in primary & secondary memories Also users are established by operating system
The DBMS transaction management, audit and recovery data during logging process Also DBMS have some
integrity constraint and validation procedure for the check of user and procedures.

Protection and Version Methods of Protection:

Protection
Protection is the branch of security when you want to safe the data from unauthorized access by using
different mechanisms and ways Then these ways and mechanisms are protected. Some protection methods are:

(a) Identification and Authentication: The authorization mechanism prepares the user profile for user and
indicates the portion of the database accessible to that user and the mode of access allowed. The enforcement
of the security policies in the data base system requires that the system know the identity of security policies in
the database system requires that the system knows the identity of the user making the request. So before
making any request the user has to identify her or himself to the system and authenticate the identification to
confirm that the user in fact the correct person. The simplest and most common authentication scheme used is a
password to authenticate the user. The user enters the user name or number & then authenticates her(himself)
by the password These are used once for the initial signs on to the system. But for sensitive and important data
on every step authentication / identification procedures can be operated. Sometimes badge, card or keys are
used for access.
(b) Distributed system Protection: For the protection of data, security enforcement in distributed system
can be enhanced by distributor Sensitive information can be fragmented and stored at dispersed sites. The
leakage of some portion of the fragment data may be not as disastrous as the leakage of unfragmented data.
Also with distribution different sites can have different levels of security and protection of data.
81
(c) Cryptography and Encryption: Suppose defence want to send or transmit a message with protected way.
The message is:

“Thanks are coming towards ‘AMRITSAR

One method of transmitting this message is to substitute a different character of the alphabet for each character
in the message. If we ignore the space between words and the punctuation and by substitution can be made by
shifting each character by a different random amount, then the above message can be transformed into as:
“Sbolrbsfdpnnjohupxbsebn&julbs”

The above process is cryptography. This is also called Encryption of data, Before
transmission data should be encrypted. This is best way to protection.

System Integrity.

Integrity implies that any properly authorized access, alteration or deletion of data in the database does not
change the validity of the data security and integrity concepts are distinct but are related with each other.
Actually integrity is obtained from security. The mechanism that is applied to ensure that the data in the
database is correct and consistent is called Data Integrity. The integrity is also the maintenance of data, which
is damaged by unauthorized person.

Data Integrity

This requires that there is a need for guarding against invalid database operations. An operation here is
used to indicate any action performed on behalf of a user or application program that modifies the state of the
database. Such operations are the result of the action such as update, insert or delete. Database integrity
involves the correctness of data. This correctness has to be preserved in the presence of concurrent operations,
error in the user’s operations and application programs and failures in Hardware and Software Integrity has
recovery system for the lost and damaged data and also check for data information stored in memory.
In database integrity there are some types of constraints that the database has to enforce to maintain the
consistency and validity of the data. Integrity constraints are hard to understand when we use these constraints
rule in application program.
Centralizing the integrity checking directly under the DBMS reduces duplication and ensure the
consistency and validity of the database. The centralized integrity constraints can be maintained in a system
catalog (data dictionary) and can be accessible to the database users via the query language.

Q. 2. What is database security?

Ans. Security in a database involves both policies and mechanism to protect the data and ensure that it is
not accessed, altered or deleted without proper authorization. As well as information is increasingly in an
organization, more and more database are created day-to-day. So there are all database should be secured from
unauthorized access or manipulations from the hand of unknown person. Data has to be protected in the
database. There are two dimensions for the protection of data in the database. First a certain class of data is
available only to those person who are authorized to access it. This makes the data confidential e.g. the
medical records of patients in a hospital are accessible to health care officer. Second, the data must be
82
protected from accidental or intentional corruption or destruction e.g. data on national defense is vital to the
security of a state. There is safety f data processing in a chemical plant. In addition to the economic or strategic
reasons for protecting data from un-authorization access, corruption or destruction, there is a privacy
dimension for data security and integrity.

Q. 3. Discuss the concept of transaction in detail.
Or
What are the desirable properties of transaction?

Ans. A transaction is a logical unit of work that must be either entirely completed or aborted; no
intermediate states are acceptable. Most real-world database transactions are formed by two or more database
requests. A database request is the equivalent of a single SQL statement in an application program or
transaction.
A transaction that changes the contents of the database must alter the database from one consistent database
state to another. To ensure consistency of the database, every transaction must begin with the database in a
known consistent state

A transaction is a logical unit of work that contains one or more SQL statements. A transaction is an atomic
unit. The effects of all the SQL statements in a transaction can be either all committed (applied to the database)
or all rolled back (undone from the database).
A transaction begins with the first executable SQL statement. A transaction ends when it is committed or
rolled back, either explicitly with a COMMIT or ROLLBACK statement or implicitly when a DDL statement
is issued.
Consider a banking database, when a bank customer transfers money from a savings account to a checking
account, the transaction can consist of three separate operations:
• Decrement the savings account
• Increment the checking account
• Record the transaction in the transaction journal
If all three SQL statements can be performed to maintain the accounts in proper balance, the effects of - the
transaction can, be applied to the database. However, if a problem such as insufficient funds, invalid account
number, or a hardware failure prevents one or two of the statements in the transaction from completing, the
entire transaction must be rolled back so that the balance of all accounts is .correct. Figure 2 illustrates the
banking transaction example
83

Statement Execution and Transaction Control:

A SQL statement that runs successfully is different from a committed transaction.
Executing successfully means that a single statement was:
•Parsed
• Found to be a valid SQL construction
•. Run without error as an atomic unit. For example, all rows of a multirow update are changed.
However, until the transaction that contains the statement is committed, the transaction can be rolled back,
and all of the changes of the statement can be undone.. A statement, rather than a transaction, runs
successfully.

Committing means that a user has explicitly or implicitly requested that the changes in the transaction be
made permanent. An explicit request :means that the user issued a COMMIT statement. An implicit request
can be made through normal termination of an application or in data definition language, for example. The
changes made by the SQL statements of your transaction become permanent and visible to other users only
after your transaction has been committed. Only other users’ transactions that started after yours will see the
committed changes.

You can name a transaction using the SET TRANSACHON ... NAME statement before you start the
transaction. This makes it easier to monitor long-running transactions and to resolve in-doubt distributed
transactions.

If at any time during execution a SQL statement causes an error, all effects of the statement are rolled back.
The effect of the rollback is as if that statement had never been run. This operation is a statement-level
rollback.
84
Errors discovered during SQL statement execution cause statement-level rollbacks. An example of such an
error is attempting to insert a duplicate value in a primary key. Single SQL statements involved in a deadlock
(competition for the same data) can also cause a statement-level rollback. Errors discovered during SQL
statement parsing, such as a syntax error, have not yet been run, so they do not cause a statement- level
rollback.

A SQL statement that fails causes the loss only of any work it would have performed itself. It does not
cause the loss of any work that preceded it in the current transaction. If the statement is a DDL statement, then
the implicit commit that immediately preceded it is not undone. The user can also request a statement-level
rollback by issuing a ROLLBACK statement. Note that users cannot directly refer to implicit save points in
rollback statements.

Resumable Space Allocation.

Some DBMS provides a means for suspending, and later resuming, the execution of large database
operations in the event of space allocation failures. This enables an administrator to take corrective action,
instead of the database server returning an error to the user. After the error condition is corrected, the
suspended operation automatically resumes. This feature is called resumable space allocation and the
statements that are affected are called resumable statements.

A statement runs in a resumable mode only when the client explicitly enables resumable semantics for the
session using the ALTER SESSION statement. Resumable space allocation is suspended when one of the
following conditions occurs:
• Out of space condition
• Maximum extents reached condition
• Space quota exceeded condition
For nonresumable space allocation, these conditions result in errors and the statement is rolled back.
Suspending a statement automatically results in suspending the transaction. Thus all transactional resources are
held through a statement suspend and resume.

When the error condition disappears (for example, as a result of user intervention or perhaps sort space
released by other queries), the suspended statement automatically resumes execution.

Process of Transaction.
A DBMS must provide transaction processing system (TP System) to guarantee that if the transaction
executes some updates and then a failure occurs due to some reason before transaction reaches its termination,
then those updates will be undone. Therefore the transaction either executes in its entirety or is totally
canceled.

Transaction processing systems provide tools to help software development for applications that involve
querying and updating databases. The term “TP system” is generally taken to mean a complete system,
including application generators, one or more database systems, utilities and networking software. Within a Ti’
system, there is a core collection of services, called the TP monitor, that coordinates the flow of transactions
through the system. (Fig.)
85

In order to work properly transaction-processing system needs following system requirements
• High Availability: System must be on-line and operational while enterprise is functioning.
• High Reliability: Correctly tracks state, does not lose data, controlled concurrency.
• High Throughput: Many users => many transactions/sec.
• Low Response Time: On-line => users are waiting.
• Long Lifetime: Complex systems are not easily replaced.
• Must be designed so they can be easily extended as the needs of the enterprise change.
• Security: Sensitive information must be carefully protected since system is accessible to many users .
Roles in Design, Implementation, and Maintenance of a TPS
• System Analyst specifies system-using input from customer and also provides complete description of
functionality from customers and users point of view.
• Database Designer specifies structure of data that will be stored in database
• Application Programmer implements application programs (transactions) that access data and support
enterprise rules
• Database Administrator maintains database once system is operational: space allocation, performance
optimization, database security
• System Administrator maintains transaction-processing system, monitors inter connection of hardware
and software modules, deals with failures and congestion.

A transaction begins when the first executable SQL statement is encountered. An executable SQL
statement is a SQL statement that generates calls to an instance, including DML and DDL statements.

When a transaction begins, DBMS assigns the transaction to an available undo tablespace or rollback
segment to record the rollback entries for the new transaction.
A transaction ends when any of the following occurs:
• A user issues a COMMIT or ROLLBACK statement without a SAVEPOINT clause.
• A user runs a DDL statement such as CREATE, DROP, RENAME, or ALTER. If the current transaction
contains any DML statements, DBMS first commits the transaction, and then runs and commits the DDL
statement as a new, single statement transaction.
• A user disconnects from DBMS. The current transaction is committed.
• A user process terminates abnormally. The current transaction is rolled back.
After one transaction ends, the next executable SQL statement automatically starts the following
transaction. Note that Applications should always explicitly commit or roll back transactions before program
termination.
Commit Transactions
Committing a transaction means making permanent the changes performed by the SQL statements within
the transaction.
Before a transaction that modifies data is committed, the following has occurred:
86
• DBMS has generated rollback segment records in buffers in the SGA that store rollback segment data.
The rollback information contains the old data values changed by the SQL statements of the transaction.
• DBMS has generated redo log entries in the redo log buffer of the SGA. The redo log record contains the
change to the data block and the change to the rollback block. These changes may go to disk before a
transaction is committed.
• The changes have been made to the database buffers of the SGA. These changes may go to disk before a
transaction is committed.
Note that the data changes for a committed transaction, stored in the database buffers of the SGA, are not
necessarily written immediately to the data files by the database writer (DBWn) background process. This
writing takes place when it is most efficient for the database to do so. It can happen before the transaction
commits or alternatively, it can happen some time after the transaction commits.

When a transaction is committed, the following occurs:
• The internal transaction table for the associated rollback segment records that the transaction has
committed, and the corresponding unique system change number (SCN) of the transaction is assigned and
recorded in the table.
• The log writer process (LGWR) writes redo log entries in the SGA’s redo log buffers to the online redo
log file. It also writes the transactions SCN to the online redo log file. This atomic event constitutes the commit
of the transaction.
• DBMS releases locks held on rows and tables.
• DBMS marks the transaction complete.

Rollback of Transactions:
Rolling back means undoing any changes to data that have been performed by SQL statements within an
uncommitted transaction. DBMS uses undo tablespaces or rollback segments to store old values. The redo log
contains a record of changes.
DBMS lets you roll back an entire uncommitted transaction. Alternatively, you can roll back the trailing
portion of an uncommitted transaction to a marker called a savepoint.
All types of rollbacks use the same procedures:
• Statement-level rollback (due to statement or deadlock execution error)
• Rollback to a savepoint
• Rollback of a transaction due to user request
• Rollback of a transaction due to abnormal process termination
• Rollback of all outstanding transactions when an instance terminates abnormally
• Rollback of incomplete transactions during recovery
In rolling back an entire transaction, without referencing any savepoints, the following occurs:
1. DBMS undoes all changes made by all the SQL statements in the transaction by using the corresponding
undo tablespace or rollback segment.
2. DBMS releases all the transaction’s locks of data.
3. The transaction ends.

Savepoints in Transactions
You can declare intermediate markers called savepoints within the context of a transaction. Savepoints
divide a long transaction into smaller parts.

87
Using savepoints, you can arbitrarily mark your work at any point within a long transaction. You then have
the option later of rolling back work performed before the current point in the transaction but after a declared
savepoint within the. transaction. For example, you can use savepoints throughout a long complex series of
updates, so if you make an error, you do not need to resubmit every statement.

Savepoints are similarly useful in application programs. If a procedure contains several functions, then you
can create a savepoint before each function begins. Then, if a function fails, it is easy to return the data to its
state before the function began and re-run the function with revised parameters or perform a recovery action.

After a rollback to a savepoint, DBMS releases the data locks obtained by rolled back statements. Other
transactions that were waiting for the previously locked resources can proceed. Other transactions that want to
update previously locked rows can do so.

When a transaction is rolled back to a savepoint, the following occurs:
• DBMS rolls back only the statements run after the savepoint.
• DBMS preserves the specified savepoint, but all savepoints that were established after the specified one
are lost.
• DBMS releases all table and row locks acquired since that savepoint but retains all data locks acquired
previous to the savepoint.

The transaction remains active and can be continued. Note that whenever a session is waiting on a
transaction, a rollback to savepoint does not free rowlocks. To make sure a transaction doesn’t hang if it cannot
obtain a lock, use FOR UPDATE ….. NOWAIT before issuing UPDATE or DELETE statements.

Transaction Naming
You can name a transaction, using a simple and memorable text string. This name is a reminder of what the
transaction is about. Transaction names replace commit comments for distributed transactions, with the
following advantages:
• It is easier to monitor long-running transactions and to resolve in-doubt distributed transactions.
• You can view transaction names along with transaction IDs in applications. For example, a database
administrator can view transaction names in Enterprise Manager when monitoring system activity.
• Transaction names are written to the transaction auditing redo record.
• Log Miner can use transaction names to search for a specific transaction from transaction auditing records
in the redo log.
• You can use transaction names to find a specific transaction in data dictionary tables, such as
V$TRANSACTION.

Name a transaction using the SET TRANSACTION ... NAME statement before you start the transaction.
When you name a transaction, you associate the transaction’s name with its ID. Transaction names do not have
to be unique; different transactions can have the same transaction name at the same time by the same owner.
You can use any name that enables you to distinguish the transaction.

The Two-phase Commit Mechanism
In a distributed database, DBMS must coordinate transaction control over a network and maintain data
consistency, even if a network or system failure occurs.
88
A distributed transaction is a transaction that includes one or more statements that update data on two or
more distinct nodes of a distributed database.

A two-phase commit mechanism. guarantees that all database servers participating in a distributed
transaction either all commit or all roll back the statements in the transaction. A two-phase commit mechanism
also protects implicit DML operations performed by integrity constraints, remote procedure calls, and triggers.

The two-phase commit mechanism is completely transparent to users who issue distributed transactions. In
fact, users need not even know the transaction is distributed. A COMMIT statement denoting the end of a
transaction automatically triggers the two-phase commit mechanism to commit the transaction. No coding or
complex statement syntax is required to include distributed transactions within the body of a database
application.

The recoverer (RECO) background process automatically resolves the outcome of in-doubt distributed
transactions -distributed transactions in which the commit was interrupted by any type of system or network
failure. After the failure is repaired and communication is reestablished, the RECO process of each local
DBMS server automatically commits or rolls back any in-doubt distributed transactions consistently on all
involved nodes.

In the event of a long-term failure, DBMS allows each local administrator to manually commit or roll back
any distributed transactions that are in doubt as a result of the failure This option enables the local database
administrator to free any locked resources that are held indefinitely as a result of the long-term failure.
If a database must be recovered to a point in the past, DBMS’s recovery facilities enable database
administrators at other sites to return their databases to the earlier point in time also. This operation ensures
that the global database remains consistent.

Read/Write Operation
Since a transaction is a general program, there are an enormous number of potential operations that a
transaction can perform. However, there are only two really important operations:
1. read(A,t) (or read(A) when t is not important) — This operation is used to read database element A into
local variable t.
2. write(A,t) (or write(A) when t is not important) — This operation is used to write the value of local
variable t to the database element A.
We will assume that the buffer manager insures that database element is in memory.
We could make the memory management more explicit by using following operations:
3. input(A) — This operation is used to read database element A into local memory buffer.
4. output(A) - This operation is used to copy the block containing A to disk.
Let us consider an example to understand the use of read and write operation. Suppose that we want to
transfer $50 from account A to account B, then the set of operations performed are:
1. read(A,t)
2. t = t — 50
3. write(A,t)
4. read(B,t)
5. t = t + 50
6. write(B,t)
89
The first sep is used to read amount in account A with the help of read operation into the local variable t. In
step 2, we reduce the vale of t by 50. Step 3 is used to write back the updated value to account A with the help
of write operation. In step 4, value of account B is read into local variable t, which is incremented by 50 in step
5. Value of t is written to account B in step 6 with the help of write operation.

Transaction Properties (Acid Properties)
Any change to system state within a transaction boundary, therefore, has to ensure that the change leaves
the system in a stable and consistent state. A transactional unit of work is one in which the following four
fundamental transactional properties are satisfied: atomicity, consistency, isolation, and durability (ACID). We
will examine each property in detail.

ATOMICITY
Results of a transaction’s execution are either all committed or all rolled back. All changes take effect, or
none do. It is common to refer to a transaction as a “unit of work.” In describing a transaction as a unit of
work, we are describing one fundamental property of a transaction: that the activities within it must be
considered indivisible that is, atomic. A Flute Bank customer may interact with Flute’s ATM and transfer
money from a checking account to a savings account.

Within the Flute Bank software system, a transfer transaction involves two actions: debit of the checking
account and credit to the savings account. For the transfer transaction to be successful, both actions must
complete successfully. If either one fails, the transaction fails. The atomic property of transactions dictates that
all individual actions that constitute a transaction must succeed for the transaction to succeed, and, conversely,
that if any individual action fails, the transaction as a whole must fail.
As an example consider two transactions:
TI: BEGIN A = A + 100, B = B - 100 END
T2: BEGIN A = 1.06*A, B = 1.06*B END
Intuitively, the first transaction is transferring $100 from B’s account to A’s account. The second is
crediting both accounts with a 6% interest payment. There is no guarantee that TI will execute before T2 or
vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions
running serially in some order.
TI: A = A + 100, B = B — 100,
T2: A = 1.06*A, B = 1.06*B
This is OK. But what about this
TI: A = A + 100, B = B - 100,
T2: A = 1.06*A, B = 1.06*B

The DBMS ‘s view of the second schedule is
T1: read(A), write(A), read(B), write(B)
T2: read(A), write(A), read(B), write(B)
Fig. A possible interleaving (schedule)

Consistency
The database is transformed from one valid state to another valid state. This defines a transaction as legal
only if it obeys user-defined integrity constraints. Illegal transactions aren’t allowed and, if an integrity
constraint can’t be satisfied then the transaction is rolled back. For example, suppose that you define a rule
that, after a transfer of more than $10,000 out of the country, a row is added to an audit table so that you can
90
prepare a legally required report for the IRS. Perhaps for performance reasons that audit table is stored on a
separate disk from the rest of the database. If the audit table’s disk is off-line and can’t be written, the
transaction is aborted.

A database or other persistent store usually defines referential and entity integrity rules to ensure that data
in the store is consistent. A transaction that changes the data must ensure that the data remains in a consistent
state— that data integrity rules are not violated, regardless of whether the transaction succeeded or failed. The
data in the store may not be consistent during the duration of the transaction, but the inconsistency is invisible
to other transactions, and consistency must be restored when the transaction completes.
Isolation
When multiple transactions are in progress, one transaction may want to read the same data another
transaction has changed but not committed. Until the transaction commits, the changes it has made should be
treated as transient state, because the transaction could roll back the change. If other transactions read
intermediate or transient states caused by a transaction in progress, additional application logic must be
executed to handle the effects of some transactions having read potentially erroneous data. The isolation
property of transactions dictates how concurrent transactions that act on the same subset of data behave. That
is, the isolation property determines the degree to which effects of multiple transactions, acting on the same
subset of application state, are isolated from each other.

At the lowest level of isolation, a transaction may read data that is in the process of being changed by
another transaction but that has not yet been committed. If the first transaction is rolled back, the transaction
that read the data would have read a value that was not committed. This level of isolation-read uncommitted, or
“dirty read”-can cause erroneous results but ensures the highest concurrency. An isolation of read committed
ensures that a transaction can read only data that has been committed. This level of isolation is more restrictive
(and consequently provides less concurrency) than a read uncommitted isolation level and helps avoid the
problem associated with the latter level of isolation.

An isolation level of repeatable read signifies that a transaction that read a piece of data is guaranteed that
the data will not be changed by another transaction until the transaction completes. The name “repeatable read”
for this level of isolation comes from the fact that a transaction with this isolation level can read the same data
repeatedly and be guaranteed to see the same value. The most restrictive form of isolation is serializable. This
level of isolation combines the properties of repeatable- read and read-committed isolation levels; effectively
ensuring that transactions that act on the same piece of data are serialized and will not execute concurrently.

The isolation portion of the ACID properties is needed when there are concurrent transactions. Concurrent
transactions are transactions that occur at the same time, such as shared multiple users accessing shared
objects. This situation is illustrated at the top of the figure 5 as activities occurring over time. The safeguards
used by a DBMS to prevent conflicts between concurrent transactions are a concept referred to as isolation.
91
Fig. Concurrently executing transaction
As an example, if two people are updating the same catalog item, it’s not acceptable for one person’s
changes to be “clobbered” when the second person saves a different set of changes. Both users should be able
to work in isolation, working as though he or she is the only user. Each set of changes must be isolated from
those of the other users.
An important concept to understanding isolation through transactions is serializability. Transactions are
serializable when the effect on the database is the same whether the transactions are executed in serial order or
in an interleaved fashion. As you can see at the top of the figurel.5, Transactions 1 through Transaction 3 are
executing concurrently over time. The effect on the DBMS is that th transactions may execute in serial order
based on consistency and isolation requirements. If you look at the bottom of the figure 1.5, you can see
several ways in which these transactions may execute. It is important to note that a serialized execution does
not imply the first transactions will automatically be the ones that will terminate before other transactions in
the serial order.
Degrees of Isolation
• Degree 0 A transaction does not overwrite data updated by another user or process (“dirty data’) of other
transactions
• Degree 1 Degree 0 plus a transaction does not commit any writes until it completes all its writes (until the
end of transaction).
• Degree 2 Degree 1 plus a transaction does not read dirty data from other transactions.
• Degree 3 Degree 2 plus other transactions do not dirty data read by a transaction before the transaction
commits.
These were originally described as degrees of consistency by Jam Gray
For example let us consider two transactions:
First transaction transfers $100 from B’s account to A’s Second transaction credits
both accounts with 6% interest. Let us assume at first A and B each have $1000. Then what are the legal
outcomes of running TI and T2?
There is no guarantee that Ti will execute before T2 or vice-versa, if both are
submitted together.
Consider a possible interleaved schedule
92

Durability
Once committed (completed), the results of. a transaction are permanent and survive future system and
media failures. If the airline reservation system computer gives you seat 22A and crashes a millisecond later, it
won’t have forgotten that you are sitting in 22A and also give it to someone else. Furthermore, if a programmer
spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the
coffee spill, showing that you had seat 22A.

The durability property of transactions refers to the fact that the effect of a transaction must endure beyond
the life of a transaction and application. That is, state changes made within a transactional boundary must be
persisted onto permanent storage media, such as disks, databases, or file systems. If the application fails after
the transaction has committed, the system should guarantee that the effects of the transaction will be visible
when the application restarts. Transactional resources are also recoverable: should the persisted data be
destroyed, recovery procedures can be executed to recover the data to a point in time (provided the necessary
administrative tasks were properly executed). Any change committed by one transaction must be durable until
another valid transaction changes the data.

States of Transaction
During its execution, a transaction can be in many states. These states indicate the status of a transaction.
Various states in which a transaction can be are:
1. Active: This state is the initial state of a transaction. The transaction stays in this state while it is
executing. A transaction enters active state when the first query or update is encountered. Data is processed in
buffer or on disk.
2. Partially committed: A transaction is partially committed after its final statement, has been executed. A
transaction may change its state form active to partially committed one. A transaction enters this state
immediately before the “commit work”. All operations are completed (in the memory buffer or on disk) and
wait to be finalized.
3. Failed: A transaction enters the failed state after the discovery that normal execution can no longer
proceed. A transaction may change is state form active to failed state. A transaction enter this state when the
transaction is interrupted by an event such as a program exception or a system.
4. Aborted: A transaction is aborted after it has been rolled back and the database restored to its prior state
before the transaction. A transaction enters this state after a “rollback -work” or at the system recovery. All
93
updates made by the transaction are rolled back and .the database is restored to the state prior to the start of the
transaction.
‘There are two options after abort:
• Restart the transaction: This option is selected only if there is no internal logical error.
• Kill the transaction: This option is selected if there is problem with transaction itself.
5. Committed: Commit state occurs after successful completion. May also consider terminated as a
transaction state A transaction enters this state after “commit work”. Updates are guaranteed to be permanent.

Fig. Transaction State Diagram

Advantages of Concurrent Execution of Transaction
Concurrent execution of transaction means executing more than one transaction at the same time. The
schedule shown in fig. represents an interleaved execution of two transactions. Ensuring transaction isolation
while permitting such concurrent execution is difficult.

Thus, multiple transactions are allowed to run concurrently in the system.
Advantages of using concurrent execution of transaction are:

• Increased processor utilization - If system is executing only one transaction then processor might not be
always busy for example if the only transaction in the system is waiting for the completion of some I/O
operation, processor is also waiting and thus doing no task. On the other hand if system is executing more than
one transaction at same time, processor might be always busy executing one or the older transaction.
• Increased Disk utilization
• Better transaction throu1put - One transaction can be using the CPU while another is reading from or writing
to the disk
• Reduced average response time for transaction - As short transactions need not wait behind long ones.
• Reduced average turnaround time for transactions - Turnaround time is the time interval between transaction
submission and transaction completion. As more than one transaction is executing at same time there is
reduction in average turnaround time.
94
•. Reduced average wait time for transactions - As more transactions are completed in less time.

Q. 4. What are the various Locking Techniques for Concurrency Control?

Ans. A lock is a variable associated with a data item in the database and describes the status of that item
with respect to possible access operations to the item. Locks enable a multi-user DBMS to maintain the
integrity of transactions by isolating a transaction from others executing concurrently. 9ks are particularly
critical in write- intensive and mixed workload (read/write) environments, because they can prevent the
inadvertent loss of data or consistency problems with reads.

Figure 10 1 depicts lost update situation that could occur if a DBMS did not
lock data Two transactions read the same bank account balance, each intending to
ad1 money to it However, because the second transaction bases its update on the
original balance, the money deposited by the first is lost .

We could have avoided this scenario if the DBMS had appropriately locked the balance on behalf of the
first transaction in preparation for its update. The second transaction would have waited, thereby using the
updated balance as a basis for its work.

As with locking data in preparation for writes, locking data for reads can be important in certain situations,
preventing inconsistent analysis of the database. While DBMSs use exclusive locks for writes, share locks are
commonly used for reads. Share locks enable other concurrently executing transactions to read the same data
but prohibit any transaction from writing the chosen data.

Consider a situation that might occur without share locks, as shown in Fig 10 2 The first transaction reads
the balances of multiple accounts (perhaps through multiple queries) with the intention of calculating an
accurate sum Another transaction transfers money from one account to another during this process The timing
of this work is such that it causes the first transaction to read only part of the effect of this transfer, thus making
its sum total inconsistent with what it should be If share locks Were held by the first transaction until
transaction commit, this inconsistent analysis would not occur.
95

Serializability is an important concept associated with locking. It guarantees that the work of concurrently
executing transactions will leave the database in consistent state as it would have been if these transactions had
executed serially. This requirement is the ultimate criterion for database consistency and is the motivation for
the two-phase locking protocol, which dictates that no new locks can be acquired on behalf of a transaction
after the DBMS releases a lock held by that transaction. In practice, this protocol generally means that locks
are held until commit time.

Aside from their integrity implications, locks can have a significant impact on performance. While it may
benefit a given application to lock a large amount of data (perhaps one or more tables) and hold these locks for
a long period of time, doing so inhibits concurrency and increases the likelihood that other applications will
have to wait for locked resources. Yet locking only small amounts of data and releasing these locks quickly
may be inappropriate for some applications, increasing the overhead associated with transaction processing. In
addition, certain integrity problems can arise if a single transaction acquires locks after some have already been
released.
Need. for Lock
Lock is required for the following
• Need isolation (the “I” of ACID):
• Give each transaction the illusion that there are no concurrent updates.
• Hide concurrency anomalies.
• Do it automatically - (system does not know transaction semantics)
• Goal of lock
-To provide concurrency in a system execution equivalent to some serial execution of the system
-Not deterrninistic9utc2me just a consistent transformation
Locks are a popular approach concurrency control. Transactions request and acquire locks on data items
which they wish to access and which they do not want other transaction to update. i.e. a lock locks other
transactions out. Transactions can not access data items unless they have the appropriate lock.

Most locking protocols are based on two types of locks: -
96
WRITE (or exclusive) locks: if a transaction holds a write lock on an item no other transaction may acquire a
read or write lock on that item.
READ (or shared) locks: if a transaction holds a read lock on an item no other transaction may acquire a write
lock on that item.
Transactions go into a WAIT state till required lock is available. Acquisition of locks is the responsibility
of the transaction management subsystem. For strict schedules - i.e. simple recovery, transactions should hold
all exclusive locks until COMMIT or ROLLBACK time. Thus no transaction can read or update an item until
the last transaction that updated it has committed and released the exclusive lock.

LOCK MANAGEMENT
The art of the DBMS that keeps track of the locks issued to transactions is called the lock manager. The lock
manager maintains a lock table which is a hah table with the data object identifier as the key. The DBMS also
maintains a descriptive entry for each transaction in a transaction table, and among other things, the entry
contains a pointer to a list of locks held by the transaction. A lock table entry for an object which can be a
page, a record, and so on, depending on the DBMS contains the following information: the number of
transactions currently. holding a lock on the object (this can be more than one if the object is locked in shared
mode), the nature of the lock (shared or exclusive), and a pointer to a queue of lock requests.

Implementing Lock and Unlock Requests
According to the Strict 2PL protocol, before a transaction T reads or writes a database object 0, it must
obtain a shared or exclusive lock on 0 and must hold on to the lock until it commits or aborts. When a
transaction needs a lock on an object, it issues a. lock request to the lock manager:
1. If a shared lock is requested, the queue of requests is empty, and the object is not currently locked in
exclusive mode, the lock manager grants the lock and updates the lock table entry for the object (indicating that
the object is locked in shared mode, and incrementing the number of transactions holding a lock by one).
2. If an exclusive lock is requested, and no transaction currently holds a lock on the object (which also
implies the queue of requests is empty), the lock manager grants the lock and updates the lock table entry.
3. Otherwise, the requested lock cannot be immediately granted, and the lock request is added to the queue
of lock requests for this object. The transaction requesting the lock is suspended. When a transaction aborts or
commits, it releases all its locks. When a lock on an object is released, the lock manager updates the lock table
entry for the object and examines the lock request at the head of the queue for this object. If this request can
now be granted, the transaction that made the request is woken up and given the lock. Indeed, if there are
several requests for a shared lock on the object at the front of the queue, all of these requests can now be
granted together.

Note that if TI has a shared lock on 0, and T2 requests an exclusive lock, T2’s request is queued. Now, if
T3 requests a shared lock, its request enters the queue behind that of T2, even though the requested lock is
compatible with the lock held by TI. This rule ensures that .T2 does not starve, that is, wait indefinitely while a
stream of other transactions acquire shared locks and thereby prevent T2 from getting the exclusive lock that it
is waiting for.

Atomicity of Locking and Unlocking,
The implementation of lock and unlock commands must ensure that these are atomic operations. To ensure
atomicity of these operations when several instances of the lock manager code can execute concurrently,
access to the lock table has to be guarded by an operating system synchronization mechanism such as a
semaphore. To understand why, suppose that a transaction requests an exclusive lock. The lock manager
97
checks and finds that no other transaction holds a lock on the object and therefore decides to grant the request.
But in the meantime, another transaction might have requested and received a conflicting lock! To prevent this,
the entire sequence of actions in a lock request call (checking to see if the request can be granted, updating the
lock table, etc.) must be implemented as an atomic operation.

The DBMS maintains a transaction table, which contains (among other things) a list of the locks currently
held by a transaction. This list can be checked before requesting a lock, to ensure that the same transaction
does not request the same lock twice. However, a transaction may need to acquire an exclusive lock on an
object for which it already holds a shared lock. Such a lock upgrade request is handled specially by granting
the write lock immediately if no other transaction holds a shared lock on the object and inserting the request at
the front of the queue otherwise. The rationale for favoring the transaction thus is that it already holds a shared
lock on the object and queuing it behind another transaction that wants an exclusive lock on the same object
causes both transactions to wait for each other and therefore be blocked forever.

We have concentrated thus far on how the DBMS schedules transactions, based on their requests for locks.
This interleaving interacts with the operating system’s scheduling of processes’ access to the CPU and can lead
to a situation called a convoy, where most of the CPU cycles are spent on process switching. The problem is
that a transaction T holding a heavily used lock may be suspended by the operating system. Until T is resumed,
every other transaction that needs this lock is queued. Such queues, called convoys, can quickly become very
long; a convoy, once formed, tends to be stable. Convoys are one of the drawbacks of building a DBMS on top
of a general purpose operating system with preemptive scheduling. In addition to locks, which are held over a
long duration, a DBMS also supports short duration latches. Setting a latch before reading or writing a page
ensures that the physical read or write operation is atomic; otherwise, two read/write operations might conflict
if the objects being locked do not correspond to disk pages (the units of I 0). Latches are unset immediately
after the physical read or write operation is completed.

TYPES OF LOCKS
Any data that are retrieved by a user for updating must be locked, or denied to other users, until the update
is completed or aborted. Locking can be done at different levels. These levels includes database, table, record,
field.
Data items can be locked in two modes
1. Exclusive (X) lock Data item can be both read as well as written X-lock as requested using lock-X
instruction Prevent another transaction from reading a record until it is unlocked C
2 Shared (S) lock Data item can only be read S-lock is requested using lock-S instruction Allow other
transactions to read a record or other resource.

Binary Locks
A binary lock has only two states: locked (1) or unlocked (0).
If a object is locked by transaction, no other transaction can use that object. If an object is unlocked, any
transaction can lock the object its use. A transaction must unlock the object after its termination very
transaction requires a lock and unlock operation for each data item that is accessed.
Binary locks are simple but restrictive. Checking is done before entry is made, waiting is done when the
object is found locked, unlock is done after use.
98

Share/Exclusive Locks
An exclusive lock exists when access is specially reserved for, the transaction that locked the object. The
exclusive lock must be used when the potential for conflict exists.

An exclusive lock is issued when a transaction wants to write (update) data item and no locks are currently
held on that data item.
There are two basic requirements of locking:
• READ operations (such as SELECT and FETCH), acquire SHARE lock before rows can be retrieved.
• WRITE operations (such as UPDATE, INSERT, and DELETE),. must acquire EXCLUSIVE before rows can
be modified.
A SHARE(S) lock permits reading by other users. No other transaction may modify the data that is locked
with an S lock.

• When an S lock is obtained at the table level the transaction can read all rows in the table. No row or page
level lock are acquired when the transaction reads a row (the S lock at the table level covers all of the rows in
the table, so additional locks are not necessary).
• When an s lock is obtained at the page level the transaction can read all rows on the page. No row level locks
are acquired when the transaction reads a row (the S lock at the page level covers all of the rows on the page).
• When an S lock is obtained at the row level, the transaction can read the row. An EXCLUSIVE (X) lock
prevents access by any other user. An X lock is the strongest type of lock. No other transaction may read or
modify the data that is locked with an X lock. An X lock must be obtained (either at the table, page, or row
level) when user data is updated, inserted, or deleted.
• When an X lock is obtained at the table level, the transaction can read and modify all rows in the table. No
row or page level locks are acquired when the transaction reads or modifies a row.
• When an X lock is obtained at the page level, the transaction can read and modify all rows on the page. No
row level locks are acquired when the transaction reads or modifies a row.
• When an X lock is obtained at the row level, the transaction can read and modify the row.

Disadvantages of Locking
Pessimistic concurrency control has a number of key disadvantages, particularly in distributed systems:
• Overheat: Locks cost, and you pay even if no conflict occurs. Even read only actions must acquire lock\
High overhead forces careful choices about lock granularity.
99
• Low concurrency: If locks are too coarse, they reduce concurrency unnecessarily. Need for strict 2PL to
avoid cascading aborts makes it even worse.

• Low availability: A client cannot make progress if the server or lock holder is temporarily unreachable.
• Deadlock.
TWO PHASE LOCKING PROTOCOL

A locking protocol is a set of rules followed by all transactions while requesting and releasing locks
Rules for Two-Phase Locking Protocol are
• Two transactions cannot have conflicting locks.
• No unlock operation can precede a lock operation in the same transaction.
• No data are affected until all locks are obtained that is, until the transaction is in its locked point.
Two phase locking protocol is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase
• Transaction may obtain locks
• Transaction may not release locks
Phase 2: Shrinking Phase
• Transaction may release locks
• Transaction may not obtain locks

The protocol assures serializability. It can be proved that the transactions can be serialized in the order of
their lock points (i.e. the point where a transaction acquired its final lock).
Two-phase locking does not ensure freedom from deadlocks. Cascading roll-back is possible under two-
phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction
must hold all its exclusive locks till it commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/ abort. In this protocol
transactions can be serialized in the order in which they commit.
There can be conflict serializable schedules that cannot be obtained if two-phase locking is used. However,
in the absence of extra information (e.g., ordering of access to data), two- phase locking is needed for conflict
serializability in the following sense:
• Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that uses two-
phase locking, and a schedule for Ti and Tj that is not conflict serializable.
100
Example of a transaction performing locking:
Begin(T1)
Lock(T1,A,S)
Read (T1,A)
Unlock(T1,A)
Begin(T2)
Lock(T2,B,S)
Read (T2,B)
Unlock(T2,B)
Display(A+B)
Commit(T1)
Commit(T2)
Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read
of A and B, the displayed sum would be wrong.
Two-phase locking with lock conversions:
First Phase:
• can acquire a lock-S on item
• can acquire a lock-X on item
• can convert a lock-S to a lock-X (upgrade)
Second Phase:
• can release a lock-S
• can release a lock-X :
• can convert a 1ockX to a lock-S (downgrade)
This protocol assures serializability But still relies on the programmer to insert the various locking
instructions.
IMPLEMENTATION OF LOCKING
A lock manager can be implemented as a separate process to which transactions send lock and unlock
requests The lock manager replies to a lock request by sending a lock grant messages (or a message asking the
transaction to roll back, in case of a deadlock) The requesting transaction waits until its request is answered
The lock manager maintains a data structure called a lock table to record granted locks and pending requests
The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being
locked.
In lock table, Black rectangles indicate granted locks, white ones indicate waiting requests. Lock table also
records the type of lock granted or requested. New request is added to the end of the queue of requests for the
data item, and granted if it is compatible with all earlier locks. Unlock requests result in the request being
deleted, and later requests are checked to see if they can now be granted. If transaction aborts, all waiting or
granted requests of the transaction are deleted. lock manager may keep a list of locks held by each transaction,
to implement this efficiently.
Problems With Two Phase Locking Protocol
Consider the partial schedule
101

Fig. An example schedule
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on
B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A. Such a situation is called a
deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its locks released.
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil. Starvation is
also possible if concurrency control manager is badly designed. For example: A transaction may be waiting for
an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same
item. The same transaction is repeatedly rolled back due to deadlocks. Concurrency control manager can be
designed to prevent starvation.
PRECEDENCE GRAPH
Precedence graph is used, for testing serializability of a schedule. There is one node for each transaction in
the schedule.

If the precedence graph has a cycle the schedule is not serializable. If it has no cycle, any ordering of the
transactions which obeys the arrows is an equivalent serial schedule, so the schedule is serializable.
Solution Of Inconsistency Problem
Problem of inconsistent analysis can be solved with the help of locks. Let us understand this with the help
of following example
Initial values: A = $400, B = $500, and C = $300
102

Note that the above example leads to a deadlock. Still, it is an acceptable solution because the ACID
properties are preserved.

Q. 5. What are Concurrency Control Based on Timestamp Ordering?

Ans. Concurrency control : concurrent control is a method used to ensure that database transaction are
executed in a safe manner it process of management operations against a database so tat 1ãta operation do not
interfere with each other in a multi-user
environment.
One such that shared database is used in on-line manner is the database for an airline reservation that is
used by many agents accessing the database from their terminals A database could also be accessed in hatch
mode and it is concurrently used with the online mode The sharing of the database for read only access not
cause any problem, but if one of the transactions running concurrently tries to modify same data item, it could
lead to inconsistencies.

Further if more than one transaction is allowed to simultaneously modify a data item in a database. It could
lead to incorrect value for the data item and an inconsistent database will be created. For example, suppose that
two ticked agents access the online reservation system simultaneously to see, if the seat is available on a given
flight or not and if both agents make the reservation against the last available seat on that flight then a message
of overbooking will be displayed. This will make the data in inconsistent way We can say that concurrent
processing of the programs, process or job are similar to 11ie multiprogramming, i.e. no. of jobs or programs
processed simultaneously to achieve their independent & different goals according to their own requirements.

Some concurrency problem, when we apply a correct in the concurrent processing then we see that
databases becomes inconsistent after the completion of the transaction In the case of concurrent operation
where a number of transactions are running and using the database we cannot make any assumption about the
103
order in which the statement belonging to different transactions will be executed The order in which these
statements are executed is called Schedu3 So the processing of these statements which are in schedule and used
in concurrent operation and we cannot change the schedule is called concurrent schedule.
Some problems occur during scheduling and concurrent processing are as:
(a) Lost update problem
Consider the two transactions given below and these transactions are accessing the same data item A. Each
of these transactions modifies the data item and write it back. Then we see that he concurrent processing of
the modification of the value of A

(b) Inconsistent Read Problem
The lost update problem was caused, by concurrent modification of same data item. However concurrency can
also cause problem when only one transaction modifies given set of data while that set of data is being used by
other transaction. For example, if here are two transactions occur T5 and T6 in a schedule Suppose A and B
represents some data items having integer value then if both are concurrently processed then one processing
will be reading the data and other will be modifying the data. So it will create an inconsistency in the reading
in next transactions that which data is correct for reading and which is incorrect.

(c) The phantom phenomenon
In phantom, phenomenon let us consider an organization where parts are purchased and kept in stock The parts
are withdrawn from the stock and us number of projects To check the extent of loss, we want to see that
whether (or if) current quantity of some part purchased and received is equal to the current sum of the quantity
of that part in stock, plus the current quantity is used by various projects The phantom problem means if
additional items are added, this additional information reflects the transaction and query during the concurrent
processing.
This problem could be prevented by using the concept of Locking i.e. locking of such type of records also
prevents the addition of such phantom records.
(d) Semantic of Concurrent Transaction

As we take two different transactions for different ordering, it is not necessary that the two transactions are
commutative. Suppose two transactions.
104
For example, whether syntax of operation as same, but semantics of these transactions are different during
concurrent processing. An important part of concurrency is serial execution or seralizability. When we let some
independent transactions in a schedule by setting them in a order such that their execution becomes serially
then these type of execution is called serial execution or serialzability.

Some problem of concurrent processing are removed by serial execution by setting or ordering the
operation in a particular sequences.

Senalizablity
A non-serial schedule that is equivalent to some serial execution of transactions is called a serializiable
schedule, For example m below written three schedules, schedule- 3 is Serializable schedule and is equivalent
to the scheule-1 and schedule-2.
The purpose of Serializable scheduling is to find the non-serial schedules that allows the transactions to
execute concurrently without interfering with one another and therefore produces a database state that could be
produced by a serial execution.

Note that serializability also removes the problem of inconsistency.
Definition: The given interleaved execution of some transactions is said to be serializable if it produces the
same results as some serial execution of the transactions.

In serializablity the ordering of read and write operations are important before any operation to avoid any type
of confusion or in-consistency.

Serializablity Test can be explained and solved with the help of Precedence Graph.
Solution to these problems
If all schedules in concurrent environment are restricted to serializable schedule, the result obtained will be
consistent with some serial execution of the transaction and will be considered correct Also testing of
serialzability of a schedule is not only expensive but it is impractical some time. Thus one of the following
105
concurrent control schemes is applied m concurrent database environment to ensure that the schedule produced
by concurrent transaction are serializable.
Some concurrency controls schemes used to solve all the problems occurs during the concurrent scheduling
are as discussed below
Locking Scheme
(ii) Time stamp based order

(iii) Optimistic scheduling
(iv) Multi version technique

Locking
From the point of view of Locking a database can be considered as being made up of set of data items A
lock is a variable associated with each such data item Manipulating the value of the lock is called LOCKING
The value of lock variable is used m locking scheme to control the concurrent access and manipulation of the
associated data item. The locking is done by a subsystem of DBMS and such system is called Lock Manager.
There are two- types of Lock:
(a) Exclusive Lock: Exclusive Lock is also called update or writes lock. The intention of their mode of
locking is to provide exclusive use of data items to one transaction. If a transaction T locks a data item in an
exclusive mode, no other transaction can access Q or not even read Q until the lock is released by transaction.

(b) Shared Lock : Share Lock is also called a read lock. Any number of transactions can concurrently lock
and access a data item in -the shared mode but none of thes transactions can modify the data item. A data item
locked in a shared mode cannot be locked in the exclusive mode until the shared lock is released by all
transactions holding
the lock. A data item in the exclusive mode cannot be locked in the share mode untill the exclusive lock on the
data item is released.

Two Phase Locking: Another method of locking is called Two Phase Locking. In this once a lock is
released no additional lock are requested. In other words the release of the lock is delayed, until all the locks on
all data items required by the transactions have been acquired. It has two phases, a growing phase in which the
number of locks increase from 0 to maximum for the transaction and a contracting phase in which the number
of locks held decreases from maximum to zero. Both of these phases is monotonic ie. the number of locks are
only increasing in first phase and decreasing in the 2 phase. Once a transaction starts releasing locks, it is not
allowed to request any further locks. In this way a transaction is obliged to request all locks it may need during
its life before it releases any. This leads to control and lower the degree of Concurrency.

In time stamp based method, a serial order is created among the concurrent transaction by assigning to each
transaction a unique non-decreasing number. The usual value assigned to each transaction is the system clock
value at the start of the transaction. It is called Time Stamp ordering. There are two types of. time stamp:
(a) Write time stamp
(b) Read time stamp.
A variation of this scheme is used in a distributed environment includes the site of a transaction appendid
to the system wide clock value. This value can then be the system wide clock value. This value can then be
used on deciding the order in which the conflict between two transactions is resolved. A transaction with a
106
smaller time stamp value is considered to be an ‘older’ transaction than another transaction with a larger time
stamp value. Data item X is thus represented by triple X set as X: {X, Wx, Rx) where each X is represented as:
X The value of data item.
Wx : The write time stamp value, the largest time stamp value of any transaction that was allowed to write a
value of X.
Rx : The read time stamp value, the largest timestamp value of any transaction that was allowed to read the
current value X.
(iii) Optimistic Scheduling
In the optimistic scheduling schema, the philosophy is to assume that all data items can be successfully
updated at the end of a transaction and to read in the values for data item without any locking Reading is done
when required and If any data item is found to be inconsistent at the end of a transaction then the transaction is
rolled back (Used for recovery procedure in DBMS).
In optimistic scheduling each transaction has three phases:
(a) The read phase : This phase starts with the activation of a transaction and in this all data items are read
into local variables and any modification that are made are only to those local copies. This ends with
commitment.
(b) Validation phase: In this when data items are modified it check that data after the procedure be rolled
back.
(c) Write phase: When transaction passes. the validation phase, then whole transaction be written into
secondary storage data
An optimistic scheme does not use lock and so it is dead lock free even through
starvation can still occur.
It is also called time domain addressing scheme, which follows the accounting principle of never
overwriting a transaction Any charge are achieved by entering compensating transaction e g m this X is
achieved by making a new copy or version of data item X So it is called Multi versions In this way a history of
evolution of the value of data item is recorded in the database. With multi version technique, write operations
can occur concurrently. Since they do not overwrite with each other. Also read operation can read any version.

Q. 6. What do you mean by Database Recovery Techniques?

Ans. Recovery : A computer system is an electromechanical device subject to failures of various types.
Recovery is the procedure through which data can be again collected, recalled or accessed by using different
mechanisms, which has been lost during the processing or due to failure of any type. The types of failures that
the computer system is likely to be subjected to include failures of components or sub systems, software
failure, power failure, accidents, natural or man-made disasters. Database recovery technique or methods of
making the database formation in valid form and original form, which is damaged by any failure. The aim of
the recovery scheme is to allow database operations to be resumed after a failure with a minimum loss of
information at an economically justifiable cost. Recovery schemes can be classified as forward or backward
recovery. Database system term use the later scheme to recover from errors.
(a) Forward Error Recovery: In this scheme when a particular error in the system is detected, the recovery
system makes an accurate assessment of the state of the system and then makes appropriate adjustment based
on the anticipated result which make the system error free. The aim of the adjustment is to restore the system
so that the effects of the error are cancelled and system can continue to operate. This scheme is not applicable
to unanticipated errors.
107
(b) Backward Error Recovery: In this scheme no attempt is made to extrapolate and no state is accessed
which is error free. In this system is reset to some previous correct state that is known to be free of any errors.
You can take backup from floppy disks for backward recovery.

Recovery in a centralized DBMS
In a centralized DBMS if there be any failure, then there are some methods used to recover the data. Some
methods used to recover the data are:
(i) To set transaction marker for transaction identification through which we can access the data.
(ii) By applying some operations on record These operations are insert, delete and modify
(iii) To set log on the system transaction In an online database system, for example, an airline reservation
system, there could be hundreds of transactions handled per minute.
The log for this type of database contains a very large volume of information. A scheme called checkpoint is
used to limit the’ volume of log information that has to be handled & processed in the events of a system
failure involving the loss of volatile information The check point scheme is an additional component of the
logging scheme described above. A checkpoint operation performed periodically copies log information onto
stable storage.

Q 7 What are the various Database Security Issue?

Ans.
1. A computer system operator or system programmer can intentionally by pass the normal security and
integrity mechanisms, alter or destroy The data in the database or make unauthorized copies of sensitive data.
2. An unauthorized user can bet access to a secure terminal or the password of an authorized user and
compromise the database. Such user could also destroy the database file.
3. Authorized users could pass on sensitive information under pressure or form personal gain.
4. System and application programmers could by pass normal security in their program by directly
accessing database files and making changes and copies for illegal use.
5. An unauthorized person can get access to the computer system, physically or by using communication
channel and compromise the database.

Q. 8. What do you mean by the term deadlock?

Ans. Consider the following example: transaction Ti gets an exclusive lock on object A, T2 gets an
exclusive lock on B, Ti requests an exclusive lock on B and is queued, and T2 requests an exclusive lock on A
and is queued. Now, TI is waiting for T2 to release its lock and T2 is ‘waiting for TI to release its lock! Such a
cycle of transactions waiting for locks to be released is called a deadlock )Clearly, these two transactions will
make no further progress. Worse, they hold locks that may be required by other transactions. The DBMS must
either prevent or detect (and resolve) such deadlock situations.
Deadlock Prevention
We can prevent deadlocks by giving each transaction a priority and ensuring that lower priority
transactions are not allowed to wait for higher priority transactions (or vice versa). One way to assign priorities
is to give each transaction a timestamp when at starts up.The lower the timestamp, the higher the transaction’s
priority, that is, the oldest transaction has the highest priority.

108
If a transaction Ti requests a lock and transaction Tj holds a conflicting lock, the lock manager can use one
of the following two policies:

Wait-die If Ti has higher priority, it is allowed to wait, otherwise it is aborted
Wound-wait If Ti has higher priority, abort Tj, otherwise ti waits.

In the wait-die scheme, lower priority transactions can never wait for higher priority
transactions. In the wound-wait scheme, higher priority transactions never wait for lower priority transactions.
In either case no deadlock cycle can develop.
A subtle point is that we must also ensure that no transaction is perennially aborted
because it never has a sufficiently high priority. (Note that in both schemes, the higher priority transaction is
never aborted.) When a transaction is aborted and restarted, it should be given the same timestamp that it had
originally. Reissuing timestamps in this way ensures that each transaction will eventually become the oldest
transaction, and thus the one with the highest priority, and will get all the locks that it requires.

The wait-die scheme is nonpreemptive; only a transaction requesting a lock can be aborted. As a
transaction grows older (and its priority increases), it tends to wait for more and more younger transactions. A
younger transaction that conflicts with an older transaction may be repeatedly aborted (a disadvantage with
respect to wound wait, but on the other hand, a transaction that has all the locks it needs will never be aborted
for deadlock reasons (an advantage with respect to wound-wait, which is preemptive).

Deadlock Detection
Deadlocks tend to be rare and typically involve very few transactions. This observation suggests that rather
than taking measures to prevent deadlocks, it may be better to detect and resolve deadlocks as they arise. In the
detection approach, the DBMS must periodically check for deadlocks. When a transaction Ti is suspended
because a lock that it requests cannot be granted, it must wait until all transactions Tj that currently hold
conflicting locks release them.

The lock manager maintains a structure called a waits-for graph to detect deadlock cycles. The nodes
correspond to active transactions, and there is an arc from Ti to Tj if (and only if) Ti is waiting for Tj to release
a lock. The lock manager adds edges to this graph when it queues lock requests and removes edges when it
grants lock requests.
Observe that the waits-for graph describes all active transactions, some of which will eventually abort. If
there is an edge from Ti to Tj in the waits-for graph, and both Ti and Tj eventually commit, there will be an
edge in the opposite direction (from Tj to Ti) in the precedence graph (which involves only committed
transactions). The waits- for graph is periodically checked for cycles, which indicate deadlock, A deadlock is
resolved by aborting a transaction that is on a cycle and releasing its locks; this action allows some of the
waiting transactions to proceed.

109
Fig. Watts-for Graph before and after Deadlock
As an alternative to maintaining a waits-for graph, a simplistic way to identify deadlocks is to use a timeout
mechanism: if a transaction has been waiting too long for a lock, we can assume (pessimistically) that it is in a
deadlock cycle and abort it.

Q. 9. What is serializability of schedules?

Ans. Serializablity A non-serial schedule that is equivalent to some serial execution of transactions is
called a serializiable schedule, For example in below written three schedules, schedule-3 is Serializable
schedule and is equivalent to the scheule-1 and schedule-2.
The purpose of Serializable scheduling is to find the non-serial schedules that allows the transactions to
execute concurrently without interfering with one another and therefore produces a database state that could be
produced by a serial execution.
In serializablity the ordering of read and write operations are important before any operation to avoid any
type of confusion or in-consistency.

Q 10 Define the concept of aggregation Give two examples of where this concept is useful?

Ans. Selecting the data n group of records is called aggregation Data aggregation is in which information is
gathered and eprdJiia.summary.1orzn, for purposes such as statistical analysis A common aggregation purpose
is to get more information about particular groups based on specific variables such as age profession or
income.

Q 11 Compare the shadow-paging recovery scheme with log-based recovery Schemes?

Ans. Modifying the database without ensuring that the transaction will commit may leave the database in
an inconsistent state. Consider transaction Ti that transfers $50 from account A to account B; goal is either to
perform all database modifications made by Ti or none at all.
110
Several output operations may be required for Ti (to output A and B). A failure may occur after one of
these modifications has been made but before all of them are made. To ensure atomicity despite failures:
First output information describing the modifications to stable storage without modifying the database
itself; only then start modifying the database.
We study two approaches:
1. Log -based recovery
2. Shadow paging
We assume (initially) that transactions run serially, that are, one after the other.
A log:
• Sequence of log records
• Maintains a record of update activities on the database
• Kept on stable storage.
• When transaction Ti starts, it writes log record <Ti start>
• Before Ti executes write (X), it writes log record <Ti, X, VI, V2>:
• V1 is the value of X before the write (for undo)
• V2isthevaluetobewrittentoX
• When Ti commits, it writes log record <Ti commit>
• When Ti aborts1 it writes log record <Ti abort >
We assume for now that log records are written directly to stable storage (that is, they are not buffered)
The two approaches using logs are
1. Deferred database modification
2. Immediate database modification
Deferred Database Modification
• Records all modifications to the log
• Defers all the writes to after partial commit
• Transaction starts by writing <T1 start> record to log.
• A write (X) operation results in writing a log record <T1, X, V>
• V is the new value for X
Note: Old value is not needed for this scheme.
The write is not performed on X at this time, but is deferred. When T1 partially commits, <T1 commit> is
written to the log. Finally, the log records are read and used to, ‘actually execute the previously deferred write
s.
During recovery after a crash, a transaction needs to be redone if and only if both < T1 start> and <T
commit> are in the log. Redoing a transaction T, (redo Ti) sets the value of all data items updated by the
transaction to the new values.
Crashes can occur while the transaction is executing the original updates, or while recovery action is being
taken.
Example transactions T0 and T1 (T0 executes before T1):
T0: T1:
read(A) read(C)
A:A50 C:C100
Write(A) write (C)
read(B)
B:B+ 50
write(B)
Let the original value of A be 1000, that of B be 2000 and that of C be 700.
111
Let us handle the cases when crashes occur at three different instances as shown in (a),(b),(c)

Log at three instances of time (a), (b), (C):
Crash at
(a): No redo actions need to be taken
(b): redo (T0) must be performed since <T0 commit> is present
(c): redo (T0) must be performed followed by redo (T1) since
<T0 commit> and <T1 commit> are present
Immediate Database Modification
• Allows database updates of an uncommitted transaction.
• Undoing may be needed.
• Update logs must have both old value and new value.
• Update log record must be written before database item is written
We assume that the log record is output directly to stable storage output of updated blocks. It can take place at
any time before or after transaction commit order in which
blocks are output can be different from the order in which they are written. Immediate Database Modification
Example

Recovery procedure has two operations instead of one:
undo (Ti) restores the value of all data items updated by Ti to their old values,
going backwards from the last log record for Ti
redo (Ti) sets the value of all data items updated by Ti to the new values, going
forward from the first log record for Ti
112
Both operations must be idem potent, that is, even if the operation is executed multiple times the effect is
the same as if it is executed once. It is needed since operations may get re executed during recovery.
When recovering after failure, transaction Ti needs to be undone if the log contains the record <Ti start>,
but does not contain the record <Ti commit>. Transaction Ti needs to be redone if the log contains. Both the
record <Ti start> and the record <Ti commit>. Undo operations are performed first, then redo operations.
Immediate Database Modification Recovery

Log at three instances of time (a), (b), (c)
Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000.
(b)undo (T1) and redo (T0): C is restored to 700, and then A and B are set to 950
and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050 respectively. Then C
is set to 600

Checkpoints
Problems in recovery procedure;
• Searching the entire log is time consuming
• We might unnecessarily redo transactions that have already output their updates to the database.
• Streamline recovery procedure by periodically performing
• Check pointing procedure
• Output all log records currently residing in main memory onto stable storage.
• Output all modified buffer blocks to the disk
• Write a log record <checkpoint > onto stable storage.
Checkpoint system failure
During recovery we need to consider only the most recent transaction Ti that started before the checkpoint, and
transactions that started after Ti.

Log based recovery
The log, sometimes called the trail or journal, is a history of actions executed by the DBMS. Physically, the log
is a file of records stored in stable storage, which is assumed to survive crashes; this durability can be achieved
by maintaining two or more copies of the log on deferent disks (perhaps in different locations), so that the
chance of all copies of the log being simultaneously lost is negligibly small.

113
The most recent portion of the log, called the log tail, is kept in main memory and is periodically forced to
stable storage. This way, log records and data records are written to disk at the same granularity (pages or sets
of pages).

Every log record is given a unique id called the log sequence number (LSN). As with any record id, we can
fetch a log record with one disk access given the LSN. Further, LSNs should be assigned in monotonically
increasing order; this property is required for the ARIES recovery algorithm. If the log is a sequential file, in
principle growing indefinitely, the LSN can simply be the address of the first byte of the log record. For
recovery purposes, every page in the database contains the LSN of the most recent log record that describes a
change to this page. This LSN is called the page LSN.

A log record is written for each of the following actions:
1. Updating a page: After modifying the page, an update type record (described later in this section) is
appended to the log tail. The page LSN of the page is then set to the LSN of the update log record. (The page
must be pinned in the buffer pool while these actions are carried out.)
2. Commit: When a transaction decides to commit, it force-writes a commit type log record containing the
transaction id. That is, the log record is appended to the log, and the log tail is written to stable storage, up to
and including the commit record.2 The transaction is considered to have committed at the instant that its
commit log record is written to stable storage. (Some additional steps must be taken, e.g., removing the
transaction’s entry in the transaction table; these follow the writing of the commit log record.)
3. Abort : When a transaction is aborted, an abort type log record containing the transaction id is appended to
the log, and Undo is initiated for this transaction
4. End As noted above, when a transaction is aborted or committed, some additional actions must be taken
beyond writing the abort or commit log record. After all these additional steps are completed, an end type log
record containing the transaction id is appended to the log.
5. Undoing an update : When a transaction is rolled back (because the transaction is aborted, or during
recovery from a crash), its updates are undone. When the action described by an update log record is undone, a
compensation log record, or CLR, is written.

Every log record has certain fields: prevLSN, translD, and type. The set of all log records for a given
transaction is maintained as a linked list going back in time, using the prevLSN field; this list must be updated
whenever a log record is added. The trasID field is the id of the transaction generating the log record, and the
type field oIviously indicates the type of the log record.

Additional fields depend on the type of the log record. We have already mentioned the, additional contents of
the various log record types, with the exception of the update and compensation log record types, which we
describe next. Update Log Records.

The pagelD field is the page id of the modified page; the length in bytes and the offset of the change are also
included. The before-image is the value of the changed bytes before the change; the after-image is the value
after the change. An update log record that contains both before- and after-images can be used to redo the
change and to undo it. In certain contexts, which we will not discuss further, we can recognize that the change
will never be undone (or, perhaps, redone). A redo-only update log record will contain just the after-image;
similarly an undo-only update record will contain just the before-image.
Log File
Contains information about all updates to database:
114
• Transaction records.
• Checkpoint records.
Transaction records contain:
• Transaction identifier
• Type pf log record, (transaction start, insert, update, delete, abort, commit).
• Identifier of data item affected by database action (insert, delete, and update operations).
• Before-image of data item.
• After-image of data item.
• Log management information.
A technique often used to perform recovery is the transaction log or journal
1. Records information about the progress of transactions in a log since the last
consistent state.
2. The database therefore knows the state of the database before and after each transaction.
3. Every so often database is returned to a consistent state and the log may be truncated to remove committed
transactions.
4. When the database is returned to a consistent state the process is often referred to as checkpointing.

Q. 12. What do you understand by a distributed database?

Ans. Distributed database technology is recent development within overall database field. Distributed
database can be defined as “It is a system consisting of data with different parts under the control of separate
DBMS running on interconnected way. Each system has autonomous processing capability and is applicable
for the local application.” Each system participates in the more global applications. Distributed data are
capable of handling both local and global transactions. Distributed database are handled or controlled by
DDBMS (Distributed DBMS).

A distributed database system is also defined as it is not stored at a single physical location and it is spread
across network of computer that are geographically dispersed and is connected by communication link or by
using network. Distributed database has sharing of data. Distributed database is always available and it is
reliable. Also we can do the increments growth (addition of data) of data in a distributed system.

A Query in a distributed database is divided into sub-query and all the sub- queries are parallel evaluated. For
example, consider a banking system in which customer account database is distributed across the bank branch
offices, such that each individual customer can process his data or record at the local branch. In other words we
can say data is stored at all the locations and any customers can access his data from any location via the
communication network. It means customer data is distributed to all the locations and so we call it distributed
database. One more advantage of distributed database system is that it looks like a centralized system to the
user. For example Indian Railway reservation system has a distributed database system, which can be accessed
at any location by any station.
Challenges to Distributed System
> Monotonicity: Once something is published in an open distributed system, it
cannot be taken back.
> Pluralism: Different subsystems of an open distributed system include
heterogeneous, overlapping and possibly conflicting information. There is no
central arbiter of truth in open distributed systems.
115
> Unbounded nondeterminism: Asynchronously, different subsystems can come
up and go down and communication links can come in and go out between
subsystems of an open distributed system. Therefore the time that it will take
to complete an operation cannot be bounded in advance
A scalable system is one that can easily be altered to accommodate changes in the number of users,
resources and computing entities affected to it. Scalability can be measured in three different dimensions:
• Load scalability: A distributed system should make it easy for us to expand and contract its resource pool to
accommodate heavier or lighter loads.
• Geographic scalability: A geographically scalable system is one that maintains its usefulness and usability,
regardless of how far apart its users or resources are.
• Administrative scalability: No matter how many different organizations need to share a single distributed
system, it should still be easy to use and manage.
Some loss of performance may occur in a system that allows itself to scale in one or more of these
dimensions.

A multiprocessor system is simply a computer that has more than one CPU on its motherboard. If the
operating system is built to take advantage of this, it can run different processes on different CPUs, or different
threads belonging to the same process. Over the years, many different multiprocessing options have been
explored for use in distributed computing. Intel CPUs employ a technology called Hyperthreading that allows
more than one thread (usually two) to run on the same CPU. The most recent Sun U1traSPARC TI, Athlon 64
X2 and Intel Pentium D processors feature multiple processor cores to also increase the number of concurrent
threads they can run.

A multicomputer system is a system made up of several independent computers interconnected by a
telecommunications network. Multicomputer systems can be homogeneous or heterogeneous: A homogeneous
distributed system is one where all CPUs are similar and are connected by a single type of network. They are
often used for parallel computing which is a kind of distributed computing where every computer is working
on different parts of a single problem.

In contrast an heterogeneous distributed system is one that can be made up of all sorts of different
computers, eventually with vastly differing memory sizes, processing power and even basic underlying
architecture. They are in widespread use today, with many companies adopting this architecture due to the
speed with which hardware goes obsolete and the cost of upgrading a whole system simultaneously.

Various hardware and software architectures exist that are usually used for distributed computing. At a
lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of that network
being printed onto a circuit board or made up of several loosely-coupled devices and cables. At a higher level,
it is necessary to interconnect processes running on those CPUs with some sort of
communication system.
• Client-server : Smart client code contacts the server for data, then formats and displays it to the user.
Input at the client is committed back to the server when it represents a permanent change.
• 3-tier architecture : Three tier systems move the client intelligence to a middle tier so that stateless clients
can be used. This simplifies application deployment. Most web applications are 3-Tier.
• N-tier architecture: N-Tier refers typically to web applications which further forward their requests to
other enterprise services. This type of application is V the one most responsible for the success of application
servers.
116
• Tightly coupled (clustered) : refers typically to a set of highly integrated machines that run the same
process in parallel, subdividing the task in parts that are made individually by each one, and then put back
together to make the final result.
• Peer-to-peer : an architecture where there is no special machine or machines that provide a service or
manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as
peers.
• Service oriented : Where system is organized as a set of highly reusable services that could be offered
through a standardized interfaces.
• Mobile code : Based on the architecture principle of moving processing closest to source of data
• Replicated repository: Where repository is replicated among distributed system to support online / offline
processing provided this lag in data update is acceptable.
Distributed computing implements a kind of concurrency. The types of distributed computers are based on
Flynn’s taxonomy of systems;. single instruction, single data (SISD),multiple instruction, single data (MISD),
single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD).

13. Write short notes on the following :
(a) Multiple Granularity
(b) Transaction Processing Systems

Ans. (a) Multiple Granularity Another specialized locking strategy is called multiple-granularity locking,
and it allows us to efficiently set locks on objects that contain other objects. For instance, a database contains
several flies, a file is a collection of pages, and a page is a collection of records A transaction that expects to
access most of the pages in a file should probably set a lock on the entire file, rather than locking individual
pages (or records’) as and when it needs them.

Doing so reduces the locking overhead considerably Op the other hand, other transactions that require
access to parts of the file - even parts that are not needed by this transaction are blocked If a transaction
accesses relatively few pages of the Me, it is better to lock only those pages Similarly, if a transaction accesses
several records on a page, it should lock the entire page, and if it accesses just a few records, it should lock just
those records.

The question to be addressed is how a lock manager can efficiently ensure that a page, for example, is not
locked by a transaction while another transaction holds a conflicting lock on the file containing the page (and
therefore, implicitly, on the page). The idea is to exploit the hierarchical nature of the ‘contains’ relationship.

A database contains a set of, files, each file contains a set of page, and each page contains a set of records
This containment hierarchy can be thought of as a tree of objects, where each node contains all its children
(The approach can easily be extended to cover hierarchies that are not trees, but we will not discuss this
extension.) A lock on a node locks that node and, implicitly, all its descendants. (Note that this interpretation of
a lock is very different from B+ tree locking, where locking a node does not lock any descendants implicitly!)
In addition to shared (S) and exclusive (X) locks, multp1e-granularity locking protocols also use two new
kinds of locks, called intention shared (IS) and intention exclusive (IX) locks IS locks conflict only with locks
IX locks conflict with S and X locks. To lock a node in S (respectively X) mode, a transaction must first lock
all its ancestors in IS (respectively IX) mode Thus, if a transaction locks a node in S mode, no other transaction
can have locked any ancestor in X mode, similarly, f a transaction locks a node in X mode, no other transaction
117
can have locked any ancestor in S or X mode. This ensures that no other transaction holds a lock on an
ancestor that conflicts with the requested S or X lock on the node.

A common situation is that a transaction needs to read an entire file and modify a few of the records in it;
that is, it needs an S lock on the file and an IX lock so that it can subsequently lock some of the contained
objects in ,X mode. It is useful to define a new kind of lock called an SIX lock that is logically equivalent to
holding an S lock and an IX lock. A transaction can obtain a single SIX lock (which conflicts with any lock
that conflicts with either S or IX) instead of an S lock and an IX lock.

A subtle point is that locks must be released in leaf-to-root order for this protocol to work correctly. To see
this, consider what happens when a transaction Ti locks all nodes on a path from the root (corresponding to the
entire database) to the node corresponding to same page p in IS mode, locks p in S mode, and then releases, the
lock on the root node. Another transaction Tj could now obtain an X lock on the root. This lock implicitly
gives Tj an X lock on page p, which conflicts with the S lock currently held by Ti.

Multiple-granularity locking must be used with 2PL in order to ensure serializability. 2PL dictates when
locks can be released. At that time, locks obtained using multiple granularity locking can be released and must
be released in leaf-to-root order.

Finally, there is the question of how to decide what granularity of locking is
appropriate for a given transaction. One approach is to begin by obtaining fine
granularity locks (e.g., at the record level) and after the transaction requests a certain
number of locks at that granularity, to start obtaining locks at the next higher granularity
(e.g., at the page level). This procedure is called lock escalation.
(b) Transaction Processing Systems. Same Answer of Question No. 3.

Q. 14. What are the desirable properties of transactions in a database?

Ans. Same Answer of Question No. 3.

Q. 15. What are locking techniques for concurrency control? Explain.

Ans. Concurrent control is a method used to ensure that database transaction are executed in a safe manner
or It is the process of managing simultaneous operations against a database so that data operation do not
interfere with each other in a multi user environment.

One such that shared database is used in on-line manner is the database for an airline reservation that is
used by many agents accessing the database from their terminals. A database could also be accessed in batch
mode and it is concurrently used with the online mode. The sharing of the database for read only access does
not cause any problem, but if one of the transactions running concurrently tries to modify same data item, it
could lead to inconsistencies.

Further if more than one transaction is allowed to simultaneously modify a data item in a database. It could
lead to incorrect values for the data item and an inconsistent database will be created. For example, suppose
118
that two ticked agents access the online reservation system simultaneously to see, if the seat is available on a
given flight or not
and if both agents make the reservation against the last available seat on that flight then a message of
overbooking will be displayed. This will make the data in inconsistent way: We can say that concurrent
processing of the programs, process of Job are similar to the multiprogramming, i.e. no. of jobs or programs
processed simultaneously to achieve their independent & different goals according to their own requirements.
Some concurrency problem, when we apply a correct transaction in the concurrent processing then we see
that databases becomes inconsistent after the completion of the transaction In the case of concurrent operation
where a number of transactions are running and using the database we cannot make any assumption about the
order in which the statement belonging to different transactions will be executed. The order in which these
statements are executed is called Schedule. So the processing of these statements which are in schedule and
used in concurrent operation ‘and we cannot change the schedule is called concurrent schedule.
Some problems occur during scheduling and concurrent processing are as:
(a) Lost update problem : Consider the two transactions given below and these transactions are accessing
the same data item A. Each of these transactions modifies the data item and write it back. Then. we see that the
concurrent processing of the modification of the value of A will create a problem by loosing the old value with
the currently updated value.

(b) Inconsistent Read Problem: The lost update problem was caused by concurrent modification of same
data item. However concurrency can also cause problem when only one transaction modifies a given set of
data while that set of data is being used by other transaction. For example, if there are two transaction occur T5
and T6 in a schedule. Suppose A and B represents some data items having integer value then if both are
concurrently processed then one processing will be reading the data and other will be modifying the data. So it
will create an inconsistency in the reading in next transactions that which data is correct for reading and which
is incorrect.

(c) The phantom phenomenon In phantom phenomenon let us consider an organization where parts are
purchased and kept in stock. The parts are withdrawn from the stock and used by number of projects. To check
the extent of loss, we want to see that whether (or if) current quantity of some part purchased and received is
equal to the current sum of the quantity of that part in stock, plus the current quantity is used by various
projects The phantom problem means if additional items are added, this additional information reflects the
transaction and query during the concurrent processing. This problem could be prevented by using the concept
of Locking i.e. locking of such type. of records also prevents the addition of such phantom records.
119
(d) Semantic of Concurrent Transaction : As we take two different transactions for different ordering, it is
not necessary that the two transactions are commutative. Suppose two transactions
A = (A+10) + 20
A = (A+20) + 10
gives same result.
But some time the commutative operations is not same
Salary = (Salary + 1000) * 1.1
Salary = (Salary x 1.1) + 1000
For example, whether syntax of operation as same, but semantics of these transactions are different during
concurrent processing. An important part of concurrency is serial execution or seralizability. When we let some
independent transactions in a schedule by setting them in a order such that their execution becomes serially
then these type of execution is called serial execution or serialzability.
Some problem of concurrent processing are removed by serial execution by setting or ordering the
operation in a particular sequences.

Serializablity:
A non-serial schedule that is equivalent to some serial execution of transactions is called a serializiable
schedule, For example in below written three schedules, schedule- 3 is Serializable schedule and is equivalent
to the scheule-1 and schedule-2. The purpose of Serializable scheduling is to find the non-serial schedules that
allows the transactions to execute concurrently without interfering with one another and therefore produces a
database state that could be produced by a serial execution.
In serializablity the ordering of read and write operations are important before any operation to avoid any
type of confusion or in-consistency,
Solution to these problems
If all schedules in concurrent environment are restricted to serializable schedule, the result obtained will be
consistent with some serial execution of the transaction and will be considered correct. Also testing of
serialzability of a schedule is not only expensive but it is impractical some time. Thus one of the following
concurrent control schemes is applied in concurrent database environment to ensure that the schedule produced
by concurrent transaction are serializable.
Some concurrency controls schemes used to solve all the problems occurs during the concurrent scheduling
are as discussed below:
(i). Locking Scheme
120
(iii) Optimistic scheduling
(i) Locking
From the point of view of Locking a database can be considered as being made up of set of data items A
lock is a variable associated with each such data item Manipulating the value of the lock is called LOCKING.
The value of lock variable is used in locking scheme to control the concurrent access and manipulation of the
associated data item. The locking is done by a subsystem of DBMS and such system is called Lock Manager.
There are two types of Lock:
(a) Exclusive Lock: Exclusive Lock is also called update or writes lock. The intention of their mode of
locking is to provide exclusive use of data items to one transaction. If a transaction T locks a data item Q in ar
exclusive mode, no other transaction can access Q or not even read Q until the lock is released by transaction
(b) Shared Lock : Share Lock is also called a read lock. Any number of transactions can concurrently lock
and access a data item in the shared mode but none of these transactions can modify the data item. A data item
locked in a shared mode cannot be locked in the exclusive mode until the shared lock is released by all
transactions holding the lock. A data item in the exclusive mode cannot be locked in the share mode until the
exclusive lock on the data item is released.

Two Phase Locking: Another method of locking is called Two Phase Locking. In this once a lock is
released no additional lock are requested. In other words the release of the lock is delayed, until all the locks on
all data items required by the transactions have been acquired. It has two phases, a growing phase in which the
number of locks increase from 0 to maximum for the transaction and a contracting phase in which the number
of locks held decreases from maximum to zero. Both of these phases is monotonic i.e. the number of locks are
only increasing in first phase and decreasing in the 2nd phase. Once a transaction starts releasing locks, it is not
allowed to request any further locks. In this way a transaction is obliged to request all locks it may need during
its life before it releases any. This leads to control and lower the degree of Concurrency.

In time stamp based, method, a serial order is created among the concurrent transaction by assigning to
each transaction a unique non-decreasing number. The usual value assigned to each transaction is the system
clock value at the start of the transaction. It is called Time Stamp ordering. There are two types of time stamp:
(a) Write time stamp
(b) Read time stamp.
A variation of this scheme is used in a distributed environment includes the site of a transaction appendid to the
system wide clock value. This value can then be the system wide clock value. This value can then be used on
deciding the order in which the conflict between two transactions is resolved. A transaction with a smaller time
stamp value is considered to be an ‘older’ transaction than another transaction with a larger time stamp value.
Data item X is thus represented by triple X set as X: {X, Wx, Rx) where each X is represented as
X : The value of data item.
Wx : The write time stamp value, the largest time stamp value of any transaction that was allowed to write a
value of X.
Rx : The read time stamp value, the largest timestamp value of any ‘transaction that
was allowed to read the current value X.
(iii) Optimistic Scheduling
In the optimistic scheduling schema, the philosophy is to assume that all data items can be successfully
updated at the end of a transaction and to read in the values for data item without any locking. Reading is done
121
when required and if any data item is found to be inconsistent at the end of a transaction then the transaction is
rolled back (Used for recovery procedure in DBMS).
In optimistic scheduling each transaction has three phases:
(a) The read phase : This phase starts with the activation of a transaction and in this all data items are read
into local variables and any modification that are made are only to those local copies. This ends with
commitment.
(b) Validation phase: In this when data items are modified it check that data after the procedure be rolled
back.
(c) Write phase: When transaction passes the validation phase, then whole transaction be written into
secondary storage data.
An optimistic scheme does not use lock and so it is dead lock free even through
starvation can still occur.
It is also called time domain addressing scheme, which follows the accounting principle of never
overwriting a transaction. Any charge are achieved by entering compensating transaction e.g. in this X is
achieved by making a new copy or version of data item X. So it is called Multi version. In this way a history of
evolution of the value of data item is recorded in the database. With multi version technique, write operations
can occur concurrently. Since they do not overwrite with each other. Also read operation can read any version.
Q. 16. Describe the usefulness of granting priviledges to the users.

Ans. You use GRANT to assign roles or system privileges to roles or users. The same command works
whether you are assigning these roles or system privileges to an individual user or to a role that in turn can be
assigned to many users.
The Syntax for the GRANT Command
The GRANT command takes the following syntax:
SYNTAX:
GRANT role or system privilege [, role or system privilege I
TO user or role or PUBLIC [, user or role]
[WITH ADMIN OPTION]
The GRANT command can take any number of system privileges and roles and assign them to any number
of users or roles. By specifying that you want to grant a role or system privilege to PUBLIC, you are
specifying that you want that role or privilege to be granted to all users in the system.
The REVOKE command is just the opposite of the GRANT command; it will take a role or system
privilege away from a user or role:
REVOKE role or system privilege [‘role or system privilege]
FROM user or role or PUBLIC [‘user or role]

Q. 18. Defile the term Generalization.

Ans. Generalization. Generalization seems to be simplification of data, i.e. to bring
the data from Un-normalized form to normalized form
• A botto-up design process - combine a number of entity sets that share the same features into a higher-level
entity set
. • Specialization and generalization are simple inversions of each other; they are represented an E-R diagram
in the same way.
• The terms specialization and generalization are used interchangeably.
122

Q. 19. What is cascading roll back

Ans. A cascading rollback occurs in database systems when a transaction (T1)
causes a failure and rollback must be performed Other transactions dependent on
T1’s actions must also be rolled back due to T1’s failure, thus causing a cascating effect
That is, one transaction’s failure causes many to fail.

Q. 20. Explain various recovery techniques based on deferred update.

Ans. Refer to Q.No. 11.
Q. 21. What is a log record ? What fields it contains ? How it is used for database recovery?

Ans. Refer to Q.No. 11.
Q. 22. What does roll name signify? Explain with example.

Ans. Oracle provides for easy and controlled privilege management through roles. Roles are named groups
of related privileges that you grant to users or other roles. Roles, are designed to ease the administration of end-
user system and object privileges. However, roles are not meant to be used for application developers, because
the privileges to access objects within stored programmatic constructs need to be granted directly.
These properties of roles allow for easier privilege management within a database:
. • Reduced privilege administration Rather than explicitly granting the same set of privileges to several users,
you can grant the privileges for a group of related users to a role, and then only the role needs to be granted to
each member of the group.

• Dynamic privilege management If the privileges of a group must change, only the privileges of the role need
to be modified. The security domains of all users granted the group’s role automatically reflect the changes
made to the role.

• Selective availability of privileges You can selectively enable or disable the roles granted to a user This
allows specific control of a user’s privileges in any given situation.
• Application awareness Because the data dictionary records which roles exist, you can design database
applications to query the dictionary id automatically enable (and disable) selective roles when a user attempts
to execute the application via a given username.
• Application-specific security You can protect role use with a password. Applications can be created
specifically to enable a role when supplied the correct password. Users cannot enable the role if they do not
know the password.
123

Database Question

Uploaded by

Copyright:

Available Formats

Database Question

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Question

Uploaded by

Copyright:

Available Formats

Q. 1. What do you mean by database?

Q. 2. What are the various characteristics of DBMS?

5. Centralized control of the data resource

Q. 3. What are the various types of databases?

Ans. Types of Databases

Q. 4. What do you mean by DBMS?

Q11. Discuss the architecture of database management system.

Q.13. Describe the Classification of Database Management Systems.

Q. 16. What do you mean by data redundancy?

Q. 17. What do you mean by database schema?

This join creates the following invalid information:

Relationship between Entity Sets

Figure: Instances of 1: M Relationship

Figure: Instances of M:N Relationship

Q. 26. Write a short note on Mapping.

Q. 27. Distinguish between RDBMS and DBMS.

Fig: The relation SCP (Sample tabulation)

The functional dependencies in relation FIRST is as follows:

(d) Semantic of Concurrent Transaction

(ii) Time stamp based order

Q. 22. What does roll name signify? Explain with example.

You might also like