DataBase Management System
DataBase Management System
DataBase Management System
What is a Database?
Database is a model of a real world system. Can be considered as an organized collection of data’s
which is meant to represent some universe of discourses. The contents (sometimes called the
extension) of a database represent the state of what is being modeled. Changes in the database
represent events occurring in the environment that change the state of what is being modeled. It is
appropriate to structure a database to mirror what it is intended to model.
Most modern-day organizations have a need to store data relevant to their day today activities. Those
organizations choose an electronic database to organize and store some of this data. Take for instance a
university. Most universities need to record data to help in the activities of teaching and learning. Most
universities need to record, among other things:
What students and lecturers they have
What courses and modules they are running
Which lecturers are teaching which modules
Which students are taking which modules
Which lecturer is assessing against which module
Which students have been assessed in which modules
Various members of staff at a university will be entering data such as this into a database system. For instance,
administrators in academic departments may enter data relevant to courses and modules, course co-
coordinators may enter data pertaining to lecturers, and data relevant to students, particularly their enrolments
on courses and modules, may be entered by staff at a central registry.
Once the data is entered into the database it may be utilized in a variety of ways. For example, a complete and
accurate list of enrolled students may be used to generate membership records for the learning resources
center; it may be used as a claim to educational authorities for student income or as an input into a timetabling
system which might attempt to optimize room utilization across a university campus. In this section our
objective is to learn fundamental concept of a database system using this example of an academic database to
illustrate concepts
Properties of a Database
The term database also usually implies a series of related properties: data sharing, data integration, data
integrity, data security, data abstraction and data independence.
Data sharing
Data stored in a database is not usually held solely for the use of one person. A database is normally expected
to be accessible by more than one person, perhaps at the same time. Hence a students' database might be
accessible by members of not only academic but also administrative staff.
Data integration
1
Shared data brings numerous advantages to the organization. Such advantages, however, only result if the
database is treated responsibly. One major responsibility of database usage is to ensure that the data is
integrated. This implies that a database should be a collection of data, which, at least ideally, has no redundant
data. Redundant data is unnecessarily duplicated data. In the past, for instance separate files of student
information might have been maintained by different academic and administrative departments of a university
with many Fields in common. The aim of a database system would be to store one logical item of data in one
place only. Hence, one student record would be accessible to a range of information systems.
Data integrity
Another responsibility arising as a consequence of shared data is that a database should display integrity. In
other words, that the database should accurately reflect the universe of discourse that it is attempting to model.
Data security
One of the major ways of ensuring the integrity of a database is by restricting access. In other words, securing
the database. The main way that this is done in contemporary database systems is by defining in some detail a
set of authorized users of the whole, or more usually parts of the database. For instance, a secure system would
be one where the finance department has access to information used for the collection of student fees but is
prohibited from changing the fee levels of given students.
Data abstraction
A database can be viewed as a model of reality. The information stored in a database is usually an attempt to
represent the properties of some objects in the real world.
Data independence
One immediate consequence of abstraction is the idea of buffering data from the processes that use such data.
The ideal is to achieve a situation where data organization is transparent to the users or application programs
which feed off data. If for instance, a change is made to some part of the underlying database no application
programs using affected data should need to be changed. Also, if a change is made to some part of an
application system then this should not affect the structure of the underlying data used by the application.
These properties amount to desirable features of the ideal database. As we shall see, properties such as data
independence are only partly achieved in current implementations of database technology
(Data Base Management System) Software that controls the organization, storage, retrieval, security and
integrity of data in a database. It accepts requests from the application and instructs the operating system to
transfer the appropriate data. The major DBMS vendors are Oracle, IBM, Microsoft and Sybase MySQL is a
very popular open source product DBMSs may work with traditional programming languages (COBOL, C,
etc.) or they may include their own programming language for application development.
DBMSs let information systems be changed more easily as the organization's requirements change. New
categories of data can be added to the database without disruption to the existing system. Adding a field to a
record does not require changing any of the programs that do not use the data in that new field.
2
Major Features of a DBMS
Data Security
The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords, users
are allowed access to the entire database or a subset of it known as a "subschema." For example, in an
employee database, some users may be able to view salaries while others may view only work history and
medical data.
Data Integrity
The DBMS can ensure that no more than one user can update the same record at the same time. It can keep
duplicate records out of the database; for example, no two customers with the same customer number can be
entered.
Interactive Query
Most DBMSs provide query languages and report writers that let users interactively interrogate the database
and analyze its data. This important feature gives users access to all management information as needed.
Interactive Data Entry and Updating
Many DBMSs provide a way to interactively enter and edit data, allowing you to manage your own files and
databases. However, interactive operation does not leave an audit trail and does not provide the controls
necessary in a large organization. These controls must be programmed into the data entry and update
programs of the application.
This is a common misconception about desktop computer DBMSs. Complex business systems can be
developed, but not without programming. This is not the same as creating lists of data for your own record
keeping.
Data Independence
With DBMSs, the details of the data structure are not stated in each application program. The program asks
the DBMS for data by field name; for example, a coded equivalent of "give me customer name and balance
due" would be sent to the DBMS. Without a DBMS, the programmer must reserve space for the full structure
of the record in the program. Any change in data structure requires changing all application programs.
DBMS Components
Data: Data stored in a database include numerical data which may be integers (whole numbers only) or
floating point numbers (decimal), and non-numerical data such as characters (alphabetic and numeric
characters), date or logical (true or false). More advanced systems may include more complicated data entities
such as pictures and images as data types.
Standard operations: Standard operations are provided by most DBMS. These operations provide the user
basic capabilities for data manipulation. Examples of these standard operations are sorting, deleting and
selecting records.
3
Data definition language (DDL): DDL is the language used to describe the contents of the database. It is used
to describe, for example, attribute names (field names), data types, location in the database, etc.
Data manipulation and query language: Normally a query language is supported by a DBMS to form
commands for input, edit, analysis, output, reformatting, etc. Some degree of standardization has been
achieved with SQL (Structured Query Language).
Programming tools: Besides commands and queries, the database should be accessible directly from
application programs through function calls (subroutine calls) in conventional programming languages.
File structures: Every DBMS has its own internal structures used to organize the data although some common
data models are used by most DBMS.
DBMS Advantages
There are three main features of a database management system that make it attractive to use a DBMS in
preference to other systems. These features are:
Centralized data management,
Data independence
Systems integration.
In a database system, the data is managed by the DBMS and all access to the data is through the DBMS
providing a key to effective data processing. This contrasts with conventional data processing systems where
each application program has direct access to the data it reads or manipulates.
In the conventional data processing application programs, the programs usually are based on a considerable
knowledge of data structure and format. In such environment any change of data structure or format would
require appropriate changes to the application programs. If major changes were to be made to the data, the
application programs may need to be rewritten.
In a database system, the database management system provides the interface between the application
programs and the data. When changes are made to the data representation, the metadata maintained by the
DBMS is changed but the DBMS continues to provide data to application programs in the previously used
way. The DBMS handles the task of transformation of data wherever necessary.
This independence between the programs and the data is called data independence. Data independence is
important because every time some change needs to be made to the data structure, the programs that were
being used before the change would continue to work. To provide a high degree of data independence, a
DBMS must include a sophisticated metadata management system.
In DBMS, all files are integrated into one system thus reducing redundancies and making data management
more efficient. In addition, DBMS provides centralized control of the operational data. Some of the advantages
of data independence, integration and centralized control are:
Redundancies and inconsistencies can be reduced
Better service to the Users
Flexibility of the system is improved
Cost of developing and maintaining systems is lower
Standards can be enforced
Security can be improved
Integrity can be improved
Enterprise requirements can be identified
Data model must be developed
DBMS Disadvantages
A database system generally provides on-line access to the database for many users. In contrast, a conventional
system is often designed to meet a specific need and therefore generally provides access to only a small
4
number of users. Because of the larger number of users accessing the data when a database is used, the
enterprise may involve additional risks as compared to a conventional data processing system in the following
areas.
Confidentiality, Privacy and Security When information is centralized and is made available to users from
remote locations, the possibilities of abuse are often more than in a conventional system. To reduce the
chances of unauthorized users accessing sensitive information, it is necessary to take technical, administrative
and, possibly, legal measures. Most databases store valuable information that must be protected against
deliberate trespass and destruction.
Data Quality Since the database is accessible to users remotely, adequate controls are needed to control users
updating data and to control data quality. With increased number of users accessing data directly, there are
enormous opportunities for users to damage the data. Unless there are suitable controls, the data quality may be
compromised.
Data Integrity Since a large number of users could be using a database concurrently, technical safeguards are
necessary to ensure that the data remain correct during operation. The main threat to data integrity comes from
several different users attempting to update the same data at the same time. The database therefore needs to be
protected against inadvertent changes by the users.
Enterprise Vulnerability Centralizing all data of an enterprise in one database may mean that the database
becomes an indispensable resource. The survival of the enterprise may depend on reliable information being
available from its database. The enterprise therefore becomes vulnerable to the destruction of the database or to
unauthorized modification of the database.
The Cost of using a DBMS Conventional data processing systems are typically designed to run a number of
well-defined, preplanned processes. Such systems are often "tuned" to run efficiently for the processes that
they were designed for. Although the conventional systems are usually fairly inflexible in that new
applications may be difficult to implement and/or expensive to run, they are usually very efficient for the
applications they are designed for.
The database approach on the other hand provides a flexible alternative where new applications can be
developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the
additional cost of running applications that the conventional system was designed for. Using standardized
software is almost always less machine efficient than specialized software.
Database Design
A business information system is made up of subjects (customers, employees, vendors, etc.) and activities
(orders, payments, purchases, etc.). Database design is the process of organizing this data into related record
types. The DBMS that is chosen is the one that can support the organization's data structure while efficiently
processing the transaction volume.
Organizations may use one kind of DBMS for daily transaction processing and then move the detail to
another DBMS better suited for random inquiries and analysis.
Overall systems design decisions are performed by data administrators and systems analysts. Detailed
database design is performed by database administrators.
DEFINITION: A data model is a plan for building a database. The model represents data conceptually, the
way the user sees it, rather than how computers store it. Data models focus on required data elements and
associations; most often they are expressed graphically using entity-relationship diagrams. On a more abstract
level, the term is also used in describing a database's overall structure -- e.g., a relational or hierarchical data
model.
Data Modeling With Entity-Relationship Diagrams
In the following sample diagram, rectangles represent entities, diamonds are relationships, and ellipses are
attributes of entities. Note that there are many different conventions used in entity-relationship diagrams; this
same model could be represented in several other ways using different notation.
5
Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child
data segments. This structure implies that a record can have repeating information, generally in the
child data segments. Data in a series of records, which have a set of field values attached to it. It
collects all the instances of a specific record together as a record type. These record types are the
equivalent of tables in the relational model, and with the individual records being the equivalent of
rows. To create links between these record types, the hierarchical model uses Parent Child
Relationships
Hierarchical databases link records like an organization chart. A record type can be owned by only
one owner. In the following example, orders are owned by only one customer. Hierarchical structures
were widely used with early mainframe systems; however, they are often restrictive in linking real-
world structures.
Network Model
In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network
model. The basic data modeling construct in the network model is the set construct. A set consists of
an owner record type, a set name, and a member record type. A member record type can have that
role in more than one set, hence the multiparent concept is supported. An owner record type can also
be a member or owner in another set
In network databases, a record type can have multiple owners. In the example below, orders are
owned by both customers and products, reflecting their natural relationship in business.
6
Relational Model
(RDBMS - relational database management system) A database based on the relational model
developed by E.F. Codd. A relational database allows the definition of data structures, storage and
retrieval operations and integrity constraints. In such a database the data and relations between them
are organised in tables. A table is a collection of records and each record in a table contains the same
fields.
Values Are Atomic
Each Row is Unique
Column Values Are of the Same Kind
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values of that field
will use indexing to speed them up. Where fields in two different tables take values from the same
set, a join operation can be performed to select related records in the two tables by matching values in
those fields. Often, but not always, the fields will have the same name in both tables.
Relational databases do not link records together physically, but the design of the records must
provide a common field, such as account number, to allow for matching. Often, the fields used for
matching are indexed in order to speed up the process.
In the following example, customers, orders and products are linked by comparing data fields and/or
indexes when information from more than one record type is needed. This method is more flexible for
ad hoc inquiries. Many hierarchical and network DBMSs also provide this capability.
Object-Oriented Model
Object DBMSs add database functionality to object programming languages. They bring much more
than persistent storage of programming language objects. Object DBMSs extend the semantics of the
C++, Smalltalk and Java object programming languages to provide full-featured database
programming capability, while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development into a seamless data model
and language environment. As a result, applications require less code, use more natural data
modeling, and code bases are easier to maintain. Object developers can write complete database
applications with a modest
7
The world of information is made up of data, text, pictures and voice. The relational DBMS was not
designed to store multimedia data, because there are so many different types of sound and video
formats. Although a relational DBMS may provide a BLOB (binary large object) field that holds
anything, extensive use of this field can strain the processing.
An object database is often better suited for multimedia. Using the object model, an object-oriented
DBMS can store anything or refer to anything. For example, a video object can reference a video
file stored elsewhere on some other hard disk and launch the video player software necessary to play
it.
Intelligent Databases
All DBMSs provide some data validation; for example, they can reject invalid dates or alphabetic data entered
into money fields. But most validation is left up to the application programs.
Intelligent databases provide more validation; for example, table lookups can reject bad spelling or coding of
items. Common algorithms can also be used such as one that computes sales tax for an order based on zip
code.
When validation is left up to each application program, one program could allow an item to be entered while
another program rejects it. Data integrity is better served when data validation is done in only one place.
Mainframe DBMSs were the first to become intelligent, and all the others followed suit.
Module II
Relational Model
Relational model and relational databases
8
o Standardisation means that distributed data can be combined more easily.
o Sharing of improvements to the facilities and the implementation can be shared
easily.
All of the information stored in a Relational Database is held in relations No other data structures.
A relation may be thought of as a table
STUDENT
name matric exam1 exam2
Mounia 891023 12 58
Jane 892361 66 90
Thomas 880123 50 65
The relational model consists of three components:
A Structural component - a set of TABLES (also called RELATIONS).
A Manipulative component consists of a set of high-level operations which act upon and
produce whole tables.
A Set of Rules for maintaining the INTEGRITY of the database.
A relation has:
A name
, an unchanging set of columns; named and typed and a time varying set of rows
An attribute is a column of a relation:
Domain: the set of values it may take.
Domain is a set of atomic values (indivisible):
A tuple is a row of a relation
A set of values which are instances of the attributes of a relation
n
Relational schema: a set of attributes and is written R (A1, A2,. . .An) e.g., STUDENT (name,
matric, exam1, exam2
Relation:
a set of tuples which accords with some relational schema.
The degree of a relation:
the number of attributes.
The cardinality:
the number of tuples.
Keys, or candidate Keys
any set of attributes which are unique for each row
Primary key
–one candidate key is chosen to identify the tuples
it is underlined
e.g., STUDENT (name, matric, exam1, exam2)
Relational database schema: set of relation schemas together with a set of "integrity constraint
Relational database instance: a set of relations realising a relational database schema
Base relation: relation which actually exists as a stored file (vs. temporary or view relations)
Foreign Key: an attribute or set of attributes which match the primary key of another relation and
thus link them
9
Constraints: a set of rules which must hold for all relations in a DB and are enforced by the DBMS.
Key Constraints
Every candidate Key is unique for every tuple.
Entity Integrity
No primary Key of a base relation may be null.
The primary Key acts as an identifier for the objects in the relation.
A null Key implies the existence of unidentifiable objects.
Referential Integrity.
Any attributes of a relation which are foreign Keys to another relation must take values which either
exist in the second relation
or are null
The DBMS must continually check that constraints are not violated every time an update (insertion,
deletion, modification) occurs.
Two strategies:
– refuse to perform violating update
– compensate in some way
Compensation is performed by :
– Cascading - make a compensating change to the current tuple then check everything that
refers to it.
– Restricting - only change those tuples which do not violate constraints.
– Nullifying - set foreign Keys to null if referential integrity is violated.
The terminology associated with relational database theory originates from the branch of mathematics
called set theory although there are widely used synonyms for these precise, mathematical terms.
In a relational database, everything is stored in tables. Tables contain columns and rows.
A column is drawn from a DOMAIN, that is, a set of values from which the actual values are
taken. More than one column in a table may draw its values from the same domain.
A column entry in any row is SINGLE-VALUED, i.e. it contains exactly one item only (e.g. a
surname). Repeating groups, i.e. columns which contain sets of values rather than a single
value, not allowed.
Each row of a table is uniquely identified by a PRIMARY KEY composed of one or more
columns. This implies that a table may not contain duplicate rows.
The ENTITY INTEGRITY RULE of the model states that no component of the primary key
may contain a NULL value.
A column, or combination of columns, that matches the primary key of another table is called
a FOREIGN KEY.
The REFERENTIAL INTEGRITY RULE of the model states that, for every foreign key value in a
table there must be a corresponding primary key value in another table in the database.
Relational Operations
The relational model defines a number of relational operators that can be used to manipulate the
database.
These relational operators are:
UNION
INTERSECT
10
DIFFERENCE
DIVIDE
PRODUCT
SELECT
PROJECT
JOIN
The set operations union, intersection and difference require that the relations have same degree
(number of columns) and corresponding attributes must have same domain.
SQL Tutorial
Introduction
SQL (Structured Query Language) allows users to access data in relational database management
systems, such as Oracle, Sybase, Informix, Microsoft SQL Server, Access, and others, by allowing
users to describe the data the user wishes to see. SQL also allows users to define the data in a
database, and manipulate that data. This tutorial will describe how to use SQL, and give examples.
SQL Statements
Command Description
11
Command Description
Introduction to SQL
What is SQL?
SQL is an ANSI (American National Standards Institute) standard computer language for accessing
and manipulating database systems. SQL statements are used to retrieve and update data in a
database. SQL works with database programs like MS Access, DB2, Informix, MS SQL Server,
Oracle, Sybase, etc.
Unfortunately, there are many different versions of the SQL language, but to be in compliance with
the ANSI standard, they must support the same major keywords in a similar manner (such as
SELECT, UPDATE, DELETE, INSERT, WHERE, and others).
Note: Most of the SQL database programs also have their own proprietary extensions in addition to
the SQL standard!
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data.
Below is an example of a table called "Persons":
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger
The table above contains three records (one for each person) and four columns (LastName,
FirstName, Address, and City).
SQL Queries
With SQL, we can query a database and have a result set returned.
A query like this:
SELECT LastName FROM Persons
Gives a result set like this:
LastName
Hansen
Svendson
Pettersen
Note: Some database systems require a semicolon at the end of the SQL statement. We don't use the
semicolon in our tutorials.
SQL (Structured Query Language) is a syntax for executing queries. But the SQL language also
includes a syntax to update, insert, and delete records.
13
These query and update commands together form the Data Manipulation Language (DML) part of
SQL:
SELECT - extracts data from a database table
UPDATE - updates data in a database table
DELETE - deletes data from a database table
INSERT INTO - inserts new data into a database table
The Data Definition Language (DDL) part of SQL permits database tables to be created or deleted.
We can also define indexes (keys), specify links between tables, and impose constraints between
database tables.
The most important DDL statements in SQL are:
CREATE TABLE - creates a new database table
ALTER TABLE - alters (changes) a database table
DROP TABLE - deletes a database table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index
The SELECT statement is used to select data from a table. The tabular result is stored in a result table
(called the result-set).
Syntax
SELECT column_name(s)
FROM table_name
To select the columns named "LastName" and "FirstName", use a SELECT statement like this:
SELECT LastName,FirstName FROM Persons
"Persons" table
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
14
Pettersen Kari Storgt 20 Stavanger
Result
LastName FirstName
Hansen Ola
Svendson Tove
Pettersen Kari
To select all columns from the "Persons" table, use a * symbol instead of column names, like this:
SELECT * FROM Persons
Result
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger
The result from a SQL query is stored in a result-set. Most database software systems allow
navigation of the result set with programming functions, like: Move-To-First-Record, Get-Record-
Content, Move-To-Next-Record, etc.
Programming functions like these are not a part of this tutorial. To learn about accessing data with
function calls, please visit our ADO tutorial.
Semicolon is the standard way to separate each SQL statement in database systems that allow more
than one SQL statement to be executed in the same call to the server.
Some SQL tutorials end each SQL statement with a semicolon. Is this necessary? We are using MS
Access and SQL Server 2000 and we do not have to put a semicolon after each SQL statement, but
some database programs force you to use it.
Syntax
15
SELECT DISTINCT column_name(s)
FROM table_name
To select ALL values from the column named "Company" we use a SELECT statement like this:
SELECT Company FROM Orders
"Orders" table
Company OrderNumber
Sega 3412
W3Schools 2312
Trio 4678
W3Schools 6798
Result
Company
Sega
W3Schools
Trio
W3Schools
Result:
Company
Sega
W3Schools
Trio
16
The WHERE clause is used to specify a selection criterion.
To conditionally select data from a table, a WHERE clause can be added to the SELECT statement.
Syntax
Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE
Search for a pattern
To select only the persons living in the city "Sandnes", we add a WHERE clause to the SELECT
statement:
"Persons" table
Using Quotes
Note that we have used single quotes around the conditional values in the examples.
SQL uses single quotes around text values (most database systems will also accept double quotes).
Numeric values should not be enclosed in quotes.
For text values:
This is correct:
SELECT * FROM Persons WHERE FirstName='Tove'
This is wrong:
SELECT * FROM Persons WHERE FirstName=Tove
This is correct:
SELECT * FROM Persons WHERE Year>1965
This is wrong:
SELECT * FROM Persons WHERE Year>'1965'
Syntax
Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
The following SQL statement will return persons with first names that end with an 'a':
The following SQL statement will return persons with first names that contain the pattern 'la':
SQL ORDER BY
Orders:
Company OrderNumber
Sega 3412
ABC Shop 5678
W3Schools 2312
W3Schools 6798
Example
Result:
Company OrderNumber
ABC Shop 5678
Sega 3412
19
W3Schools 6798
W3Schools 2312
Example
To display the companies in alphabetical order AND the ordernumbers in numerical order:
Result:
Company OrderNumber
ABC Shop 5678
Sega 3412
W3Schools 2312
W3Schools 6798
Example
Company OrderNumber
W3Schools 6798
W3Schools 2312
Sega 3412
ABC Shop 5678
Example
To display the companies in reverse alphabetical order AND the ordernumbers in numerical order:
Result:
Company OrderNumber
W3Schools 2312
W3Schools 6798
Sega 3412
20
ABC Shop 5678
AND & OR
Example
Use AND to display each person with the first name equal to "Tove", and the last name equal to
"Svendson":
Result:
Example
Use OR to display each person with the first name equal to "Tove", or the last name equal to
"Svendson":
Result:
21
LastName FirstName Address City
Svendson Tove Borgvn 23 Sandnes
Svendson Stephen Kaivn 18 Sandnes
Example
You can also combine AND and OR (use parentheses to form complex expressions):
Result:
SQL IN
IN
The IN operator may be used if you know the exact value you want to return for at least one of the
columns.
22
Example 1
To display the persons with LastName equal to "Hansen" or "Pettersen", use the following SQL:
Result:
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Pettersen Kari Storgt 20 Stavanger
SQL BETWEEN
The BETWEEN ... AND operator selects a range of data between two values. These values can be numbers, text, or dates.
SELECT column_name FROM table_name
WHERE column_name
BETWEEN value1 AND value2
Example 1
To display the persons alphabetically between (and including) "Hansen" and exclusive "Pettersen",
use the following SQL:
Result:
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Nordmann Anna Neset 18 Sandnes
23
IMPORTANT! The BETWEEN...AND operator is treated differently in different databases. With
some databases a person with the LastName of "Hansen" or "Pettersen" will not be listed
(BETWEEN..AND only selects fields that are between and excluding the test values). With some
databases a person with the last name of "Hansen" or "Pettersen" will be listed (BETWEEN..AND
selects fields that are between and including the test values). With other databases a person with the
last name of "Hansen" will be listed, but "Pettersen" will not be listed (BETWEEN..AND selects
fields between the test values, including the first test value and excluding the last test value).
Therefore: Check how your database treats the BETWEEN....AND operator!
Example 2
To display the persons outside the range used in the previous example, use the NOT operator:
Result:
SQL Aliases
With SQL, aliases can be used for column names and table names.
Table Employees:
LastName FirstName
Hansen Ola
Svendson Tove
Pettersen Kari
SQL Join
Employees:
Employee_ID Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
Orders:
Prod_ID Product Employee_ID
234 Printer 01
657 Table 03
865 Chair 03
We can select data from two tables by referring to two tables, like this:
Example
Result
Name Product
Hansen, Ola Printer
Svendson, Stephen Table
Svendson, Stephen Chair
26
Example
SELECT Employees.Name
FROM Employees, Orders
WHERE Employees.Employee_ID=Orders.Employee_ID
AND Orders.Product='Printer'
Result
Name
Hansen, Ola
Using Joins
OR we can select data from two tables with the JOIN keyword, like this:
Syntax
The INNER JOIN returns all rows from both tables where there is a match. If there are rows in
Employees that do not have matches in Orders, those rows will not be listed.
Result
Name Product
Hansen, Ola Printer
Svendson, Stephen Table
Svendson, Stephen Chair
Syntax
27
SELECT field1, field2, field3
FROM first_table
LEFT JOIN second_table
ON first_table.keyfield = second_table.foreign_keyfield
The LEFT JOIN returns all the rows from the first table (Employees), even if there are no matches in
the second table (Orders). If there are rows in Employees that do not have matches in Orders, those
rows also will be listed.
Result
Name Product
Hansen, Ola Printer
Svendson, Tove
Svendson, Stephen Table
Svendson, Stephen Chair
Pettersen, Kari
Syntax
The RIGHT JOIN returns all the rows from the second table (Orders), even if there are no matches in
the first table (Employees). If there had been any rows in Orders that did not have matches in
Employees, those rows also would have been listed.
Result
Name Product
Hansen, Ola Printer
28
Svendson, Stephen Table
Svendson, Stephen Chair
Example
SELECT Employees.Name
FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID
WHERE Orders.Product = 'Printer'
Result
Name
Hansen, Ola
UNION
The UNION command is used to select related information from two tables, much like the JOIN
command. However, when using the UNION command all selected columns need to be of the same
data type.
Note: With UNION, only distinct values are selected.
SQL Statement 1
UNION
SQL Statement 2
Employees_Norway:
Employee_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
Employees_USA:
Employee_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
29
04 Scott, Stephen
Example
Result
Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Scott, Stephen
Note: This command cannot be used to list all employees in Norway and USA. In the example above
we have two employees with equal names, and only one of them is listed. The UNION command
only selects distinct values.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
SQL Statement 1
UNION ALL
SQL Statement 2
Example
30
Result
Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Svendson, Stephen
Scott, Stephen
SQL Functions
Function Syntax
Types of Functions
There are several basic types and categories of functions in SQL. The basic types of functions are:
Aggregate Functions
Scalar functions
Aggregate functions
Aggregate functions operate against a collection of values, but return a single value.
Note: If used among many other expressions in the item list of a SELECT statement, the SELECT
must have a GROUP BY clause!!
Name Age
31
Hansen, Ola 34
Svendson, Tove 45
Pettersen, Kari 19
Function Description
AVG(column) Returns the average value of a column
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
FIRST(column) Returns the value of the first record in a specified field
LAST(column) Returns the value of the last record in a specified field
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
STDEV(column)
STDEVP(column)
SUM(column) Returns the total sum of a column
VAR(column)
VARP(column)
GROUP BY...
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate of all
column values every time they are called, and without the GROUP BY function it was impossible to
find the sum for each individual group of column values.
GROUP BY Example
The above code is invalid because the column returned is not part of an aggregate. A GROUP BY
clause will solve this problem:
HAVING...
HAVING... was added to SQL because the WHERE keyword could not be used against aggregate
functions (like SUM), and without HAVING... it would be impossible to test for result conditions.
This SQL:
33
SELECT Company,SUM(Amount) FROM Sales
GROUP BY Company
HAVING SUM(Amount)>10000
The SELECT INTO statement is most often used to create backup copies of tables or for archiving
records.
Syntax
If you only want to copy a few fields, you can do so by listing them after the SELECT statement:
SELECT LastName,FirstName INTO Persons_backup
FROM Persons
You can also add a WHERE clause. The following example creates a "Persons_backup" table with two columns
(FirstName and LastName) by extracting the persons who lives in "Sandnes" from the "Persons" table:
SELECT LastName,Firstname INTO Persons_backup
FROM Persons
WHERE City='Sandnes'
Selecting data from more than one table is also possible. The following example creates a new table "Empl_Ord_backup"
that contains data from the two tables Employees and Orders:
SELECT Employees.Name,Orders.Product
34
INTO Empl_Ord_backup
FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID
Product Spotlight
What is a View?
In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement. A view contains
rows and columns, just like a real table. The fields in a view are fields from one or more real tables in
the database. You can add SQL functions, WHERE, and JOIN statements to a view and present the
data as if the data were coming from a single table.
Note: The database design and structure will NOT be affected by the functions, where, or join
statements in a view.
Syntax
Note: The database does not store the view data! The database engine recreates the data, using the
view's SELECT statement, every time a user queries a view.
Using Views
A view could be used from inside a query, a stored procedure, or from inside another view. By adding
functions, joins, etc., to a view, it allows you to present exactly the data you want to the user.
The sample database Northwind has some views installed by default. The view "Current Product
List" lists all active products (products that are not discontinued) from the Products table. The view is
created with the following SQL:
35
FROM Products
WHERE Discontinued=No
Another view from the Northwind sample database selects every product in the Products table that
has a unit price that is higher than the average unit price:
Another example view from the Northwind database calculates the total sale for each category in
1997. Note that this view select its data from another view called "Product Sales for 1997":
We can also add a condition to the query. Now we want to see the total sale only for the category
"Beverages":
The INSERT INTO statement is used to insert new rows into a table.
Syntax
36
INSERT INTO table_name
VALUES (value1, value2,....)
You can also specify the columns for which you want to insert data:
37
Hetland Camilla Hagabakka 24 Sandnes
Rasmussen Storgt 67
Syntax
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value
Person:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Storgt 67
We want to add a first name to the person with a last name of "Rasmussen":
Result:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Storgt 67
We want to change the address and add the name of the city:
UPDATE Person
SET Address = 'Stien 12', City = 'Stavanger'
WHERE LastName = 'Rasmussen'
38
Result:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Stien 12 Stavanger
Syntax
Person:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Stien 12 Stavanger
Delete a Row
Result
39
It is possible to delete all rows in a table without deleting the table. This means that the table
structure, attributes, and indexes will be intact:
DML
Create a Database
To create a database:
CREATE DATABASE database_name
Create a Table
Example
This example demonstrates how you can create a table named "Person", with four columns. The
column names will be "LastName", "FirstName", "Address", and "Age":
This example demonstrates how you can specify a maximum length for some columns:
CREATE TABLE Person
(
40
LastName varchar(30),
FirstName varchar,
Address varchar,
Age int(3)
)
The data type specifies what type of data the column can hold. The table below contains the most
common data types in SQL:
Create Index
Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to
create an index on one or more columns of a table, and each index is given a name. The users cannot
see the indexes, they are just used to speed up queries.
Note: Updating a table containing indexes takes more time than updating a table without, this is
because the indexes also need an update. So, it is a good idea to create indexes only on columns that
are often used for a search.
A Unique Index
Creates a unique index on a table. A unique index means that two rows cannot have the same index
value.
A Simple Index
Creates a simple index on a table. When the UNIQUE keyword is omitted, duplicate values are
allowed.
This example creates a simple index, named "PersonIndex", on the LastName field of the
Person table:
CREATE INDEX PersonIndex
ON Person (LastName)
If you want to index the values in a column in descending order, you can add the reserved word
DESC after the column name:
CREATE INDEX PersonIndex
ON Person (LastName DESC)
If you want to index more than one column you can list the column names within the parentheses, separated by commas:
CREATE INDEX PersonIndex
ON Person (LastName, FirstName)
Drop Index
You can delete an existing index in a table with the DROP statement.
To delete a table (the table structure, attributes, and indexes will also be deleted):
To delete a database:
Truncate a Table
What if we only want to get rid of the data inside a table, and not the table itself? Use the
TRUNCATE TABLE command (deletes only the data inside the table):
42
TRUNCATE TABLE table_name
ALTER TABLE
The ALTER TABLE statement is used to add or drop columns in an existing table.
Note: Some database systems don't allow the dropping of a column in a database table (DROP
COLUMN column_name).
Person:
LastName FirstName Address
Pettersen Kari Storgt 20
Example
Result:
LastName FirstName Address City
Pettersen Kari Storgt 20
Example
Result:
LastName FirstName City
Pettersen Kari
43
PL/SQL
Oracle's Procedural SQL
Usually PL/SQL programs are written within other applications (such as forms or reports).
1. Variable Declaration
variable_name data_type;
2. Reference Variable declarations inherit their data type from a referenced database field.
variable_name table_name.field_name%TYPE;
3. Scalar Data Types
o VARCHAR2
o CHAR
o DATE
o LONG
o NUMBER
o Integers
o Decimal Numbers
o Boolean
4. PL/SQL Program Structure
DECLARE
Variable Declarations
BEGIN
Program Statements
EXCEPTION
Error-handling Statements
END;
5. Comments
Multiline comments are delimited with /* ... */
statements
IF condition
EXIT;
END IF;
statements
END LOOP;
LOOP
statements
EXIT WHEN condition;
END LOOP;
WHILE condition;
LOOP
statements
END LOOP;
FOR counter
IN start_val .. end_val
LOOP
statements
45
END LOOP;
14. Cursors
A cursor is a pointer that references the context area which contains command information
such as the number of rows processed and the active set (the set of data rows returned by the
query).
Cursor Attributes
o %NOTFOUND - True when no rows left to fetch
o %FOUND - True when rows left to fetch
o %ROWCOUNT - Number of rows a cursor has fetched so far
o %ISOPEN - True if cursor is open
Implicit Cursors are created automatically for each INSERT, UPDATE, DELETE and
SELECT command.
The command may return only one record to use an implicit cursor.
SELECT field_names
INTO declared_variable_names
FROM table_names
WHERE join_condition
AND record_selection_condition_for_exactly_one_record;
Explicit cursors must be declared and can be used with queries that return any number of
records.
DECLARE
variable declarations
CURSOR cursor_name IS
Select_statement;
BEGIN
OPEN cursor_name;
LOOP
END LOOP
CLOSE cursor_name;
END;
/
46
DECLARE
variable declarations
CURSOR cursor_name IS
Select_statement;
%ROWTYPE declaration
BEGIN
statements
END LOOP;
END;
/
DECLARE
variable declarations
CURSOR cursor_name IS
SELECT field_names
FROM table_name
WHERE record_selection_condition
FOR UPDATE;
row_variable cursor_name%ROWTYPE
BEGIN
UPDATE table_name
SET field_name = value
WHERE CURRENT OF cursor_name;
END LOOP;
END;
/
NOTE: WHERE CURRENT OF is a reference to the current row.
Example:
DECLARE
state_codes(1) := 'AK';
TYPE table_data_type_name
IS TABLE OF database_table_name%ROWTYPE
INDEX BY BINARY_INTEGER;
To assign a value to a field of a record in a table of records use the following syntax.
table_name(key_value).database_field_name := field_value;
Code to handle run-time errors in a PL/SQL program is located in the EXCEPTION section of
a PL/SQL program.
48
Error-handling Statements
END;
Date Functions
Name Description
LAST_DAY Returns the last day in the month of the specified date.
NEW_TIME Returns the date/time value, with the time shifted as requested by the
specified time zones.
NEXT_DAY Returns the date of the first weekday specified that is later than the date.
SYSDATE Returns the current date and time in the Oracle Server.
TRUNC Truncates the specified date of its time portion according to the format unit
provided.
NOTE: In the examples in this chapter, a date contained in single quotation marks is a character
string. PL/SQL converts the string to a true date datatype when it applies the function. (This is an
implicit conversion.) Date values that are displayed in the format DD-MON-YYYY and are not
contained in single quotation marks represent actual date values in the database.
12-DEC-1997
'12-DEC-1997'
49
Remember, a date has its own internal storage format and cannot be viewed or entered directly. These
examples also assume that the default format mask for dates is DD-MON-YYYY.
The ADD_MONTHS function returns a new date with the specified number of months added to the
input date. The specification for ADD_MONTHS is as follows:
ADD_MONTHS is an overloaded function. You can specify the date and the number of months by
which you want to shift that date, or you can list the month_shift parameter first and then the date.
Both arguments are required.
Date Arithmetic
PL/SQL allows you to perform arithmetic operations directly on date variables. You may add
numbers to a date or subtract numbers from a date. To move a date one day in the future, simply add
1 to the date as shown below:
hire_date + 1
You can even add a fractional value to a date. For example, adding 1/24 to a date adds an hour to the
time component of that value. Adding 1/(24*60) adds a single minute to the time component, and so
on.
If the month_shift parameter is positive, ADD_MONTHS returns a date for that number of months
into the future. If the number is negative, ADD_MONTHS returns a date for that number of months
in the past. Here are some examples that use ADD_MONTHS:
ADD_MONTHS always shifts the date by whole months. You can provide a fractional value for the
month_shift parameter, but ADD_MONTHS will always round down to the whole number nearest
zero, as shown in these examples:
50
ADD_MONTHS ('28-FEB-1989', 0) ==> 28-FEB-1989
If you want to shift a date by a fraction of a month, simply add to or subtract from the date the
required number of days. PL/SQL supports direct arithmetic operations between date values.
If the input date to ADD_MONTHS does not fall on the last day of the month, the date returned by
ADD_MONTHS falls on the same day in the new month as in the original month. If the day number
of the input date is greater than the last day of the month returned by ADD_MONTHS, the function
sets the day number to the last day in the new month. For example, there is no 31st day in February,
so ADD_MONTHS returns the last day in the month:
This is perfectly reasonable. However, what if the input date falls on the last day of the month and the
new month has more days in it than the original month? If I shift two months forward from 28-FEB-
1994, do I get back 30-APR-1994 (the last day of the month) or 28-APR-1994 (the same day in the
new month as in the old month)? The answer is:
If you pass to ADD_MONTHS a day representing the last day in the month, PL/SQL always returns
the last day in the resulting month, regardless of the number of actual days in each of the months.
This quirk can cause problems. I offer a solution in the section entitled Section 12.2.1, "Customizing
the Behavior of ADD_MONTHS"" later in this chapter
The LAST_DAY function returns the date of the last day of the month for a given date. The
specification is:
This function is useful because the number of days in a month varies throughout the year. With
LAST_DAY, for example, you do not have to try to figure out if February of this or that year has 28
or 29 days. Just let LAST_DAY figure it out for you.
Get the last day of the month three months after being hired:
51
LAST_DAY (SYSDATE) - SYSDATE
The MONTHS_BETWEEN function calculates the number of months between two dates and returns
that difference as a number. The specification is:
If date1 and date2 are in the same month, then MONTHS_BETWEEN returns a fraction (a value
between -1 and +1).
If date1 and date2 both fall on the last day of their respective months, then
MONTHS_BETWEEN returns a whole number (no fractional component).
If date1 and date2 are in different months and at least one of the dates is not a last day in the
month, MONTHS_BETWEEN returns a fractional number. The fractional component is
calculated on a 31-day month basis and also takes into account any differences in the time
component of date1 and date2.
Calculate two ends of month, the first earlier than the second:
Calculate two ends of month, the first later than the second:
If you detect a pattern here you are right. As I said, MONTHS_BETWEEN calculates the fractional
component of the number of months by assuming that each month has 31 days. Therefore, each
additional day over a complete month counts for 1/31 of a month, and:
52
According to this rule, the number of months between January 31, 1994 and February 28, 1994 is
one -- a nice, clean integer. But to calculate the number of months between January 31, 1994 and
March 1, 1994, I have to add an additional .032258065 to the difference (and make that additional
number negative because in this case MONTHS_BETWEEN counts from the first date back to the
second date.
I don't know about you, but I am simply unable to remember the time in Anchorage when it is 3:00
P.M. in Chicago (and I really doubt that a lot of people in Anchorage can convert to Midwest U.S.
time). Fortunately for me, PL/SQL provides the NEW_TIME function. This function converts dates
(along with their time components) from one time zone to another. The specification for NEW_TIME
is:
where date_in is the original date, zone1 is the starting point for the zone switch (usually, but not
restricted to, your own local time zone), and zone2 is the time zone in which the date returned by
NEW_TIME should be placed.
The specification of time zones to NEW_TIME is not case-sensitive, as the following example
shows:
The NEXT_DAY function returns the date of the first day after the specified date which falls on the
specified day of the week. Here is the specification for NEXT_DAY:
The day_name must be a day of the week in your session's date language (specified by the
NLS_DATE_LANGUAGE database initialization parameter). The time component of the returned
date is the same as that of the input date, date_in. If the day of the week of the input date matches the
specified day_name, then NEXT_DAY will return the date seven days (one full week) after date_in.
NEXT_DAY does not return the input date if the day names match.
Here are some examples of the use of NEXT_DAY. Let's figure out the date of the first Monday and
Wednesday in 1997 in all of these examples.
The ROUND function rounds a date value to the nearest date as specified by a format mask. It is just
like the standard numeric ROUND function, which rounds a number to the nearest number of
specified precision, except that it works with dates. The specification for ROUND is as follows:
The ROUND function always rounds the time component of a date to midnight (12:00 A.M.). The
format mask is optional. If you do not include a format mask, ROUND rounds the date to the nearest
day. In other words, it checks the time component of the date. If the time is past noon, then ROUND
returns the next day with a time component of midnight.
The set of format masks for ROUND is a bit different from those masks used by TO_CHAR and
TO_DATE. (See Chapter 14, Conversion Functions, for more information on these functions.) The
masks are listed in Table 12.3. These same formats are used by the TRUNC function, described later
in this chapter, to perform truncation on dates.
CC or SSC Century
MONTH, MON, MM, or RM Month (rounds up on the sixteenth day, which is not necessarily
the same as the middle of the month)
54
Table 12.3: Format Masks for ROUND and TRUNC
IW Same day of the week as the first day of the ISO year
MI Minute
55
TO_CHAR (TO_DATE ('01-APR-1994'), 'DAY') ==> FRIDAY
In the rest of the examples I use TO_DATE in order to pass a time component to the ROUND
function, and TO_CHAR to display the new time.
The SYSDATE function returns the current system date and time as recorded in the database. The
time component of SYSDATE provides the current time to the nearest second. It takes no arguments.
The specification for SYSDATE is:
SYSDATE is a function without parameters; as a result, it looks like a system-level variable and
programmers tend to use it as if it is a variable. For example, to assign the current date and time to a
local PL/SQL variable, you would enter the following:
my_date := SYSDATE;
However, SYSDATE is not a variable. When you use SYSDATE, you are calling a function, which
executes underlying code.
NOTE: In Oracle Version 6 and the earliest releases of the Oracle Server, when you called
SYSDATE, PL/SQL issued an implicit cursor to the database to get the current date and time, as
follows:
The TRUNC function truncates date values according to the specified format mask. The specification
for TRUNC is:
The TRUNC date function is similar to the numeric FLOOR function discussed in Chapter 13,
Numeric, LOB, and Miscellaneous Functions. Generally speaking, it rounds down to the beginning of
the minute, hour, day, month, quarter, year, or century, as specified by the format mask.
TRUNC offers the easiest way to retrieve the first day of the month or first day of the year. It is also
useful when you want to ignore the time component of dates. This is often the case when you perform
comparisons with dates, such as the following:
The date component of date_entered and start_date might be the same, but if your application does
not specify a time component for each of its dates, the comparison might fail. If, for example, the user
enters a request_date and the screen does not include a time component, the time for request_date will
be midnight or 12:00 A.M. of that day. If start_date was set from SYSDATE, however, its time
component will reflect the time at which the assignment was made. Because 12:00 A.M. comes
before any other time of the day, a comparison that looks to the naked eye like a match might well
fail.
If you are not sure about the time components of your date fields and variables and want to make sure
that your operations on dates disregard the time component, TRUNCate them:
TRUNC levels the playing field with regard to the time component: all dates now have the same time
of midnight (12:00 A.M.). The time will never be a reason for a comparison to fail.
Here are some examples of TRUNC for dates (all assuming a default date format mask of DD-MON-
YYYY):
Without a format mask, TRUNC sets the time to 12:00 A.M. of the same day:
TO_CHAR (TRUNC (TO_DATE ('11-SEP-1994 9:36 AM', 'DD-MON-YYYY HH:MI AM'))
==> 11-SEP-1994 12:00 AM
57
==> 01-JAN-1900
In the rest of the examples I use TO_DATE to pass a time component to the TRUNC function, and
TO_CHAR to display the new time:
Trunc back to the beginning of the current day (time is always midnight):
TO_CHAR (TRUNC (TO_DATE ('11-SEP-1994 10:00 AM',
'DD-MON-YYYY HH:MI AM'), 'DD'),
'DD-MON-YYYY HH:MI AM')
==> 11-SEP-1994 12:00 AM
TO_CHAR (TRUNC (TO_DATE ('11-SEP-1994 4:00 PM',
'DD-MON-YYYY HH:MI AM'), 'DD'),
'DD-MON-YYYY HH:MI AM')
==> 11-SEP-1994 12:00 AM
58
Example Queries
1. For example, to find the branch-name, loan number, customer name and amount for loans
over $1200:
This gives us all attributes, but suppose we only want the customer names. (We would use project in the
algebra.)
1. In English, we may read this equation as ``the set of all tuples such that there exists a tuple in the
relation borrow for which the values of and for the cname attribute are equal, and the value of for the amount
attribute is greater than 1200.''
The notation means ``there exists a tuple in relation such that predicate is true''.
How did we get the above expression? We needed tuples on scheme cname such that there were tuples in borrow
pertaining to that customer name with amount attribute .
The tuples get the scheme cname implicitly as that is the only attribute is mentioned with.
Find all customers having a loan from the SFU branch, and the the cities in which they live:
In English, we might read this as ``the set of all (cname,ccity) tuples for which cname is a borrower at the SFU
branch, and ccity is the city of cname''.
Tuple variable ensures that the customer is a borrower at the SFU branch.
Tuple variable is restricted to pertain to the same customer as , and also ensures that ccity is the city of the
customer.
The logical connectives (AND) and (OR) are allowed, as well as (negation).
1. 1. Find all customers having a loan, an account, or both at the SFU branch:
59
2. Find all customers who have both a loan and an account at the SFU branch.
3. Find customers who have an account, but not a loan at the SFU branch.
4. Find all customers who have an account at all branches located in Brooklyn. (We used division in relational
algebra.)
For this example we will use implication, denoted by a pointing finger in the text, but by here.
Formal Definitions
1. A tuple relational calculus expression is of the form
where is a formula.
60
Safety of Expressions
1. A tuple relational calculus expression may generate an infinite expression, e.g.
2. There are an infinite number of tuples that are not in borrow! Most of these tuples contain
values that do not appear in the database.
3. Safe Tuple Expressions
o The domain of a formula , denoted dom( ), is the set of all values referenced in .
o These include values mentioned in as well as values that appear in a tuple of a
relation mentioned in .
o So, the domain of is the set of all values explicitly appearing in or that appear in
relations mentioned in .
o is the set of all values appearing in borrow.
o is the set of all values appearing in borrow.
We may say an expression is safe if all values that appear in the result are values from dom( ).
4. A safe expression yields a finite number of tuples as its result. Otherwise, it is called unsafe.
Deletion
Deletion is expressed in much the same way as a query. Instead of displaying, the selected tuples
are removed from the database. We can only delete whole tuples.
Some examples:
2. Delete all loans with loan numbers between 1300 and 1500.
61
Insertions
To insert data into a relation, we either specify a tuple, or write a query whose result is the set of
tuples to be inserted. Attribute values for inserted tuples must be members of the attribute's
domain.
An insertion is expressed by
Some examples:
1. To insert a tuple for Smith who has $1200 in account 9372 at the SFU branch.
2. To provide all loan customers in the SFU branch with a $200 savings account.
Updating
Updating allows us to change some values in a tuple without necessarily changing all.
∏ A=E ( r)
The expression is any arithmetic expression involving constants and attributes in relation .
Some examples:
∏ balance=balance*1.05(deposit)
62
∏balance=balance*1.05(σ balance>10000(deposit))
∏ balance=balance*1.05(σ balance<=10000(deposit))
Note: in this example the order of the two operations is important. (Why?)
Views
1. We have assumed up to now that the relations we are given are the actual relations stored in
the database.
2. For security and convenience reasons, we may wish to create a personalized collection of
relations for a user.
3. We use the term view to refer to any relation, not part of the conceptual model, that is made
visible to the user as a ``virtual relation''.
4. As relations may be modified by deletions, insertions and updates, it is generally not possible
to store views. (Why?) Views must then be recomputed for each query referring to them.
View Definition
3. Having defined a view, we can now use it to refer to the virtual relation it creates. View names
can appear anywhere a relation name can.
4. We can now find all customers of the SFU branch by writing
63
3. Since SQL allows a view name to appear anywhere a relation name may appear, the clerk can
write:
This insertion is represented by an insertion into the actual relation borrow, from which the view is constructed.
The symbol null represents a null or place-holder value. It says the value is unknown or does not exist.
4. Another problem with modification through views: consider the view
This view lists the cities in which the borrowers of each branch live.
Using nulls is the only possible way to do this (see Figure 3.22 in the textbook).
If we do this insertion with nulls, now consider the expression the view actually corresponds to:
As comparisons involving nulls are always false, this query misses the inserted tuple.
Module III
Transaction Concept
A transaction is a unit of program execution that accesses and possibly updates various data
items.
A transaction must see a consistent database.
During transaction execution the database may be inconsistent.
When the transaction is committed, the database must be consistent.
Two main issues to deal with:
Failures of various kinds, such as hardware failures and system crashes
Concurrent execution of multiple transactions
Data processing folks like to talk about the "ACID test" when deciding whether or not a
database management system is adequate for handling transactions. An adequate system has
the following properties:
64
Atomicity
Results of a transaction's execution are either all committed or all rolled back. All changes
take effect, or none do. That means, for Joe User's money transfer, that both his savings
and checking balances are adjusted or neither are.
Consistency
The database is transformed from one valid state to another valid state. This defines a
transaction as legal only if it obeys user-defined integrity constraints. Illegal transactions
aren't allowed and, if an integrity constraint can't be satisfied then the transaction is rolled
back. For example, suppose that you define a rule that, after a transfer of more than
$10,000 out of the country, a row is added to an audit table so that you can prepare a
legally required report for the IRS. Perhaps for performance reasons that audit table is
stored on a separate disk from the rest of the database. If the audit table's disk is off-line
and can't be written, the transaction is aborted.
Isolation
The results of a transaction are invisible to other transactions until the transaction is
complete. For example, if you are running an accounting report at the same time that Joe
is transferring money, the accounting report program will either see the balances before
Joe transferred the money or after, but never the intermediate state where checking has
been credited but savings not yet debited.
Durability
Once committed (completed), the results of a transaction are permanent and survive future
system and media failures. If the airline reservation system computer gives you seat 22A
and crashes a millisecond later, it won't have forgotten that you are sitting in 22A and also
give it to someone else. Furthermore, if a programmer spills coffee into a disk drive, it
will be possible to install a new disk and recover the transactions up to the coffee spill,
showing that you had seat 22A.
Example of Fund Transfer
Transaction States
65
Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are:
increased processor and disk utilization, leading to better transaction throughput: one
transaction can be using the CPU while another is reading from or writing to the disk
reduced average response time for transactions: short transactions need not wait
behind long ones.
Concurrency control – provides mechanisms to achieve isolation, i.e., to control the
interaction among the concurrent transactions in order to prevent them from destroying
the consistency of the database
Serializability
Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A schedule defines sequence in which operations were done
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
We ignore operations other than read and write instructions, Our simplified schedules
consist of only read and write instructions.
Conflict Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions wrote
Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
66
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If
li and lj are consecutive in a schedule and they do not conflict, their results would remain
the same even if they had been interchanged in the schedule.
View Serializability
Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
Ti must, in schedule S´, also read the initial value of Q.
2. For each data item Q if Ti executes read(Q) in schedule S, and that value was produced
by transaction Tj (if any), then Ti must in schedule S´ also read the value of Q that was
produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation
in schedule S must perform the final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone.
Recoverability
Need to address the effect of transaction failures on concurrently
running transactions.
Recoverable schedule — if Tj reads a data items previously written by Ti , the commit
operation of Ti appears before the commit operation of Tj.
The following schedule is not recoverable if T9 commits immediately after the read
If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
database state. Hence database must ensure that schedules are recoverable.
Cascading rollback – a single transaction failure leads to a series of transaction rollbacks.
Consider the following schedule where none of the transactions has yet committed (so the
schedule is recoverable)
If T10 fails, T11 and T12 must also be rolled back.
Can lead to the undoing of a significant amount
Cascadeless schedules — where cascading rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are cascadeless.
Transaction Definition in SQL
Data manipulation language must include a construct for specifying the set of actions that
comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
Serializable — default
Repeatable read
Read committed
Read uncommitted
Levels of Consistency in SQL-92
Serializable — default
Repeatable read — only committed records to be read, repeated reads of same record must
return same value. However, a transaction may not be serializable – it may find some records
67
inserted by a transaction but not find others.
Lock-Based Protocols
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested
using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after
request is granted.
Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction holds an
exclusive on the item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all incompatible
locks held by other transactions have been released. The lock is then granted.
The Two-Phase Locking Protocol
This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase
transaction may obtain locks
transaction may not release locks
Phase 2: Shrinking Phase
transaction may release locks
transaction may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be serialized
in the order of their lock points (i.e. the point where a transaction acquired its final lock).
Two-phase locking does not ensure freedom from deadlocks
Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified
protocol called strict two-phase locking. Here a transaction must hold all its exclusive
locks till it commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In
this protocol transactions can be serialized in the order in which they commit.
Pitfalls of Lock-Based Protocols (Cont.)
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary
evil.
Starvation is also possible if concurrency control manager is badly designed. For example:
A transaction may be waiting for an X-lock on an item, while a sequence of other
transactions request and are granted an S-lock on the same item.
The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.
Deadlock Handling
68
System is deadlocked if there is a set of transactions such that every transaction in the set
is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock
state. Some prevention strategies :
Require that each transaction locks all its data items before it begins execution
(predeclaration).
Impose partial ordering of all data items and require that a transaction can lock
data items only in the order specified by the partial order (graph-based protocol).
More Deadlock Prevention Strategies
Following schemes use transaction timestamps for the sake of deadlock prevention alone.
wait-die scheme — non-preemptive
older transaction may wait for younger one to release data item. Younger
transactions never wait for older ones; they are rolled back instead.
a transaction may die several times before acquiring needed data item
wound-wait scheme — preemptive
older transaction wounds (forces rollback) of younger transaction instead of
waiting for it. Younger transactions may wait for older ones.
may be fewer rollbacks than wait-die
Both in wait-die and in wound-wait schemes, a rolled back transactions is restarted
with its original timestamp. Older transactions thus have precedence over newer
ones, and starvation is hence avoided.
Timeout-Based Schemes :
a transaction waits for a lock only for a specified amount of time. After
that, the wait times out and the transaction is rolled back.
thus deadlocks are not possible
simple to implement; but starvation is possible. Also difficult to determine
good value of the timeout interval.
69
Wait-for graph with a cycle
Deadlock Recovery
When deadlock is detected :
Some transaction will have to rolled back (made a victim) to break deadlock. Select that
transaction as victim that will incur minimum
cost.
Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to
break deadlock.
Starvation happens if same transaction is always chosen as victim. Include
the number of rollbacks in the cost factor to avoid starvation
Recovery mechanism
Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to some internal error
condition
System errors: the database system must terminate an active transaction
due to an error condition (e.g., deadlock)
System crash: a power failure or other hardware or software failure causes the
system to crash.
Fail-stop assumption: non-volatile storage contents are assumed not to be
corrupted by system crash
Database systems have numerous integrity checks to prevent
corruption of disk data
Disk failure: a head crash or similar disk failure destroys all or part of disk storage
Destruction is assumed to be detectable: disk drives use checksums to
detect failures
70
TRIGGERS
In a DBMS, a trigger is a SQL procedure that initiates (fires) an action when an event (INSERT,
DELETE, or UPDATE) occurs. Since triggers are event-driven specialized procedures, the DBMS
stores and manages them. A trigger cannot be called or executed; the DBMS automatically fires the
trigger as a result of a data modification to the associated table. Triggers maintain the referential
integrity of data by changing the data in a systematic fashion.
Constraint: The integrity constraint or business rule enforced by the trigger -- in other words,
the "purpose" of the trigger. Practically speaking, this should appear in the header of the
trigger, which should be reflected in its name. For example, the "Positive Opening Balance"
constraint requires that all new accounts must have non-negative balances.
* Event: A specific situation occurring in the database, which indicates when the constraint should be
enforced. The event is said to be "raised" when the situation occurs. The event is often specified in
two ways: as a state change on the database (for example, an insert into the Accounts table) and as an
optional predicate condition used to filter out some of the state changes (for example, only those
inserts into the Accounts table with a negative value for the Balance column). Specific moments are
also very interesting events. Note that "event" in this generic definition should not be confused with
the database event alerters supported by some DBMSs such as CA-OpenIngres.
* Action: A procedure or a sequence of procedural operations that implement the processing logic
required to enforce the constraint. For example, the action must enforce the business rule that
accounts may not have negative opening balances. This can be done by refusing the insert operation if
the opening balance is negative, or by replacing the negative balance with zero and inserting an entry
in a journaling table. The implied "if" condition highlights another point: conventional database
manipulation operations (that is, SQL select, insert, update, and delete statements) are usually too
limited to implement the required actions. They must be extended with procedural constructs such as
iteration (while, for, and repeat) and conditional (if and case) statements. A trigger is therefore an
event that "fires" an action to enforce a constraint.
Advantages of Triggers
The most attractive feature of triggers is that they are stored and executed in the database. This has
the following advantages:
* The triggers always fire when the associated events occur. Application developers do not have to
remember to include the functionality in each application, and users cannot bypass the triggers
through interactive tools. (But every skilled DBA knows how to bypass the triggers on his or her
system.) Most DBMSs have some mechanism to bypass the trigger, either by temporarily
deactivating the triggers or by using a "trace point" or some similar mechanism.
* Triggers are administered centrally. They are coded once, tested once, and then centrally enforced
for all the applications accessing the database. The triggers are usually controlled, or at least audited,
by a skilled DBA. The result is that the triggers are implemented efficiently.
71
subsequent operations performed on the database. The data and operations are not "dragged" across
the network between the client and the server.
Because triggers are so powerful, they must be managed well and they must be used correctly.
Inefficient triggers can bring the database server to its knees, due to the amount of work fired off in
the database. Incorrect triggers can corrupt the integrity of the data.
Triggers are extremely powerful constructs and can be used for various purposes, for example:
* Integrity control: You can use triggers to implement domain integrity, column integrity, referential
integrity, and unconventional integrity constraints. I will address the declarative vs. the do-it-yourself
(that is, using triggers) approaches to integrity control in my next column.
* Business rules: You can use triggers to centrally enforce business rules. Business rules are
constraints that are enforced over the relationships between tables or between different rows in the
same table. For example, the sum of the amounts of the InvoiceItems rows must add up to the total on
the row in the Invoices table for the corresponding invoice -- that is, if your organization's standards
and/or DBAs let you have denormalized derived attributes in your physical data model.
* Application logic: You can use triggers to enforce business logic centrally, for example, to insert
rows automatically in the Orders and OrderItems tables when the QuantityOnHand value in the
Stocks table drops below a given threshold. Business rules could be formalized and could actually be
defined declaratively, if only the declarative syntax allowed it; but application logic requires more
complex functionality than can be specified declaratively.
* Security: You can use triggers to check value-based security constraints. When an operation is
performed on a sensitive table, the trigger fires to check that the operation is allowed for the user. For
example, you may only insert a row in a table if the department column contains the value of your
own department. In most systems, however, you cannot use triggers to restrict the data that is visible
to users. The only exception I came across is the Progress Database Server (release 7.3 and later), in
which you can define triggers to fire on select operations. This makes it possible to inspect or filter
the data a user is about to see. For example, you can restrict a user to only retrieve the details of
orders placed for his department.
* Audit trails: Triggers can insert records into an audit trail table to log all the operations on
sensitive tables. The problem with this approach is that most trigger actions are under transactional
control. When an operation is rolled back, all its triggered operations are also rolled back. The
triggers will therefore only record the effects of successful operations. When an unsuccessful
operation is rolled back, the audit trail entry of that operation will also be rolled back. The audit trail
will therefore not contain attempted threats at violating data integrity constraints or security
restrictions.
* Replication: Many DBMS vendors and consultants (including my corporation) have implemented
replicators using triggers as the recording mechanism. In essence, when the replicated tables change,
the triggers fire and record the changes in buffer tables. A replication server then propagates the
operations from the buffer tables to the various target databases. (It is, however, not quite as simple as
it sounds.) In this situation, the transactional control on the triggers is extremely useful, as you only
want to replicate successfully completed transactions.
72
* Updatable views: In Borland's InterBase, you can define triggers on views and tables. You can
then use the view triggers to propagate the actions performed on the view to the underlying base
table(s). You can use this extremely powerful feature to update theoretically non-updatable views.
For example, the triggers on a view that does not contain the keys of the underlying tables can query
the key values of the underlying tables and can then perform the necessary operations.
<trigger_body>
Some important points to note:
You can create only BEFORE and AFTER triggers for tables. (INSTEAD OF triggers are only
available for views; typically they are used to implement view updates.)
You may specify up to three triggering events using the keyword OR. Furthermore, UPDATE
can be optionally followed by the keyword OF and a list of attribute(s) in <table_name>. If
present, the OF clause defines the event to be only an update of the attribute(s) listed after OF.
Here are some examples:
... INSERT ON R ...
... INSERT OR DELETE OR UPDATE ON R ...
... UPDATE OF A, B OR INSERT ON R ...
If FOR EACH ROW option is specified, the trigger is row-level; otherwise, the trigger is
statement-level.
<trigger_body> is a PL/SQL block, rather than sequence of SQL statements. Oracle has
placed certain restrictions on what you can do in <trigger_body>, in order to avoid situations
where one trigger performs an action that triggers a second trigger, which then triggers a third,
and so on, which could potentially create an infinite loop. The restrictions on
<trigger_body> include:
o You cannot modify the same relation whose modification is the event triggering the
trigger.
o You cannot modify a relation connected to the triggering relation by another constraint
such as a foreign-key constraint.
73
Viewing Defined Triggers
Dropping Triggers
To drop a trigger:
drop trigger <trigger_name>;
Disabling Triggers
Module IV
Database normalization
- (also spelt database normalisation) relates to the level of redundancy in a database's structure. Well-
normalized databases have a schema that reflects the true dependencies between tracked quantities.
This means that updates can be quickly performed with little risk of data becoming inconsistent.
In the relational model, formal methods exist for quantifying "how normalized" a database is. These
classifications are called normal forms (or NFs), and there are algorithms for converting a given
database between them.
74
Any increase in normalization generally involves splitting existing tables into multiple ones, which
must be re-joined each time a query is issued. This can sometimes lead to poor performance, so
intentional denormalization is used for applications like on-line analytical processing
Normal Forms
One can only ascribe a normal form to a database if relationships between quantities have been
rigorously defined. It is possible to use set theory to express this knowledge once a problem domain
has been fully understood. Such formality is usually sidestepped by database designers, who instead
model the relationships in terms of an "idealized schema". (The mathematical underpinnings come
back into play in proofs regarding the process of transforming from one form to another.)
Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others
that are generally accepted, but 3NF is sufficient for many practical applications, as most relations
when reaching 3NF are also in BCNF, 4NF and 5NF, and 6NF only applies to temporal databases.
Normalizing beyond that point can be tricky in SQL technology, nevertheless a non-fully normalised
database may be vulnerable to update anomalies — in other words, data corruption. Thus, full
normalisation, even when not fully implemented in the target technology, is a good exercise to help
discover all potential internal database consistency problems.
First normal form (or 1NF) requires that all column values in a table are atomic (e.g., a number is an
atomic value, while a list or a set is not). This basic format eliminates repeating groups and attributes
by putting each into a separate table, then connecting them with a primary key-foreign key
relationship.
Consider a relation for capturing an order as follows:
ORDER_NUMBER (PRIMARY_KEY)
CUSTOMER_NAME
CUSTOMER_ADDRESS
ITEM_1_NAME
ITEM_1_PRICE
ITEM_1_QUANTITY
ITEM_2_NAME
ITEM_2_PRICE
ITEM_2_QUANTITY
…
ITEM_N_NAME
ITEM_N_PRICE
ITEM_N_QUANTITY
The attributes for holding information about each Item on the Order are repeated for the number of
different Items ordered. These attributes should instead be placed on a separate relation called
ORDER_ITEM containing the following attributes
ITEM_NAME (PRIMARY KEY)
ORDER_NUMBER (FOREIGN KEY)
ITEM_PRICE
ITEM_QUANTITY
An ORDER relation can then reference many ORDER_ITEMs.
Third normal form (or 3NF) requires that there are no non-trivial functional dependencies of non-key
attributes on something other than a superset of a candidate key. A relation is in 3NF if none of the
non-Primary Key attributes are a fact about any other non-Primary Key attribute. Another way of
saying this is that all non-key attributes are mutually independent (i.e. there should not be transitive
dependencies).
Consider a relation that defines a Part as having the following attributes.
PART_NUMBER (PK)
MANUFACTURER_NAME
MANUFACTURER_ADDRESS
In this case, the manufacturer address does not belong on this relation, because it is a fact about the
Manufacturer of the Part, rather than the Part itself. Manufacturer should therefore be made into a
separate relation with the attributes
MANUFACTURER_NAME (PRIMARY_KEY)
MANUFACTURER_ADDRESS
and the Part relation should be redefined as
PART_NUMBER (PRIMARY_KEY)
MANUFACTURER_NAME
The problem with a table not being in 3NF is that for every MANUFACTURER_NAME we have to
maintain a MANUFACTURER_ADDRESS which is redundant.
Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of
attributes on something else than a superset of a candidate key. At this stage, all attributes are
dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A).
A relation is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-
76
irreducible functional dependency has a candidate key as its determinant. In more informal terms, a
relation is in BCNF if it is in 3NF and the only determinants are the candidate keys.
Fourth normal form (or 4NF) requires that there are no non-trivial multi-valued dependencies of
attribute sets on something else than a superset of a candidate key. A relation is said to be in 4NF if
and only if it is in the BCNF and multi-valued dependencies are functional dependencies. The 4NF
removes unwanted data structures: multi-valued dependencies.
Consider a case where an Employee relation has multiple job categories and multiple locations where
they work. It might be tempting to create a relation as follows to capture this information:
EMPLOYEE_ID
JOB_CATEGORY_ID
LOCATION_ID
The problem with this relation is that we might have to enter Employee's Job Category more than
once if they fulfill the same Job Category at more than one location. Therefore this relation is not in
4NF.
There are actually 3 distinct many-to-many relationships here, one between Employee and Job
Category, one between Employee and Location, and one between Job Category and Location. So 3
relations should be created to capture these.
EMPLOYEE_JOB_CATEGORY relation:
EMPLOYEE_ID
JOB_CATEGORY_ID
EMPLOYEE_LOCATION relation:
EMPLOYEE_ID
LOCATION_ID
JOB_CATEGORY_LOCATION relation:
JOB_CATEGORY_ID
LOCATION_ID
Ronald Fagin demonstrated that it is always possible to achieve 4NF (but not always desirable).
Rissanen's theorem is also applicable on multi-valued dependencies.
Fifth normal form (or 5NF or PJ/NF) requires that there are no non-trivial join dependencies that do
not follow from the key constraints. A relation R is said to be in the 5NF if and only if it is in 4NF
and every join dependency in R is implied by the candidate keys.
Domain/key normal form (or DKNF) requires that each key uniquely identifies each row in a table. A
domain is the set of permissible values for an attribute. By enforcing key and domain restrictions, the
database is assured of being freed from modification anomalies.
While sometimes called the 6NF, the DKNF should not be considered together with the seven other
normal forms (1–6 and Boyce-Codd), because contrary to them it is not always achievable;
furthermore, relations in the real 6NF are not always in the DKNF.
77
Sixth normal form
The most recent one, the sixth normal form (actually seventh, by counting the Boyce-Codd one) was
only defined when extending the relational model to take into account the temporal dimension;
unfortunately, current SQL technology doesn't take into account this work, and the temporal
extensions to SQL are not relational.
NORMALISATION
A repeating group is a set of logically related fields or values that occur multiple times in one
record.
A Practical Approach
The sample tables below do not comply with first normal form. Look for fields that contain too
much data and repeating group of fields.
EMPLOYEES_PROJECTS_TIME
The example above is also related to another design issue, namely, that each field should hold
the smallest meaningful value and that there should not be multiple values in a single field.
There would be no way to sort by last names nor to know which allocation of time belonged to
which project.
EMPLOYEES_PROJECTS_TIME
78
A table with repeating groups of fields.
If an employee was assigned to a fourth project, you would have to add two new fields to the
table. Also, it would be very difficult to total the amount of time devoted to a particular project.
The design problems addressed are very common-particularly among new designers who are
accustomed to tracking data in a spreadsheet. Often, when building a spreadsheet, we arrange
the data horizontally, laying it out across the spreadsheet. When designing tables, we have to
think more vertically. Similar data belongs in the same column or field with a single value in
each row.
Now we will take the table you saw above and redesign it so it will comply with first normal
form.
EMPLOYEES_PROJECTS_TIME
Look at the repeating groups of data. Identify tables and fields that will hold this data
without the repeating groups. Think vertically and remember that similar data belongs in the
same field.
Enter the sample data from the table to make sure you don't have repeating groups. If
necessary, include foreign key field(s) to connect the tables.
79
EMPLOYEES
PROJECTS_EMPLOYEES_TIME
Mark the primary key field(s) and foreign keys in each table. Shown below with *
indicating the Primary Key.
EMPLOYEES
PROJECTS_EMPLOYEES_TIME
80
If an employee was assigned to an additional project, it would involve merely adding a new
record. Also, it would be much easier to search for a particular project number as they are all
held in a single column.
Before we go any further, there's a new concept you need to be aware of and that's functional
dependency. A functional dependency is a relationship between fields so that the value in Field A
determines the value in Field B, and there can be only one value in Field B. In that case, Field B
is functionally dependent on Field A. Consider the following sample table:
Each airport name is unique and each airport can be in only one city. Therefore, City is
functionally dependent on Airport. The value in the Airport field determines what the value will
be in the City field (making Airport the determinant field) and there can be only one value in the
City field. This does not need to work in the reverse. As shown in the table, a city can have more
than one airport, so Airport is not functionally dependent on City; the value in City does not
necessarily determine what the value in Airport will
81
o is a trivial functional dependency (i.e. ).
o is a superkey for schema R.
2. A database design is in BCNF if each member of the set of relation schemas is in BCNF.
3. Let's assess our example banking design:
4. Customer-schema = (cname, street, ccity)
5.
6. cname street ccity
7.
8. Branch-schema = (bname, assets, bcity)
9.
10. bname assets bcity
11.
12. Loan-info-schema = (bname, cname, loan#, amount)
13.
14. loan# amount bname
15.
To see whether these schemas are in BCNF, we need to know what functional dependencies
apply to them.
We also no longer have the repetition of information problem. Branch name and loan amount
information are not repeated for each customer in this design.
MODULE 5
Distributed Database
A logically interrelated collection of shared data (and a description of this data), physically
distributed over a computer network.
Distributed DBMS
Software system that permits the management of the distributed database and makes the
distribution transparent to users.
Concepts
83
Distributed Processing
Advantages of DDBMSs
1)Organisational Structure
3)Improved Availability
4)Improved Reliability
5)Improved Performance
6)Economics
7)Modular Growth
Disadvantages of DDBMSs
84
1)Complexity
2)Cost
3)Security
5)Lack of Standards
6)Lack of Experience
Types of DDBMS
1)Homogeneous DDBMS
2)Heterogeneous DDBMS
Homogeneous DDBMS
Heterogeneous DDBMS
1)Sites may run different DBMS products, with possibly different underlying data models.
2)Occurs when sites have implemented their own databases and integration is considered later.
– Different hardware.
– Different DBMS products.
Functions of a DDBMS
85
– Extended Data Dictionary.
– Fragmentation.
– Allocation
– Replication
86
Fragmentation
Relation may be divided into a number of sub-relations, which are then distributed.
Types of Fragmentation:
o Horizontal – break up relation into subsets of the tuples in that relation, based on a
restriction on one or more attributes. E.g. – we could break up a table with student info
into one subset for undergrads and one subset for postgrads.
o Mixed – fragments the data multiple times, in different ways. We could do our
postgrad/undergrad split and then our grades/course split to each of the fragments
Allocation
Replication
Transparencies in a DDBMS
Transparency
• Data independence
– logical data independence: user applications are not
affected by changes in the logical structure of the DB
(schema definition)
– physical data independence: hiding the details of the
storage structure
– the only form that is important in centralized DBMSs as
well
• Network (distribution) transparency
– location transparency: an operation on data is independent
of both the location and the system where it is executed
– naming transparency: unique name is provided for each
object in the DB.
.Replication transparency
– Data is replicated for reliability and performance
considerations.
– The user should not be aware of the existence of copies
• Fragmentation transparency
87
– DB relations are divided into smaller fragments for
performance and availability reasons
– The global queries should be translated to fragment queries
– A question of query processing.
88