Nothing Special   »   [go: up one dir, main page]

Introduction To RDBMS Day - 1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Introduction to RDBMS

Day - 1

What is Data?
Data (plural of the word Datum) - Data is a factual
information used as a basis for reasoning, discussion, or
calculation
Data may be numerical data which may be integers or
floating point numbers, and non-numerical data such as
characters, date and etc. Data by itself normally doesnt
have a meaning associated with it.
e.g:Krishnan
01-Jan-71
15-Jun-05
50000

Information
Related data is often called as information. Information will
always have a meaning and context attached to the data
element. Let us consider the same example that we gave
for data. When we add meaning and context to the data it
becomes information:

Employee Name: Krishnan


Date of Birth: 01-Jan-71
Data of Joining: 15-Jun-05
Salary: 50000
Department Number: 10

Database
A logically coherent collection of related data
(information) with inherent meaning, built for a certain
application, and representing a subset of the "real-world".
For e.g. a customer database in your bank, details of the
insurance policies that we hold etc.

Evolution of Databases

File Systems (1950s)


Basic Constructs
Sequential records
A record contains sequential fields
Relies on indexes for random access
ISAM (Index Sequential Access Method)
VSAM (Virtual Storage Access Method)
Basic operation
Open, close, or reset a file
Read, write or delete a record

Disadvantages of File processing


system
Data redundancy and inconsistency
Difficulty in accessing data
Data isolation
Integrity problems
Atomicity problem
Concurrent access anomalies
Security problems

Hierarchical Model (1960s)


Data are represented by collection of records and the
relationship among data are represented by links, which
can be viewed as pointers.
A record is a collection of fields (attributes), each of which
contains only one data value.
A link is an association between precisely two records.
Records are organized as collection of rooted trees.

Hierarchical Model
Again consider the database representing a customeraccount relationship in a banking system. There are two
record types, customer and account.
type customer = record
customer name: string;
customer street: string;
customer city: string;

end

The account record type can be defined as


type account = record
account number: string;
balance: integer;

end

Hierarchical Model

Network Model (1960s)


Data are represented by collection of records and the
relationship among data are represented by links, which
can be viewed as pointers.
Record is a collection of fields (attributes), each of which
contains only one data
Link is an association between precisely two records.
Records are organized as arbitrary graphs.

Network Model
consider a database representing a customer-account
relationship in a banking system. There are two record
types, customer and account.
type customer = record
customer name: string;
customer street: string;
customer city: string;

end

The account record type can be defined as


type account = record
account number: string;
balance: integer;

end

Network Model

Relational Model (1980s)


Data is presented as a collection of relations
Each relation is depicted as a table
Columns are attributes
Rows ("Tuples") represent entities
Every table has one or more set of attributes that taken
together as a "key" uniquely identifies each entity
Customer

Account

emp_no

name

street

city

emp_no

ac_no

balance

E1

Hayes

Main

harrison

E1

A-102

400

E2

Jhonson

Alma

Palo alto

E2

A-101

500

E3

Turner

Putnam

stamford

E2

A-201

900

E3

A-304

300

Relational Model
Relationship is established with the help of keys
emp_no

name

street

city

E1

Hayes

Main

harrison

E2

Jhonson

Alma

Palo alto

E3

Turner

Putnam

stamford

emp_no

ac_no

balance

E1

A-102

400

E2

A-101

500

E2

A-201

900

E3

A-304

300

Database Management System


(DBMS)
A database management system (DBMS) is software that
allows databases to be defined, constructed, and
manipulated.
A collection of programs that enables you to store, modify,
and extract information from a database.
The general purpose of a DBMS is to provide for the
definition, storage, and management of data in a
centralized area that can be shared by many users.

Purpose of a DBMS
DBMS developed to handle the following difficulties:

Data redundancy and inconsistency


Difficulty in accessing data
Data isolation - multiple files and formats
Integrity problems
Atomicity of updates
Concurrent access by multiple users
Security problems

Functionality of a DBMS
Specifying the database structure
data definition language

Manipulation of the database


query processing and query optimization

Integrity enforcement
integrity constraints

Concurrent control
multiple user environment

Crash recovery
Security and authorization

DBMS Approach

Data independence
capability of changing a database scheme without having
to change the scheme at the next higher level
two level of data independence

Data Abstraction
Physical Level describes how data are actually stored
Logical Level describes what data are stored
View Level
describes only a part of the database
useful for a particular user

View 1

View 2

View n

Logical level

Physical level

Database Languages
data definition language (DDL) :
specify the conceptual database scheme
data manipulation language (DML): query language
used to retrieve and update information in a database
host language:
a conventional high level language used to write
application programs

DBMS Structure

Relations

Relations

Relations

Codds Rule for RDBMS


Rule 1: The Information Rule
All data should be presented to the user in table form

Rule 2: Guaranteed Access Rule


All data should be accessible without ambiguity.
This can be accomplished through a combination
of the table name, primary key, and column name.

Rule 3: Systematic Treatment of Null Values


A field should be allowed to remain empty. This
involves the support of a null value, which is distinct
from an empty string or a number with a value of
zero. Of course, this can't apply to primary keys. In
addition, most database implementations support the
concept of a not null field constraint that prevents null
values in a specific table column.
Rule 4: Dynamic On-Line Catalog Based on the Relational Model
A relational database must provide access to its structure
through the same tools that are used to access the data.
This is usually accomplished by storing the structure
definition within special system tables.

Rule 5: Comprehensive Data Sub-language Rule


The database must support at least one clearly defined
language that includes functionality for data definition, data
manipulation, data integrity, and database transaction
control. All commercial relational databases use forms of
the standard SQL (Structured Query Language) as their
supported comprehensive language.
Rule 6: View Updating Rule
Data can be presented to the user in different logical
combinations, called views. Each view should support the
same full range of data manipulation that direct-access to a
table has available. In practice, providing update and delete
access to logical views is difficult and is not fully supported
by any current database.

Rule 7: High-level Insert, Update, and Delete


Data can be retrieved from a relational database in sets
constructed of data from multiple rows and/or multiple
tables. This rule states that insert, update, and delete
operations should be supported for any retrievable set
rather than just for a single row in a single table.
Rule 8: Physical Data Independence
The user is isolated from the physical method of storing and
retrieving information from the database. Changes can be
made to the underlying architecture ( hardware, disk
storage methods ) without affecting how the user accesses
it.

Rule 9: Logical Data Independence


How a user views data should not change when the logical
structure (tables structure) of the database changes. This
rule is particularly difficult to satisfy. Most databases rely on
strong ties between the user view of the data and the actual
structure of the underlying tables.
Rule 10: Integrity Independence
The database language (like SQL) should support
constraints on user input that maintain database integrity.
This rule is not fully implemented by most major vendors.
At a minimum, all databases do preserve two constraints
through SQL.
-- No component of a primary key can have a null value
-- If a foreign key is defined in one table, any value in it must exist as
a primary key in another table

Rule 11: Distribution Independence


A user should be totally unaware of whether or not the
database is distributed (whether parts of the database exist
in multiple locations). A variety of reasons make this rule
difficult to implement; I will spend time addressing these
reasons when we discuss distributed databases.
Rule 12: Non-subversion Rule
There should be no way to modify the database structure
other than through the multiple row database language (like
SQL). Most databases today support administrative tools
that allow some direct manipulation of the data Structure.

Functional dependency

Given a relation R, a set of attributes X in R is said to


functionally determine another attribute Y, also in R, (written
X Y) if and only if each X value is associated with at most
one Y value. Customarily we call X the determinant set and Y
the dependent attribute.

Properties of functional dependencies

Given that X, Y, and Z are sets of attributes in a relation R,


one can derive several properties of functional dependencies.
Among the most important are Armstrong's axioms
Reflexivity: If Y is a subset of X, then X Y
Augmentation: If X Y, then XZ YZ
Transitivity: If X Y and Y Z, then X Z

Closure of a set of Functional


Dependencies
Given a set F of functional dependencies, there are certain
other functional dependencies that are logically implied by
F.
The set of all functional dependencies logically implied by
F is the closure of F denoted by F+.
F+ can be found by applying Armstrongs Axioms:

Closure of a set of Functional


Dependencies

We can further simplify the computation of F+ by using


additional
rules:

Closure of a set of Functional


Dependencies(example)
R=(A, B, C, G, H, I)
F= { A ---- > B
A ---- > C
CG ---- > H
CG ---- > I
B ---- > H }
Some members of F+:
A ---- > H (transitivity A ---- > B, B ---- > H)
AG ---- > I (pseudotransitivity A ---- > C, CG ---- > I)
CG ---- > HI (union CG ---- > H, CG ---- > I)

NORMALIZATION

Normalization
A company obtains parts from a number of suppliers. Each supplier is
located in one city. A city can have more than one supplier located there
and each city has a status code associated with it. Each supplier may
provide many parts. The company creates a simple relational table to
store this information that can be expressed in relational notation as:
FIRST (s#, status, city, p#, qty)

S#

Status

City

P#

qty

S1

20

London

p1,p2,p3

100,200,50

S2

10

Paris

p1,p2

200,100

S3

10

Paris

p1

200

S4

20

London

p4,p5

100,50

First Normal Form (1 NF)


A relational table, by definition, is in first normal form if all values of
the columns are atomic. That is, they contain no repeating values

1 NF Anomalies
contains redundant data. For example, information about the
supplier's location and the location's status have to be
repeated for every part supplied.
Redundancy causes what are called update anomalies.

INSERT. The fact that a certain supplier (s5) is located in a


particular city (Athens) cannot be added until they supplied a part.
DELETE. If a row is deleted, then not only is the information
about quantity and part lost but also information about the
supplier.
UPDATE. If supplier s1 moved from London to New York, then six
rows would have to be updated with this new information.

Second Normal Form (2 NF)

A relational table is in second normal form 2NF if it is in 1NF and


every non-key column is fully dependent upon the primary key.

2 NF

functional dependencies in the First table:


s# > city, status
city > status
(s#,p#) >qty

2 NF
The process for transforming a 1NF table to 2NF is:
1. Identify any determinants other than the composite key, and the
columns they determine.
2. Create and name a new table for each determinant and the
unique columns it determines.
3. Move the determined columns from the original table to the new
table. The determinate becomes the primary key of the new
table.
4. Delete the columns you just moved from the original table except
for the determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic
meaning.

2 NF
To transform FIRST into 2NF we move the columns s#,
status, and city to a new table called SECOND. The column
s# becomes the primary key of this new table

2 NF Anomalies
Tables in 2NF but not in 3NF still contain modification
anomalies. In the example of SECOND, they are:

INSERT. The fact that a particular city has a certain status


(Rome has a status of 50) cannot be inserted until there is
a supplier in the city.

DELETE. Deleting any row in SUPPLIER destroys the


status information about the city as well as the association
between supplier and city.

Third Normal Form (3 NF)

A relational table is in third normal form (3NF) if it is already in 2NF


and every non-key column is non transitively dependent upon its
primary key. In other words, all nonkey attributes are functionally
dependent only upon the primary key.

3 NF

Table PARTS is already in 3NF. The non-key column, qty,


is fully dependent upon the primary key (s#, p#).

SUPPLIER is in 2NF but not in 3NF because it contains


the following transitive dependency.
s# > status
s# > city
city > status

3 NF
The process of transforming a table into 3NF is:
1. Identify any determinants, other the primary key, and the columns
they determine.
2. Create and name a new table for each determinant and the
unique columns it determines.
3. Move the determined columns from the original table to the new
table. The determinate becomes the primary key of the new table.
4. Delete the columns you just moved from the original table except
for the determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic
meaning.

3 NF
create a new table called CITY_STATUS and move the columns city
and status into it. Status is deleted from the original table, city is left
behind to serve as a foreign key to CITY_STATUS, and the original
table is renamed to SUPPLIER_CITY to reflect its semantic meaning

3 NF
The results of putting the original table into 3NF has
created three tables. These can be represented in
"psuedo-SQL" as:
PARTS (#s, p#, qty)
Primary Key (s#,#p)
Foreign Key (s#) references SUPPLIER_CITY.s#
SUPPLIER_CITY(s#, city)
Primary Key (s#)
Foreign Key (city) references CITY_STATUS.city
CITY_STATUS (city, status)
Primary Key (city)

Advantages of Third Normal Form


The advantage of having relational tables in 3NF is that it
eliminates redundant data which in turn saves space and
reduces manipulation anomalies. For example, the
improvements to our sample database are:

INSERT. Facts about the status of a city, Rome has a status of 50,
can be added even though there is not supplier in that city.
Likewise, facts about new suppliers can be added even though
they have not yet supplied parts.

DELETE. Information about parts supplied can be deleted without


destroying information about a supplier or a city. UPDATE.
Changing the location of a supplier or the status of a city requires
modifying only one row.

Boyce/Codd Normal Form

Determinant: an attribute or a group of attributes on which


some other attribute is fully functionally dependent.

Boyce/Codd Normal Form: a relation is in BCNF if and


only if every determinant is a candidate key.

www.igate.com

You might also like