Semantics of The Relation Attributes: Each Tuple in A Relation Should Represent One Entity or Relationship Instance
Semantics of The Relation Attributes: Each Tuple in A Relation Should Represent One Entity or Relationship Instance
Semantics of The Relation Attributes: Each Tuple in A Relation Should Represent One Entity or Relationship Instance
Attributes
Each tuple in a relation should represent one entity
or relationship instance
Redundant Information in
Tuples and Update Anomalies
Mixing attributes of multiple entities may cause
problems
EXAMPLE OF AN UPDATE
ANOMALY
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
Update Anomaly
Insert Anomaly
Cannot insert a project unless an employee is assigned to .
Inversely- Cannot insert an employee unless he/she is assigned to
a project.
EXAMPLE OF AN UPDATE
ANOMALY (2)
Delete Anomaly
Spurious Tuples
Bad designs for a relational database may result in
erroneous results for certain JOIN operations
The "lossless join" property is used to guarantee
meaningful results for join operations
The relations should be designed to satisfy the
lossless join condition. No spurious tuples should
be generated by doing a natural-join of any
relations
Functional Dependencies
Functional dependencies (FDs) are used to
specify formal measures of the "goodness"
of relational designs
FDs and keys are used to define normal
forms for relations
FDs are constraints that are derived from
the meaning and interrelationships of the
data attributes
Examples of FD constraints
Social Security Number determines employee name
SSN ENAME
Union
If X Y and X Z, then X YZ
Psuedotransitivity
If X Y and WY Z, then WX Z
Introduction to
Normalization
Normalization: Process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
Insertion anomaly means that that some data can not be inserted in the
database. For example we can not add a new course to the database of example1,unless we insert a student who has taken that course.
Update anomaly means we have data redundancy in the database and to make
any modification we have to change all copies of the redundant data or else the
database will contain incorrect data. For example in our database we have the
Course description "Database Concepts" for IS380 appears in both St-100Course-taken and St-200-Course-taken tables. To change its description to
"New Database Concepts" we have to change it in all places. Indeed one of the
purposes of normalization is to eliminate data redundancy in the database.
Deletion anomaly means deleting some data cause other information to be lost.
For example if student Russell is deleted from St-100-Course-taken table we
also lose the information that we had a course call IS417 with description
System Analysis.
Thus Student-courses table suffers from all the three anomalies.
Examples
Second Normal Form
{SSN, PNUMBER} HOURS is a full FD since neither
SSN HOURS nor PNUMBER HOURS hold
{SSN, PNUMBER} ENAME is not a full FD (it is
called a partial dependency ) since SSN ENAME also
holds
A relation schema R is in second normal form (2NF) if
every non-prime attribute A in R is fully functionally
dependent on the primary key
R can be decomposed into 2NF relations via the process
of 2NF normalization
BCNF
{Student,course} Instructor
Instructor Course
Decomposing into 2 schemas
{Student,Instructor} {Student,Course}
{Course,Instructor} {Student,Course}
{Course,Instructor} {Instructor,Student}
Example
Given the relation
Book(Book_title, Authorname, Book_type,
Listprice, Author_affil, Publisher)
The FDs are
Book_title Publisher, Book_type
Book_type Listprice
Authorname Author_affil