Nothing Special   »   [go: up one dir, main page]

Normalization

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 32

Normalization

Database Management Systems, 3rd ed.,


Ramakrishnan and Gehrke, Chapter 19

Motivation
Update anomalies unexpected side effects as result of insertion, deletion, or modification of data
Insertion anomaly not possible to store certain information unless some other, unrelated, information is stored Deletion anomaly - not possible to delete certain information w/o losing some other, unrelated, information Modification anomaly inconsistency due to not all copies of data item being updated
2

Example
Advising (FacSSN, FacName, Office, FacDept, StuSSN, StuName, Address, GPA)

What insertion anomalies may occur? What deletion anomalies may occur?

What modification anomalies may occur?

Normalization
Process that decomposes (poorly designed) relations into simpler (well designed) relations in a way that:
minimizes redundancy ensures that a relations attributes are properly dependent on (or determined by) the PK

Functional Dependence
For given relation, attribute X functionally determines attribute Y (or attribute Y is functionally dependent on attribute X) if each value of X has associated with it exactly one value of Y at any time. XY I.e., for every pair of tuples t1 and t2 in the relation, if t1.X = t2.X then t1.Y = t2.Y.

Candidate Key
Let K represent an attribute or set of attributes of relation R; let T represent a tuple in R. K is a candidate key of R if R.K R.T, and
no attribute in K can be removed without violating the first condition. A superkey is an attribute or set of attributes in R that satisfies only the first condition.
6

Closure of Set of FDs


Armstrongs Axioms:
Reflexivity: If X Y, then X Y Augmentation: If X Y, then XZ YZ for any Z Transitivity: If X Y and Y Z, then X Z These axioms are both sound and complete.

Closure of Set of FDs (continued)


Additional rules that are convenient: Union: If X Y and X Z, then X YZ Decomposition: If X YZ, then X Y and X Z

Example
Given: Contracts (C, S, J, D, P, Q, V) F = {C (C, S, J, D, P, Q, V), (J, P) C, (S, D) P} Show: (S, D, J) (C, S, J, D, P, Q, V) What does this mean must be true of (S, D, J)?
9

Example (continued)
1. C (C, S, J, D, P, Q,V) 2. (J, P) C 3. (S, D) P 4. (J, P) (C, S, J, D, P, Q, V) 5. (S, D, J) (J, P) 6. (S, D, J) (C, S, J, D, P, Q, V) given given given transitivity: 1 & 2 augmentation on 3 transitivity: 5 & 4

10

Trivial Functional Dependencies


A trivial FD is an FD in which the right side contains only attributes that also appear on the left side. X Y where Y X Such FDs always hold due to reflexivity. Examples from the Contracts relation: SS (S, J, D) (S, D) (P, Q, V) P (P, Q, V) (Q, V)

11

Attribute Closure
To determine if a given FD such as X Y is in the closure set F+, determine attribute closure X+ with respect to (wrt) F. X+ is set of attributes A such that X A can be inferred using Armstrongs Axioms.

12

Attribute Closure Algorithm


To compute X+ wrt F for some attribute set X:

closure = X;

repeat until there is no change: { if there is an FD U V in F such that U closure, then set closure = closure V }

Figure 19.4
13

First Normal Form


A relation is in 1NF if all of its attribute values are atomic. 1NF disallows nested relations. Example of relation not in 1NF: EmpProj {SSN, EName, {PNumber, Hours}}

14

Second Normal Form


Let X and Y be sets of attributes of relation R where X consists of more than one attribute. Y is fully functionally dependent on X if (a) Y is functionally dependent on X, and (b) Y is not functionally dependent on some proper subset of X.
A relation is in 2NF if it is in 1NF and all of its nonkey attributes are fully functionally dependent on every candidate key.
15

Third Normal Form


Let X, Y, and Z be sets of attributes of relation R. Z is transitively dependent on X if
X Y, Y Z, and Y -/-> X A relation is in 3NF if it is in 2NF and there are no transitive dependencies.

16

Motivation for BCNF


Relation representing land for sale in various counties: Lots (PropertyID, CountyName, LotNum, Area, Price, TaxRate) CKs: PropertyID and (CountyName, LotNum)
Suppose: tax rate is fixed for given county; price of lot (for tax purposes) is determined by area regardless of county. What are the FDs?
Fundamentals of Database Systems by Elmasri and Navathe
17

Motivation (continued)
Lots is not in 2NF because TaxRate is partially dependent on (CountyName, LotNum).
Decomposition of Lots: Lots1 (PropertyID, CountyName, LotNum, Area, Price) Lots2 (CountyName, TaxRate) Lots1 and Lots2 are in 2NF. But Lots1 is not in 3NF. (Why not?)

18

Motivation (continued)
Decomposition of Lots1: Lots1A (PropertyID, CountyName, LotNum, Area) Lots1B (Area, Price)
Lots1A, Lots1B, and Lots2 are in 3NF.

19

Motivation (continued)
Now suppose we will store information about only 2 counties, and that the lot sizes will only be as follows: Marion: 0.5, 0.6, , 1.0 acres Liberty: 1.1, 1.2, , 2.0 acres
What additional FD is now added to F? Although Lots1A is in 3NF, there is a lot of unncessary redundancy. A stronger normal form is needed to suggest to us the need to decompose Lots1A.
20

Boyce-Codd Normal Form


Let R be a relation schema, F be the set of FDs given to hold over R, X be a subset of the attributes of R, and A be an attribute of R. R is in BCNF if, for every FD X A in F, one of the following statements is true:
A X, or X is a superkey.

21

Decomposition into BCNF


1. Suppose R is not in BCNF. Let X R, A be a single attribute in R, and X A be an FD that causes a violation of BCNF. Decompose R into R A and XA.
2. If either R A or XA is not in BCNF, decompose them further by a recursive application of this algorithm.

Database Management Systems, 3rd ed.,


Ramakrishnan and Gehrke, p. 623

22

Example of Multivalued Dependencies


Relation in BCNF: CourseTeacherBook (Course, Teacher, Book)
Sample tuples:
<Physics101, <Physics101, <Physics101, <Physics101, Green, Mechanics> Green, Optics> Brown, Mechanics> Brown, Optics> <Math301, Green, Mechanics> <Math301, Green, Vectors> <Math301, Green, Geometry>

Decomposition needed to eliminate redundancy:


CourseTeacher (Course, Teacher) CourseBook (Course, Book)

Adapted from Figure 19.13

23

Multivalued Dependencies
Let R be a relation schema and let X, Y, and Z be subsets of attributes of R. X multidetermines Y (X Y) iff the set of Y values that correspond to a given (X, Z)-pair value depends only on the value of X.
Examples from CourseTeacherBook: Course Teacher Course Book

24

Fourth Normal Form


Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of dependencies that includes both FDs and MVDs. R is in 4NF if, for every MVD X Y that holds over R, one of the following statements is true:
Y X or XY = R, or X is a superkey. I.e., for each MVD X Y, all attributes of R are functionally dependent on X.
25

Quick Introduction to Relational Algebra Operations


A projection extracts only those columns of interest. A natural join of two relations R1 and R2 constructs a new relation such that each tuple in the new relation is the concatenation of two tuples, one from R1 and one from R2, where the two tuples have the same value for two attributes defined over a common domain.

26

Lossless Decomposition
The decomposition process used in normalization is a lossless (or nonloss) decomposition.

I.e., the join of the projects is equal to the original relation.


O/w spurious data is introduced.

27

Fifth Normal Form


Let X, Y, , Z represent subsets of the set of attributes for some relation R. R satisfies the join dependency iff it equals the join of its projections on X, Y, , Z. A relation is in 5NF iff each of its join dependencies is a consequence of its candidate keys.

28

Example
Relation in 4NF: Employee (ID, Skill, Job)

Sample tuples: <100, Sk-1, Job-4> <100, Sk-14, Job-4> <100, Sk-7, Job-2>

<100, Sk-1, Job-2> <200, Sk-14, Job-4> <200, Sk-14, Job-5>

29

Example (continued)
Projections of Employee: R1 (ID, Skill) R2 (Skill, Job) R3 (ID, Job) Consider the tuples belonging to each projection.

Is the join of the projects equal to Employee?

30

Example (continued)
Join R1 and R2 over Skill. Spurious data introduced: <100, Sk-14, Job-5> Join resulting relation and R3 over (ID, Job).

Result? original Employee tuples

31

Example (continued)
Odd restriction on updating Employee. E.g., suppose Employee contains: <100, Sk-14, Job-4> <100, Sk-7, Job-2> If we wish to add: we must also add: <200, Sk-14, Job-2> <100, Sk-14, Job-2>

32

You might also like