Normalization
Normalization
Normalization
Motivation
Update anomalies unexpected side effects as result of insertion, deletion, or modification of data
Insertion anomaly not possible to store certain information unless some other, unrelated, information is stored Deletion anomaly - not possible to delete certain information w/o losing some other, unrelated, information Modification anomaly inconsistency due to not all copies of data item being updated
2
Example
Advising (FacSSN, FacName, Office, FacDept, StuSSN, StuName, Address, GPA)
What insertion anomalies may occur? What deletion anomalies may occur?
Normalization
Process that decomposes (poorly designed) relations into simpler (well designed) relations in a way that:
minimizes redundancy ensures that a relations attributes are properly dependent on (or determined by) the PK
Functional Dependence
For given relation, attribute X functionally determines attribute Y (or attribute Y is functionally dependent on attribute X) if each value of X has associated with it exactly one value of Y at any time. XY I.e., for every pair of tuples t1 and t2 in the relation, if t1.X = t2.X then t1.Y = t2.Y.
Candidate Key
Let K represent an attribute or set of attributes of relation R; let T represent a tuple in R. K is a candidate key of R if R.K R.T, and
no attribute in K can be removed without violating the first condition. A superkey is an attribute or set of attributes in R that satisfies only the first condition.
6
Example
Given: Contracts (C, S, J, D, P, Q, V) F = {C (C, S, J, D, P, Q, V), (J, P) C, (S, D) P} Show: (S, D, J) (C, S, J, D, P, Q, V) What does this mean must be true of (S, D, J)?
9
Example (continued)
1. C (C, S, J, D, P, Q,V) 2. (J, P) C 3. (S, D) P 4. (J, P) (C, S, J, D, P, Q, V) 5. (S, D, J) (J, P) 6. (S, D, J) (C, S, J, D, P, Q, V) given given given transitivity: 1 & 2 augmentation on 3 transitivity: 5 & 4
10
11
Attribute Closure
To determine if a given FD such as X Y is in the closure set F+, determine attribute closure X+ with respect to (wrt) F. X+ is set of attributes A such that X A can be inferred using Armstrongs Axioms.
12
closure = X;
repeat until there is no change: { if there is an FD U V in F such that U closure, then set closure = closure V }
Figure 19.4
13
14
16
Motivation (continued)
Lots is not in 2NF because TaxRate is partially dependent on (CountyName, LotNum).
Decomposition of Lots: Lots1 (PropertyID, CountyName, LotNum, Area, Price) Lots2 (CountyName, TaxRate) Lots1 and Lots2 are in 2NF. But Lots1 is not in 3NF. (Why not?)
18
Motivation (continued)
Decomposition of Lots1: Lots1A (PropertyID, CountyName, LotNum, Area) Lots1B (Area, Price)
Lots1A, Lots1B, and Lots2 are in 3NF.
19
Motivation (continued)
Now suppose we will store information about only 2 counties, and that the lot sizes will only be as follows: Marion: 0.5, 0.6, , 1.0 acres Liberty: 1.1, 1.2, , 2.0 acres
What additional FD is now added to F? Although Lots1A is in 3NF, there is a lot of unncessary redundancy. A stronger normal form is needed to suggest to us the need to decompose Lots1A.
20
21
22
23
Multivalued Dependencies
Let R be a relation schema and let X, Y, and Z be subsets of attributes of R. X multidetermines Y (X Y) iff the set of Y values that correspond to a given (X, Z)-pair value depends only on the value of X.
Examples from CourseTeacherBook: Course Teacher Course Book
24
26
Lossless Decomposition
The decomposition process used in normalization is a lossless (or nonloss) decomposition.
27
28
Example
Relation in 4NF: Employee (ID, Skill, Job)
Sample tuples: <100, Sk-1, Job-4> <100, Sk-14, Job-4> <100, Sk-7, Job-2>
29
Example (continued)
Projections of Employee: R1 (ID, Skill) R2 (Skill, Job) R3 (ID, Job) Consider the tuples belonging to each projection.
30
Example (continued)
Join R1 and R2 over Skill. Spurious data introduced: <100, Sk-14, Job-5> Join resulting relation and R3 over (ID, Job).
31
Example (continued)
Odd restriction on updating Employee. E.g., suppose Employee contains: <100, Sk-14, Job-4> <100, Sk-7, Job-2> If we wish to add: we must also add: <200, Sk-14, Job-2> <100, Sk-14, Job-2>
32