Study Material: Vivekananda College Thakurpukur
Study Material: Vivekananda College Thakurpukur
Study Material: Vivekananda College Thakurpukur
VIVEKANANDA COLLEGE
THAKURPUKUR
© Vivekananda College,
Thakurpukur
Subject:
Database Management System
(DBMS)
Topic:
Introduction to Normalization
Name of the Teacher:
Prof. Amitav Biswas
Dept. of Computer Science
© Vivekananda College,
Thakurpukur
Normalization:
It is a process for evaluating and correcting table structures to minimize data
redundancies, there by reducing the likelihood of data anomalies.
Normalization works through a series of stages called normal forms. The first
three stages are described as first normal form (1NF), second normal form (2NF)
and third normal form (3NF).
From a structural point of view, 2NF is better than 1NF and 3NF is better than
2NF.
Denormalization:
Produces a lower normal form, which is a 3NF will be converted to a 2NF
through denormalization. A successful design must also consider end-user
demand for fast performance. Therefore, you will occasionally be expected to
denormalize some portions of database design in order to meet performance
requirements.
The need for normalization
In following example:
1
We see in that example, the structure of data set does not conform to the
requirements of table nor does it handle data very well.
Consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be primary key or
at least a part of a PK, but it contains nulls.
2. The table entries invite data inconsistencies. For example the JOB_CLASS
value "Elect. Engineer" might be entered as "Elect. Eng."
3. The table displays data redundancies. Those data redundancies yield the
following anomalies:
a. Update anomalies. Modifying the JOB_CLASS for employee number 105
requires (potentially) many alterations, one for each EMP_NUM=105.
b. Insertion anomalies. Just to complete a row definition, a new employee
must be assigned to a project. If the employee is not assigned, a
phantom project must be created to complete the employee data
entry.
c. Deletion anomalies. Suppose that only one employee is associated with
a given project, if that employee leaves the company and the employee
data are deleted , the project information will also be deleted .to
prevent the loss of the project information ,a fictitious employee must
be created just to save the project information.
2
Each table represents a single subject. For example, a course Table will
contain only data that directly pertains to courses. Similarly, a student table
will contain only student data.
No data item will be unnecessarily stored in more than one table (in short,
tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are update in only one place.
All nonprime attributes in a table are dependent on the primary key. The
reason for this requirement is to ensure that the data are uniquely
identifiable by a primary key value.
Each table is void of insertion, update or deletion anomalies. This is to
ensure the integrity and consistency of the data.
Conversion to First Normal Form (1NF)
Step 1: Eliminate the Repeating Groups
Start by presenting the data in tabular format, where each cell has a single
value and there are no repeating groups. A repeating group derives its name from
the fact that a group of multiple entries of the same type can exist for any single
key attributes occurrence. To eliminate the repeating groups, eliminate nulls by
making sure each repeating group attribute contains an appropriate data value.
3
Step 2: Identify the primary key:
Even causal observers will not that PROJ-NUM is not an adequate primary key
because the project number does not uniquely identify all of the remaining entity
(row) attributes. To maintain a proper primary key that will uniquely identify any
attribute value, the new key must be compost of a combination of a PROJ_NUM
and EMP_NUM
Step 3: Identify All Dependencies:
The identification of the PK in Step 2 means that you have already identified the
following dependency:
Converting to 2NF is done only when the 1NF has a composite primary key. if the
1NF has a single attribute primary key, then the table is automatically in 2NF. The
1NF-to-2NF conversion is simple starting with:
Step 1: Write Each Key Component on a Separate Line
Write each key component on a separate line; then write the original (composite)
key on the last line.
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
Each component will become the key in a new table. In other words, the original
table is now divided in to three tables:
(PROJECT, EMPLOYEE, and ASSIGNMENT).
5
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
6
Identify the attributes that are dependent on each determinant identified in Step
1 and identify the dependency.
JOB_CLASS CHG_HOUR
Name the table to reflect its contents and function. In this case, JOB seems
appropriate.
Step 3: Remove the Dependent Attributes from Transitive Dependencies
Eliminate all dependent attributes in the transitive relationship(s) from each of
the tables that have such a transitive relationship.
EMP_NUM EMP_NAME, JOB_CLASS
Note that the JOB_CLASS remains in the EMPLOYEE table to save as FK.
After the 3NF conversion has been completed, your database contains four
tables:
7
References:
1. Database Systems Concepts by Abraham Silberschatz and Henry F. Korth
2. Introduction to Database Management Systems by Atul Kahate
3. Web Resources