Nothing Special   »   [go: up one dir, main page]

To Databases: Data Bases

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Data Bases

CHAPTER

1 INTRODUCTION
TO DATABASES

LEARNING OBJECTIVES

 To know about history and evolution of Database


 To understand Database Systems and Database Management System (DBMS)
 To know about Data Modelling
 To learn Structured Query Language (SQL)
 To introduce basic concept of MS Access

1.1 HISTORY OF DATABASE


The human race from its early evolution has been striving to record information it acquired
through its experience, in some form or another. Take for example, the stone carvings, palm
leaves etc., of the ancient humanity. They try to convey some sort or information about the
humanity, the living styles and also the knowledge they acquired in the process. The problem
for mankind was first to identify ways of describing the information about specific properties of
natural or man-made systems, and then to actually measure parameters associated with these
properties.
Record keeping systems had been employed for thousands and thousands of years in the form
of notations on clay tablets, palm leaves, rock, timber, bone, ceremony, dance, music, poetry
and a variety of other media. As time passed on, humanity invented the technology of writing,
to store all new information. Thus, data was stored in voluminous storage in the form of books
and the collections of books in libraries were perhaps the first real "databases."
When mankind moved into the infancy years of computers, electromagnetic media was used to
information storage. The organisations recorded information and data of essentially three different
kinds: scientific, administrative and business. A select section of people were involved in the
processing, storage and retrieval of information.
But industrialisation produced a major shift in the life style of people. They required and got
more information needed for their sustenance. And the industrialisation also paved way for the
advent of Information Technology. The growth of industrialisation and Information Technology
gave rise to several large-scale organisations with large administrative structures which in turn
had to handle large, voluminous data. A lot of problems arose with the gathering, organising,
retrieval, storage and security of data. Hence, there was need for organisation of data and its
management in an effective and efficient manner.

2 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
1.2 THE EVOLUTION OF THE DATABASE
The conversion from a manual file system to a matching computer file system could be technically
was done initially by Data Processing Specialists. They created the necessary computer file
structures (a file structure is the description of file layouts and location of a file or group of files),
wrote the software that managed the data within those structures, and designed the application
programs that produced reports based on the file data. Thus, numerous "organisation evolved"
computerized file systems were born.
The first computer applications focused on clerical tasks: order/entry processing, payroll, work
scheduling, and so on. Such applications accessed data stored in computer files. Requests for
more complex information quickly followed (how many items were sold, by whom, and to
whom?); reports were generated to transform stored data into information useful for management
decisions.
Data was stored on the disk as files were programmers defined both logical data structure and
physical structure, such as storage structure, access methods and I/O modes by specific program
codes. Data are stored in files with interface between programs and files. The mapping between
logical files and physical file were done based on a programme code where, one file corresponds
to one or several programs.
As time went on, additional programs were written to produce new reports. As the number of
files increased, a small file system evolved. Each of the files in the system used its own application
programs to store, retrieve, and modify data. And each file was owned by the individual or the
department that commissioned its creation.
Extensive programming in a third-generation language (3GL) was required, even for the simplest
data-retrieval task. The programmer had to specify both what needed to be done and how it
was to be done. Examples of 3GLs include Common Business-Oriented Language (COBOL),
Beginner's All-purpose Symbolic Instruction Code (BASIC), and FORmula TRANslation
(FORTRAN). Programming in a 3GL was a time-consuming, high-skill activity.
In the late 1980s and the 1990s, advances have been made in many areas of database systems.
Considerable research has been carried out into more powerful query languages and richer
data models, and there has been a big emphasis on supporting complex analysis of data from all
parts of an enterprise. Several vendors (e.g., IBM's DB2, Oracle 8, Informix UDS) have extended
their systems with the ability to store new data types such as images and text, and with the
ability to ask more complex queries. Specialized systems have been developed by numerous
vendors for creating data warehouses, consolidating data from several databases, and for carrying
out specialized analysis.
Most significantly, DBMSs have entered the Internet Age. While the earlier generation of Web
sites stored their data exclusively in operating systems files, the use of a DBMS to store data that
is accessed through a Web browser is the order of the day. Queries are generated through Web-
accessible forms and answers are formatted using a markup language such as HTML, in order
to be easily displayed in a browser.
1.3 SIMPLE FILE BASED SYSTEMS
In a simple file based system, the data is stored as a collection of operating system files. The
programmer is expected to be familiar with the physical file structure, that is, how and where
the files were stored in the computer. Therefore, every file reference in a program required the
programmer to define access paths to the data. Such access paths used complex coding to
establish the precise location of various file and system components and their data characteristics.

INFORMATION TECHNOLOGY TRAINING 3

© The Institute of Chartered Accountants of India


Data Bases
As file systems became more complex, the access paths became difficult to manage and tend to
produce system malfunctions.
Even a simple file system of only 20 files in five departments, requires 5 * .20 = 100 file management
programs. If each of the files was accessed by 10 different reporting programs, an additional 20
* 10 = 200 programs had to be written. Because ad hoc queries were not possible, the file reporting
programs multiplied quickly. And, because each department in the organisation owned its data
by creating its own files, the number of files multiplied rapidly.
The addition or deletion of a field required the modification of all programs using that file. Such
modifications were required because the file system exhibited structural dependence; that is,
access to a file was dependent on its structure.
The File System also had the problem of "Field Definition and Naming Convention" as different
departments used different names for a single field of the file structure. For example, the Customer
field in the Sales Department's Sales File was named "SALES_CUST" and the customer field in
the Accounts Department's Accounts File was named "ACCT_CUST". Besides the above, "Data
Redundancy", that is, the same data being stored in many different locations, was also a problem
in File System of data storage.
The drawbacks of a file system include the following:
 The data is to be stored in a storage device such as a disk or tape and bring relevant parts
into main memory for processing as needed.
 Special programming methods or instructions are required to identify data items and to
answer each question that users may want to ask about the data.
 These programs are likely to be complex because of the large volume of data to be searched.
 The need to protect the data from inconsistent changes made by different users accessing
the data concurrently.
The significant differences between a file processing system and a DBMS are:
 Data Independence - Data independence is the ability to change the representation of
data at one level of the system without having to change the representation at the next
level. In file processing systems the data and applications are generally interdependent, but
DBMS provides the feature of data independence.
 Data Redundancy - Data redundancy means unnecessary duplication of data. In file
processing systems there is redundancy of data, but in DBMS we can reduce data
redundancy by means of normalization (section 1.9) process without affecting the original
data. If we do so in file processing system, it becomes too complex.
1.4 WHY DATABASE?
In today's market existence, the success of an organization depends on its ability to acquire
accurate and timely data about its operations, to manage this data effectively, and to use it to
analyze and guide its activities.
Phrases such as the information superhighway have become ubiquitous, and information
processing is a rapidly growing multibillion dollar industry. The amount of information available
to us is literally exploding, and the value of data as an organizational asset is widely recognized.
Yet without the ability to manage this vast amount of data, and to quickly find the information
that is relevant to a given question, as the amount of information increases, it tends to become a
distraction and a liability, rather than an asset. This paradox drives the need for increasingly

4 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
powerful and flexible data management systems. To get the most out of their large and complex
datasets, users must have tools that simplify the tasks of managing the data and extracting
useful information in a timely fashion. Otherwise, data can become a liability, with the cost of
acquiring it and managing it far exceeding the value that is derived from it.
A Database is a collection of data, typically describing the activities of one or more related
organizations. For example, a university database might contain information about the following:
 Entities such as students, faculty, courses, and classrooms.
 Relationships between entities, such as students' enrolment in courses, faculty teaching
courses, and the use of rooms for courses.
A Database Management System, or DBMS, is software designed to assist in maintaining and
utilizing large collections of data, and the need for such systems, as well as their use, is growing
rapidly. By storing data in a DBMS, rather than as a collection of operating system files, we can
use the DBMS's features to manage the data in a robust and efficient manner. As the volume of
data and the number of users grow with hundreds of gigabytes of data and thousands of users
are common in current corporate databases DBMS support becomes indispensable.
1.5 DATABASE MANAGEMENT SYSTEM [DBMS]
A DBMS is a computer-based system to manage a database, or a collection of databases or files.
It is a general-purpose software system that facilitates the process of defining, constructing,
manipulating and sharing databases among various users and applications.
 Defining is a process to specify the data types, structures and the constraints for the data
stored.
 Constructing is the process of physical storing the data on a storage medium and the
respective mapping to the DBMS software.
 Manipulating a DBMS includes functions as querying, updating and generating reports
from the data.
 Sharing the data in a DBMS allows multiple users and programs to access the data stored
into it concurrently.
"When an employee joins an organisation, relevant details about the employee have to be recorded in
files. When an employee leaves the organisation, the relevant record has to be deleted. Modifications
need to be made to the employee records as and when necessary. For example, a change in address may
have to be incorporated. The file needs to be accessed for information about employees. If the organisation
is planning to start a new Branch, a new file has to be created. All these tasks are collectively known as
Database Management"
Database Management System performs the above tasks of maintaining databases so that the
information is readily available. DBMS is used to build and manage the database like, adding
new records (appending), editing (changing/modifying the existing data), deleting (removing
unwanted data), sorting (arranging in an order [ascending/descending]), retrieving the data
based on queries, updating and manipulating data items. Viewing as well as generating Reports
using the stored data and performing calculations on it are made possible through the use of
DBMS. In a sense, a database resembles a very well-organised electronic filing cabinet in which
powerful software, known as the database management system (DBMS), helps manage the
cabinet's contents.
The DBMS also contains a query language which makes it possible to produce answers to queries.

INFORMATION TECHNOLOGY TRAINING 5

© The Institute of Chartered Accountants of India


Data Bases
For example, end users, when dealing with large purchase data, may require answers to questions
like:
 What was the rupee volume of purchases product-wise during the past six months?
 How many of our suppliers have credit balances of Rs.1,00,000 or more?
The DBMS serves as the intermediary between the user and the repository (stored data) by
translating user requests into the complex code required to fulfil those requests. The DBMS
hides the database's internal complexity from the application programs that use the data base.
The DBMS which has become the data management standard in the present situation, helps to
create an environment in which end users have better access to more and better-managed data
than they did before. Such access makes it possible for end users to respond quickly to changes
in their environment. The availability of data, combined with the tools that transform data into
usable information, empowers the end users to make quick and informed decisions that can
make the difference between success and failure in the global economy.
Wider access to well-managed data promotes an integrated view of the organization's operations.
Thus it becomes much easier to see how actions in one segment of the company affect other
segments. The DBMS environment in Fig 1.5.1 illustrates the interaction between the End User,
DBMS software and the Database content.

Fig.1.5.1: The DBMS Environment.

6 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
1.5.1 Advantages of a DBMS
Using a DBMS to manage data has many advantages:
 Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of the
data to insulate application code from such details.
 Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
 Data integrity and security: If data is always accessed through the DBMS, the DBMS can
enforce integrity constraints on the data. For example, before inserting salary information
for an employee, the DBMS can check that the department budget is not exceeded. Also,
the DBMS can enforce access controls that govern what data is visible to different classes of
users.
 Data administration: When several users share the data, centralizing the administration
of data can offer significant improvements. Experienced professionals, who understand
the nature of the data being managed, and how different groups of users use it, can be
responsible for organizing the data representation to minimize redundancy and for fine-
tuning the storage of the data to make retrieval efficient.
 Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the
data in such a manner that users can think of the data as being accessed by only one user
at a time. Further, the DBMS protects users from the effects of system failures.
 Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This,
in conjunction with the high-level interface to the data, facilitates quick development of
applications. Such applications are also likely to be more robust than applications developed
from scratch because many important tasks are handled by the DBMS instead of being
implemented by the application.
A DBMS is a complex piece of software, optimized for certain kinds of workloads (e.g., answering
complex queries or handling many concurrent requests), and its performance may not be
adequate for certain specialized applications. Examples include applications with tight real-
time constraints or applications with just a few well-defined critical operations for which efficient
custom code must be written. Another reason for not using a DBMS is that an application may
need to manipulate the data in ways not supported by the query language. In such a situation,
the abstract view of the data presented by the DBMS does not match the application's needs,
and actually gets in the way. As an example, relational databases do not support flexible analysis
of text data (although vendors are now extending their products in this direction). If specialized
performance or data manipulation requirements are central to an application, the application
may choose not to use a DBMS, especially if the added benefits of a DBMS (e.g., flexible querying,
security, concurrent access, and crash recovery) are not required. In most situations calling for
large-scale data management, however, DBMSs have become an indispensable tool.
1.5.2 Parts of a DBMS System
A database system comprises of five major parts, namely,
1) Data, 2) Hardware, 3) Software, 4) People, and 5) Procedures.
1) Data: Data are raw facts (raw means, the facts that have not yet been processed to reveal
their meaning). The word "data" covers the collection of facts stored in the database. Because

INFORMATION TECHNOLOGY TRAINING 7

© The Institute of Chartered Accountants of India


Data Bases
data are the raw material from which information is generated, the determination of which
data are to be entered into the database and how such data are to be organized is a vital part of
the database designer's job.
Processed Data is called Information
The American National Standards Institute (ANSI) offers a dual definition for data:
(i) A representation of facts, concepts, or instructions in a formalized manner suitable for
communication, interpretation, or processing by humans or by automatic means.
(ii) Any representation such as characters or analog quantities to which meaning is or might be
assigned. Generally, we perform operations on data or data items to supply some information
about an entity.
Data consists of symbols written or stored on some recording medium. The symbols represent
certain things, ideas, or values, which convey information in particular context. In terms of
structure, data consists of value of attributes of entities.
A database is a collection of related and ordered information, organized in such a way that
information can be accessed quickly and easily. Hence, an organized, logical group of related
files would constitute a database. In a database, data are integrated and related so that software
programs provide access to all the data.
A database can be of any size of varying complexity. It may be generated and maintained
manually or by machine. A database can consist of one or more tables of information that are
related in some way, for
instance, you could track all the information about the students in a college in a student's table.
Example:-
A University has hundreds of students enrolled in it. It stores relevant information about the
students and courses in corresponding files. For example, there could be a file for each semester
within a department. If there are 10 departments and each of them has courses lasting 3 semesters,
the University will have 30 files to maintain. All these files together constitute the University
Database.
Example of a simple Database:

Name Reg. No Dept Course Year_of_Study


Kala K. 2002005 Phy. Sci B.Sc I
Ram S 2002007 Bio. Sci M.Sc II
Peter L 2002009 Maths B.Sc III
Priya A 2002010 Tamil B.A II
Databases play an important role in business, engineering, medicine, law, education, library
science etc. Database constitutes the most important component of any computer based
information systems.
The main difference between modern databases and their old paper-based ancestors lies in the
ability to retrieve data. In a modern system, for example, finding all the books published in the
year 2000 are as simple as typing a question and asking the computer to reply. Amazingly, you
will get your answer back in sub second. In an old paper-based system, the task might take
several minutes, hours, or even days
2) Hardware: The physical components of computer systems are referred to as the Hardware.
The hardware comprises of:

8 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
a) Computer - a microcomputer (personal computer), a minicomputer, or a mainframe
computer.
b) Computer peripherals - physical devices that control computer input and output such as
keyboards, mice, modems, printers, any electronic devices that are used to connect two or
more computers to produce a computer network. Such networks are very important for
databases that are likely to be remotely accessed, for example, Automatic Teller Machine.
3) Software: Software means the collection of programs; instructions that make the hardware
work. The following types of software make the database system fully functional. They are:
Operating system software manages all hardware components and makes it possible for all
other software to run on the computers. The operating system software mainly used by:
Microcomputers - DOS, OS/2, and Windows 2000 etc.
Minicomputers - UNIX and VMS
Mainframe computers - MVS used by IBM.
DBMS software manages the database within the database system. Some examples of DBMS
are Microsoft's Access and SQL Server, Oracle Corporation's Oracle, and IBM's DB2.
Application programs are most commonly used to access and manipulate the data found within
the database, to generate reports, tabulations, and other information to facilitate decision-making.
Example: Visual Basic, Power Builder - they are used as a front-end application software to
access the DBMS.
Utility software is the software tool used to help manage the database system's components.
Examples of utility software in relation to DBMS would be a storage backup program, disk and
file recovery program etc.
4) People: People would include all users of the database system. They can be classified into five
types on the basis of their job functions, namely systems administrators, database administrators,
database designers, systems analysts/programmers, and end users.
Systems administrators are responsible to oversee the general operations of the database system.
Database Administrators, also known as DBAs, manage the use of the Database. DBAs
determine the content, internal structure and access strategy for a database define security and
integrity and ensure that the database is functioning properly.
Database designers, the architects of database, design the database structure. Only with a
good design of the database, the application programmers and the DBAs can produce a useful
database environment. For example, without good blueprint, a good building cannot be
constructed even if we have very good bricks and masons.
Systems analysts and programmers are the persons who design and implement the application
programs. They design and create the data entry screens, reports, and procedures through
which end users' access and manipulate the database's data.
End users are the persons who run the organization's daily operations using application programs.
Examples of end users are: clerks, supervisors, managers, and top executives. The information
obtained from the database is used to make strategic business decisions, which would contribute
the survival as well as growth of an organisation.
5) Procedure: Procedure is instructions and rules that govern the design and use of the database
system. Procedures are critical components of the system. They play an important role in a
company, because they enforce the standards by which business is conducted within the
organization and with customers. Procedures also are used to ensure that there is an organized
way to monitor and audit both the data that enters the database and the information that is
generated through the use of such data.

INFORMATION TECHNOLOGY TRAINING 9

© The Institute of Chartered Accountants of India


Data Bases
1.6 LEVELS OF ABSTRACTION IN A DBMS
The storage of data and their structures play an important role in determining the effectiveness
of DBMS. The database designs (Data Models) are simplified abstractions of real-world conditions.
For example, the user of a DBMS is ultimately concerned with the real-world enterprise, say a
university database would consists of data that describes various aspects of its students, faculty,
and courses in the university. The data in a university database is to describe these entities and
their relationships such that the abstractions will enable us to explore the characteristics of
entities and the relationships that can be created among such entities.
The data in a DBMS is described at three levels of abstraction, as illustrated in Fig. 1.6.1. The
database description consists of a schema (description of data at a view level) at each of these
three levels of abstraction: the conceptual, physical, and external schemas.

Fig.1.6.1: Three Levels of Abstraction


1.6.1 External Data View
The External Data View is also known as External schema, which describes the various user
views; say data as viewed by the Payroll Department, Accounting Department, and Inventory
Department etc. This schema specifies a view of the data in terms of the conceptual level. It is
customized to the needs of a particular category of users and is guided by end user requirements.
Portions of stored data should not be seen by some users and implements a level of security and
simplifies the view for these users. For example:
 Students should not see faculty salaries.
 Faculty should not see billing or payment data.
 Information that can be derived from stored data might be viewed as if it were stored. GPA
not stored, calculated when needed.

10 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
Applications are written in terms of an external schema. The external view is computed when
accessed. It is not stored. Different external schemas can be provided to different categories of
users. Translation from external level to conceptual level is done automatically by DBMS at run
time. A user can treat a view just like a relation and ask questions about the records in the view.
Any given database has exactly one conceptual schema and one physical schema because it has
just one set of stored relations. Each external schema consists of a collection of one or more
views and relations from the conceptual schema. A view is conceptually a relation, but the
records in a view are not stored in the DBMS. Rather, they are computed using a definition for
the view, in terms of relations stored in the DBMS
1.6.2 Conceptual Data View
Conceptual Data view is also known as Conceptual Schema, which describes the structure
and constraints for the whole database. It uses a conceptual or an implementation data model
and hides details of the physical level.
This schema presents data as a set of tables and data access between the conceptual to physical
schemas is performed automatically in a DBMS.
1.6.3 Physical Data View
Physical Data View is also known as Internal/Physical Schema describes data storage structures
and access paths. It typically uses a physical data model and the details are hidden from database users.
This schema describes details of how data is stored: files, indices, etc. on the random access disk system.
It also typically describes the record layout of files and type of files (hash, b-tree, flat).Early applications
worked at this level - explicitly dealt with details. E.g., minimizing physical distances between related
data and organizing the data structures within the file (blocked records, linked lists of blocks, etc.) Here
data access routines are hardcoded to deal with physical representation.
1.7 DATA INDEPENDENCE
Data independence is the capacity to change the schema at one level of a database system
without having to change the schema at the next level. The three-schema architecture allows
the feature of data independence. Data independence occurs because when the schema is changed
at some level, the schema at the next level remains unchanged; only the mapping between the
two levels is changed. Types of data independence are:
 Physical Data Independence - Hiding the changes to the structure of storage of the physical
files or reorganizing to improve the performance of retrieval or update from the conceptual
data view. This data independence happens at the logical interface level and the conceptual
schema remains the same. For example the data is compressed before they are stored on
disks and construction of an index on data.
 Logical Data Independence - Hiding the changes that occur at the conceptual schema
without having to change external schemas (user view) of the data. Here the changes at
the conceptual schema may be to expand the database (by adding a record type or data
item), to change constraints, or to reduce the database (by removing a record type or data
item). The view definition and the mappings occur at the user interface level without
affecting the external schemas or application programs.
The user interface level and the logical interface level are the mappings among schema levels.
Programs refer to an external schema, and are mapped by the DBMS to the internal schema for
execution. When a schema at a lower level is changed, only the mappings between this schema
and higher level schemas need to be changed in a DBMS that fully supports data independence.

INFORMATION TECHNOLOGY TRAINING 11

© The Institute of Chartered Accountants of India


Data Bases
The higher level schemas themselves are unchanged. Hence, the application programs need not
be changed since they refer only to the external schemas.
1.8 DATA MODEL
A Data Model is a collection of high-level data description constructs that hide many low-level
storage details. A DBMS allows a user to define the data to be stored in terms of a data model.
While the data model of the DBMS hides many details, it is nonetheless closer to how the DBMS
stores data than to how a user thinks about the underlying enterprise.
1.8.1 Entity-Relationship Model
The semantic data model or the Entity-Relationship model is a conceptual level model is a
more abstract, high-level data model that makes it easier for a user to come up with a good
initial description of the data in an enterprise. A database design in terms of a semantic model
serves as a useful starting point and is subsequently translated into a database design in terms of
the data model the DBMS actually supports.
This model captures meanings of:
 Entities: are real-world objects about which we collect data,
 Attributes: describe the entities,
 Relationships: are associations among entities,
 Entity set: set of entities of the same type, and
 Relationship set: set of relationships of same type.
The entity-relationship (ER) model pictorially denotes entities and the relationships among them
by the Entity-Relationship Model technique for analysis and logical modelling of a system's data
requirement. The ER Diagram helps to identify the database's main entities and their relationships.
Because the ER Diagram components are graphically represented, their role is more easily
understood. Using the ER Diagram, it's easy to map it to the relational database model's tables
and attributes. Table 1.8.1 : illustrates graphical notation of ER Diagram.
This mapping process uses a series of well-defined steps to generate all the required database
structures.
It uses three basic concepts, namely, entities, their attributes and the relationships that exist
between the entities. It uses graphical notations for representing these.
1.8.2 Entity
An entity is any object, place, person, and physical object such as a house or a car, an event
such as a house sale or a car service, or a concept such as a customer transaction or order about
which an enterprise records data. It is an object, which can have instances or occurrences. Each
instance should be capable of being uniquely identified. Each entity has certain properties or
attributes associated with it and operations applicable to it. Entities can be thought of as nouns.
Examples: a computer, an employee, a student, a customer, a branch, an account etc. Entities
are represented as rectangles. Categories of objects under which an entity is identified:

12 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases

NAME SYMBOL FUNCTION

Entity Entity Name Data object in the system

Attribute Attribute Describes an entity

Relationship Relation Relates two entities

Cardinality Entity 1 1 Entity


Relation
Name Name
one-to-one, Describe associations
(1:1) Entity 1 M Entity among data.
Relation
one-to-many, Name Name
(1:M)
many-to-many,
Entity M N Entity
(M:N) Relation
Name Name

Table 1.8.1: Graphical notation for ER Diagram

Category Examples
Physical object Employee, Machine, Book, Client, Student, Item
Abstract object Account, Department
Event Application, Reservation, Invoice, Contract
Location Building, City, State
Table 1.8.2 : Entity

INFORMATION TECHNOLOGY TRAINING 13

© The Institute of Chartered Accountants of India


Data Bases
1.8.3 Attributes
Attributes are data elements that describe an entity. If the attributes of an entity have more
attributes that describe it, then it is not an attribute of that entity, but another entity. Attributes
can either be listed next to the entities, or placed in circles and attached to the entities. The
attributes reflect the level of detail that needs to be represented as information about the entities.
For example, the Employees entity could use Employee Name, Employee Number, Bank Account
Number and Gross Pay as attributes as shown in Fig. 1.8.1.
Entity Attributes
Customer Name, Address, Customer-City
Branch Branch-Name, Branch-City, Branch-Region
Employee Employee-Name, Employee Number, Gross Pay
Order Order Number, Order Date, Order Placed by
Book ISBN, Title, Author, Price
Table 1.8.3: Attributes

Name
Number Gross Pay

Employee

Fig. 1.8.1: Entity Employee-Attributes


1.8.4 Relationship
A Relationship is an association among two or more entities. For example, we may have the
relationship that Mr.Amit works in the Accountants Department. A relationship is an association
between several entities. For an entity we collect a set of similar relationships into a relationship
set. This is an association between entities. It is represented by a diamond in the ER Diagram.
For example, there is a relationship between student and course. This relationship represents
the fact that a student enrols in a course.
For example, consider the two entity sets Customer and Account; here the relationship CustAcct
denotes the association between customers and their accounts. This is a binary relationship set
as shown in the Fig. 1.8.2. The role of an entity is the function it plays in a relationship. A
relationship may also have descriptive attributes. For example, date (last date of account access)
could be an attribute of the CustAcct relationship set.

1 1
Customer CustAcct Account

Fig. 1.8.2: CussAcct-Relationship

14 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
1.8.5 Degree of Relationship (Cardinality)
The Degree of Relationship indicates the link between the two entities for a specified occurrence
of each. The degree of relationship is also called “Cardinality”. Conceptual models use three
types of relationships to describe associations among data: one-to-one, one-to-many, many-to-
many. Cardinality specifies how many instances of an entity relate to one instance of another
entity.
 ONE TO ONE RELATIONSHIP (1:1): One order requisition transaction results in a
purchase order. A purchase order is raised by an order requisition as shown in Fig. 1.8.3.
For one occurrence of the first entity, there can be at most one related occurrence of the
second entity and vice-versa.

Order 1 1 Purchase
Transaction
Requisition Order
Raises

Fig. 1.8.3: Order Requisition–Purchase Oder (1 : 1).


 ONE TO MANY RELATIONSHIP (1: N): An employee works in a department. One
department can have many employees as shown in Fig. 1.8.4. For one occurrence of the
first entity, there can exist many related occurrences of the second entity and for every
occurrences of the second entity there exists only one associated occurrence of the first.

Employee M 1
Works IN Department

Fig. 1.8.4: Employee–Department (M : 1)


 MANY TO MANY RELATIONSHIP (M:N): One order may contain many items. One
item can be contained in many orders as shown in Fig. 1.8.5. For one occurrence of the first
entity, there exists many related occurrences of the second entity and for every occurrence
of the second entity, there may exist many associated occurrences of the first.

Order M N
Contains Item

Fig. 1.8.5: Order-Item ( M : N).


Example 1.8.1
The Fig. 1.8.6 shows the relationship between Employee, Department, Project, Parts and
Warehouse entities. An organisation has employees assigned to specific departments. The
employees may work on several projects at the same time. The project uses parts, which are
supplied by the different suppliers, and stored, in various warehouses.

INFORMATION TECHNOLOGY TRAINING 15

© The Institute of Chartered Accountants of India


Data Bases
N
PERSON PARTS SUPPLIED
BY
N M M N M
SUPPLIER

BELONGS WORKS USE STORED


TO ON IN

1 N M
N WARE
DEPT PROJECT
HOUSE

Fig. 1.8.6: E-R Diagram-Example 1.8.1


Example 1.8.2
The Fig. 1.8.7 shows the relationship between Faculty, Student, Department, Graduation Course,
and Person entities. A university has faculty assigned to specific departments. The faculty may
teach more than one subject on several courses at the same time.

Student
Faculty
N
M 1
Registered

Belongs Teaches
To Subject

1
1
1 M N

Department Handle Graduation Course

Fig. 1.8.7: E-R Diagram-Example 1.8.2


1.8.6 Data Normalization
Normalization is the process of refining the data model built by the Entity- Relationship Diagram.
The normalization technique, logically groups the data over a number of tables, which are
independent and contain no duplicate data. The entities or tables resulting from normalization
contain simple data item, with relationships being represented by replication of key data item(s).

16 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
The normalization process when properly executed eliminates uncontrolled data redundancies,
thus eliminating the data anomalies and the data integrity problems that are produced by such
redundancies. Normalization produces a carefully controlled redundancy that lets us properly
link database tables. The process of structuring data is to minimise duplication and inconsistencies.
The process usually involves breaking down a single table into two or more tables and defining
relationships between those tables. Normalization is usually done in stages, with each stage
applying more rigorous rules to the types of information, which can be stored in a table. Full
adherence to normalization principles increases the efficiency of a particular database.
The first three levels in normalizing a database are as follows:
(1) First Normal Form (1NF): First Normal Forms are groups of records such as Students
lists in which each field (column) contains unique and non-repeating information. 1NFs are
the least structured. For example, we have a student's table(Table 1.8.4) with the following
structure:

Student ID
Name
Date Of Birth
Course Advisor
Course Advisor's Telephone
Student
Course ID 1
Course Description 1
Course Instructor 1
Course ID 2
Course Description 2
Course Instructor 2
Table.1.8.4: Student Table(1NF)
The repeating course fields are in conflict with first normal form. To fix the problems created by
such repeating fields, we should place the course information in a separate course table, and then
provide a linking field (most likely Student ID) between the Students Table and the Course Table.
(2) Second Normal Form (2NF): Second Normal Forms break down First Normal Forms,
separating them into different tables by defining successively finer interrelationships between
fields. 2NFs do not include fields that are subsets of fields other than primary (key) field. For
example, say we create a course table(Table.1.8.5) with the structure:
Student ID
Course ID
Course Fee
Course Description
Course Instructor
Table.1.8.5: Course Table

INFORMATION TECHNOLOGY TRAINING 17

© The Institute of Chartered Accountants of India


Data Bases
We can create a unique primary key by combining StudentID + CourseID (StudentID is not
unique in itself, as one student may take multiple courses; similarly, CourseID is not unique in
itself as many students may take the same course; however, each student will only be taking a
particular course once at any one time, so the combination of StudentID + CourseID gives us a
unique primary key).
Now, in 2NF, no non-key fields (course description, course instructor) may depend on a portion
of the primary key. That, however, is exactly what we have here: the course instructor and
course description are the same for any course, regardless of the student taking the course. To
fix this and put the database in second normal form, we create a third table (Table.1.8.6), so our
database structure now looks like this (with key fields in italics):

Student Student Courses Courses


student ID student ID course ID
name course ID course Description
date of birth course Instructor
course advisor
course advisor's telephone
Table 1.8.6: Student Course Table (2NF).
(3) Third Normal Form (3NF): Like the Second Normal Forms, the Third Normal Forms also
break down the First Normal Forms, separating them into different tables by defining successively
finer interrelationships between fields. Third Normal Forms do not include fields that provide
information about fields other than the key field.
In our students table, for example, each field should provide information about the particular
student referred to by the key field, studentID. That certainly applies to the student's name and
date of birth. But the advisor's name and telephone doesn't change depending on the student.
So, to put this database in third nomal form (Table 1.8.7), we need to place the advisor's
information in a separate table:
Students Student Courses Courses Course Advisor
Student ID Student ID Course ID Advisor ID
Name Course ID Course Description Advisor Name
Date Of Birth Course Instructor Advisor's Telephone
Course Advisor ID
Table 1.8.7: Student-Course-Course Advisor (3NF).
(4) Boyce-Codd Normal Form (BCNF): A table is in Boyce-Codd Normal Form (BCNF), when
it is in 3NF and every determinant in the table is a candidate key. A determinant is any
attribute whose value determines other values within a row. For example, if the table is in
3NF and it contains a nonprime attribute that determines a prime attribute, the BCNF
requirements are not met.
This description clearly yields the following conclusions:
 If a table is in 3NF and it contains only one candidate key, 3NF and BCNF are equivalent.
 BCNF can be violated only if the table contains more than one candidate key. Putting it
another way, there is no way that the BCNF requirement can be violated if there is only one
candidate key.

18 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
Codd's 12 rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F.
Codd, a pioneer of the Relational Model for Databases, designed to define what is required from
a Database Management System in order for it to be considered relational, i.e., an Relational
Database Management System (RDBMS).The Codd's Rules are as follows:
Rule 0: The system must qualify as relational, as a database, and as a management system.
For a system to qualify as a relational database management system (RDBMS), that system must
use its relational facilities (exclusively) to manage the database.
Rule 1: The Information Rule
All information in a relational database including table names, column names are represented
by data values in tables.
Rule 2: The Guaranteed Access Rule
Every piece of data in a relational database can be accessed by using combination of a table
name, a primary key value that identifies the row and column name which identified a cell.
In other words, all data must be accessible. It says that every individual scalar value in the
database must be logically addressable by specifying the name of the containing table, the name
of the containing column and the primary key value of the containing row.
Rule 3: Systematic Treatment of NULL values
The RDBMS handles records that have unknown or inapplicable values in a pre-defined fashion.
The RDBMS distinguishes between zeros, blanks and nulls in the records and handles such
values in a consistent manner that produces correct answers, comparisons and calculations.
Even though the rule doesn't specify what should be done in the case of nulls it specifies that
there should be a consistent policy in the treatment of nulls.
Rule 4: Dynamic online catalog based on the Relational model
The description of a database and in its contents are database tables and therefore can be queried
on-line via the data manipulation language. The database administrator's productivity is
improved since the changes and additions to the catalog can be done with the same commands
that are used to access any other table. All queries and reports can also be done as any other
table.
Rule 5: The Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use (for
example, the fill-in-the-blanks mode). However, there must be at least one language whose
statements are expressible, per some well-defined syntax, as character strings and that is
comprehensive in supporting all the following items
 Data Definition
 View Definition
 Data Manipulation (Interactive and by program).
 Integrity Constraints
 Authorization.
Every RDBMS should provide a language to allow the user to query the contents of the RDBMS
and also manipulate the contents of the RDBMS.
Rule 6: The View Updating Rule
Any view that is theoretically updateable can be updated using the RDBMS. Data consistency is
ensured since the changes made in the view are transmitted to the base table and vice-versa.

INFORMATION TECHNOLOGY TRAINING 19

© The Institute of Chartered Accountants of India


Data Bases
Rule 7: High - level Insert, Update, and Delete
The RDBMS must be capable of supporting insertion, updation and deletion at a table level. The
performance is improved since the commands act on a set of records rather than one record at
a time. The user should be able to modify several tables by modifying the view to which they act
as base tables.
Rule 8: Physical Data Independence
Application programs and terminal activities remain logically unimpaired whenever any changes
are made in either storage representations or access methods. Changes to the physical level
(how the data is stored, whether in arrays or linked lists etc.) must not require a change to an
application based on the structure.
The user should not be aware of where or upon which media data-files are stored. Database
administrators can make changes to the physical access and storage method which improve
performance but do not require changes in the application programs or requests.
Rule 9: Logical Data Independence
Changes to the logical level (tables, columns, rows, and so on) must not require a change to an
application based on the structure. Logical data independence is more difficult to achieve than
physical data independence.
User programs and the user should not be aware of any changes to the structure of the tables
(such as the addition of extra columns). Logical changes in tables and views such as adding/
deleting columns or changing fields lengths need not necessitate modifications in the programs.
For example- adding attribute or column to the base table should not disrupt the programs or
the interactive command that have no use for the new attribute.
Rule 10: Integrity Independence
Integrity constraints specific to a particular relational data base must be definable in the relational
data sub-language and storable in the catalog, not in the application programs. It must be
possible to change such constraints as and when appropriate without unnecessarily affecting
existing applications.
If a column only accepts certain values, then it is the RDBMS which enforces these constraints
and not the user program, this means that an invalid value can never be entered into this
column, whilst if the constraints were enforced via programs there is always a chance that a
buggy program might allow incorrect values into the system.
Rule 11: Distribution Independence
A relational DBMS has distribution independence which means that the distribution of portions
of the database to various locations should be invisible to users of the database.
The RDBMS may spread across more than one system and across several networks, however to
the end-user the tables should appear no different to those that are local.
Rule 12: The Non - Subversion Rule
If the system provides a low-level (single-record-at-a-time) interface, then that interface cannot
be used to subvert or bypass the Integrity Rules and constraints expressed in the higher level
relational language (multiple-records-at-a-time).
1.9 RELATIONAL MODEL
The Relational Model represents data in the two-dimensional model called a Relation or a Table.
This model consists of a relation schema and a relation instance. The relation instance is a table,
and the relation schema describes the column heads for the table. In the Relational Database, the
data is organized in the form of a matrix as shown in the Fig. 1.9.1, with rows of the matrix

20 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
forming what are known as Records in the Database and the columns of the matrix forming the
individual Fields across all Records. This model is called the logical-level model proposed by E.F.
Codd in 1970, and the first commercial system appeared in 1981-82. Now, RDBMS is widely used
in several commercial products (ORACLE, SYBASE, INFORMIX, CA-INGRES).
 Fields describe a single location in a record in which a particular type of data is stored. For
example, an employee record might contain fields to store Name, Address, City, State, PIN
code. All records will have exactly the same structure, so they contain the same fields. But
the values in each field vary from record to record. The fields are the columns or the attributes
of a relation.
 Record is a collection of fields. Each record is made up of a number of fields. A record
contains all the information about a single 'member'(instance) of a table (relation). In the
employees table, each employee's details (name, date of birth, pay rate, address, and so on)
will be contained in a record (tuple). A records or a tuple can be accessed as a collective
unit of fields (attributes), or the fields attributes) can be accessed individually.
 Table is often implemented as an array of records. In relational databases, a data structure
is characterised by rows and columns, with data occupying each cell formed by a row-
column intersection. For example, a phone book table contains records for each telephone
subscriber separately, and each subscriber's details are contained in three fields - name,
address and telephone.
Example: Employee (Entity) Fields (Attributes)

   
Fields
Names  Emp-ID
5000
Emp-Name
Amit
City
Chennai
State
TN
5001 Jaykishan Nagpur MH
Records 5002 Kamini Lucknow UP
(Tuples)
5003 Manish Hyderabad AP
5004 Utpal Delhi HY
Fig. 1.9.1: Relation-Table-Employee

 Characteristics of a Relational Table


A table is perceived as a two-dimensional structure composed of rows and columns.
 Each table row (tuple) represents a single entity occurrence within the entity set.
 Each table column represents an attribute, and each column has a distinct name.
 Each row/column intersection represents a single data value.
 Each table must have an attribute or a combination of attributes that uniquely identifies
each row.
 All values in a column must conform to the same data format. For example, if the attribute
is assigned an integer data format, all values in the column representing that attribute must
be integers.
 Each column has a specific range of values known as the attribute domain.
 The order of the rows and columns is immaterial to the DBMS.

INFORMATION TECHNOLOGY TRAINING 21

© The Institute of Chartered Accountants of India


Data Bases
1.10 DATA ACCESS METHODS
The DBMS manages the interaction between the End User and the Database as shown in
Fig. 1.10.1.

Fig. 1.10.1: DBMS Interaction


The number of users determines whether the DBMS is classified as Single-user or Multiuser.
 A single-user DBMS supports only one user at a time. In other words, if user A is using the
database, users B and C must wait until user A has completed his/her database work. If a
single-user database runs on a personal computer, it is also called a Desktop Database.

Fig. 1.10.2: DBMS Access with Multiple Clients Fig. 1.10.3: DBMS Access with Multiple
Clients of different type.

22 INFORMATION TECHNOLOGY TRAINING

© The Institute of Chartered Accountants of India


Introduction to Databases
 A multi-user DBMS supports multiple users at the same time. The database site location
might also be used to classify the DBMS. For example, a DBMS that supports a database
located at a single site is called a Centralized DBMS. A DBMS that supports a database
distributed across several different sites is called a distributed DBMS. Fig. 1.10.2 depicts
DBMS Access with multiple clients of same type, where as Fig. 1.10.3 depicts DBMS access
with multiple clients of different type.
Distributed Database Systems have now come to be known as client/server based database
systems because they do not support a totally distributed environment, but rather a set of database
servers supporting a set of clients. Fig. 1.10.4 depicts Distributed Database.
In client/server computing, processing is shared between multiple small computers known as
clients that are connected via a network to a host computer known as a server. Clients on a
network are typically PC's, a type of high powered small computer built for specialized
applications called a workstation, or a network computer. Servers are typically dedicated to a
specific type of processing, like providing files with a file server, responding to database queries
with a database server, or handling high-speed processing with an application server.
A DBMS performs several important functions that guarantee the integrity and consistency of
the data in the database. Most of these functions are transparent to end users, and most can be
Multiple databases/distributed database. A DBMS performs several important functions that
guarantee the integrity and consistency of the data in the database. Most of these functions are
transparent to end users, and most can be achieved only through the use of a DBMS.

Fig.1.10.4: Distributed Database

INFORMATION TECHNOLOGY TRAINING 23

© The Institute of Chartered Accountants of India

You might also like