Module 1 Final
Module 1 Final
Module 1 Final
MODULE 1
MODULE 1
1.1 INTRODUCTION
“Good decisions require good information that is derived from raw facts”
These raw facts are known as data. Data are likely to be managed most
efficiently when they are stored in a database
What is Data?
Data means known raw facts that can be recorded and that have implicit meaning.
For example:
consider the names, telephone numbers, and addresses of the people you know.
Note: The word raw indicates that the facts have not yet been processed to
reveal their meaning.
What is Information?
Information is the result of processing raw data to reveal its meaning. Data processing
can be as simple as organizing data to reveal patterns or as complex as making forecasts or
drawing inferences using statistical modeling.
Why Databases:
Imagine trying to operate a business without knowing who your customers are,
what products you are selling, who is working for you, who owes you money, and whom
you owe money. All businesses have to keep this type of data and much more; and just as
importantly, they must have those data available to decision makers when they need them.
It can be argued that the ultimate purpose of all business information systems is to help
businesses use information as an organizational resource. At the heart of all of these
systems are the collection, storage, aggregation, manipulation, dissemination, and
management of data.
financial instruments such as stocks and bonds; also for storing real-time
market data to enable online trading by customers and automated trading
by the firm
Universities: For student information, course registrations, and grades (in
addition to standard enterprise information such as human resources and
accounting).
Airlines: For reservations and schedule information. Airlines were among the
first to use databases in a geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly
bills, maintaining balances on prepaid calling cards, and storing information
about the communication networks
A traditional file processing approach supports a single view of the data. However a
database approach supports multiple view of the data. Database approach supports
many users each of whom would require a certain view of the database. Hence
DBMS approach provides facilities for defining multiple views.
Traditional file processing approach did not support sharing of data. However, the
modern database approach supports sharing of data as well as multi-user
transactions. For this, the DBMS includes features such as concurrency control to
ensure that several users trying to update the same data do so in a controlled
manner. It also enforces isolation property , atomicity property etc,. to promote
this.
In any organization where many persons use the same resources, there is a need for
a chief administrator to oversee and manage these resources. In a database
environment, the primary resource is the database itself and the secondary resource
is the DBMS and related software. Administering these resources is the
responsibility of the database administrator (DBA). The DBA is responsible for
authorizing access to the database, for coordinating and monitoring its use, and for
acquiring software and hardware resources as needed
2) Database Designers
iii. Database designers typically interact with each potential group of users and
develop “views” of the database as per the requirements of these groups.
iv. It is his responsibility to create such a database design which favors the
requirements of all user groups.
3) End Users
What is the different type of database end-users? Discuss the main activities of
each.
End users are the people whose jobs require access to the database for querying,
updating, and generating reports; the database primarily exists for their use. There
are several categories of end users:
i. Casual end users: are such users who may need different information each
time they query the database. These users include managers, occasional
browser.
ii. Naive or parametric end users: make up the major portion of database
users they constantly query the database using standard types of queries and
updates called as “canned transactions”
Bank tellers check account balances and post withdrawals and deposits
System analysts determine the requirements of end users, especially naive and
parametric end users, and develop specifications for canned transactions that meet
these requirements. Application programmers implement these specifications as
programs; then they test, debug, document, and maintain these canned transactions.
Such analysts and programmers (nowadays called software engineers) should be
familiar with the full range of capabilities provided by the DBMS to accomplish their
tasks.
Those who work to maintain the database system environment, but who are not
actively interested in the database contents as part of their daily job
i. DBMS system designers: who design modules such as the module for
implementing catalog, modules for controlling concurrency , handling data
recovery and security etc.
ii. Tool developers: who design tools(software packages). Tools are optional
packages that are often purchased separately.
iii. Operators and maintenance personnel : are responsible for actual running
and maintenance of hardware and software system.
What are the advantages of using a DBMS approach? (or) Discuss the capabilities
that must be provided by a DBMS.
Files that represent the same data may become inconsistent. This may happen
because an update is applied to some of the files but not to others.
ii. Restricting unauthorized access to data. When multiple users share a large
database, it is likely that most users will not be authorized to access all
information in the database. for example only authorized persons are allowed to
access the data. In addition, some users may only be permitted to retrieve data,
whereas others are allowed to retrieve and update. A DBMS should provide a
security and authorization subsystem.
iii. Providing Persistent Storage for Program Objects Databases can be used to
provide persistent storage for program objects and data structures. The values
of program variables or objects are discarded once a program terminates, unless
the programmer explicitly stores them in permanent files, which often involves
converting these complex structures into a format suitable for file storage.
iv. The persistent storage of program objects and data structures is an important
function of database systems. Traditional database systems often suffered from the
so called impedance mismatch problem
vi. Providing Backup and Recovery A DBMS must provide facilities for recovering
from hardware or software failures. The backup and recovery subsystem of
the DBMS is responsible for recovery.
vii. For example, if the computer system fails in the middle of a complex update
transaction, the recovery subsystem is responsible for making sure that the
database is restored to the state it was in before the transaction started executing.
viii. Providing Multiple User Interfaces Because many types of users with varying
levels of technical knowledge use a database, a DBMS should provide a variety of
user interfaces. forms-style interfaces and menu-driven interfaces are used and
commonly known as graphical user interfaces (GUIs). Many specialized
languages and environments exist for specifying GUIs.
Disadvantages
When a user needs to store a record in a child table that is currently unrelated to any
record in a parent table, it gets difficulty in recording and user must record an additional
entry in the parent table.
This type of database cannot support complex relationships, and there is also a problem
of redundancy, which can result in producing inaccurate information due to the
inconsistent recording of data at various sites.
Disadvantages
Extended relational systems add further capabilities (e.g. for multimedia data, XML,
and other data types)
Relational DBMS Products emerged in the 1980s
Data on the Web and E-commerce Applications:
Web contains data in HTML (Hypertext markup language) with links among pages.
This has given rise to a new set of applications and E-commerce is using new
standards like XML (eXtended Markup Language).
Script programming languages such as PHP and JavaScript allow generation of
dynamic Web pages that are partially generated from a database
New functionality is being added to DBMSs in the following areas:
Scientific Applications
XML (eXtensible Markup Language)
Image Storage and Management
Audio and Video data management
Data Warehousing and Data Mining
Spatial data management
Time Series and Historical Data Management
The above gives rise to new research and development in incorporating new data
types, complex data structures, new operations and storage and indexing schemes
in database systems.
Also allow database updates through Web pages
High-level or conceptual data models provide concepts that are close to the way
many users perceive data,
Low-level or physical data models provide concepts that describe the details of
how data is stored on the computer storage.
These two extremes is a class of representational (or implementation) data
models, which provide concepts that may be easily understood by end users.
Schema Construct: A component of the schema or an object within the schema, e.g.,
STUDENT, COURSE.
Database State: The actual data present in the database at any particular point of time is
called as a database state (or snapshot or occurrences or instances). The database
state(actual data) may change from time to time frequently.
Initial Database State: Refers to the database state when it is initially populated with data
into the system.
Valid State: A state that satisfies the structure and constraints of the database.
The database schema is sometimes called as the “intension” and a database state is called
an “extension” of the schema.
1. The external or view level includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular user group
is interested in and hides the rest of the database from that user group.
2. The conceptual level has a conceptual schema, which describes the structure of
the whole database for a community of users. The conceptual schema hides the
details of physical storage structures and concentrates on describing entities, data
types, relationships, user operations, and constraints.
3. The internal level has an internal schema, which describes the physical storage
structure of the database. The internal schema uses a physical data model and
describes the complete details of data storage and access paths for the database.
What is the difference between logical data independence and physical data
independence which one is harder to achieve? Why?
Three-schema architecture can be used to achieve both logical data independence and
physical data independence.
1. Logical data independence
2. Physical data independence
1. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the conceptual
schema to expand the database (by adding a record type or data item), to change
constraints, or to reduce the database (by removing a record type or data item).
2. Physical data independence is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be changed
as well. Changes to the internal schema may be needed because some physical files were
reorganized -for example, by creating additional access structures—to improve the
performance of retrieval or update. If the same data as before remains in the database, we
should not have to change the conceptual schema.
Data definition language (DDL): Used by the DBA and database designers to specify the
conceptual schema of a database. The DBMS will have a DDL compiler whose function is to
process DDL statements in order to identify descriptions of the schema constructs and to
store the schema description in the DBMS catalog.
Storage definition language (SDL), is used to specify the internal schema. The mappings
between the two schemas may be specified in either one of these languages.
View definition language (VDL),to specify user views and their mappings to the
conceptual schema, but in most DBMSs the DDL is used to define both conceptual and
external schemas.
Data Manipulation Language (DML), Used to specify database retrievals and
updates.DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language), such as COBOL, C, C++, or Java.
DBMS Interfaces:
Discuss the different types of user friendly interfaces and the types of users who
typically use each.
Many user friendly interfaces are provided by the DBMS to enable the user to interact with
the data in the database such as
Menu-Based Interfaces: These interfaces present the user with lists of options (called
menus) that help the user to make a request. The advantage of this is that the user need
not memorize the specific commands and syntax.
Forms-Based Interfaces: A forms-based interface displays a form to each user. Users can
fill out all of the form entries to insert new data onto the database. Forms are usually
designed and programmed for naive users.
Graphical User Interface: Present a pictorial form of the schema. The user can then use a
pointing device(such as a mouse) to make a choice out of the many options provided by the
GUI.
Natural Language Interfaces: These interfaces accept requests written in English or some
other language and attempt to understand them. A natural language interface would have a
dictionary of important words. If the interpretation is successful, it generate a high level
query. Otherwise, a dialogue is started with the user to clarify the request.
Speech Input and Output: Limited use of speech as an input query and speech as an
answer to a question or result of a request is becoming commonplace. Applications with
limited vocabularies such as inquiries for telephone directory, flight arrival/departure, and
credit card account information are allowing speech for input and output to enable
customers to access this information.
Interfaces for Parametric Users: Parametric users, such as bank tellers, often have a
small set of operations that they must perform repeatedly.
Interfaces for the DBA: Most database systems contain privileged commands that can be
used only by the DBA staff. These include commands for creating accounts, setting system
parameters, granting account authorization, changing a schema, and reorganizing the
storage structures of a database.
A DBMS is a complex software system. The types of software components that constitute a
DBMS and the types of computer system software with which the DBMS interacts.
The figure is divided into two parts. The top part of the figure refers to the various users of
the database environment and their interfaces. The lower part shows the internals of the
DBMS responsible for storage of data and processing of transactions. The database and the
DBMS catalog are usually stored on disk. Access to the disk is controlled primarily by the
operating system (OS).
Many DBMSs have their own buffer management module to schedule disk
Read/write, because this has a considerable effect on performance. top part of Figure
shows interfaces for the DBA staff, casual users who work with interactive interfaces to
formulate queries, application programmers who create programs using some host
programming languages, and parametric users who do data entry work by supplying
parameters to predefined transactions.
The DBA staff works on defining the database and tuning it by making changes to its
definition using the DDL and other privileged commands. The queries are parsed and
validated for correctness of the query syntax, the names of files and data elements, and so
on by a query compiler that compiles them into an internal form. the query optimizer is
concerned with the rearrangement and possible reordering of operations, elimination of
redundancies, and use of correct algorithms and indexes during execution. The pre
compiler extracts DML commands from an application program written in a host
programming language. We have shown concurrency control and backup and recovery
systems separately as a module in this figure.
The DBMS interacts with the operating system when disk accesses—to the database or to
the catalog—are needed. If the computer system is shared by many users, the OS will
schedule DBMS disk access requests and DBMS processing along with other processes. On
the other hand, if the computer system is mainly dedicated to running the database server,
the DBMS will control main memory buffering of disk pages.
Database utilities refer to additional facilities that help the DBA to manage the database
system. Some of the common utilities are-
i. Loading: . A loading utility is used to load existing data files—such as text files or
sequential files—into the database. Usually, the current (source) format of the data
file and the desired (target) database file structure are specified to the utility, which
then automatically reformats the data and stores it in the database.
Sudarsanan D Assistant Professor, CITECH-ISE. Page 18
Introduction to DBMS
ii. Backup: A backup utility creates a backup copy of the database, usually by dumping
the entire database onto tape or other mass storage medium. The backup copy can
be used to restore the database in case of catastrophic disk failure.
iii. Database storage re-organization: This utility can be used to reorganize a set of
database files into different file organizations and create new access paths to
improve performance.
iv. Performance monitoring: database usage and provides statistics to the DBA. The
DBA uses the statistics in making decisions such as whether or not to reorganize
files or whether to add or drop indexes to improve performance
Application such as PowerBuilder (Sybase) or JBuilder (Borland), xamp have been quite
popular. These systems provide an environment for developing database applications and
include facilities that help in many facets of database systems, including database design,
GUI development, querying and updating, and application program development.
were sent from the computer to the display terminals, which were connected to the
central computer via various types of communications networks.
As prices of hardware declined, most users replaced their terminals with PCs and
workstations, and more recently with mobile devices.
Gradually, DBMS systems started to exploit the available processing power at the
user side, which led to client/server DBMS architectures.
Two main types of basic DBMS architectures were created on this underlying
client/server framework: two-tie and three-tier.
Explain the operation of a 2-tier client/server architecture. How does it differ from
a 3-tier client/server architecture.
Many web-application use a 3-ier architecture which adds an additional layer called
the APPLICATION SERVER or WEB SERVER between the client and database
server. This additional layer stores business rules(ie procedures or constraints) that
are used to access database from the database server.
There are several criteria’s based upon which the DBMS can be classified
i. Based on the Data Model: The main data model used in many current commercial
DBMSs is the relational data model, and the systems based on this model are
known as SQL systems. The object data model has been implemented in some
commercial systems but has not had widespread use. Recently, so-called big data
systems, also known as key-value storage systems and NOSQL systems
Sudarsanan D Assistant Professor, CITECH-ISE. Page 22
Introduction to DBMS
Many legacy applications still run on database systems based on the hierarchical
and network data models.
Some experimental DBMSs are based on the XML (eXtended Markup Language)
model, which is a tree-structured data model. These have been called native XML
DBMSs.
ii. Based on the number of users: to classify DBMSs is the number of users supported
by the system. Single-user systems support only one user at a time and are mostly
used with PCs. Multiuser systems, which include the majority of DBMSs, support
concurrent multiple users.
iii. Based on the number of sites: they can be classified as centralized and distributed. A
centralized DBMS can support multiple users, but the DBMS and the database reside
totally at a single computer site. A distributed DBMS (DDBMS) can have the actual
database and DBMS software distributed over many sites connected by a computer
network. Big data systems are often massively distributed, with hundreds of sites.
The data is often replicated on multiple sites so that failure of a site will not make
some data unavailable
iv. Based on the types of software: at various sites, they can be classified as
Homogeneous or Hetrogeneous.
v. Based on the purpose: they can be classified as specific purpose and general
purpose.
Other than the above mentioned criterias, classification can be made based on the cost,
based on the access path etc.
The first step is called as the requirements collection and analysis phase. During this
phase, the database designers interview the database users to understand their
expectations. At the end of the first phase, the database designers generate the data
requirements and functional requirements of the application.
In parallel with specifying the data requirements, it is useful to specify the known
functional requirements of the application. These consist of the user defined operations
(or transactions) that will be applied to the database, including both retrievals and
updates.
Once the first step has been completed, the second step involves the functional analysis of
the functional requirements and the preparation of the CONCEPTUAL DESIGN using the
data requirements. This phase includes the creation of ENTITY TYPES, RELATIONSHIPS
and CONSTRAINTS.
Sudarsanan D Assistant Professor, CITECH-ISE. Page 24
Introduction to DBMS
The third step in the database design is the actual implementation of the database using a
COMMERCIAL DBMS such as ORACLE or relational (SQL) model. This phase generate the
conceptual schema is transformed from the high-level data model into the implementation
data model. This step is called logical design or data model mapping.
The last step is the physical design phase, during which the internal storage structures,
file organizations, indexes, access paths, and physical design parameters for the database
files are specified. In parallel with these activities, application programs are designed and
implemented as database transactions corresponding to the high level transaction
specifications.
ENTITIES and ATTRIBUTES : are the basic objects of an ER-MODEL. Entity represents a
“THINGS’ in the real world which has an independent existence.
Each entity has attributes. Attributes are properties that more fully describe an entity.
Eg: the EMPLOYEE entity would be described by the name, age, address, salary, sex
etc. which become the attributes of the enity.
Types of Attributes:
What are the different type of attributes. Explain.
1. Composite Attributes
2. Simple (Atomic) Attributes
3. Single-Valued Attributes
4. Multi valued Attributes
5. Stored Attributes
6. Derived Attributes
7. Complex Attributes
1. Composite attributes are such attributes which can be divided into smaller sub-parts.
These sub-parts would represent more basic attributes.
For example, the address attribute can be further divided into street no. city, state, zipcode
etc.
3. Single valued attributes: Most attributes have a single value for a particular entity;
such attributes are called single-valued.
Ex: Age is a single-valued attribute of a person
4. Multi valued attributes: Most attributes have a multi-value for the same property; such
attributes are called Multivalued. Ex: color : {red, blue} ,phone_no
5. Derived attribute: In some cases, the value of one attribute can be obtained using the
value of another attribute.
Ex: AGE attribute can be derived by subtracting the date of DOB from the current
DATE
6. Stored attribute: the attribute that cannot be obtained using the value of another
attribute is called as the stored attribute. or entered directly to relative attribute entities.
Ex: date of birth attribute is the stored attribute.
7. Complex Attributes: this attribute in general, composite and multivalued attributes can
be nested arbitrarily. We can represent arbitrary nesting by grouping components of a
composite attribute between parentheses ( ) and separating the components with commas,
and by displaying multivalued attributes between braces { }. Such attributes are called
complex attributes.
ENTITY TYPES:
Define the terms entity types and entity set.
Key Attributes of an Entity Type. An important constraint on the entities of an entity type
is the key or uniqueness constraint on attributes. An entity type usually has one or more
attributes whose values are distinct for each individual entity in the entity set. Such an
attribute is called a key attribute, and its values can be used to identify each entity
uniquely. Each key attribute has its name underlined inside the oval
ENTITY SET: A entity set is a set of entities of the same type that share the same
properties or attributes
ENTITY TYPE
OR
Collection of entity is called entity set.
Ex:
ENTIT
Y
SET
Ex: if the range of ages allowed for employees is between 16 and 70, we can specify the
value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16
and 70.
We can identify four entity types—one Corresponding to each of the four items in the
specification
1. An entity type DEPARTMENT with attributes Name, Number, Locations,
Manager, and Manager_start_date. Locations is the only multivalued
attribute.
ROLES
What is a participation role? When it necessary to use role names in the description of
relationship types?
The role name signifies the role that the participating entities play in each relationship. For
example consider the EMPLOYEE and DEPARTMENT entities as given below-
In the above example, the role name is works for and it signifies that the employee works
for the particular department. However role names are not compulsorily required when
the participating entities are distinct. However, in some cases where the participating
entities are same , role name becomes essential for distinguishing the meaning of each
participating the meaning of each participation. Such relationship are called as “
RECURSIVE RELATIONSHIP”
EMPLOYEE SUPERVISION
Int the above example, the EMPLOYEE entity participates twice in SUPERVISION. i.e once
in the role of a supervisior and next in the role of a supervisee. Such relationships are called
as Recursive relationships in which the role names becomes very essential.
CONSTRAINTS
CARDINALITY RATIO:
The cardinality ratio of a binary relationship specifies the maximum number of
relationship instances that an entity can participate in the possible cardinality ratios
for binary relationship types are
1. One to One (1:1)
2. One to Many (1:N)
3. Many to One(N:1)
4. Many to Many (M:N)
Many to many (M:N):An example of M:N binary relationship is WORKS-ON which relates
the EMPLOYEE entity to PROJECT entity. This represents the constraint that at any point in
time an employee may work on more than one PROJECT and that a PROJECT also can have
more than one EMPLOYEE.
PARTICIPATING CONSTRAINTS
TOTAL PARTICIPATION: if the company policy states that every employee must work for
a department, then an employee entity can exist if it participates in the WORKS-FOR
relationship. Thus , the participation of EMPLOYEE in WORKS-FOR is called TOTAL
PARTICIPATION(which is also called as existence dependency). Total participation is
represented by double lines in an ER-DIAGRAM
Ex: PROFESSOR Teaches CLASS
Partial participation
NOTE:
Write a note on the structural constraints of relationship types.
If this question is asked in the exams, then discuss about both i) cardinality ratio ii)
participation constraints.
Entities that do not have key attributes of their own are called as “weak entities”. On the
other hand, strong entities are such entities which have a key of their own.
A weak entity is identified through another strong entity in combination with one of its
attribute such a strong entity is called as the “identifying or owner entity type”. The
relationship type that relates a weak entity type to its owner is called as the “identifying
relationship”.
A weak entity normally has a “ partial key” which is an attribute(or set of attributes) that
can uniquely identify weak entities that are related to some owner entity.
In ER- Diagrams, both a weak entity type and its identifying relationship are distinguished
by surrounding their boxes and diamonds with double lines. Further the partial key
attribute is underlined with a dashed line.
e_ph
e_sex
date
e_name
e_dob month
ssn
year
EMPLOYEE
has
DEPENDENT
sex
vehicle
car truck
No_of_axles
Age
Name gender
PERSON
IS A
student Teacher
roll no
emp_id
1.21 ER DIAGRAMS
MOVIE DATABASE
BANKING DATABASE
MUSIC DATABASE