Nothing Special   »   [go: up one dir, main page]

Database Basics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

01.

Database Basics
[PM Jat, DAIICT, Gandhinagar]

Contents
• What is Database?
• How do we represent databases?
• DBMS based computing architecture
• Database and Database Schema
• Three Schema Architecture

What is Database?
In simple terms, Database is a collection of “Relevant” “Data”.
We elaborate terms “Relevant” and “Data” next.
A database does not have a random collection of data; it typically
• Records all facts about some “application” context.
• Records necessary information of all events that occur as “business process”, for example an
order is received; an items is added in store, and so forth

Data
Data is an atomic value; represents a fact about some entity.
For instance, “Amit Kumar” is value for attribute name of a student; “28-June-1986” value for
attribute date-of-birth of the student, and so forth. A set of values for various attributes describe an
entity.
Data being atomic mean a “single value” for an attribute.

Relevant Data
Databases are built for some purpose. A data being relevant mean, data is required for meeting the
objective of building the database.
For example, do we need to record names of dependents of an employee in a company database?
Answer would depend if the company provides certain benefits to dependents of employees then it
may be required otherwise not required.
Hereby we say that data is “relevant” if it is required in database, otherwise not.
Basically, intuition here is, we are able to draw some boundary to separate out data of database and
data of universe.
Book elmasri/navathe uses a term called “mini data world”, while the book Korth uses a term
“enterprise” for capturing the notion of “boundary” for what is relevant and what is not?

1 | Database Basics [pm jat @ daiict]


Representation of databases
We have certain data modeling techniques that are used for representing databases. Following are
two most popular techniques.
Entity-Relationship Models:
• Database is seen as set of entities of different types and interactions between them.
• Figure [XIT-ER] below shows a set of student, program, and department entities; an entity
in its set is described by values [data] for its attributes.
• ER model is “Conceptual Model”, and primarily used for documentation purpose.
Relational Model (Implementation Model)
• In this technique, Database is represented as a set of relations, also called as tables.
• Each relation is set of tuple, or rows.
• Relational model is de-facto standard for implementing enterprise databases.

Example #1 (Entity-Relationship Models)

Database can be expressed as set of sets of some entity sets and interaction sets.

Figure 1: XIT Database in terms of Entities and their interactions

XIT Database = {S, P, D, SP, PD}

Where

2 | Database Basics [pm jat @ daiict]


S = {s1, s2, s3, …} - set of students
P = {p1, p2, p3, …} - set of program
D = {d1, d2, d3, …} - set of departments
SP = set of events/instances of student enrolling in a program
PD = set of events/instances of program getting offered by a department

Example #2 (Entity-Relationship Models)


Figure below partially depicts a database scenario (da-acad, we will be introducing shortly)

Figure 2: DA-Acad (partial) Database in terms of Entities and their interactions

DA-Acad DB={S, C, T, F, CT, CTF, CTS}


Where
S = {s1, s2, s3, …}; set of students

3 | Database Basics [pm jat @ daiict]


C = {c1, c2, c3, …}; set of courses
T = {<Autumn,2016>, <Winter,2017>, … }; set of terms
F = set of faculty
CT = Set of instances of Course offerings?
CTF = set of instances of “course offering” getting a faculty associated?
CTS = set of instances of students registering in offerings?

Example (relational representation):


XIT Database – set of relations/tables
DB = {Student, Program, Department}
Where Student, Program, Department are three relations as shown below.

Figure 3: XIT database in terms of Relations

Operations on Databases
Following are main operations on database (also referred as database manipulation operations).

Are of two types:

• Update operations: that changes data of databases; typically add, modify, and delete entities.
o Add more facts [entities and their interaction]
o Modify existing data
o Delete existing facts

For example - operations on XIT database

o Add a student (id=234, Name=‘Aman’, ..) in program bcs


o Update CPI of student having id = 102 to 7.8

• Query: get answer of a query from the database


o List (ID, Name) of BCS students having CPI > 7.0

4 | Database Basics [pm jat @ daiict]


o Give me program-wise student count for all programs
o How many students are studying in CS department?

It is the business events that trigger the database update. For example following events will be
update the corresponding sets in “da-acad” database –

• Add an course - adds an element in set C


• Add new Term - adds an element in set T
• Offer a course in a term - adds an element in set CT
• A faculty is assigned to a course - adds an element in set CTF
• A student register in a course – adds an element in set CTS
• Faculty uploads course grades – modifies attribute values of elements in set CTS (for all
students registered in given course in given term)
• Result processing – computes and updates SPI and CPI of each student for current semester!

When data is data?


• For human mind, table below contains some data!
• But is it data for a computer too?
• If stored in Excel, it is more data than doc/html/pdf.
• If stored in any DBMS like PostgreSQL as relaion, then it is more data than stored in Excel.
• Why is it so?

Machine must be able to “refer” a data item by


name; for example “cpi of a student identified by
id=1234”.
This is difficult (for machine) when you have
data in text/html/doc/pdf?
When represented in excel, a data item is
locatable but not by name but by cell location.
This is why we say that it is less data for
machine than we represent in relation form in a
dbms like postgresql or mysql.

Point here is, for machines, data is data only


when we can identify an data item by name and
value.
Recap: Figure 4: when data is data?

• What is database? Database is a


collection of “relevant” “data”
• Database Representation.
• (1) Set of entities and their interaction instances!
• (2) Set of relations
• Few examples
• When data is data?

5 | Database Basics [pm jat @ daiict]


Database Characteristics
A modern database typically has following characteristics-
• Persistent
• Integrated and Shared
• Self describing through meta-data
• Authorized Access
• Simple interface to update (add/modify/delete) and query the database in logical manner –
SQL like query language.
• Accessed through a Database Management System that efficiently enables said
characteristics.
Persistent: database is stored on secondary storage like disks.
Integrated and shared: data of different users are stored in a single database
Self describing nature of Databases: database definition is not defined in applications (programs);
database itself contains its “structural” definition as a part of database and more importantly in the
form of data.

Authorized Access: Database is owned by some user; other users may be granted permission for
viewing or updating. It is also possible to grant partial access to a database, a user is allowed to
view a subset of a database.

Simple interface to update (add/modify/delete) and query the database: Though databases are
actually used on disk files, and often have complex file organization. User should not be required to
deal with such files; rather use should be able to view and manipulate databases as some logical
units like “entities” or “relations”. For working with databases, we have query language like SQL.

Database Constraints
Database constrains are basically “data existential rules”.
“Rules” that must hold true on data (values) in a database.
For example in DA-ACAD database
• StudentID is key attribute of student entity, i.e. no two student entities can have same value
for ID.
• A course offering necessarily need to have an instructor associated with; and exactly one
instructor.
• An elective can be taken only from respective domain, and so forth

Any data that violate such rules cannot exist in database; if do, then database is said to be in invalid
state; such a state is called as “inconsistent” state of database.

The interpretation of “Inconsistent”, here is database is not consistent with its constraints (rules)
Constraints are part of “database description”, referred as database schema, and any valid state of
database should satisfy these rules.

6 | Database Basics [pm jat @ daiict]


Database Instance, state, and Schema
The term “database” that we have learning so far, refers “database instance”. All sketches seen
so far depict database instance.

A snapshot of a database instance (values) is referred as database state. In any real system,
databases keep updating; and continuously changing its state.

We have another concept “database schema”.

Database Schema-
While database instance actually holds data; database schema describes “structures of
database”. Database structure, implicitly, also includes description of database constraints.

In other words, database schema contains description of database structure and its constraints.
• Structure describes
– What entities? Or what relations if represented in relational model.
– What “attribute values” for each entity (or relations)

• Constraints
– Defines what is “domain” for each attribute of an entity (or relation)?
– What is key attribute?
– What interaction it has with other entities?
is that interaction is mandatory?;
– What is cardinality of interaction?
That is, to how many entities “an entity of a type” might be interacting with; how
many other entities, an entity can associate with; and so forth.
Below are depiction of XIT database schema in “entity relation-ship representation” and relational
representation-

entity relation-ship representation relational representation

Figure 6: Schema of XIT database in Relational


Model
Figure 5: Schema of XIT database in ER Model

7 | Database Basics [pm jat @ daiict]


Also below is, XIT schema in SQL-DDL

Figure 7: XIT schema in SQL-DDL

Here we have three different description of database schema – ER, Relational, and SQL-DDL. Each
description serves some specific purpose.

• ER description is often used for documentation purpose.

• Relation description is used for documenting “implementation structure of database”.

• Schema in SQL DDL script, as a program, becomes input to a database management system
so that empty database instance is created on a computer system.

DBMS based Computing Architecture


There are certain issues while we perform database operations. First we take note of issues, and then
see how DBMS based computing architecture addresses these issues.

Issues/Concerns in performing database operations:

• Data are stored in disk file. Maintaining efficiently searchable data file is complex task.
Suppose, B+ tree is an efficient searching mechanism, implementing this on secondary storage
is complex. We would want this complex organization to be transparent from applications.
• “data manipulation” functionality, for example search based on key or some other attribute,
insert an entity, modify, delete, or so, is almost repeatable in every database application. We
would not want this to be part of application; instead want external and simply reused in every
application.
• We would want “data manipulation” functionality to be independent of any application that
accesses databases. So that it can be independently developed, improved, and maintained.
• Issues related to concurrent access by multiple users, authorized access, dealing with system
failures, or so

8 | Database Basics [pm jat @ daiict]


A DBMS based Computing Architecture was suggested to deal with said issues; figure below (from
elmasri/navathe) depicts the architecture. In this approach DBMS takes charge of performing all
manipulation operations while providing simple interface to the users.
When user submits database operation to DBMS
using interface language like SQL; it in turn
translates the request into file manipulation system
calls or instructions, after checking syntax etc.

DBMS stores data on disk files; also keeps schema


information of every database in its “dictionary” as
meta-data.

DBMS maintains sophisticated file organization and


indexes for accessing data in data file. Also DBMS
implements complex set of algorithms to perform
various user operations.

Book Elmasri/Navathe gives following definition of


DBMS-

“DBMS is general purpose software system that


facilitates the processes of defining, constructing, Figure 8: DBMS based Computing Architecture
manipulating, and sharing the databases among
various users and applications”

DBMS is a complex system; a figure below (Fig 11: DBMS-COMPONENTS) from book
Elmasri/Navathe depicts its various components and their roles.

DBMS (Database Management System) provides following functionality

• Persistent storage with “Data Abstraction”: Database is represented (and manipulated)


through some logical representation
• Data manipulation interface – through a query language.
• Ensures “database integrity”. DBMS would reject a manipulation operation on a database
that violates database constraint.
• Transaction Management: safe shared concurrent access by multiple users and recovers
from system failures.
• Database Security: Authorization based accesses

Downside of using DBMS?

• Requires more resources.


• Due to a translation and computing layer in between, performance gets down
• Deployment of application becomes complex.
• Due to these reasons, many popular proprietary applications do not (did not) use DBMSes;
reasons could be: performance, cost, proprietary, compact, etc.
• About a decade back mobile or other devices could not afford to use DBMS due to limited
resources. Today you have SQLite and similar product, though

9 | Database Basics [pm jat @ daiict]


Meta-Data: Database Schema as Data
One of the characteristics of database is self-describing. DBMS stored schema information of every
database in its “catalog” or data dictionary.

Schema Information as data is what we refer as Meta-Data, that is, Information about database data
in the form of data? Given below is snapshot of queried schema information from postgresql
catalogue. Query used to fetch this information is given in footnote 1

Figure 9: Database Metadata

Data Abstraction
What is Data Abstraction in general? Consider example below, hopefully illustrates the
notion? Observe how floating points are internally represented, and relevant operations are
actually performed. User is transparent to underlying binary representation, and procedure
of performing floating point operations.

There are two kind of transparency we have in “data abstraction”

• Representation Transparency (how data re represented)


• Operational Transparency (how operations are performed - procedure)

Programming languages do provide such typed abstraction over binary representation and
manipulation of data.

1
SELECT table_name, column_name, data_type FROM information_schema.columns
WHERE table_name in (select tablename from pg_tables where schemaname='xit')
and table_schema = 'xit';

10 | Database Basics [pm jat @ daiict]


Figure 10: Programing Languages provide typed abstraction over binary representation of floating point numbers

Figure 11: DBMS Components

Data Abstraction by DBMS


Same notion of data abstraction is applied on relations. DBMS provides -

• Representational Transparency: allows representing database in some logical view while


hiding how data are stored in disks on file system.

11 | Database Basics [pm jat @ daiict]


• Operational Transparency: enables performing operations on database without letting us
delved in underlying complex algorithms.

For example, Relational is a popular database representation and manipulation model. RDBMS
is DBMS based on Relational model, and provides representational and operational
transparencies.

• Defining and manipulating database as relations or table while data are actually stored on
disks.
• Perform various manipulation operations on relations in logical manner (rather than how
actually they are performed by RDBMS). Obviously DBMS uses sophisticated algorithms
for performing database manipulation operations (add/modify/delete, and query/search)

Figure 12: Relational abstraction

RDBMS provide SQL support for manipulating databases. Database is manipulated as tables, rather
than data structure on disk files.

This way RDBMS hides complexity of performing operations. Below are some SQL statements for
certain database operations-

INSERT INTO STUDENT VALUE(‘201001123’, ‘Ankit’, …);

UPDATE REGISTERS SET GRADE = ‘AB’ WHERE STUDENT_ID = ‘201001123’


AND COURSE_NO = ‘IT321’ AND ACAD_YEAR = 2012;

SELECT * FROM STUDENTS WHERE CPI >=8.0 AND BATCH = 2009;

Actual algorithms that perform these operations are quiet complex. They work on disk files
corresponding to concerned relations.

12 | Database Basics [pm jat @ daiict]


Three Schema Architecture
American National Standards Institute (ANSI) and Standards Planning and Requirements
Committee (SPARC) identify three levels of defining database schema (structures).
DBMS are expected to be support three schema level implementation of databases.
(1) External Schema or User schema:
• Different users might see different subset of database, and see a schema defined in
independent technique.
• Few examples:
o One such example is external user might see relational view of some subset of
legacy system
o A user might see object view of database while having relational implementation
• Other use of external schema could be hiding actual database schema.
(2) Implementation Level/Logical Level Schema:
• At this level we represent and manipulate database in terms of some logical entities.
• This provides much simpler (abstract) view of databases to work with.

• At this level schema defines structural information of database elements (relation,


object, or so)
• Database constraints are also defined
• The model used (for instance relational) defines sets of primitives for describing schema
at this level.
(3) Physical Level (internal):
• Lowest Level. Physical, that is Disk/Storage level. Recall that we have some logical
representation of data and work with database in that model. However data are actually
needed to be stored on disk.
• Data are stored on Complex File Structures and often searches are enhanced by
providing access path like B+-Tree, hashing, or so.
• Internal Schema describes physical storage structures and access paths.
• Note that actual data resides at physical level only.
Diagram below from book elmasri/navathe depicts the architecture

13 | Database Basics [pm jat @ daiict]


Why three schema architecture?
To have data independence at three identified levels. We need independence so that schema
can be “independently” modified/improved at lower level, without affecting schema at
higher level.
Schema at lower level may be altered to (1) provide efficient representation of data, and (2)
improve upon algorithms that manipulate data. Note that actual data lives only at physical
level and that is where operations are executed.
How independence is accomplished?
Often database operations are initiated by user at external schema level; whereas actual
execution happens at internal schema level (data are actually stored at physical level)
A data reference at external schema should map to data at physical level. Therefore DBMS
maintains “schema mappings” from higher level schema to lower level schema.
This mapping actually helps in accomplishing “data independence”.
If any change in schema at lower level happens, appropriate changes are made in mapping!

Downside of three schema architecture?


To reach a data item referred by user at external level to corresponding data at physical level
requires two stage resolutions through mappings.
This may become too much of computational overhead and takes in executing a database
operation (query).
Also, there would be some data representational transformations are happening for
appropriate schema model at respective level.
Due to these reasons, most DBMS, RDBMS at least, do not implement full three-schema
architecture, and typically do not separate schema at external level.
You can emulate though by creating wrappers on top it. For instance there are lots of Object
Relational Mapping tools are used in the industry, and act as bridge between relations and
objects.

14 | Database Basics [pm jat @ daiict]

You might also like