Database architecture refers to the structure or layout of a database system. It typically consists of three levels: Internal Level (Physical Level): The lowest level that describes how the data is physically stored on storage devices. It involves data structures like indexes, pointers, and storage methods. Conceptual Level (Logical Level): Represents the entire database as a whole. It provides a unified view of all data in the database without considering how the data is stored. It defines entities, attributes, relationships, and constraints. External Level (View Level): The highest level of abstraction that provides individual users or applications with customized views of the data. Each view contains the data of interest to a specific user group. 2. Explain various cardinality constraints through ER diagram. Cardinality constraints specify the number of instances of one entity that can be associated with instances of another entity. Types include: One-to-One (1:1): For every entity in A, there is at most one entity in B. Example: A person has a unique passport. One-to-Many (1 ): An entity in A can be associated with multiple entities in B. Example: A customer can place multiple orders. Many-to-One (N:1): Multiple entities in A can be associated with a single entity in B. Example: Many students belong to one department. Many-to-Many (M ): Entities in A can be associated with multiple entities in B and vice versa. Example: Students can enroll in multiple courses, and courses can have multiple students. 3. Draw an ER diagram for an online sales system for which a customer can order online & pay through a credit card. The ER diagram should include the following entities and relationships: Entities: Customer (Attributes: CustomerID, Name, Email, Address) Order (Attributes: OrderID, Date, Status) Product (Attributes: ProductID, Name, Price, Stock) Payment (Attributes: PaymentID, Amount, PaymentDate) Relationships: Places: Customer to Order (One-to-Many) Contains: Order to Product (Many-to-Many) Processes: Payment to Order (One-to-One) 4. Explain various database design issues. Common database design issues include: Data Redundancy and Inconsistency: Storing duplicate data across multiple locations can lead to data inconsistency. Normalization is used to minimize redundancy. Data Integrity: Ensures that data is accurate and consistent. Constraints like primary key, foreign key, and unique constraints help maintain integrity. Scalability: The design should accommodate future growth in data volume and user base without significant redesign. Security: Proper access controls and encryption mechanisms are needed to protect sensitive data from unauthorized access. Data Independence: Changes in the schema should not affect the application using the database.
Assignment 2:
1. Explain Codd’s Rules.
Codd's 12 rules, formulated by Edgar F. Codd, define the characteristics of a relational database management system (RDBMS): Rule 0: The system must qualify as a relational database. Rule 1 (Information Rule): All data should be represented as tables. Rule 2 (Guaranteed Access Rule): Each data item is accessible using a unique identifier. Rule 3 (Systematic Treatment of NULLs): Supports representation of missing information. Rule 4 (Active Online Catalog): Metadata should be stored as tables. Rule 5 (Comprehensive Data Sublanguage): The system should support a comprehensive language for data manipulation. Rule 6 (View Updating Rule): Views should be updatable. Rule 7 (High-level Insert, Update, and Delete): Should support set-based operations. Rule 8 (Physical Data Independence): Changes to storage do not affect application. Rule 9 (Logical Data Independence): Changes to logical structure do not affect applications. Rule 10 (Integrity Independence): Integrity constraints should be stored in the catalog. Rule 11 (Distribution Independence): Applications should work regardless of the database's distribution. Rule 12 (Non-subversion Rule): Cannot bypass integrity rules. 2. What is normalization of Database? Explain BCNF. Normalization is the process of organizing database tables to reduce redundancy and improve data integrity. It involves decomposing tables into smaller tables without losing data. Boyce-Codd Normal Form (BCNF): A table is in BCNF if, for every functional dependency X → Y, X is a superkey (an attribute or combination of attributes that uniquely identifies a row). 3. Explain referential integrity constraints. Referential integrity ensures that relationships between tables remain consistent. It requires that if a foreign key in one table points to a primary key in another, then the value of the foreign key must match an existing primary key or be NULL. 4. Explain various types of functional dependencies. Functional dependencies express the relationship between attributes: Full Functional Dependency: Attribute Y is fully dependent on attribute X, meaning X determines Y. Partial Dependency: Y depends on only a part of a composite key. Transitive Dependency: Attribute X determines Y, and Y determines Z, thus X indirectly determines Z.
Assignment 3:
1. Describe ACID properties of a transaction.
The ACID properties are a set of principles that ensure reliable processing of database transactions: Atomicity: This property ensures that all operations within a transaction are completed; if any operation fails, the entire transaction fails and the database state remains unchanged. It guarantees that the transaction is indivisible and will either succeed completely or have no effect. Consistency: The database must always be in a consistent state before and after the transaction. Any changes made by the transaction should bring the database from one valid state to another while maintaining all defined rules, such as data integrity constraints (foreign keys, unique constraints, etc.). Isolation: Ensures that transactions are executed independently without interference. This means the intermediate state of a transaction is not visible to other transactions. The level of isolation can vary from serializable, where transactions are fully isolated, to read uncommitted, where there is no isolation. Durability: Once a transaction is committed, its effects are permanent, even in the case of a system crash or power failure. The changes made to the database are saved and can be recovered if needed. 2. What is Serializability? Explain types of Serializability. Serializability is the property of a concurrent schedule of transactions that guarantees the same effect as if the transactions were executed in a serial (one after another) order. It ensures consistency by avoiding conflicts. Types of Serializability: Conflict Serializability: Two schedules are considered conflict-serializable if they can be converted to a serial schedule by swapping non-conflicting operations. This means that transactions' order can be rearranged without changing the final outcome. View Serializability: A schedule is view-serializable if it produces the same results as a serial schedule. This considers the read and write operations, ensuring that transactions view consistent data. 3. Explain different concurrency control protocols in DBMS. Concurrency control protocols are used to ensure the ACID properties during concurrent transaction execution: Lock-Based Protocols: Use locks on data items to control access. Locks can be: Shared Lock (S-lock): Allows multiple transactions to read a data item. Exclusive Lock (X-lock): Prevents other transactions from accessing a data item while a transaction modifies it. Techniques include two-phase locking (2PL), which has two phases: acquiring locks (growing phase) and releasing locks (shrinking phase). Timestamp-Based Protocols: Each transaction is assigned a timestamp to determine the order of execution. Older transactions get higher priority to avoid conflicts. If a newer transaction tries to write to a data item already accessed by an older transaction, it is aborted (known as Thomas's write rule). Validation-Based Protocols: Transactions are validated before committing to ensure no conflicts. It consists of three phases: Read Phase: Transaction reads data without making any changes. Validation Phase: Checks for conflicts with other transactions. Write Phase: Writes changes to the database if validation is successful. 4. Explain the concept of Log-based recovery & Shadow Paging. Recovery techniques ensure data consistency and durability in the case of a system failure. Log-Based Recovery: Uses logs to track changes made during a transaction. The log contains records for every operation, including a unique transaction ID, the data item accessed, and the old and new values. Redo Logging: Re-applies changes to ensure committed transactions are reflected in the database. Undo Logging: Reverts changes made by uncommitted transactions to maintain consistency. Shadow Paging: Maintains two copies of the database: the current page table and the shadow page table. The shadow page table points to the stable copy of the database, while the current page table is modified during transactions. When a transaction commits, the current page table becomes the shadow page table, and changes are finalized. This technique avoids log writing but can be less efficient for large databases. Assignment 4:
1. Explain the types of Distributed Database Systems.
Distributed databases store data across multiple physical locations and can be categorized into: Homogeneous Distributed Database Systems: All sites use the same database management system (DBMS), schema, and data models. It allows easier data sharing but lacks flexibility when different systems are required. Heterogeneous Distributed Database Systems: Different sites may use different DBMSs, schemas, or data models. They provide flexibility to integrate diverse systems but require complex data integration and conversion methods. Federated Database Systems: A type of heterogeneous system where multiple independent databases are integrated, maintaining autonomy. Federated systems can be either loosely or tightly coupled, depending on how much control and data integration exists. 2. What is CAP theorem? Explain. The CAP theorem states that a distributed database system can only guarantee at most two of the following three properties: Consistency (C): All nodes in the system see the same data simultaneously. Availability (A): Every request made to the system gets a response, regardless of the state of the system. Partition Tolerance (P): The system continues to function despite network partitioning (communication breakdown between nodes). In practice, systems must make trade-offs based on which two properties they prioritize. For example, traditional relational databases often favor consistency and availability over partition tolerance. 3. What is NoSQL? Explain advantages & disadvantages of NoSQL. NoSQL databases are designed to handle unstructured, semi-structured, or large-scale data that traditional relational databases struggle with. Advantages: Scalability: Supports horizontal scaling (adding more servers to distribute the load). Flexibility: Handles various data types (structured, semi-structured, unstructured). Performance: Faster read/write operations for large-scale data. Schema-less: Allows dynamic data structure changes. Disadvantages: Limited ACID compliance: NoSQL databases often sacrifice consistency for availability and partition tolerance. Lack of standardization: No unified query language (like SQL). Learning curve: Developers must learn new data models and query languages. 4. Write basic commands of MongoDB & explain its datatypes. MongoDB is a NoSQL document-oriented database that stores data in JSON-like format. Basic Commands: Insert Data: db.collection.insertOne({ name: "Alice", age: 25 }) Query Data: db.collection.find({ age: { $gt: 20 } }) Update Data: db.collection.updateOne({ name: "Alice" }, { $set: { age: 26 } }) Delete Data: db.collection.deleteOne({ name: "Alice" }) Data Types: String: "Alice" Number (int, double, etc.): 25 Boolean: true or false Array: [ "red", "blue", "green" ] Object (Embedded Document): { address: { city: "New York", zip: "10001" } } Date: Stores date-time values. Binary Data: For storing images, audio, etc.
Assignment 5:
1. Explain the types of Distributed Database Systems.
Similar to the explanation in Assignment 4. Distributed database systems can be homogeneous, heterogeneous, or federated based on the level of uniformity and integration. 2. What is JSON? Explain with syntax various data types of JSON. JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON Syntax Example: json Copy code { "name": "John Doe", "age": 30, "isStudent": false, "courses": ["Math", "Science"], "address": { "city": "New York", "zipcode": "10001" } } Data Types in JSON: String: "Hello, World!" Number: 123 (can be integer or floating-point) Object: { "key": "value" } Array: [1, 2, 3, 4] Boolean: true or false Null: Represents an empty value, null 3. What is XML database & its types? An XML database stores data in the XML format, providing a flexible way to structure and query data. Types of XML Databases: Native XML Database: Designed specifically to store XML documents and uses XML structures as a storage mechanism. It allows querying using XQuery. XML-Enabled Database: Uses a relational or object-oriented database to store XML data. The XML data is mapped to relational tables and queried using SQL. 4. Explain Object-Oriented Data Model. The object-oriented data model integrates object-oriented programming concepts into database design. Basic Concepts: Objects: Instances of classes that represent entities with attributes (data) and methods (functions). Classes: Templates or blueprints for creating objects, defining attributes and behaviors. Inheritance: Enables sharing and extending attributes and methods from parent to child classes. Encapsulation: Bundles data and methods that manipulate the data, restricting access. Polymorphism: Allows methods to behave differently based on the object class. Advantages: Supports complex data types and improves code reusability. Disadvantages: Can be less intuitive for simple data models and may require additional tools for implementation.