Nothing Special   »   [go: up one dir, main page]

Unit 4 - File Systems in Operating System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Unit 4- File Systems in Operating System

A file system is a method an operating system uses to store, organize, and


manage files and directories on a storage device.
Some common types of file systems include:
• FAT (File Allocation Table): An older file system used by older versions of
Windows and other operating systems.
• NTFS (New Technology File System): A modern file system used by
Windows. It supports features such as file and folder permissions,
compression, and encryption.
• ext (Extended File System): A file system commonly used on Linux and
Unix-based operating systems.
• HFS (Hierarchical File System): A file system used by macOS.
• APFS (Apple File System): A new file system introduced by Apple for
their Macs and iOS devices.

The advantages of using a file system


• Organization: A file system allows files to be organized into directories
and subdirectories, making it easier to manage and locate files.
• Data protection: File systems often include features such as file and
folder permissions, backup and restore, and error detection and
correction, to protect data from loss or corruption.
• Improved performance: A well-designed file system can improve the
performance of reading and writing data by organizing it efficiently on
disk.

Disadvantages of using a file system


• Compatibility issues: Different file systems may not be compatible with
each other, making it difficult to transfer data between different
operating systems.
• Disk space overhead: File systems may use some disk space to store
metadata and other overhead information, reducing the amount of
space available for user data.
• Vulnerability: File systems can be vulnerable to data corruption,
malware, and other security threats, which can compromise the stability
and security of the system.
A file is a collection of related information that is recorded on secondary
storage. Or file is a collection of logically related entities. From the user’s
perspective, a file is the smallest allotment of logical secondary storage.

The name of the file is divided into two parts as shown below:
• name
• Extension, separated by a period.

Files Attributes And Their Operations


Attributes Types Operations
Name Doc Create
Type Exe Open
Size Jpg Read
Creation Data Xis Write
Author C Append
Last Modified Java Truncate
protection class Delete
Close

File type Usual extension Function


Read to run machine
Executable exe, com, bin
language program
Compiled, machine
Object obj, o
language not linked
Source code in various
Source Code C, java, pas, asm, a
languages
Commands to the
Batch bat, sh
command interpreter
Textual data,
Text txt, doc
documents
Various word processor
Word Processor wp, tex, rrf, doc
formats
Related files grouped
Archive arc, zip, tar into one compressed
file
For containing
Multimedia mpeg, mov, rm audio/video
information
It is the textual data
Markup xml, html, tex
and documents
It contains libraries of
Library lib, a ,so, dll routines for
programmers
It is a format for
Print or View gif, pdf, jpg printing or viewing an
ASCII or binary file.
For containing
Multimedia mpeg, mov, rm audio/video
information
It is the textual data
Markup xml, html, tex
and documents

FILE ALLOCATION METHODS

1. Continuous Allocation

In this scheme, each file occupies a contiguous set of blocks on the disk. For
example, if a file requires n blocks and is given a block b as the starting
location, then the blocks assigned to the file will be: b, b+1, b+2,……b+n-
1. This means that given the starting block address and the length of the file
(in terms of blocks required), we can determine the blocks occupied by the
file.
The directory entry for a file with contiguous allocation contains
• Address of starting block
• Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6
blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
• Both the Sequential and Direct Accesses are supported by this. For direct
access, the address of the kth block of the file which starts at block b can
easily be obtained as (b+k).
• This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
• This method suffers from both internal and external fragmentation. This
makes it inefficient in terms of memory utilization.
• Increasing file size is difficult because it depends on the availability of
contiguous memory at a particular instance.

2. Linked Allocation (Non-contiguous allocation)


In this scheme, each file is a linked list of disk blocks which need not
be contiguous. The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file
block. Each block contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly
distributed. The last block (25) contains -1 indicating a null pointer and does
not point to any other block.

Advantages:
• This is very flexible in terms of file size. File size can be increased easily
since the system does not have to look for a contiguous chunk of
memory.
• This method does not suffer from external fragmentation. This makes it
relatively better in terms of memory utilization.

Disadvantages:
• Because the file blocks are distributed randomly on the disk, a large
number of seeks are needed to access every block individually. This
makes linked allocation slower.
• It does not support random or direct access. We cannot directly access
the blocks of a file. A block k of a file can be accessed by traversing k
blocks sequentially (sequential access) from the starting block of the file
via block pointers.
• Pointers required in the linked allocation incur some extra overhead.

3. Indexed Allocation
In this scheme, a special block known as the Index block contains the
pointers to all the blocks occupied by a file. Each file has its own index block.
The ith entry in the index block contains the disk address of the ith file block.
The directory entry contains the address of the index block as shown in the
image:
Advantages:
• This supports direct access to the blocks occupied by the file and
therefore provides fast access to the file blocks.
• It overcomes the problem of external fragmentation.

Disadvantages:
• The pointer overhead for indexed allocation is greater than linked
allocation.
• For very small files, say files that expand only 2-3 blocks, the indexed
allocation would keep one entire block (index block) for the pointers which
is inefficient in terms of memory utilization. However, in linked allocation
we lose the space of only 1 pointer per block.

FILE DIRECTORIES

The collection of files is a file directory. The directory contains information


about the files, including attributes, location, and ownership. Much of this
information, especially which is concerned with storage, is managed by the
operating system. The directory is itself a file, accessible by various file
management routines.

Information contained in a device directory is:


• Name
• Type
• Address
• Current length
• Maximum length
• Date last accessed
• Date last updated
• Owner id
• Protection information

The operation performed on the directory are:

• Search for a file


• Create a file
• Delete a file
• List a directory
• Rename a file
• Traverse the file system

The advantages of maintaining directories are:

• Efficiency: A file can be located more quickly.


• Naming: It becomes convenient for users as two users can have same
name for different files or may have different name for same file.
• Grouping: Logical grouping of files can be done by properties e.g. all java
programs, all games etc.

A directory is a container that is used to contain folders and files. It


organizes files and folders in a hierarchical manner.

Following are the logical structures of a directory, each providing a solution


to the problem faced in previous type of directory structure.
1) Single-level directory:

The single-level directory is the simplest directory structure. In it, all files
are contained in the same directory which makes it easy to support and
understand.
A single level directory has a significant limitation, however, when the
number of files increases or when the system has more than one user. Since
all the files are in the same directory, they must have a unique name. If two
users call their dataset test, then the unique name rule violated.

Advantages:
• Since it is a single directory, so its implementation is very easy.
• If the files are smaller in size, searching will become faster.
• The operations like file creation, searching, deletion, updating are very
easy in such a directory structure.
• Logical Organization: Directory structures help to logically organize files
and directories in a hierarchical structure. This provides an easy way to
navigate and manage files, making it easier for users to access the data
they need.
• Increased Efficiency: Directory structures can increase the efficiency of
the file system by reducing the time required to search for files. This is
because directory structures are optimized for fast file access, allowing
users to quickly locate the file they need.
• Improved Security: Directory structures can provide better security for
files by allowing access to be restricted at the directory level. This helps
to prevent unauthorized access to sensitive data and ensures that
important files are protected.
• Facilitates Backup and Recovery: Directory structures make it easier to
backup and recover files in the event of a system failure or data loss. By
storing related files in the same directory, it is easier to locate and backup
all the files that need to be protected.
• Scalability: Directory structures are scalable, making it easy to add new
directories and files as needed. This helps to accommodate growth in the
system and makes it easier to manage large amounts of data.
Disadvantages:
• There may chance of name collision because two files can have the same
name.
• Searching will become time taking if the directory is large.
• This cannot group the same type of files together.

2) Two-level directory:
As we have seen, a single level directory often leads to confusion of files
names among different users. The solution to this problem is to create
a separate directory for each user.
In the two-level directory structure, each user has their own user files
directory (UFD). The UFDs have similar structures, but each lists only the
files of a single user. System’s master file directory (MFD) is searched
whenever a new user id is created.

Two-Levels Directory Structure

Advantages:
• The main advantage is there can be more than two files with same name,
and would be very helpful if there are multiple users.
• A security would be there which would prevent user to access other
user’s files.
• Searching of the files becomes very easy in this directory structure.
Disadvantages:
• As there is advantage of security, there is also disadvantage that the user
cannot share the file with the other users.
• Unlike the advantage users can create their own files, users don’t have
the ability to create subdirectories.
• Scalability is not possible because one use can’t group the same types of
files together.

3) Tree Structure/ Hierarchical Structure:

Tree directory structure of operating system is most commonly used in


our personal computers. User can create files and subdirectories too,
which was a disadvantage in the previous directory structures.
This directory structure resembles a real tree upside down, where the root
directory is at the peak. This root contains all the directories for each user.
The users can create subdirectories and even store files in their directory.
A user do not have access to the root directory data and cannot modify it.
And, even in this directory the user do not have access to other user’s
directories. The structure of tree directory is given below which shows how
there are files and subdirectories in each user’s directory.

Tree/Hierarchical Directory Structure

Advantages:
• This directory structure allows subdirectories inside a directory.
• The searching is easier.
• File sorting of important and unimportant becomes easier.
• This directory is more scalable than the other two directory structures
explained.
Disadvantages:
• As the user isn’t allowed to access other user’s directory, this prevents
the file sharing among users.
• As the user has the capability to make subdirectories, if the number of
subdirectories increase the searching may become complicated.
• Users cannot modify the root directory data.
• If files do not fit in one, they might have to be fit into other directories.

4) Acyclic Graph Structure:


As we have seen the above three directory structures, where none of them
have the capability to access one file from multiple directories. The file or the
subdirectory could be accessed through the directory it was present in, but
not from the other directory.
This problem is solved in acyclic graph directory structure, where a file in one
directory can be accessed from multiple directories. In this way, the files
could be shared in between the users. It is designed in a way that multiple
directories point to a particular directory or file with the help of links.
In the below figure, this explanation can be nicely observed, where a file is
shared between multiple users. If any user makes a change, it would be
reflected to both the users.

Acyclic Graph Structure

Advantages:
• Sharing of files and directories is allowed between multiple users.
• Searching becomes too easy.
• Flexibility is increased as file sharing and editing access is there for
multiple users.
Disadvantages:
• Because of the complex structure it has, it is difficult to implement this
directory structure.
• The user must be very cautious to edit or even deletion of file as the file is
accessed by multiple users.
• If we need to delete the file, then we need to delete all the references of
the file in order to delete it permanently.

File Access Methods


When a file is used, information is read and accessed into computer memory
and there are several ways to access this information of the file. Some
systems provide only one access method for files. Other systems, such as
those of IBM, support many access methods, and choosing the right one for
a particular application is a major design problem.
There are three ways to access a file into a computer system: Sequential-
Access, Direct Access, Index sequential Method.
1. Sequential Access –
It is the simplest access method. Information in the file is processed in
order, one record after the other. This mode of access is by far the most
common; for example, editor and compiler usually access the file in this
fashion.
Read and write make up the bulk of the operation on a file. A read
operation -read next- read the next position of the file and automatically
advance a file pointer, which keeps track I/O location. Similarly, for the -
write next- append to the end of the file and advance to the newly written
material.
• It uses lexicographic order to quickly access the next entry.
• It is suitable for applications that require access to all records in a file, in a
specific order.
• It is less prone to data corruption as the data is written sequentially and
not randomly.
• It is a more efficient method for reading large files, as it only reads the
required data and does not waste time reading unnecessary data.
• It is a reliable method for backup and restore operations, as the data is
stored sequentially and can be easily restored if required.
Disadvantages of Sequential Access Method :
• If the file record that needs to be accessed next is not present next to the
current record, this type of file access method is slow.
• Moving a sizable chunk of the file may be necessary to insert a new
record.
• It does not allow for quick access to specific records in the file. The entire
file must be searched sequentially to find a specific record, which can be
time-consuming.
• It is not well-suited for applications that require frequent updates or
modifications to the file. Updating or inserting a record in the middle of a
large file can be a slow and cumbersome process.
• Sequential access can also result in wasted storage space if records are
of varying lengths. The space between records cannot be used by other
records, which can result in inefficient use of storage.

2. Direct Access –
Another method is direct access method also known as relative access
method. A fixed-length logical record that allows the program to read
and write record rapidly. in no particular order. The direct access is
based on the disk model of a file since disk allows random access to
any file block. For direct access, the file is viewed as a numbered
sequence of block or record. Thus, we may read block 14 then block
59, and then we can write block 17. There is no restriction on the order
of reading and writing for a direct access file.
A block number provided by the user to the operating system is
normally a relative block number, the first relative block of the file is 0
and then 1 and so on.

Advantages of Direct Access Method :


• The files can be immediately accessed decreasing the average access
time.
• In the direct access method, in order to access a block, there is no need
of traversing all the blocks present before it.

3. Index sequential method –


It is the other method of accessing a file that is built on the top of the
sequential access method. These methods construct an index for the
file. The index, like an index in the back of a book, contains the pointer
to the various blocks. To find a record in the file, we first search the
index, and then by the help of pointer we access the file directly.

4. Relative Record Access –


Relative record access is a file access method used in operating systems
where records are accessed relative to the current position of the file pointer.
In this method, records are located based on their position relative to the
current record, rather than by a specific address or key value.
Advantages of Relative Record Access:
Random Access: Relative record access allows random access to records in
a file. The system can access any record at a specific offset from the current
position of the file pointer.
Efficient Retrieval: Since the system only needs to read the current record
and any records that need to be skipped, relative record access is more
efficient than sequential access for accessing individual records.
Useful for Sequential Processing: Relative record access is useful for
processing records in a specific order. For example, if the records are sorted
in a specific order, the system can access the next or previous record
relative to the current position of the file pointer.
Disadvantages of Relative Record Access:
Fixed Record Length: Relative record access requires fixed-length records. If
the records are of varying length, it may be necessary to use padding to
ensure that each record is the same length.
Limited Flexibility: Relative record access is not very flexible. It is difficult to
insert or delete records in the middle of a file without disrupting the relative
positions of other records.
Limited Application: Relative record access is best suited for files that are
accessed sequentially or with some regularity, but it may not be appropriate
for files that are frequently updated or require random access to specific
records.
5. Content Addressable Access-
Content-addressable access (CAA) is a file access method used in operating
systems that allows records or blocks to be accessed based on their content
rather than their address. In this method, a hash function is used to calculate
a unique key for each record or block, and the system can access any record
or block by specifying its key.
Advantages of Content-Addressable Access:
Efficient Search: CAA is ideal for searching large databases or file systems
because it allows for efficient searching based on the content of the records
or blocks.
Flexibility: CAA is more flexible than other access methods because it allows
for easy insertion and deletion of records or blocks.
Data Integrity: CAA ensures data integrity because each record or block has
a unique key that is generated based on its content.
Disadvantages of Content-Addressable Access:
Overhead: CAA requires additional overhead because the hash function
must be calculated for each record or block.
Collision: There is a possibility of collision where two records or blocks can
have the same key. This can be minimized by using a good hash function,
but it cannot be completely eliminated.
Limited Key Space: The key space is limited by the size of the hash function
used, which can lead to collisions and other issues.

Disk Structure
The actual physical details of a modern hard disk may be quite complicated.
Simply, there are one or more surfaces, each of which contains several tracks,
each of which is divided into sectors.

There is one read/write head for every surface of the disk. Also, the same track
on all surfaces is known as a cylinder, when talking about movement of the
read/write head, the cylinder is a useful concept, because all the heads (one for
each surface), move in and out of the disk together.

We say that the “read/write head is at cylinder #2", when we mean that the
top read/write head is at track #2 of the top surface, the next head is at track
#2 of the next surface, the third head is at track #2 of the third surface, etc.

The unit of information transfer is the sector (though often whole tracks may
be read and written, depending on the hardware). As far as most file-systems
are concerned, though, the sectors are what matter. In fact, we usually talk
about a 'block device'. A block often corresponds to a sector, though it need
not do, several sectors may be aggregated to form a single logical block.

You might also like