Unit 3 - Files
Unit 3 - Files
Unit 3 - Files
In this chapter we learn how to create, open, read, write, and close files.
Programs can use disk files, serial ports, printers and other devices in the exactly the same way
as they would use a file.
Most files on a UNIX system are regular files or directories, but there are additional types of
files:
1. Regular files: The most common type of file, which contains data of some form. There
is no distinction to the UNIX kernel whether this data is text or binary.
2. Directory file: A file contains the names of other files and pointers to information on
these files. Any process that has read permission for a directory file can read the contents
of the directory, but only the kernel can write to a directory file.
3. Character special file: A type of file used for certain types of devices on a system.
4. Block special file: A type of file typically used for disk devices. All devices on a
system are either character special files or block special files.
5. FIFO: A type of file used for interprocess communication between processes. It’s
sometimes called a named pipe.
6. Socket: A type of file used for network communication between processes. A socket
can also be used for nonnetwork communication between processes on a single host.
7. Symbolic link: A type of file that points to another file.
S_ISSOCK() Socket
A user, neil, usually has his files stores in a 'home' directory, perhaps /home/neil.
3.4.1 Inodes
File attributes
Attribute value meaning
File type type of the file
Access permission file access permission for owner, group and others
1. Every process has an entry in the process table. Within each process table entry is a table
of open file descriptors, which is taken as a vector, with one entry per descriptor.
Associated with each file descriptor are
a. The file descriptor flags.
b. A pointer to a file table entry.
2. The kernel maintains a file table for all open files. Each file table entry contains
a. The file status flags for the file(read, write, append, sync, nonblocking, etc.),
b. The current file offset,
c. A pointer to the v-node table entry for the file.
3. Each open file (or device) has a v-node structure. The v-node contains information about
the type of file and pointers to functions that operate on the file. For most files the v-
node also contains the i-node for the file. This information is read from disk when the
file is opened, so that all the pertinent information about the file is readily available.
The arrangement of these three tables for a single process that has two different files open
one file is open on standard input (file descriptor 0) and the other is open standard output
(file descriptor 1).
Here, the first process has the file open descriptor 3 and the second process has file open
descriptor 4. Each process that opens the file gets its own file table entry, but only a single v-
node table entry. One reason each process gets its own file table entry is so that each process has
its own current offset for the file.
After each ‘write’ is complete, the current file offset in the file table entry is incremented
by the number of bytes written. If this causes the current file offset to exceed the current
file size, the current file size, in the i-node table the entry is to the current file offset(Ex:
file is extended).
If a file is opened with O_APPEND flag, a corresponding flag is set in the file status flags
of the file table entry. Each time a ‘write’ is performed for a file with this append flag
set, the current file offset in the file table entry is first set to the current file size from the
i-node table entry. This forces every ‘write’ to be appended to the current end of file.
The ‘lseek’ function only modifies the current offset in the file table entry. No I/O table
place.
If a file is positioned to its current end of file using lseek, all that happens is the current
file offset in the file table entry is set to the current file size from the i-node table entry.
It is possible for more than a descriptor entry to point to the same file table only. The file
descriptor flag is linked with a single descriptor in a single process, while file status flags are
descriptors in any process that point to given file table entry.
System calls are provided by UNIX to access and control files and devices.
Library Functions
To provide a higher level interface to device and disk files, UNIIX provides a number of
standard libraries.
Low-level File Access
Each running program, called a process, has associated with it a number of file descriptors.
When a program starts, it usually has three of these descriptors already opened. These are:
The write system call arranges for the first nbytes bytes from buf to be written to the file
associated with the file descriptor fildes.
$ simple_write
Here is some data
$
read
The read system call reads up to nbytes of data from the file associated with the file
decriptor fildes and places them in the data area buf.
This program, simple_read.c, copies the first 128 bytes of the standard input to the standard
output.
open
To create a new file descriptor we need to use the open system call.
The name of the file or device to be opened is passed as a parameter, path, and
the oflags parameter is used to specify actions to be taken on opening the file.
The oflags are specified as a bitwise OR of a mandatory file access mode and other optional
modes. The open call must specify one of the following file access modes:
The call may also include a combination (bitwise OR) of the following optional modes in
the oflags parameter:
Initial Permissions
When we create a file using the O_CREAT flag with open, we must use the three parameter
form. mode, the third parameter, is made form a bitwise OR of the flags defined in the header
file sys/stat.h. These are:
For example
Has the effect of creating a file called myfile, with read permission for the owner and execute
permission for others, and only those permissions.
umask
The umask is a system variable that encodes a mask for file permissions to be used when a file is
created.
You can change the variable by executing the umask command to supply a new value.
The value is a three-digit octal value. Each digit is the results of ANDing values from 1, 2, or 4.
For example, to block 'group' write and execute, and 'other' write, the umask would be:
Values for each digit are ANDed together; so digit 2 will have 2 & 1, giving 3. The
resulting umask is 032.
close
We use close to terminate the association between a file descriptor, fildes, and its file.
ioctl
ioctl is a bit of a rag-bag of things. It provides an interface for controlling the behavior of
devices, their descriptors and configuring underlying services.
ioctl performs the function indicated by cmd on the object referenced by the descriptor fildes.
We now know enough about the open, read and write system calls to write a low-level
program, copy_system.c, to copy one file to another, character by character.
We can improve by copying in larger blocks. Here is the improved copy_block.c program.
Now try the program, first removing the old output file:
The revised program took under two seconds to do the copy.
Here are some system calls that operate on these low-level file descriptors.
lseek
The lseek system call sets the read/write pointer of a file descriptor, fildes. You use it to set
where in the file the next read or write will occur.
The offset parameter is used to specify the position and the whence parameter specifies how the
offset is used.
The dup system calls provide a way of duplicating a file descriptor, giving two or more, different
descriptors that access the same file.
The members of the structure, stat, may vary between UNIX systems, but will include:
The permissions flags are the same as for the open system call above. File-type flags include:
Other mode flags include:
There are some macros defined to help with determining file types. These include:
To test that a file doesn't represent a directory and has execute permisson set for the owner and
no other permissions, we can use the test:
3.7 File and record locking-fcntl function
• Write lock is also called a exclusive lock and read lock is also called a shared lock.
• fcntl API can be used to impose read or write locks on either a segment or an entire file.
• Function prototype:
#include<fcntl.h>
• All file locks set by a process will be unlocked when the process terminates.
You can change the permissions on a file or directory using the chmod system call. Tis forms the
basis of the chmod shell program.
3.9 chown
A superuser can change the owner of a file using the chown system call.
Soft link(symbolic links):Refer to a symbolic path indicating the abstract location of another
file.
Used to provide alternative means of referencing files.
Users may create links for files using ln command by specifying –s option.
hard links : Refer to the specific location of physical data.
A hard link is a UNIX path name for a file.
Most of the files have only one hard link. However users may create additional hard links for
files using ln command.
Limitations:
Users cannot create hard links for directories unless they have super user privileges.
Users cannot create hard links on a file system that references files on a different systems.
3.11 Directories
As well as its contents, a file has a name and 'administrative information', i.e. the file's
creation/modification date and its permissions.
The permissions are stored in the inode, which also contains the length of the file and where on
the disc it's stored.
A directory is a file that holds the inodes and names of other files.
We can create and remove directories using the mkdir and rmdir system calls.
The mkdir system call makes a new directory with path as its name.
3.11.2 chdir
A program can determine its current working directory by calling the getcwd library function.
The getcwd function writes the name of the current directory into the given buffer, buf.
3.13 Scanning Directories
The directory functions are declared in a header file, dirent.h. They use a structure, DIR, as a
basis for directory manipulation.
3.13.1 opendir
3.13.2 readdir
The readdir function returns a pointer to a structure detailing the next directory entry in the
directory stream dirp.
The dirent structure containing directory entry details included the following entries:
telldir
The telldir function returns a value that records the current position in a directory stream.
seekdir
The seekdir function sets the directory entry pointer in the directory stream given by dirp.
3.13.3 closedir
The closedir function closes a directory stream and frees up the resources associated with it.
1. The printdir, prints out the current directory. It will recurse for subdirectories.
2. Now we move onto the main function:
The program produces output like this (edited for brevity):
How It Works
After some initial error checking, using opendir, to see that the directory exists, printdir makes
a call to chdir to the directory specified. While the entries returned by readdir aren't null, the
program checks to see whether the entry is a directory. If it isn't, it prints the file entry with
indentation depth.