Kedar Sovani
(Work in Progress)
Date Started: 30 Aug, 2005
Applies to linux kernel 2.6.11.4
A filesystem manages storage of data persistently on a storage medium. All Operating systems have a native filesystem (ext3 for Linux), but also, the operating systems should support other filesystems. This may include support for ISO9660 filesystem for CD-Roms, NT filesystem to mount Windows partitions or support for remote access file systems like NFS/CIFS.
To support multitude of such filesystems the operating system provides an abstraction called VFS or the Virtual Filesystem. Yes, that is what the Linux VFS is, an abstraction, and not a filesystem in itself. The VFS layer is a very good example of the use of Object Oriented principles in the Linux kernel. Though an object oriented language like C++ is not used to implement this, the concepts are taken from the Object Oriented Programming(OOP) paradigm.
As is obvious, the "inheritance" feature of OOP is used in the Linux VFS. The Linux VFS acts as the base class for all the filesystem supported by Linux. It defines a set of objects and methods to operate on these objects. Filesystems (like ext3, reiserfs) inherit this base class, and extend its objects to suite their own need. Also, these filesystem refine the methods that operate on these objects as per their requirement. Thus, a generalisation-specialisation relationship is maintained between the Linux VFS and Linux filesystems. With such an architecture, the application that uses system calls like (open/read/write/close), can access the filesystem, without being concerned about which particular filesystem the application is talking to. The application is simply using APIs provided by the Linux VFS (base class).
We'll take a look at it one-piece-at-a-time. Although the most in-depth knowledge can be had only by experience and spending hours tracing through different calls, we'll try to cover most of the major components and their interactions with each other.
We mentioned above that the Linux VFS defines a set of objects and corresponding methods to operate on these objects. Lets look at what these objects are, and why they exist.
(Defined : include/linux/fs.h : 758)
A superblock, as the name suggests, is the "super" block which defines how to make sense of all other blocks in the filesystem. This block maintains all the details about the filesystem, like the block size, the total number of blocks available on the filesystem, the number of free blocks, the number of inodes (we'll come to them soon) on the filesystem, and importantly, the root inode of this filesystem. The superblock is an on-disk data structure.
The superblock that we'll discuss here, is the in-core (in memory) version of this super block. The Linux VFS provides us the object "struct super_block" and its associated methods, "struct super_operations" for the same.
For now, we'll talk about a few members in this "struct super_block" object. The others we'll discuss later as we go through the rest of the VFS objects.
The definition of the super_block object, will show you the default members that VFS defines for us. Yes, filesystems may extend this object further to include more information. The member, "void *s_fs_info" assists filesystems to extend this object as per the filesystem. Filesystem may define their own structures to associate more information with the super_block and hook these structures off this pointer. For example, the ext3 filesystem extends this object with the structure, "struct ext3_sb_info".
E.g.
sbi = kmalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
sb->s_fs_info = sbi;
The VFS layer maintains a global list of all the superblocks in the system. The variable "super_blocks" is the head of this list of superblocks. The member "struct list_head s_list" of a super_block, hooks a superblock into this global list of superblocks. The spinlock sb_lock protects this list of super blocks.
Talk about how a superblock is usually created on a mount.
Talk about global superblock linked list.
Talk about *_operations. (super_operations).
Talk about different lists maintained by the super_block.
Then, talk about some important operations that are performed on the super_block.
(Defined : include/linux/fs.h : 430)
An inode, or an "index node" is a unique index to an object in the filesystem. The object could be a regular file, a directory, a symbolic link, a device or a socket. An inode stores all the required information to retreive all the metadata (attributes) and data of one single file. The inode is an on-disk data structure.
As with the super_block, the inode that we'll discuss here, is the in-core version of the inode. The Linux VFS provides us the object "struct inode" and its associated methods "struct inode_operations".
An inode is a frequently used entity, and unlike the superblock, which is one perfilesystem, there could be hundreds and thousands of inodes in a given filesystem. Appropriate management of the inodes is critical to achieve optimum performance levels. To this effect, the Linux VFS layer maintains an inode cache that holds all the inodes that are currently being used. The inode cache also maintains a list of unused inodes that had been recently used, with the expectation that the inode may be accessed soon. The recently used, but now unused, inodes may be purged from the cache, if VM pressure rises.
Lets look at some of the members of the inode structure.
The member "struct inode_operations *i_op" defines the methods for the inode. A filesystem may define different inode operations based on the type of the inode (regular/directory/link/device). We'll discuss these in the later sections.
As with other objects, the inode object can also be extended by filesystems. The VFS provides a mechanism similar to that of the super_block object to extend this object. This is by providing a generic (void *) pointer, "void *generic_ip". Filesystems may define their own structures to associate more information with an inode and hook them off this pointer.
But for efficent use of the slab cache, the VFS layer recommends filesystems to embed the "struct inode" in their own inode specific data structure.
E.g.
/* my fs specific inode info */
struct inode vfs_inode;
}
An inode belongs to a number of lists. Lets look at these :
A super_block for a given filesystem stores a list of all the inodes for that particular filesystem. The head of this list is the member "struct list_head s_inodes" in the super_block structure. The member "struct list_head i_sb_list" in the inode structure hooks onto this linked list.
Note that, the "struct list_head" data type may act as
The member "struct list_head i_list" allows the inode to be hooked into three different lists. This is the inode cache implementation that we talked about earlier.
sb->s_io --> intermediate list, while I/O is being performed on the inode.
inode_operations
(Defined : include/linux/dcache.h : 83)
A dentry, or a "directory entry", is what its name suggests. You may have noticed in the Inodes section, that there was no mention of the name of the file that an inode refers to. That is not because we forgot to mention it, but because there is separate structure, called the "dentry", to keep track of that.
A dentry is responsible for storing the names of filesystem objects. Well, that means there ought to be the following members in the dentry structure,
Lets talk about some of the members of this relatively small structure.
There are quiet a few other "list_head" members that need some explanation.
Note that, the "struct list_head" data type may act as
The Unix filesystem namespace heirarchy is not a tree but a directed acyclic graph (DAG). Thus, a file may have one or more than one names associated with it. The same file may be seen at multiple places in a namespace heirarchy. Thus, these two components of the name and the namespace were segregated from the inode object and became a separate structure called the dentry. Thus, multiple dentries may point to a single inode. An inode maintains a list of dentries that point to it. The member "struct list_head i_dentry;" in the inode structure, is the head of this list. Dentries hook onto this list by means of the member "struct list_head d_alias" in the dentry structure.
Negative Dentry
A +ve dentry holds reference on the corresponding inode
(Defined : include/linux/fs.h : 580)
The file structure is used to keep track of open files in a system. The file-handle (that we get in response to the open(2) system call, internally points to this structure. All the attributes related to the opening of the file, and other state information (like offset) during the use of the file descriptor is maintained in this file structure.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir8915pCTNpL/lyx_tmpbuf0/linuxvfs.tex
The translation was initiated by Kedar Sovani on 2007-03-20