COS 318

COS 318 : Operating system
Fall 2004, Princeton University

Project 6: File System

Important Dates:

Design review: Tuesday, December 7, 2004

Project due: 11:59 PM, Monday, January 10, 2005

Simple File System

In this project you will implement a simple UNIX-like file system that relies on i-nodes, blocks and a superblock to represent a hierarchical directory structure. Files and directories can grow and shrink, and you need to manage free space. You are able to browse the directory structure, create new files and directories, delete them, etc. The functionality is similar to UNIX filesystems, but it doesn't include permission and user management. On the other hand, a naive implementation has a low complexity if efficiency is not a concern.

You will be required to implement the following system calls : fs_mkfs, fs_open, fs_close, fs_read, fs_write, fs_unlink, fs_mkdir, fs_chdir, fs_rmdir, fs_lseek, fs_stat and fs_link. Most of these are standard UNIX system calls and you can look up man pages for descriptions. You will be implementing simpler versions of these functions (no notion of permissions/users in this file system, open takes only read only, write only, read/write flags). The code given to you provides an implementation for the fs_fsck system call; this will come handy when debugging your own filesystem.

The shell has been modified so that you can access these functions from the prompt. We have also added a few commands like "ls", "cat", ... to the shell. Look at shell.c to find a full list of supported commands. (“create filename size_of_file” is handy to test creating a file named “filename” with “size_of_file” bytes.)

You should debug your file system in the Linux environment. We have provided enough functions so that you can test your entire implementation on Linux without having to reboot each time you make a change. Once it is working, you can move it to our OS to use the USB drive. This way you can use the debugging tools on Linux for this project.

Function name	Function type	Description
fs_init	Internal routine	Initialize the file system, e.g., the file descriptor table
fs_fsck	System call	Check the integrity of the file system, this is done for you.
fs_mkfs	System call	Make a new file system, i.e., format a disk.
fs_open	System call	Open a file with given flags.
fs_close	System call	Close a file.
fs_read	System call	Read some bytes from a file into a buffer
fs_write	System call	Write some bytes to a file from a buffer
fs_lseek	System call	Move the file pointer to a new location
fs_mkdir	System call	Create a sub-directory under the current directory
fs_rmdir	System call	Remove a sub-directory
fs_chdir	System call	Change the current directory
fs_link	System call	Create a link to a file.
fs_unlink	System call	Remove a link to a file
fs_stat	System call	Get the status of a file

Files for this assignment

You should use the code available in /u/cos318/6_pre/ as a starting point for your project. Don't be daunted by the large number of files since you are already familar with many of them and also most files do not require to be touched at all. Only a few files need to be changed and they are highlighted in the following table.

You should use the functions provided in block.h to read and write blocks in the file system. These functions will use a file (on linux) or the USB disk (on our OS) as the store. Two different implementations of block.c are available for the linux/USB systems.

File name	Description	Need to modify?	For Extra Credit
Makefile	A makefile for you to compile with the right options	No
block.c	Read/write a block of the file system	No
block.h	header file for block.c and blockFake.c	No
blockFake.c	Simulates the block operation on Linux	No
fs.c	FS system calls	Yes	Yes
fs.h	header file for fs.c	No
bootblock.s	Code for the bootblock that handles arbitrary size image files and also switches the machines into protected-mode and creates a flat address space for you. The details will be explained in the precept.	No
createimage.c	Your familar Linux tool to create a bootable operating system image on a floppy diskette	No
kernel.c	kernel entry code	No
kernel.h	Include some useful prototypes.	No
scheduler.c	scheduler code	No
scheduler.h	Include some useful prototypes.	No
interrupt.c	Interrupt handlers	No
interrupt.h	header file for interrupt.c	No
keyboard.c	Keyboard handling code	No
keyboard.h	Header file for interrupt.c	No
usb.c	A simple usb driver.	No
usb.h	Header file for usb.c	No
mbox.c	A template you will use to write mailbox code	No
mbox.h	Header file for msg.c	No
memory.c	Virtual memory.	No
memory.h	Header file for memory.c	No
common.h	common definitions between the kernel and user processes	No
thread.h	Include the prototypes of lock and synchronization primitives.	No
thread.c	synchronization primitive implementations.	No
th.h	header file of th1.c and th2.c	No
th1.c	Code of the first kernel thread	No
th2.c	Code of two kernel threads to test synchronization primitives	No
process1.c	The first user process	No
process2.c	The second user process	No
process3.c	One of the processes to test the mailboxes	No
process4.c	The other process used to test the mailboxes.	No
shell.c	A primitive shell that uses the keyboard input.	No
shellutil.c	I/O routines used by shell.	No
shellutil.h	Header file for shellutil.c and shellutilFake.c	No
shellutilFake.c	Simulates shellutil.c on Linux	No
util.h	Contains the prototypes of util.c	No
util.c	Contains useful routines like print_int() and gettimer()	No
syslib.h	Contains the prototypes of syslib.c	No
syslib.c	system calls.	No

Testing on Linux

In addition, we provide a few files which can be used to test your code on linux

· shellutilFake.c : Fake some helper functions for the shell

· blockFake.c : Uses a UNIX file instead of a disk

You can test your code on linux by typing "make lnxsh" and executing "lnxsh" on linux. This program will read/write to a file called "disk".

Detailed Requirements and Hints

First you need to design your disk layout - boot block, super block, i-nodes, and data blocks. You may choose the number of i-nodes and data blocks your system will have, or you can rely on the provided constants (recommended). The file system needs to keep track what data blocks are free; a bitmap of all blocks is an OK solution for a small number of blocks. If you're not trying to achieve efficiency, you shouldn't be concerned about locality of data access. This means fragmentation/seek time may be ignored.

Your i-node implementation can use a greatly simplified i-node structure instead of using 13 pointers like in Unix file systems. Your i-node can simply have 6 pointers (all direct pointers). You don't have to implement indirect pointers for the required features. You get an extra credit for implementing the indirect pointers for i-node structure.

Directories are implemented similar to files, i.e. a list of data blocks pointed to by the directory i-node. Each i-node has a type that specifies whether it is a file or directory. A directory needs to store a list of entries that are identified by a string (i.e. file/directory name) and point to an i-node. Since a directory is just a chunk of data , one can store these entries sequentially. A more efficient scheme allows faster search/insert times - you could get up to two extra credits for implementing B-tree directory representation. For the required features you may rely on fixed sized strings, making indexing in the directory structure straightforward.

Multiple names from multiple directories might point to the same i-node. This can be achieved by hard linking other names to existing files/dirs ("ln" under UNIX). When one of these links is deleted by deleting the corresponding directory entry ("unlink"), the i-node might still be referenced by another name. This means you need to count the number of links to an i-node, and update it appropriately.

All file system functions might return errors if the operation fails, the arguments are invalid, or for other reasons. On failure these functions return -1 and print an error message with ERROR_MSG().

When a file is opened, a file descriptor for that file is created. The file descriptor is an integer later used by read/write/seek operations. For each file descriptor the file system maintains an entry storing the access mode to the file, the i-node number, the read pointer within the file, etc. These file descriptor entries are allocated in a table. When the table is full no more files can be opened until an entry is freed by a file close operation.

Management of free space may be implemented in the naive way, by always allocating the first free available data block when a new block is required. A correct solution only needs to meet the specifications. Do not worry about permission checking, ownership, and other features provided by a UNIX file system that have not been covered here.

Extra Credit

You will receive an extra credit if you implement the single, double and triple indirect pointers in the i-node structure of your file system. This upgrade allows larger files to be stored. Some hooks are already provided; you will have to change the read/write functions.

Directories can be implemented simply as a list of fixed size structures, each storing the name and i-node pointer of an entry. You will receive two extra credits if you implement a B-tree structure that allows arbitrary length entry names (You can still assume that the entry name will not be bigger than block size, but you can not rely on fixed sized strings.).

Design Review

For the design review, please send me a plain text file with brief description of each file system function you need to implement. Make sure you understand how functions read / modify i-nodes and the bitmap allocation table (Try not to miss these critical parts). Study how directories are represented as a file, and make sure you understand how to perform operations on directories.

Submitting your program

Submit your final solution electronically using the following command:

/u/cos318/bin/318submit 6 README changed-files ...

The submit command copies your files to the directory /u/cos318/submit/UserLogin/number and lists all the files that you have submitted for assignment number. UserLogin is your user account name. If you execute submit after the project deadline, your files are placed in directory number-late. You can run submit more than once, and you can submit partial lists of files.

CS318 Staff