Can you make the followup e-mails shorter or more concise?

On this e-mail, just the questions with a blank line in between take
up 76 lines. So, my "bloat factor" is about 5, which isn't too bad,
since these are supposed to be the questions that I don't have time to
answer in class.


What does it mean that scan/c-scan doesn't go to the end in real
implementations?

On the slides, it showed the disk head going to the edges of the disk.
In reality, you only have to go as far as the last request, which may
not actually be on the last track.


How exactly do filesystems deal with mechanical disk failures?

Some failures, you can't handle. In other cases, you try to do things
like replicating the superblock such that all of them can't be wiped
out easily.


What's the subtle unfairness when using SCAN?

Consider a block in the middle of the disk - assuming that you scan to
the edges, it gets serviced with a very regular frequency. In other
words, the closer you are to the middle track, the time the head
spends to the "left" of you is equal to the time the head spends to
the right.  However, if you're located toward the edge, one of these
times will shrink, and the other will increase. So, if the time that's
shrunk is smaller than the think time of the application, that
location on disk will only get serviced when the head is going in one
direction.


I heard IBM is developing storage that uses holograms to store
info. If holographic storage works, would that render current disk
techniques obsolete?

Any technology that changes the access characteristics of disks will
cause some of the current techniques to be useless, while it may open
the doors for new techniques. If holographic or crystal storage gets
cheap enough and reliable enough, it'll displace disks for a large
class of applications.


How can a process make more than one simultaneous request to the disk?

Go to a Solaris machine and type "man aioread" for the details or
see http://www.netsys.com/cgi-bin/man2html?aioread(3AIO)
If you're not sure of which OS you're on, see "man uname"


Where on disk are the inodes stored?

Early systems stored all inodes at the beginning of the disk in one
big array. Now, when inodes are preallocated, they are usually split
into several smaller arrays and placed around the disk on different
cylinders.


How does the file system keep track of different file types?

The file system only knows of a small number of file types, and these
are stored in the inode. For example, the OS knows the difference
between a "regular" file, a directory, and a device file. However, it
doesn't know the difference between GIFs and ".c" files. That logic
is left up to the applications.


Why is it easier to tell "what happened" with a filesystem model?
(I'm assuming this means in the presence of failures)

Let's assume that you're trying to create a new file and that the
power fail during the process. It's possible that an inode was
allocated, or maybe a data block was allocated, and even that the free
space bitmap was updated. However, until the entry is reflected in the
directory file, the process isn't logically complete. It's easier to
just say "failed" or "completed" rather than having to determine at
exactly which step of the proces failure occurred.


Define bandwidth and explain how block size affects how much is
wasted.

Bandwidth is simply (data transferred) / time. The table showing
different block sizes and the effective transfer rate is basically
just a measure of how much of the theoretical transfer rate is lost
once you factor in seek time and rotational delay. You can think of
"block size" as basically the length of a single continuous read from
disk. The effective transfer rate is therefore the number of bytes
transferred divided by the sum of the seek time, rotational delay, and
transfer time.


How long is a process cycle in milliseconds?

Assuming we're talking about the time that each process can run before
the CPU switches to something else, I believe that it's generally on
the order of 10-100 milliseconds, barring interrupts pre-empting it.


How does the OS decide what to cache?

Caches generally use the principle of "if it was used recently, it's
likely to get used again in the near future". In practice, what this
means is that when they need space, they tend to throw away whatever
object hasn't been used in the longest time. This policy is known as
"least recently used" or LRU.


How much more does the IBM UltraStar cost than the Seagate Barracuda?

I went to an online store, and found that the cheapest ATA drive I
could get is a 20GB Maxtor Fireball for $65. It's 5400 RPM, has 2MB
of cache, and is ATA/100. The cheapest SCSI drive is a Maxtor Atlas
18GB, with 8MB of cache and 10000 RPM, for $140. The cheapest 36GB
SCSI drive was $230, while a 40GB IDE is only $76.


Are there any good reasons for using files that aren't streams of
bytes?

These days, not really. It used to be simpler for text editors to
assume that text lines mapped to punch cards, so you can find the
start of line N by going to position N*80, or just record N.


Why are there so many elves and magic numbers in an engineering class?

We need so many new words that we just run out. The physicists use
words like "quark", while we use words like "cookie".


What file information is kept in memory at any arbitrary time?

The OS will keep some inodes, some directory blocks, some portion of
the free space bitmap, and some data blocks for recently-used files.


How did I get the memory transfer rate for modern systems?

see http://www.cs.virginia.edu/stream/


What's the relationship between Flemish and Dutch?

see http://www.calvino.demon.co.uk/Dutch_Translation/flemish_hist.html


How are long file names handled, especially in DOS vs Windows?

Unix used to have short file names as well. What this meant was that
the directory blocks could be thought of as arrays of file names.
However, Unix systems made a switch some time ago, and dealt with the
consequences. What DOS seems to have done is created a parallel
directory structure that contains the long names and maps them to the
short names.


Are we going to cover RAID and striping?

Assuming I can get back on track, yes.


Aren't Windows shortcuts pretty much symbolic links?

Yes, but it appears that the filesystem wasn't natively designed to
have them in a clean way, so it was probably a fair bit of ugliness.


If the disk is lying about the layout, how effectively is the OS able
to carry out the access ordering optimizations we discussed?

It's obviously not perfect, and the hope is that the disk isn't lying
too badly. In general, though, it doesn't matter what the exact layout
of the disk is, as long as the mapping isn't completely arbitrary. In
other words, choosing higher track numbers in the "fake" info should
correspond to higher track numbers in the "real" tracks.


For FIFO, would the disk diagrams show the access going off one side
and appearing on the other because the disk is circular?

The disk had has only one degree of freedom - it moves toward the edge
or toward the center. Each track really forms a ring. The rectangular
diagrams in the talk have head position on the X axis, and time on the
Y axis.


What happens if you write to the cache and something wants to read
before those bytes are written?

All caches have to handle this case, and they have all reads check
what's waiting in the write buffer to ensure that reads see the latest
data.


Is there any way to rebuild a superblock by examining the disk for
patterns?

Probably - my guess is that professional disk repair services use
various techniques to detect common layouts.


What happens if a file contains the magic cookie but isn't a unix
executable?

Nothing too horrible - the magic cookie is just used as a convenience.
If you try to run something that isn't a real executable, you'll just
get a seg fault or an illegal instruction, or something similar.


I've heard of prediction algorithms that analyze disk access pattern
and try to load the cache. Does this provide much of an improvement?

Done right, it can. Many OSs will detect if programs seem to be
reading a file sequentially, and if so, will automatically schedule a
large read when the program performs a small read. The extra data is
then cached by the OS, and delivered to the application only when
requested.


Is the superblock the main difference between different filesystems
(FAT32, NTFS, HFS, NFS). Is it a matter of how data is stored or how
it's managed? More generally, what's the difference between these
common filesystem types?

All of these filesystems use different metadata structures and on-disk
layouts. This results in different performance characteristics,
different capabilities, and different failure behaviors. The details
are a little to ugly for a short e-mail. We will cover NFS toward the
end of the semester.


What does NTFS use instead of inodes?

Offhand, I don't know. Whatever it uses has to have properties similar
to inodes, since most filesystems have some similar structure.
However, the inode-like structure isn't always stored separately from
the file or from the directory.


Can you explain directories better than the book does?

Probably, but not in a short e-mail, and not without a more specific
question. Send me an e-mail.