Can you make the followup e-mails shorter or more concise? On this e-mail, just the questions with a blank line in between take up 76 lines. So, my "bloat factor" is about 5, which isn't too bad, since these are supposed to be the questions that I don't have time to answer in class. What does it mean that scan/c-scan doesn't go to the end in real implementations? On the slides, it showed the disk head going to the edges of the disk. In reality, you only have to go as far as the last request, which may not actually be on the last track. How exactly do filesystems deal with mechanical disk failures? Some failures, you can't handle. In other cases, you try to do things like replicating the superblock such that all of them can't be wiped out easily. What's the subtle unfairness when using SCAN? Consider a block in the middle of the disk - assuming that you scan to the edges, it gets serviced with a very regular frequency. In other words, the closer you are to the middle track, the time the head spends to the "left" of you is equal to the time the head spends to the right. However, if you're located toward the edge, one of these times will shrink, and the other will increase. So, if the time that's shrunk is smaller than the think time of the application, that location on disk will only get serviced when the head is going in one direction. I heard IBM is developing storage that uses holograms to store info. If holographic storage works, would that render current disk techniques obsolete? Any technology that changes the access characteristics of disks will cause some of the current techniques to be useless, while it may open the doors for new techniques. If holographic or crystal storage gets cheap enough and reliable enough, it'll displace disks for a large class of applications. How can a process make more than one simultaneous request to the disk? Go to a Solaris machine and type "man aioread" for the details or see http://www.netsys.com/cgi-bin/man2html?aioread(3AIO) If you're not sure of which OS you're on, see "man uname" Where on disk are the inodes stored? Early systems stored all inodes at the beginning of the disk in one big array. Now, when inodes are preallocated, they are usually split into several smaller arrays and placed around the disk on different cylinders. How does the file system keep track of different file types? The file system only knows of a small number of file types, and these are stored in the inode. For example, the OS knows the difference between a "regular" file, a directory, and a device file. However, it doesn't know the difference between GIFs and ".c" files. That logic is left up to the applications. Why is it easier to tell "what happened" with a filesystem model? (I'm assuming this means in the presence of failures) Let's assume that you're trying to create a new file and that the power fail during the process. It's possible that an inode was allocated, or maybe a data block was allocated, and even that the free space bitmap was updated. However, until the entry is reflected in the directory file, the process isn't logically complete. It's easier to just say "failed" or "completed" rather than having to determine at exactly which step of the proces failure occurred. Define bandwidth and explain how block size affects how much is wasted. Bandwidth is simply (data transferred) / time. The table showing different block sizes and the effective transfer rate is basically just a measure of how much of the theoretical transfer rate is lost once you factor in seek time and rotational delay. You can think of "block size" as basically the length of a single continuous read from disk. The effective transfer rate is therefore the number of bytes transferred divided by the sum of the seek time, rotational delay, and transfer time. How long is a process cycle in milliseconds? Assuming we're talking about the time that each process can run before the CPU switches to something else, I believe that it's generally on the order of 10-100 milliseconds, barring interrupts pre-empting it. How does the OS decide what to cache? Caches generally use the principle of "if it was used recently, it's likely to get used again in the near future". In practice, what this means is that when they need space, they tend to throw away whatever object hasn't been used in the longest time. This policy is known as "least recently used" or LRU. How much more does the IBM UltraStar cost than the Seagate Barracuda? I went to an online store, and found that the cheapest ATA drive I could get is a 20GB Maxtor Fireball for $65. It's 5400 RPM, has 2MB of cache, and is ATA/100. The cheapest SCSI drive is a Maxtor Atlas 18GB, with 8MB of cache and 10000 RPM, for $140. The cheapest 36GB SCSI drive was $230, while a 40GB IDE is only $76. Are there any good reasons for using files that aren't streams of bytes? These days, not really. It used to be simpler for text editors to assume that text lines mapped to punch cards, so you can find the start of line N by going to position N*80, or just record N. Why are there so many elves and magic numbers in an engineering class? We need so many new words that we just run out. The physicists use words like "quark", while we use words like "cookie". What file information is kept in memory at any arbitrary time? The OS will keep some inodes, some directory blocks, some portion of the free space bitmap, and some data blocks for recently-used files. How did I get the memory transfer rate for modern systems? see http://www.cs.virginia.edu/stream/ What's the relationship between Flemish and Dutch? see http://www.calvino.demon.co.uk/Dutch_Translation/flemish_hist.html How are long file names handled, especially in DOS vs Windows? Unix used to have short file names as well. What this meant was that the directory blocks could be thought of as arrays of file names. However, Unix systems made a switch some time ago, and dealt with the consequences. What DOS seems to have done is created a parallel directory structure that contains the long names and maps them to the short names. Are we going to cover RAID and striping? Assuming I can get back on track, yes. Aren't Windows shortcuts pretty much symbolic links? Yes, but it appears that the filesystem wasn't natively designed to have them in a clean way, so it was probably a fair bit of ugliness. If the disk is lying about the layout, how effectively is the OS able to carry out the access ordering optimizations we discussed? It's obviously not perfect, and the hope is that the disk isn't lying too badly. In general, though, it doesn't matter what the exact layout of the disk is, as long as the mapping isn't completely arbitrary. In other words, choosing higher track numbers in the "fake" info should correspond to higher track numbers in the "real" tracks. For FIFO, would the disk diagrams show the access going off one side and appearing on the other because the disk is circular? The disk had has only one degree of freedom - it moves toward the edge or toward the center. Each track really forms a ring. The rectangular diagrams in the talk have head position on the X axis, and time on the Y axis. What happens if you write to the cache and something wants to read before those bytes are written? All caches have to handle this case, and they have all reads check what's waiting in the write buffer to ensure that reads see the latest data. Is there any way to rebuild a superblock by examining the disk for patterns? Probably - my guess is that professional disk repair services use various techniques to detect common layouts. What happens if a file contains the magic cookie but isn't a unix executable? Nothing too horrible - the magic cookie is just used as a convenience. If you try to run something that isn't a real executable, you'll just get a seg fault or an illegal instruction, or something similar. I've heard of prediction algorithms that analyze disk access pattern and try to load the cache. Does this provide much of an improvement? Done right, it can. Many OSs will detect if programs seem to be reading a file sequentially, and if so, will automatically schedule a large read when the program performs a small read. The extra data is then cached by the OS, and delivered to the application only when requested. Is the superblock the main difference between different filesystems (FAT32, NTFS, HFS, NFS). Is it a matter of how data is stored or how it's managed? More generally, what's the difference between these common filesystem types? All of these filesystems use different metadata structures and on-disk layouts. This results in different performance characteristics, different capabilities, and different failure behaviors. The details are a little to ugly for a short e-mail. We will cover NFS toward the end of the semester. What does NTFS use instead of inodes? Offhand, I don't know. Whatever it uses has to have properties similar to inodes, since most filesystems have some similar structure. However, the inode-like structure isn't always stored separately from the file or from the directory. Can you explain directories better than the book does? Probably, but not in a short e-mail, and not without a more specific question. Send me an e-mail.