I've been told (source kept secret) that the Glitter soundtrack was released on Septemeber 11, and that it went platinum on October 16. So, I guess there are still enough Mariah fans to carry that album despite the movie. Still baffling to me was that the talk-show promotion for the movie was done by "Padma", the fashion model who is 11th-billed in the movie, and who happens to be the girlfriend of Salman Rushdie. Also interesting is the cast list for Padma's latest movie - http://us.imdb.com/Title?0330082 Lots of people had questions about interleaving. Assume you have 4 disks. Byte interleaving means that byte N is on disk (N mod 4). Since disks deal with sectors, this also means that the new "sector size" is 4x the original sector size. All reads and writes involve all disks, which is great for large transfers, but doesn't give you any benefit from multiple disk heads on small transfers. Block interleaving means that block N is on disk (N mod 4). Sector size can remain unchanged. If your workload consists of lots of reads of 1-block files, then all the disks can be seeking independently, giving you good performance. Likewise, large transfers also get the benefit of all disks reading/writing data at the same time. What is parity? The count of the number of 1 bits - the last line of http://www.cs.princeton.edu/courses/archive/fall02/cs318/lec6/slide30.html Explain the RAID levels again. See gory details at http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf RAID 0 - no redundancy, data may be interleaved (striped) across the disks, may not be. If interleaved, you get better transfer rates for large transfers. RAID 1 - Each disk has a "mirror", and so all data exists on two disks. Writes have to be written to both disks, but reads can be read from either disk. So, read rates can be twice as high as write rates. RAID 2 - The ECC approach is used to detect and correct some failures. If you're interested in the details, read up on Hamming Codes. RAID 3 - One extra disk stores the XOR result, all data interleaved at byte level. All reads and writes involve all disks, so only large transfers get performance benefits. RAID 4 - Interleaving now at block level, so multiple small transfers can gain since the disks can seek independently. All writes must update the parity disk, which becomes the bottleneck. RAID 5 - The contents of that last disk are now spread across all disks to try to reduce the write bottleneck. So, small writes still involve two disks, but this is better than all writes waiting on the same disk. Explain difference between raid 3&4, and 2&3 3 and 4 differ in the level of interleaving, while 2 and 3 differ in what approach is used to do the checking. If the explanation above doesn't clarify, write me. In raid, what happens if you lose the parity disk? You basically rebuild it the same way that you'd rebuild any failed disk. If raid is using information theory as an underlying basis, does that mean we can throw in more space to handle more failures? In general, yes. However, that tends to be ugly. Instead, when people care about that, they might just have mirrored RAID 5 or something like that. Or, if you're really worried, see the Byzantine fault-tolerance work at http://www.pmg.lcs.mit.edu/~castro/pubs.html What kinds of companies are interested in RAID? Anybody that doesn't want to suffer downtime due to disk failure. Most places seem to use it to improve reliability, with raw performance being less of a concern. Most of the disks the CS department (and presumably OIT) are using involve RAID storage. Slide 33 (on the web) - what does "general error correcting codes too powerful" mean The ECC scheme is great for silent errors, and is often used for checking RAM, where individual bits can flip. However, disks tend to fail visibly, so more complex schemes like ECC can be avoided, and simpler schemes like XOR can be used instead. How is log corruption checked? Generally, you don't assume random bits flipping on disks, so I would guess that most log corruption of that form isn't checked. What logs may do, however, is put down some kind of before and after marker in the log for every write, so that way, if a write is only partially complete when the power fails, you'll be able to tell. In logging, don't you lose the log when you lose power? The log is on disk, so you assume that anything that's already written is stable across power failures. In logging, do you clear the log file every time you update the disk? Not exactly sure what's being asked. However, the log is basically intended to be a holding area, and the changes made to the log have to be reflected to the "real" metadata parts of disk. So, when those updates have taken place, you can mark those entries in the log as being "clean". Does logging slow down performance - even though they're sequential writes, they're still writes? Is it worth it if you rarely have to perform recovery If you were trying to sustain it over long periods of time, logging would probably be a net loss unless the log were on a separate disk. However, disk traffic tends to be "bursty" - periods of idle time punctuated by small regions with lots of activity. One goal of logging is to allow that activity to occur as quickly as possible, so that the user can move on to other things while the OS cleans out the log in the background. Another goal of logging is to avoid having to do the fsck cleanup after a power loss. What does the new Linux ext3fs "journalized" filesystem do for reliability? Logging, basically as I've described it. It's nothing special as far as I can tell. What does the swap partition do? That's coming up next in the virtual memory part of the course. Basically, the OS uses space on disk to augment the physical memory of the machine, giving you the illusion that you have more memory than you really have. Are old midterms up? Anything on-line in previous years is fair game, but please don't get old copies from other sources. What were advantages of unix filesystem? You could grow the file to arbitrarily large sizes in a relatively elegant way, and you could use space on disk regardless of whether it was contiguous or not. Do the first 10 entries in the unix inode point to only one block of data each? Yes Why does unix have 13 entries in the inode? If I had to guess, the number of direct block entries was chosen to fill up space to get the inode to 128 bytes. The single, double, and triple indirect entries are needed for expansion. So, if the rest of the inode had needed more space, instead of 10 direct block entries, there would have been fewer. Are triple-indirect implemented on modern systems? Given that some systems support really large single files, I would guess that they are. What happens in Unix when those 13 entries aren't enough? You could have a quadruple-indirect entry by getting rid of one of the direct entries. In Unix, is the smallest file 4KB? See page 5 of the "Fast Filesystem" paper. With today's large files, isn't it better to have a block size greater than 4KB? Small files will waste space, but it's faster. Special-purpose filesystems, like video-on-demand, will often use much larger block sizes, like 128KB. I don't know if general-purpose filesystems have really increased the block sizes beyond 4/8 KB. Explain the FAT slide http://www.cs.princeton.edu/courses/archive/fall02/cs318/lec6/slide24.html File "foo" has a first block of 217. The array marked FAT doesn't contain the data, but only the linked list of blocks. So, entry 217 in this array tells us what the next block of the file is, and block 217 on disk contains the actual data of the first block of the file. The location of the second block of the file is indicated by the value in entry 217, so the second block is block #619, and the third is block #399, etc. In old DOS, drives were limited to 4GB - does that relate to the structure of FAT? I believe so, and if I recall, "FAT32" was the solution to use larger disks. Doesn't FAT have the same problem as Unix in that the data blocks may not be contiguous? Yes - the extent-based filesystems had "extents" that potentially spanned many blocks and were contiguous. However, they tended to have a fixed-size number of extents. FAT and the standard Unix filesystem don't have a built-in mechanism to get these benefits, although more modern implementations of Unix do try pretty hard. Likewise, disk defragmentation programs (particularly on DOS) try to fix this problem as well. How does NTFS differ from FAT? Offhand, I don't know. I believe NTFS is more Unix-like, but I may be wrong. Slide 23 - what does "up front declaration a real pain" mean When you create the file, you have to say what its maximum size will be, etc. You have to declare its usage at the time of creation. If last lecture is on midterm, can I post the notes early? I'll try. Realize, though, that the midterm is on Tuesday, so that gives you five days after the last lecture. How do you do "special order" to get fast recovery times I assume this means how do programs like ScanDisk and fsck optimize their time. If that's the question, the answer is that disks can be read via the filesystem, or in a "raw" manner, by treating it as an array of blocks/sectors. If you have the appropriate permissions, then you can read it as raw data, which means that you can figure out where things are on disk in a brute-force manner and do the optimization yourself. something about metafile - couldn't read question