Akamai should put live feeds on smart bombs for any upcoming war. Imagine the possibilities. Streaming video works best when not a lot of stuff on screen changes. That way, you send "diffs" (difference) between what was on the last frame and what's on this frame. There are lots of optimizations involved, such as detecting what portions of the old frame have moved to new locations, so that you can track moving objects in a frame, or when the camera pans. A video feed on a warhead would involve most of the screen expanding, and would probably compress really horribly. That plus the fact that we tend to bomb at night would make this a bad video. So why do we use read() at all? Why not just use mmap() for all disk access? So it looks like mmap is faster, so why do we use read/write? One is convenience - read has the benefit that you specify the buffer's location, and that there are no restrictions on alignment. It's not too hard to change some applications to use mmap instead, but it's not always trivial. Another is that for small reads, the performance of mmap may be worse than read, since it has to set up the VM mapping first. Finally, remember that if you use mmap with shared mappings, the data you have will change if anything else modifies it. You can get around this problem by using a private mapping. However, if you really want a small region, copying a shared mapping will copy a page. Please explain copy-on-write again and its relation to fork() See http://www.cs.princeton.edu/courses/archive/fall02/cs318/lec13/slide9.html Basically, when two processes want a copy of the same page, you can actually copy the page or you can do copy-on-write. If you actually copy the page, then you've got two copies of the page immediately, and if the only reason you're doing the copy is because of fork(), then the copy may not get used at all. Instead, what you can do is have both processes point to the same page, but mark the page as being read-only. So, they can share it as much as they want, but only one copy exists as long as the page is being read. As soon as one process tries writing to the page, the OS gets the page fault and makes a copy of the page. The process trying to do the write gets the new copy. By delaying the actual copy until it's really needed, the OS saves work and saves space. What was the word you decided for interviews? For those of you interested in the iconography of the Mohenjo-daro civilization, see http://www.sxu.edu/~bathgate/gallery/IVC/ivc.html Note that this has nothing at all to do with CS and won't be on the final When you say that mmap just maps the file but doesn't really load it, what's going on? What's the point? The mmap call is making that portion of the process's virtual address space valid, but not doing the load right then. When the process tries to read/write that area, then a page fault is generated and the appropriate page is loaded from disk. Can you explain double buffering and why mmap might be good even if you read all the data? Double-buffering means that you have two copies of the same data taking up space in main memory. In the case of read(), one copy is sitting in the filesystem cache, and the other copy is sitting in the user-level buffer. When you modify a mapped file page, does the system automatically update the file on disk, or do you have to do it explicitly before terminating your process? You don't need to explicitly write it back to disk - that'll happen periodically while the process is running, and also when the process terminates. At any point, if you want some portion of it written back, you can generally use the msync() call. On FreeBSD, it has options that say whether the writes should just be started, or whether msync should wait until all writes are actually on disk. If you get a hash table from a file that was mapped and modify the hash table, is everything saved or do you have to mmap the hash table back to the file? See above. Isn't it possible in the case of an mmapped file for the process to think that the data was written to disk, but then the system crashes, and the data isn't there? Is corruption avoided simply because the filesystem doesn't use mmap? Even in a regular process, write() doesn't guarantee that the data is stored on disk. So, it's always possible that the writes don't make it to disk. If a process really cares about this, it can use the fsync() call to make sure that the data is on disk. In a unified VM system, couldn't we provide a way for a process to say that it's disk bound, or it needs a few pages? The madvise() system call provides a way for the process to tell the system about how it plans on using certain regions of memory. The process can say that it expects to use the region, that it no longer needs it, that it won't benefit from staying in cache, etc., etc. Not all operating systems really support this. On a system with all memory in use, then wouldn't both read and mmap cause page faults? Read needs to kick out a page for the newly allocated memory, while mmap will fault on the read? If a read occurs, the page is brought in from disk, and the page(s) containing the user-space buffer are dirtied. The mmap will cause the page to be mapped, and then faulted in on the access. If there's memory pressure and a page is needed, the disk page in the filesystem cache can be kicked out pretty easily if it hasn't been modified. In the case of read(), the buffer that was dirtied would have to be written to swap before that page could be reclaimed. 20 years ago when you upgraded to 64k, how much RAM existed in server systems? Local area networks as we know them didn't exist. Instead, you generally had a big machine connected to lots of "terminals", which were fancy display machines that did minimal computation. The machines in question were relatively modest - I believe that 1MB would have been considered really big. I believe that the PDP machines on which the original Unix was developed had about 64KB of memory. What did the turbo button do on some really old PCs? The original PC had a processor that ran around 4.33 MHz, if I remember correctly. Some programs and hardware were designed to work with this timing, and wouldn't work properly if various parts of the PC ran at a higher speed. So, as the processor got faster, the turbo button was how you switched from the high-speed (turbo) mode back to the original speed. Is it more efficient to have random access to an mmap'd file in general, versus using read with lseek()? In general, yes - it's simpler to just go ahead and do the mapping once and then bring in pages via faults rather than keep track of what you need and do the work manually. How do the number grades reflect the letter grades? Can you give a general range? In general, what I've done in past years is to have the raw (no extra credit score) mean be a straight B, with one-half standard deviation around it be some form of a B. If the standard deviation is too narrow, then I use a little lattitude in deciding what grade is assigned to the mean and how many points count for a standard deviation. Where can I find a copy of last night's Victoria's Secret show? On their web site, naturally. When the file is deleted, how exactly does it stick around? There's the in-memory inode that still references it, so even if there's no directory entry for it, as long as the in-memory inode has a ref, the filesystem won't free the disk inode and corresponding file blocks. Can you re-explain the reloading state example? Assume you want to build a giant hash table. You could do it by allocating it on the heap, but you'd have to figure out some way of saving it and recovering it. Instead, you could allocate a giant file, mmap it, and then write your own malloc/free that allocated space within the file instead of on the heap. This way, as long as you map the file at the same location every time your program starts up, all of the pointers, etc., are all valid, and you don't have to rebuild the hash table. Even better, since it's mmapped (and demand-paged), the whole thing doesn't have to load into memory before you can start using it. What's the difference between the VM and FS? The virtual memory system normally deals with the memory used by user processes - their code, data, heap, and stack. If the data/heap/stack are modified and space is needed, pages from main memory get written to the swap area of disk. The code pages shouldn't be getting modified, so they can easily be kicked out of memory when needed and reloaded from disk. In the old scheme, the filesystem had its own cache that wasn't directly accessible by user programs. It knew which pages were clean and which were dirty, and used the in-memory inode to read/write the appropriate locations on disk. In this model, there wouldn't be any major page faults in the filesystem cache, since these weren't part of the VM system. Once you unify the filesystem and VM, then you generally use the same mechanisms to decide what pages should be replaced, etc., etc.