Akamai should put live feeds on smart bombs for any upcoming war.
Imagine the possibilities.

Streaming video works best when not a lot of stuff on screen changes.
That way, you send "diffs" (difference) between what was on the last
frame and what's on this frame. There are lots of optimizations
involved, such as detecting what portions of the old frame have moved
to new locations, so that you can track moving objects in a frame, or
when the camera pans. A video feed on a warhead would involve most of
the screen expanding, and would probably compress really horribly.
That plus the fact that we tend to bomb at night would make this a bad
video.


So why do we use read() at all? Why not just use mmap() for all disk
access?  So it looks like mmap is faster, so why do we use read/write?

One is convenience - read has the benefit that you specify the
buffer's location, and that there are no restrictions on alignment.
It's not too hard to change some applications to use mmap instead, but
it's not always trivial. Another is that for small reads, the
performance of mmap may be worse than read, since it has to set up the
VM mapping first. Finally, remember that if you use mmap with shared
mappings, the data you have will change if anything else modifies
it. You can get around this problem by using a private mapping.
However, if you really want a small region, copying a shared mapping
will copy a page.


Please explain copy-on-write again and its relation to fork()

See http://www.cs.princeton.edu/courses/archive/fall02/cs318/lec13/slide9.html
Basically, when two processes want a copy of the same page, you can
actually copy the page or you can do copy-on-write. If you actually
copy the page, then you've got two copies of the page immediately, and
if the only reason you're doing the copy is because of fork(), then
the copy may not get used at all. Instead, what you can do is have
both processes point to the same page, but mark the page as being
read-only.  So, they can share it as much as they want, but only one
copy exists as long as the page is being read. As soon as one process
tries writing to the page, the OS gets the page fault and makes a copy
of the page.  The process trying to do the write gets the new copy.
By delaying the actual copy until it's really needed, the OS saves
work and saves space. 


What was the word you decided for interviews?

For those of you interested in the iconography of the Mohenjo-daro
civilization, see http://www.sxu.edu/~bathgate/gallery/IVC/ivc.html
Note that this has nothing at all to do with CS and won't be on the
final


When you say that mmap just maps the file but doesn't really load it,
what's going on? What's the point?

The mmap call is making that portion of the process's virtual address
space valid, but not doing the load right then. When the process tries
to read/write that area, then a page fault is generated and the
appropriate page is loaded from disk.


Can you explain double buffering and why mmap might be good even if
you read all the data?

Double-buffering means that you have two copies of the same data
taking up space in main memory. In the case of read(), one copy is
sitting in the filesystem cache, and the other copy is sitting in the
user-level buffer. 


When you modify a mapped file page, does the system automatically
update the file on disk, or do you have to do it explicitly before
terminating your process?

You don't need to explicitly write it back to disk - that'll happen
periodically while the process is running, and also when the process
terminates. At any point, if you want some portion of it written back,
you can generally use the msync() call. On FreeBSD, it has options
that say whether the writes should just be started, or whether msync
should wait until all writes are actually on disk.


If you get a hash table from a file that was mapped and modify the
hash table, is everything saved or do you have to mmap the hash table
back to the file?

See above.


Isn't it possible in the case of an mmapped file for the process to
think that the data was written to disk, but then the system crashes,
and the data isn't there? Is corruption avoided simply because the
filesystem doesn't use mmap?

Even in a regular process, write() doesn't guarantee that the data is
stored on disk. So, it's always possible that the writes don't make it
to disk. If a process really cares about this, it can use the fsync()
call to make sure that the data is on disk.


In a unified VM system, couldn't we provide a way for a process to say
that it's disk bound, or it needs a few pages?

The madvise() system call provides a way for the process to tell the
system about how it plans on using certain regions of memory. The
process can say that it expects to use the region, that it no longer
needs it, that it won't benefit from staying in cache, etc., etc. Not
all operating systems really support this.


On a system with all memory in use, then wouldn't both read and mmap
cause page faults? Read needs to kick out a page for the newly
allocated memory, while mmap will fault on the read?

If a read occurs, the page is brought in from disk, and the page(s)
containing the user-space buffer are dirtied. The mmap will cause the
page to be mapped, and then faulted in on the access. If there's
memory pressure and a page is needed, the disk page in the filesystem
cache can be kicked out pretty easily if it hasn't been modified. In
the case of read(), the buffer that was dirtied would have to be
written to swap before that page could be reclaimed.


20 years ago when you upgraded to 64k, how much RAM existed in server
systems?

Local area networks as we know them didn't exist. Instead, you
generally had a big machine connected to lots of "terminals", which
were fancy display machines that did minimal computation. The machines
in question were relatively modest - I believe that 1MB would have
been considered really big. I believe that the PDP machines on which
the original Unix was developed had about 64KB of memory.


What did the turbo button do on some really old PCs?

The original PC had a processor that ran around 4.33 MHz, if I
remember correctly. Some programs and hardware were designed to work
with this timing, and wouldn't work properly if various parts of the
PC ran at a higher speed. So, as the processor got faster, the turbo
button was how you switched from the high-speed (turbo) mode back to
the original speed.


Is it more efficient to have random access to an mmap'd file in
general, versus using read with lseek()?

In general, yes - it's simpler to just go ahead and do the mapping
once and then bring in pages via faults rather than keep track of what
you need and do the work manually.


How do the number grades reflect the letter grades? Can you give a
general range?

In general, what I've done in past years is to have the raw (no extra
credit score) mean be a straight B, with one-half standard deviation
around it be some form of a B. If the standard deviation is too
narrow, then I use a little lattitude in deciding what grade is
assigned to the mean and how many points count for a standard
deviation.


Where can I find a copy of last night's Victoria's Secret show?

On their web site, naturally.


When the file is deleted, how exactly does it stick around?

There's the in-memory inode that still references it, so even if
there's no directory entry for it, as long as the in-memory inode has
a ref, the filesystem won't free the disk inode and corresponding file
blocks.


Can you re-explain the reloading state example?

Assume you want to build a giant hash table. You could do it by
allocating it on the heap, but you'd have to figure out some way of
saving it and recovering it. Instead, you could allocate a giant file,
mmap it, and then write your own malloc/free that allocated space
within the file instead of on the heap. This way, as long as you map
the file at the same location every time your program starts up, all
of the pointers, etc., are all valid, and you don't have to rebuild
the hash table. Even better, since it's mmapped (and demand-paged),
the whole thing doesn't have to load into memory before you can start
using it.


What's the difference between the VM and FS?

The virtual memory system normally deals with the memory used by user
processes - their code, data, heap, and stack. If the data/heap/stack
are modified and space is needed, pages from main memory get written
to the swap area of disk. The code pages shouldn't be getting
modified, so they can easily be kicked out of memory when needed and
reloaded from disk. In the old scheme, the filesystem had its own
cache that wasn't directly accessible by user programs. It knew which
pages were clean and which were dirty, and used the in-memory inode to
read/write the appropriate locations on disk. In this model, there
wouldn't be any major page faults in the filesystem cache, since these
weren't part of the VM system. Once you unify the filesystem and VM,
then you generally use the same mechanisms to decide what pages should
be replaced, etc., etc.