How does the local file table (the per-process array) help protect
overall system integrity?

Let's assume that this array didn't exist and that all processes
referenced a global array instead. What this would mean is that each
read/write call could be referring to any file that's currently open
in the entire system, so the OS would have to check on each read/write
call whether that process is allowed to access that file.

Instead, by introducing the local file table and numbering it from 0
on a per-process basis, the checking is much simpler. Basically, it's
"did the process ever open a file with a number this high" and "is
this file still open".


What happens if a process creates (forges) a pointer to a region that
it doesn't own?

On modern, general-purpose operating systems, it will cause a seg
fault or its equivalent. The process (via libraries, etc) basically
has to indicate what portions of memory it wants to access before it
uses them. It doesn't do this on every access -- things like malloc()
handle it for you behind the scenes.


In the slides, where it said "actual file info," did it really mean
global file table? 

Yes and no - that actual file info could have been one element of a
large array, or we could have had another array of pointers which
would then point to the actual file info. Both designs would have
similar properties in this regard.


When I said that the OS keeps track of how many things are using a
file so that it knows to get rid of it, is the directory tree one of
the things?

I should have been a little clearer here - what was being "gotten rid
of" wasn't the file itself, but rather, the bookeeping information
about files that are being actively used. When nobody is currently
access a file, the kernel may opt to no longer store information about
it in main memory (since we have a limited amount of main memory).
There are similar forms of bookkeeping on disk to determine when a
file is "really" deleted, but the two forms of bookkeeping are mostly
unrelated.


How does a buffer overflow really work?

Well, the most common form of exploit seems to be when the buffer is
allocated on the stack. Recall that the stack frame also contains the
return address where this function is supposed to go when it's
finished. So, if you overwrite that return address, you can control
where the program resumes execution when the function completes. My
guess is that the attacker writes a really long buffer that contains
the code itself, and the return address points to that code on the
stack.  That's just a guess.


Any reason for the mixed braces "[ )" on slide 19?

Yes, I was trying to be a math weenie. The start location is actually
being read/written, but the location "buffer+length" is not. The last
location read/written is really "buffer+length-1", hence the mixed
braces.


How does the OS give programs the illusion that they're writing to one
address when they're really writing to another one?

This is one aspect of what's known as "virtual memory", and we'll
cover it later in the semester. At a high level, the OS/hardware
provide a translation table between the addresses the process uses
(virtual addresses) and the actual memory of the system (physical
addresses). To make this translation process efficient, it's not the
case that any arbitrary mapping is allowed. The translation takes
place at the granularity of "pages", which are understood by the OS
and hardware, and generally ignored by the processes. Pages tend to be
4KB or 8KB.


Does the OS protect files from being written, and if so, how?

Good question - the way this is done will be covered later in the
semester, but basically, it's again more bookkeeping data stored on
disk. Along with each file is some information on which individual
user owns it, what group of people own it, and what permissions all of
the people on the system have to it (e.g. the individual user who owns
it can write to it, the group that owns it can read from it, and the
rest of the people on the system can do neither)


The stripes on the powerpoint make it hard to read.

OK, I'll avoid that background


What's the difference between read and fread, and when is each one
appropriate?

The read() system call gets you the bytes you asked for, no more and
no less. The fread() library call generally tries to optimize things
behind your back. Most of the time, you don't mind the optimizations
that it does. For example, it'll generally call read() and ask for
large chunks of data so that it has to call read() fewer times than
you would call it yourself. Since a library call is cheaper than a
system call, if your program does lots of small reads, it's often a
performance boost to use fread(). It's also the case that since fread
uses a FILE * and read uses an int, sloppiness in parameter handling
may be easier for the compiler to catch on fread.

So, when is fread bad?

Remember our example about multiple processes sharing the same file
that contains a list of tasks, and each process reads the next task
once it completes its current task? That would be problematic if
fread() was reading more than you really wanted.


What happens if a user process tries accessing kernel space?

The translation layer that maps virtual addresses to physical
addresses also tends to have support for determining if a particular
translation is allowed. In this particular case, when the user process
tries to use a kernel address, that hardware layer would say that the
user process doesn't have access to that range of memory. The OS would
then normally kill the process. If it's feeling really vengeful, it
will then drag around the carcass of the process as a warning to other
processes not to try the same thing.


Do read() and write() maintain positions within the file like fread()
and fwrite() do?

Yes.


If the file descriptor's entry in the local file table points into the
global file table, can't a malicious program forge pointers into the
global file table?

The local file table is part of the process, which is in kernel
memory.  So, under normal circumstances, the process won't even get to
see where the file descriptor entry in the local file table. Even in
the event that the evil programmer gets this somehow, the other
hardware mechanisms mentioned above should prevent him/her from
modifying kernel memory.


Is the kernel taking up space in every program a separate instance?
Does launching a new program take up more space?

Launching a new program only takes up a small amount of extra kernel
memory, for a suitable definition of small. The kernel that's mapped
into the memory of every process is a single instance of the same
kernel. The extra space required in the kernel is the bookkeeping
overhead for each process.


Who actually does the malloc()?

The malloc() library call generally invokes a system call by the name
of brk() or sbrk(). The book alludes to it at the top of page 50. The
system call allows the process to move the end of the heap to a
certain address, and the library is responsible for intelligently
managing that memory. So, the bookkeeping necessary to keep track of
the fine-grained memory objects is taking place in the library via
malloc, and the operating system only deals with the much
coarser-grained information, like the end of the heap.


When the kernel issues a read for a file from disk, the extra latency
caused by the indirection pales in comparison to the time needed to
get the file from disk. On a multi-process system, can't the kernel do
something useful instead of waiting for the file?

This is getting a little ahead of the lecture, but yes, the kernel can
(and will) switch to another task that's ready to run when one task is
blocked waiting for a file to be read from disk.


Is the kernel a process? Are different parts of the OS processes?

This was actually in the last couple of slides of the lecture, but we
didn't get to it. I'll try to pick up on Tuesday. In "standard"
kernels, the kernel isn't a process. In some of the "experimental"
systems, there's only a relatively small kernel, and most of the
things that we think of as a kernel (file system, memory system, etc),
are actually handled by special processes running in user space.


The book mentioned interrupts occurring due to I/O and used the term
interrupt vector. What is it?

In general, interrupts are a way of getting the attention of the
hardware and OS. For example, rather than the OS continually checking
the keyboard to see if the user has pressed a key, there's a mechanism
that allows the keyboard to send a signal to the CPU that something
has happened and that it needs attention. This mechanism is called an
interrupt. Most I/O devices will use a scheme like it, and even
software can generate interrupts.

So, if lots of devices are attached to the system, how does the
hardware know which one generated an interrupt? One way is to have one
shared interrupt signal, and then the hardware/OS has to query
everything attached to it. This would be inefficient, so instead, a
common way around it is for the interrupt to have a number associated
with it. These are sometimes called vectored interrupts, because then
you can look into an array (a vector) and determine what needs to be
handled. Hence the term interrupt vector.


If the OS doesn't get involved on every translation, how does memory
protection work?

Again, we're getting far ahead of ourselves, but in general, the
hardware provides a small table that the OS can use to store
frequently-access translations. If the process generates a memory
request that isn't in this table, the hardware asks the OS what to do.
The OS can either update the table or tell the process that it tried
doing something illegal. The entries in this table map pages, so most
translations will take place in the hardware table.


Can a process find out which physical memory it's using?

In general, the answer is no, especially if you're looking for a
portable solution. In specific contexts, the answer may be yes, but
usually, it's not on general-purpose systems. In standard systems,
there's never really a reason that the process would need this
information.


How does the kernel map itself into every process?

That's a little too far ahead of the game - see chapter 4 if you're
really curious, or wait a couple of weeks.


Is there any way of making changes to a file viewable by everyone?

Not really through the standard file system interfaces. However, you
can use the mmap() system call (see "man mmap") with the "shared" flag
set, and all processes accessing that file can see changes
immediately.  However, getting notified that a change has occurred is
a different matter, and isn't part of mmap.


What did I mean when I said something about writing past the end of an
array?

let's say you have
int a[10];
int b;

If these are laid out in memory in the same way they're declared,
assigning any number of a[10] will store the value in b, since the
last valid entry in the array "a" is a[9]. This can be a real problem
if you use the gets() library call to get a line of input. Since
gets() doesn't take the maximum line size as a parameter, a malicious
person could type a really long line and try to crash your program.


What does the predictability of having things in the same location
really get you?

Well, one concrete example is the core dump - this is basically
writing the contents of memory to a file. Then, when you start gdb and
specify that you want this core file loaded, it knows exactly what
variables, etc., correspond to what locations in memory (and in the
file).


Isn't the predictability/repeatability of memory layout make all
programming easier? In other words, is it fair to say that it's bad in
the sense that it makes worms/virus spreading easier?

Well, this is more of philosophical question - yes, the predictability
makes all programming easier, and hence it makes it easier for the
worm writer as well. What's good for them is bad for average users in
this case. Biological monocultures tend to fare poorly in certain
plague scenarios, and we're starting to see similar behavior with
computers. I don't know that I'd trade off certain aspects of layout
simply to protect against viruses, but that is one approach people are
investigating to mitigate the spread of worms, etc.