How does the local file table (the per-process array) help protect overall system integrity? Let's assume that this array didn't exist and that all processes referenced a global array instead. What this would mean is that each read/write call could be referring to any file that's currently open in the entire system, so the OS would have to check on each read/write call whether that process is allowed to access that file. Instead, by introducing the local file table and numbering it from 0 on a per-process basis, the checking is much simpler. Basically, it's "did the process ever open a file with a number this high" and "is this file still open". What happens if a process creates (forges) a pointer to a region that it doesn't own? On modern, general-purpose operating systems, it will cause a seg fault or its equivalent. The process (via libraries, etc) basically has to indicate what portions of memory it wants to access before it uses them. It doesn't do this on every access -- things like malloc() handle it for you behind the scenes. In the slides, where it said "actual file info," did it really mean global file table? Yes and no - that actual file info could have been one element of a large array, or we could have had another array of pointers which would then point to the actual file info. Both designs would have similar properties in this regard. When I said that the OS keeps track of how many things are using a file so that it knows to get rid of it, is the directory tree one of the things? I should have been a little clearer here - what was being "gotten rid of" wasn't the file itself, but rather, the bookeeping information about files that are being actively used. When nobody is currently access a file, the kernel may opt to no longer store information about it in main memory (since we have a limited amount of main memory). There are similar forms of bookkeeping on disk to determine when a file is "really" deleted, but the two forms of bookkeeping are mostly unrelated. How does a buffer overflow really work? Well, the most common form of exploit seems to be when the buffer is allocated on the stack. Recall that the stack frame also contains the return address where this function is supposed to go when it's finished. So, if you overwrite that return address, you can control where the program resumes execution when the function completes. My guess is that the attacker writes a really long buffer that contains the code itself, and the return address points to that code on the stack. That's just a guess. Any reason for the mixed braces "[ )" on slide 19? Yes, I was trying to be a math weenie. The start location is actually being read/written, but the location "buffer+length" is not. The last location read/written is really "buffer+length-1", hence the mixed braces. How does the OS give programs the illusion that they're writing to one address when they're really writing to another one? This is one aspect of what's known as "virtual memory", and we'll cover it later in the semester. At a high level, the OS/hardware provide a translation table between the addresses the process uses (virtual addresses) and the actual memory of the system (physical addresses). To make this translation process efficient, it's not the case that any arbitrary mapping is allowed. The translation takes place at the granularity of "pages", which are understood by the OS and hardware, and generally ignored by the processes. Pages tend to be 4KB or 8KB. Does the OS protect files from being written, and if so, how? Good question - the way this is done will be covered later in the semester, but basically, it's again more bookkeeping data stored on disk. Along with each file is some information on which individual user owns it, what group of people own it, and what permissions all of the people on the system have to it (e.g. the individual user who owns it can write to it, the group that owns it can read from it, and the rest of the people on the system can do neither) The stripes on the powerpoint make it hard to read. OK, I'll avoid that background What's the difference between read and fread, and when is each one appropriate? The read() system call gets you the bytes you asked for, no more and no less. The fread() library call generally tries to optimize things behind your back. Most of the time, you don't mind the optimizations that it does. For example, it'll generally call read() and ask for large chunks of data so that it has to call read() fewer times than you would call it yourself. Since a library call is cheaper than a system call, if your program does lots of small reads, it's often a performance boost to use fread(). It's also the case that since fread uses a FILE * and read uses an int, sloppiness in parameter handling may be easier for the compiler to catch on fread. So, when is fread bad? Remember our example about multiple processes sharing the same file that contains a list of tasks, and each process reads the next task once it completes its current task? That would be problematic if fread() was reading more than you really wanted. What happens if a user process tries accessing kernel space? The translation layer that maps virtual addresses to physical addresses also tends to have support for determining if a particular translation is allowed. In this particular case, when the user process tries to use a kernel address, that hardware layer would say that the user process doesn't have access to that range of memory. The OS would then normally kill the process. If it's feeling really vengeful, it will then drag around the carcass of the process as a warning to other processes not to try the same thing. Do read() and write() maintain positions within the file like fread() and fwrite() do? Yes. If the file descriptor's entry in the local file table points into the global file table, can't a malicious program forge pointers into the global file table? The local file table is part of the process, which is in kernel memory. So, under normal circumstances, the process won't even get to see where the file descriptor entry in the local file table. Even in the event that the evil programmer gets this somehow, the other hardware mechanisms mentioned above should prevent him/her from modifying kernel memory. Is the kernel taking up space in every program a separate instance? Does launching a new program take up more space? Launching a new program only takes up a small amount of extra kernel memory, for a suitable definition of small. The kernel that's mapped into the memory of every process is a single instance of the same kernel. The extra space required in the kernel is the bookkeeping overhead for each process. Who actually does the malloc()? The malloc() library call generally invokes a system call by the name of brk() or sbrk(). The book alludes to it at the top of page 50. The system call allows the process to move the end of the heap to a certain address, and the library is responsible for intelligently managing that memory. So, the bookkeeping necessary to keep track of the fine-grained memory objects is taking place in the library via malloc, and the operating system only deals with the much coarser-grained information, like the end of the heap. When the kernel issues a read for a file from disk, the extra latency caused by the indirection pales in comparison to the time needed to get the file from disk. On a multi-process system, can't the kernel do something useful instead of waiting for the file? This is getting a little ahead of the lecture, but yes, the kernel can (and will) switch to another task that's ready to run when one task is blocked waiting for a file to be read from disk. Is the kernel a process? Are different parts of the OS processes? This was actually in the last couple of slides of the lecture, but we didn't get to it. I'll try to pick up on Tuesday. In "standard" kernels, the kernel isn't a process. In some of the "experimental" systems, there's only a relatively small kernel, and most of the things that we think of as a kernel (file system, memory system, etc), are actually handled by special processes running in user space. The book mentioned interrupts occurring due to I/O and used the term interrupt vector. What is it? In general, interrupts are a way of getting the attention of the hardware and OS. For example, rather than the OS continually checking the keyboard to see if the user has pressed a key, there's a mechanism that allows the keyboard to send a signal to the CPU that something has happened and that it needs attention. This mechanism is called an interrupt. Most I/O devices will use a scheme like it, and even software can generate interrupts. So, if lots of devices are attached to the system, how does the hardware know which one generated an interrupt? One way is to have one shared interrupt signal, and then the hardware/OS has to query everything attached to it. This would be inefficient, so instead, a common way around it is for the interrupt to have a number associated with it. These are sometimes called vectored interrupts, because then you can look into an array (a vector) and determine what needs to be handled. Hence the term interrupt vector. If the OS doesn't get involved on every translation, how does memory protection work? Again, we're getting far ahead of ourselves, but in general, the hardware provides a small table that the OS can use to store frequently-access translations. If the process generates a memory request that isn't in this table, the hardware asks the OS what to do. The OS can either update the table or tell the process that it tried doing something illegal. The entries in this table map pages, so most translations will take place in the hardware table. Can a process find out which physical memory it's using? In general, the answer is no, especially if you're looking for a portable solution. In specific contexts, the answer may be yes, but usually, it's not on general-purpose systems. In standard systems, there's never really a reason that the process would need this information. How does the kernel map itself into every process? That's a little too far ahead of the game - see chapter 4 if you're really curious, or wait a couple of weeks. Is there any way of making changes to a file viewable by everyone? Not really through the standard file system interfaces. However, you can use the mmap() system call (see "man mmap") with the "shared" flag set, and all processes accessing that file can see changes immediately. However, getting notified that a change has occurred is a different matter, and isn't part of mmap. What did I mean when I said something about writing past the end of an array? let's say you have int a[10]; int b; If these are laid out in memory in the same way they're declared, assigning any number of a[10] will store the value in b, since the last valid entry in the array "a" is a[9]. This can be a real problem if you use the gets() library call to get a line of input. Since gets() doesn't take the maximum line size as a parameter, a malicious person could type a really long line and try to crash your program. What does the predictability of having things in the same location really get you? Well, one concrete example is the core dump - this is basically writing the contents of memory to a file. Then, when you start gdb and specify that you want this core file loaded, it knows exactly what variables, etc., correspond to what locations in memory (and in the file). Isn't the predictability/repeatability of memory layout make all programming easier? In other words, is it fair to say that it's bad in the sense that it makes worms/virus spreading easier? Well, this is more of philosophical question - yes, the predictability makes all programming easier, and hence it makes it easier for the worm writer as well. What's good for them is bad for average users in this case. Biological monocultures tend to fare poorly in certain plague scenarios, and we're starting to see similar behavior with computers. I don't know that I'd trade off certain aspects of layout simply to protect against viruses, but that is one approach people are investigating to mitigate the spread of worms, etc.