Q&A Session - 7:00pm Wednesday, room 105 (our current room) What does it mean that each process has the kernel mapped? Is this taking up extra memory? Logically, what this means is that the virtual memory structures needed to map the kernel are always in place. If you do this "right", it doesn't really have to consume that much extra memory. If we're dealing with hierarchical page tables, the relevant entries in the top-level directory for all processes have to point to the same set of lower-level entries that handle the kernel mapping. So, it's not too ugly. Could you go over the differences between software-controlled and hardware-controlled TLBs again? Are TLBs always implemented in hardware, and it's just the management of the TLBs that may optionally be in software? The TLB hit/miss checking is always implemented in hardware since that is absolutely performance-critical. The management of the TLB (which determines its behavior on TLB misses) is what may be implemented in hardware or software. Realize that a TLB is just a cache of the page information, so not all of the page table entries will fit into the TLB. So, if the TLB has a miss, the most common reason is simply because there wasn't enough space in the TLB. So, at this point, something has to decide what TLB entry to evict, and how to load the appropriate entry from the main-memory page table into the TLB. This used to be done by software, and then after the approaches stabilized, this has sometimes moved to hardware. So, this means that the OS and hardware have to agree upon a page table structure. Note, however, that only "minor faults" (valid PTEs exist in the main page table) are going to be handled by the hardware - if the page isn't in physical memory, or if the attempted access was invalid, it's still invoking the software path. How does having a TLB in hardware save time if there's a miss? If the TLB is software managed, that means that some instructions must be executed on every TLB miss. However, if those instructions aren't in the L1 cache, then they must be fetched from main memory. So, every TLB miss may also incur that extra penalty. In contrast, if the minor fault handler is implemented in hardware, there are no instructions that need to be fetched to run it. They're essentially built into the chip. Can you explain when software TLB management isn't too bad and when it's really bad? If you don't have many TLB misses, then increasing their cost by a small factor may not hurt at all. Likewise, if you have lots of TLB misses, the instructions to resolve them may stay in the L1 cache all the time. The bad scenario is when you have enough TLB misses to slow down performance, and you've got a program that accesses enough code/data in between TLB misses to kick the miss handlers out of the cache. What determines the size of the TLB? There's a certain amount of space on the chip that can be devoted to caches - L1, L2, and TLB. My guess is that the TLB is made as large as possible without it being too slow or without consuming more chip space than it can effectively use for an interesting set of benchmark applications. That's sort of a weasel answer, but I didn't even get any good results on the TLB details of current chips when I tried a few web searches. Why is there a valid bit in the TLB? Why would certain mappings be invalid? Well, remember that all entries of the TLB are being checked in parallel. Assume the TLB doesn't have any process ID info, and a context switch has just occurred - the TLB has to be flushed. Since all of the comparisons are going to take place anyway, you have to have some way of saying that "even if this entry matches, it's not a real entry", and that's one of the uses of the invalid bit. When you say combining a TLB with a cache, does that mean that if a page is in the TLB, it should also be in the cache? How are they being used together? Note that combining here doesn't mean a literal combining of the two features. What motivates this "combining of behavior" is the observation that if the cache maintains information by physical address, then every cache hit will first require a virtual-to-physical translation. Instead, if the cache maintains information by virtual address, then a cache hit doesn't require that a TLB lookup be performed as well - it's already been merged. The cache still operates using cache lines (not pages), and having its own replacement policy separate from the TLB. What is meant by "consistency in memory"? A cache is just replicating part of memory, but is faster and smaller. If something gets written to the cache, it should eventually get written back to main memory. Likewise, if something changes in main memory, that change should get reflected in the cache (or that portion of the cache should be marked as invalid, causing future accesses to get the real value from main memory). So, if the value being cached changes in main memory and this change doesn't get reflected in the cached copy, it's said to be in an inconsistent state. Why is it good to have a sparsely-populated array if we're not using a hash table? It's not that we have much of a choice in the matter - the logical array is sparsely populated simply because most processes only use a small fraction of their virtual memory space. Hence, all of the unused virtual memory regions are causing the array to be sparsely populated. In the inverted page table, if the table has one entry per physical page, why do you need to hash at all? Shouldn't the pid/vpage number map directly to the physical page? It would be very difficult to come up with a mapping that's general enough and that allows all resources to be used easily. For the sake of argument, assume you have a system with 100 pages of memory and 100 process that use 1 page each. Can the same mapping handle this case as a system that has 10 processes that use 10 pages each? I can't think of one, so that's why hashing is used - if you design the hash function well, it's unlikely to have really bad worst cases. Where is the hash chain in the inverted page table? Not shown. Also realize that it may not have a chain - there are other ways of handling conflicts in hash tables, such as just walking down until you find a free entry, or re-hashing. I don't understand direct-mapped caches - what does it mean to be N-way set-associative? Direct-mapped caches have exactly one entry per location in the cache. That means that if items X, Y, and Z all map to the same location in the cache, only one can be cached at a time. For N-Way Set-Associative caches, you can think of them as having N entries per location. So, if we have a 2-way set-associative cache, then out of X, Y, and Z, two can be cached at any time. In a fully associative cache, the only restriction on what can be cached is the size of the cache itself. How many levels do page tables usually have? I would guess that on 32-bit systems with a 4KB page size, you'd only need two levels. Each level would take care of 10 bits of the virtual address. Is a TLB miss always a minor fault? What exactly is a major page fault? No - a TLB miss can also occur if the page isn't in physical memory at all. So, at this point, the OS has to get involved and load the page from disk. This is a major fault. What's the difference between segmentation with paging and a multiple-level page table? Well, the hardware requirements are quite different - realize that the segmentation schemes we've discussed have the segment table built into the hardware. In the multi-level page table, there's not a direct counterpart. Both can use TLBs to speed up the system. TLBs don't work well on matrices Several people made this observation, so I should clarify. Certain matrix operations, such as matrix multiply, have straightforward implementations that interact poorly with TLBs. So, the people who care about these things implement performance-aware algorithms. Do a search for "blocked matrix multiply" if you'd like to know more. How exactly is the entry of the TLB chosen based on VPage#? The virtual page # is presented to the TLB, which compares all of the entries in the TLB to see if one of them has a matching virtual page#. The low-level details are basically involve lots of comparators, one per TLB entry (I think). Doesn't making the TLB fully associative make it harder to look things up than making it N-way set-associative? Yes, it requires more hardware, but my guess is that there must be certain classes of applications (think high-performance fortran code) that would suffer if TLBs weren't fully associative. So, if it's a small cost to make sure you don't get killed on a benchmark, do it. How does the OS know what level of hardware suport is provided by a particular processor? Processors provide certain instructions that allow the OS to get information about what the processor provides. Some of these instructions may operate in very fine detail, while others may just say that this is processor level X, and there's some published info from the manufacturer that says how much of everything level X has. Can you explain virtually-addressed caches again? This is the start of the next lecture (which had slides handed out in this one) At which point does the OS get involved in page faults on a system like Linux? I think all modern x86 chips have a hardware TLB miss handler, so the OS only needs to get involved if the page isn't in physical memory at all, or if the attempted access was invalid. How does the TLB know what entries to keep? If it can only load about 100 entries, which 100 does it have? This was the slide about replacement policy. Some TLBs just randomly evict an entry when a new entry needs to be loaded. Others will have some sort of "LRU-like" approach, where they try to evict the entry that hasn't been used in the longest time. When are you getting rid of Windows? Well, right now, I've got a working Windows laptop, and I will always need a laptop for presentations and such, so there's no need to get rid of Windows. When this laptop needs to get replaced, then it might be possible to get rid of Windows. The reason for getting rid of Linux was basically that it was time to replace my desktop machine (300 Mhz Pentium II). Does the UltraSparc use an inverted page table? According to http://www.memorymanagement.org/glossary/i.html the Alpha, UltraSparc, and PowerPC all include inverted page tables, but if this were really important, I'd check something besides the web. When is the last day that we can ask questions? The safest assumption is that whatever's written on the feedback on Thursday will get answered. However, that's feedback, so it's somewhat restricted to the scope of lecture, and assumes that the book's been read, etc. The last safe time to ask any general question, then, is on Wednesday night. I'm unlikely to have e-mail access over the weekend. On slide 29, what does "need to write back" mean? Assuming I'm thinking of the right slide, the TLB may keep track of which pages have been referenced, modified, etc., so if the PTE has changed, it would have to be written back to the main page table before it's evicted from the TLB. What is the value of "size" on slide 21? I've got different slide #s - could you give me the appropriate URL to a jpg? Are midterm questions generally like the quiz questions, or are they more like problems that we have to solve? My format is probably going to be something like these: http://www.cs.princeton.edu/~vivek/f2000_318_midterm.pdf http://www.cs.princeton.edu/~vivek/f2001_318_midterm.pdf http://www.cs.princeton.edu/~vivek/f2001_318_final.pdf