Project 3: Preemptive Scheduling

Due: 11:59pm 5 November 2012

TA in charge of this project: Yida Wang (yidawang@cs.princeton.edu)
Lab TAs in charge of this project: Michael Franklin (mfrankli@cs.princeton.edu)
Ilias Giechaskiel (igiechas@princeton.edu)

Overview

You will extend the provided kernel (which are available on both
Blackboard and the fishbowl machines at the directory /u/318/codes/project3/start_code_3.tar.gz) with preemptive multitasking and implement several synchronization primitives. Unlike the project 2, the tasks in this project can be preempted by another task via a time interrupt. You need to deal with the time interrupt and handle the critical section issues so that a task can be preempted safely. Besides, you will implement three kinds of synchronization primitives: condition variable, semaphore and barrier by yourself.

Getting Started

Refer to previous projects for information on building and testing your project. Some versions of gcc may not support the -fno-stack-protector flag; it is safe to remove this flag from Makefile if you are using one of those versions. If you are using fishbowl machines or the provided image on VirtualBox, it should be fine to directly use the provided Makefile.

We have provided the program: tasks which prepares tasks (test cases) for your OS to run. Its usage is ./tasks taskset. We have also provided five test sets:

For example, to test the barrier primitive on your OS, run ./tasks test_barrier and then make clean. This will update the symbolic links in your main project directory so that, upon recompiling the project, your OS will run those tasks.

If you wanted to create your own test set, or to create a test set with no processes, you could copy one of these test sets to a new directory, and then modify it to suite your needs.

Use this to test your kernel. We will use similiar test cases when grading.


Preemptive Scheduling

The x86 interrupt mechanism is quite complicated, so we will present only the details relevant to this project. Interrupts can be triggered in several different ways: hardware, software, exceptions. The timer interrupt, IRQ 0, is a hardware interrupt, as is IRQ 7. The system call mechanism, which uses the int instruction, is a software interrupt. Exceptions are generated for a variety of invalid actions: dividing by zero, for example, will cause an exception.

The support code installs handlers for IRQ 0, IRQ 7, the system call interrupt, and the various exceptions. Your job for this part is to complete the handler for IRQ 0, entry.S:irq0_entry; the others are given to you as examples. The main file you will be working with is entry.S, but you may need to consult other files, especially interrupt.[ch] and scheduler.[ch]. In particular, the definition of the PCB structure has moved to scheduler.h.

The processor takes the following actions on an interrupt:

  1. Disable interrupts
  2. Push the flags register, CS, and return IP (in that order)
  3. Jump to the interrupt handler
Once the interrupt handler has run, it returns control to the interrupted task with the iret instruction, which pops the instruction pointer and the flags register, then reenables interrupts.

As with entry.S:syscall_entry, it may be some time before entry.S:irq0_entry actually performs the iret, since other tasks may execute in the middle. IRQ 0 is a hardware interrupt, so you should acknowledge it quickly using the macro entry.S:SEND_EOI -- otherwise the hardware won't deliver any more timer interrupts. The other tasks entry.S:irq0_entry must perform are

  1. Increment the 64-bit counter, num_ticks
  2. Increment entry.S:disable_count to reflect the fact that interrupts are off.
  3. If nested_count is zero, enter the kernel the way a system call would yield, else do nothing.
  4. Decrement entry.S:disable_count in anticipation of iret
nested_count is a field in the PCB. For processes, it is 1 during a system call and 0 otherwise. For threads, it is always 1. You should study how interrupt.c changes this field during a system call.

This may not look very hard, but the tricky part is managing when interrupts are on, and when they are off. Your solution should satisfy the following two properties:


Blocking Sleep

In the provided code, function sleep and do_sleep work by spinning on num_ticks. Unfortunately, this loop runs within the kernel, and so it cannot be preempted. As a result, if one task sleeps, all tasks sleep.

You must modify the provided implementation so that calls to sleep will block the current process (i.e. take it out of the ready queue until some appropriate time in the future).


Synchronization Primitives

The main file you will edit here is sync.[ch], but you will need to consult queue.[ch] and scheduler.[ch]. You have to implement condition variables, semaphores, and barriers. We've given you an implementation of locks as a reference. These primitives must operate correctly in the face of preemption by turning interrupts on and off with the functions enter_critical and leave_critical.

In the descriptions below, if an operation is performed on an uninitialized structure, the results are up to you. All unblocks should place the unblocked thread at the end of the ready queue.

Condition Variables

condition_init(cond) Initializes the specified memory to hold a condition variable.
condition_wait(lock, cond) Releases lock, blocks until signalled, then reacquires lock. The calling thread is responsible for acquiring lock before calling this function. Please see the hints section.
condition_signal(cond) Unblocks the first thread to have blocked on condition_wait(lock,cond). If no thread is blocked on cond, then do nothing.
condition_broadcast(cond) Unblocks all threads that have blocked on condition_wait(lock,cond). The order in which threads are unblocked is not important.

Semaphores

semaphore_init(sem, value) Initializes the specified memory to hold a semaphore whose starting value is value; a non-negative integer.
semaphore_down(sem) If the semaphore is not positive, then block. Decrement the semaphore once it is positive. Please see the hints section.
semaphore_up(sem) Increments the semaphore. Please see the hints section.

Fairness requirement: in a trace with exactly n ups, the downs that complete successfully are the first n invocations. Calling semaphore_init(sem, value) counts as value ups.

Barriers

barrier_init(bar, n) Initializes a barrier for use by n threads.
barrier_wait(bar) Blocks until a total of n threads have called barrier_wait(bar). The order in which threads are unblocked is unimportant. Immediately after the last call to barrier_wait(bar) returns, and possibly before the other calls have returned, the barrier is available for another round of waits. Please see the hints section.

Design Review

First, take a look at
what we expect from you in general.

The design review is worth a total of 5 points.

For example, this is an acceptable explanation for the project 2's system call mechanism:

When a process makes a system call, it looks up the kernel_entry() and calls it with the ID number of the system call. kernel_entry() saves the user context in the PCB, restores the kernel stack, then calls kernel_entry_helper(). kernel_entry_helper() multiplexes (switch) its argument and invokes the "do_" version of the system call. When kernel_entry_helper() returns, kernel_entry() saves the kernel stack, restores the user context, then returns to the process.

This is an example of an acceptable explanation of the project 2's implementation of lock:

The lock structure contains a bit indicating whether the lock is held or not and a queue of threads waiting for the lock. lock_init() sets the lock to not held and sets up an empty queue. lock_acquire() enters a critical section and tests if the lock is held. If it is, then it adds itself to the lock queue and blocks. Otherwise, it sets the held bit. Finally, it leaves the critical section. lock_release() also enters a critical section and tests if there are any threads waiting on the lock. If there are, it removes the next thread from the front of the queue and unblocks it; otherwise, it clears the held bit. Finally, it leaves the critical section.
You answer may not be so complete, but at least you need to convince the TA that you understand the workflow of system calls and locks. For the synchronization primitives, it would be nice if you write the pseudo-codes down to show to the TA.

Besides, as part of the previous year's design review requirement, we strongly suggest you to study the new system call mechanism of this project and explain how it works. You should look at at least entry.S, interrupt.c, and syslib.c.

To siun up for a design review, use our signup page.


Hints


Checklist

Note: this is not intended to be a thorough list; just a quick reference.

Submission

To ensure compatability with our tests, do the following:

  1. ./tasks tsk_test
  2. make clean; make
  3. Fix any compilation errors without modifying files within tsk_test. Upon running, the screen should display which extra credits you have attempted. If the display is incorrect, then fix your #defines in common.h.
  4. We encourage you to print out diagnostics as a way to inspect your code. However, please do so in a way that leaves the output of the supplied tasks intact.

When you are ready to submit, You will submit only these files along with a readme which should be no more than 500 words:

  1. common.h
  2. entry.S
  3. helper.S
  4. interrupt.h
  5. interrupt.c
  6. kernel.h
  7. kernel.c
  8. queue.h
  9. queue.c
  10. scheduler.h
  11. scheduler.c
  12. sync.h
  13. sync.c
  14. syslib.h
  15. syslib.c
  16. util.h
  17. util.c

Please restrict your changes to these files. Submit via dropbox; only one person per group should submit.


Extra Credit

Prioritized Task Scheduling

For an additional 1 point of extra credit, change the scheduler algorithm so that it allocates precessor time proportional to each task's priority. If you choose to do this, you should come up with your own priority scheduling algorithm and describe it in your readme.

One way to test your implementation of priority scheduling would be to modify test_preempt, so that each task has a different priority. In the syslib.[ch] files we have provided a setpriority function for you to use. If you observe that two counters are growing in different rates and the priority numbers and entry counts of processes/threads are changing as you extected, probably your algorithm works. You do not need to submit these test cases.

If implemented, add #define EC_PRIORITIES to common.h and report it in your readme.

Automatic Deadlock Detection

For another additional 1 point of extra credit, write an algorithm which automatically tests for a particular class of deadlock. This algorithm is conceptually simple, but may be difficult to implement:
Imagine a directed graph in which every process and every lock is a vertex. Draw an edge pi → lj if process pi blocks while trying to acquire lock lj. Draw an edge lm → pn if lock lm is held by process pn. Whenever this graph contains a cycle, a deadlock has occurred.

You don't need to implement this as a graph algorithm per se, but for extra credit you do need to implement something which is at least as sensitive as this this algorithm, and which has no false positives.

If a deadlock is detected, lock_acquire fails with return value 1. In all other situations, lock_acquire returns 0. Only one call to lock_acquire should fail for a single deadlock situation. A failed lock_acquire should not affect the state of the lock (it should remain locked). This enables one process to recover from failure, and all other processes to continue as normal. For instance, that process may choose to release all locks it holds and then retry.

This style of deadlock detection is only relevant to locks, and is not relevant to semaphores, condition variables or barriers. In fact, it's only relevant to a restricted use of locks, in which locks are only unlocked by the same process which locks them.

To test your implementation of deadlock detection, create a new test set containing processes which will necessarily deadlock. These processes should report if lock_acquire ever returns failure. You do not need to submit these new test cases.

If you implement this extra credit, you should add #define EC_DEADLOCK to common.h, and report it in your readme.