Resource Scheduling in SILK

Overview

Object Framework

Scheduling Disciplines

Linux CPU Scheduling Run-Time System

Limitations

Papers


Overview

The goal of SILK's scheduling framework is to generalize scheduling along two dimensions: those of the resource scheduled (e.g., CPU cycles or network bits) and the discipline used to schedule it. The framework is designed to be "dropped in" to any context where such scheduling is needed (e.g., application, OS kernel, device driver). Multiple schedulers are provided, including round-robin, priority, and several proportional sharing disciplines; new disciplines can be created by writing a self-contained object and plugging it into the framework. This framework is currently used to provide CPU isolation between slices on PlanetLab.

An actual resource scheduler built using the scheduling framework consists of two pieces:

As a concrete example, the above picture shows how our framework is used to schedule Linux tasks on PlanetLab. On the right is a hierarchy of objects. The HWResource object represents the CPU and it is scheduled using the Rate-Controlled Proportional Share (RCPS) Scheduler. For each slice on PlanetLab, a Container is created, given some number of shares, and bound to the CPU HWResource. Each Linux task running in a slice has its own Activity that is bound to the slice's Container. In the center, the CPU scheduling Run-Time System gets the next Activity from the HWResource, which makes the decision based on the shares and CPU consumption of all the Containers. Finally, the RTS sets the priority of the Linux task bound to that Activity so that the Linux scheduler will choose to run that task next. Below, we discuss each of these components in more detail.


Object Framework

The scheduling object framework builds on the Resource Containers work, elaborating the relationship between Resource Containers, the resources that they manage, and the scheduling disciplines used to manage them. The idea was to make the object interfaces agnostic as to the resources scheduled, the entities that consume the resources, and the scheduling policies used. In other words, it should be possible to schedule any timed resource (though the only ones we know of are CPUs and network links) and support arbitrary schedulers using this collection of objects. The framework is implemented in C in a loosely object-oriented manner, and comprises four kinds of objects:

Scheduler
Each Scheduler object encapsulates a particular scheduling policy. It contains no scheduling state, only functions that operate on state-ful objects such as Containers and HWResources. Mulitiple HWResources can bind to a Scheduler object, representing multiple resources (e.g., CPUs, network links) scheduled by the same scheduling policy.

HWResource
A HWResource represents a resource to be scheduled. A HWResource binds to exactly one Scheduler object, and multiple Container objects bind to a HWResource. Each Scheduler implementation typically defines its own subclass of the HWResource class, containing any per-resource state required by that Scheduler.

Container
A Container object corresponds to a principal in the system. For example, in PlanetLab a Container is mapped to a slice in the CPU scheduling run-time system (RTS). A Container binds to exactly one HWResource, and multiple Activity objects bind to a Container. Each Scheduler implementation typically defines its own subclass of the Container class, containing any per-principal state required by the Scheduler. Scheduling parameters (e.g., shares or priorities) are associated with Containers.

Activity
An Activity object represents a schedulable entity. For example, in PlanetLab an Activity is mapped onto a Linux task in the CPU scheduling RTS. An Activity binds to exactly one Container at a time. The scheduling parameter of the enclosing Container is used to schedule the Activity on the associated resource.

More information on the object interfaces can be found in the Programmer's Manual (in progress).


Scheduling Disciplines

Implementing a new scheduling discipline in the framework means writing a Scheduler object that encapsulates the particular scheduling policy. Scheduler objects have been implemented in the framework for the following scheduling disciplines:

Linux CPU Scheduling Run-Time System

The CPU scheduling run-time system (RTS) sits between the object framework and the PlanetLab Linux kernel. It performs the following functions:

The core of the RTS is the scheduling loop. At initialization, the RTS forks a kernel thread and sets this thread to the highest possible system priority (in Linux, this is Real-Time priority 99). This thread then enters a loop in which it calls the object framework to select the next Activity; puts the task associated with this Activity on the run queue and sets its priority to the next-highest system priority; and yields. In Linux, a yielding task is not eligible to run for one scheduling decision, and therefore the Linux scheduler chooses the task with the next-highest priority to run. After the Linux task has run for its timeslice, the RTS kernel thread runs again since it is now eligible and has the highest priority. The thread now scans all tasks bound to Activities to see if any have changed status, and looks for any new tasks that should be bound to an existing slice. All tasks managed by the RTS are pulled from the run queue, so the queue only contains unmanaged tasks (see below) and the task that the RTS wants to run. Finally, the RTS updates the accounting information for the Container of the Activity that just ran; and performs the next iteration of the loop.

There is always at least one Container in the system, the default system Container. All tasks not bound to a slice's Container are assigned to this default Container, and are managed differently by the RTS. Tasks bound to a standard Container are pulled from the run queue and only reinserted (with high priorities) when the RTS wants to run them. In contrast, tasks assigned to the system Container are left on the run queue whenever they are runnable. When the Scheduler decides to allow a task from the system Container to run, the RTS simply yields to Linux without setting any task to high priority. This allows the Linux CPU scheduler to choose the next task to run from the set of tasks on the queue based on its own policy. Note that, since the decision to run a task from the system Container is based on this Container's share, isolation is still preserved between these tasks and those bound to other Containers.

The RTS introduces a number of sources of overhead into the scheduling decision. First, the number of context-switches in the system doubles, since Linux must run the RTS kernel thread in order to make the "real" scheduling decision, and then context-switch to the selected task. Second, since the RTS is not directly informed when a task changes status (e.g., when wake_up() is called), it must scan the tasks to see if any have changed status. Currently, all tasks are inspected, but maybe it is only necessary to inspect blocked tasks to see if they have awakened. Third, the scheduling decision performed by the object framework takes some time. On the other hand, the RTS pulls Linux tasks that it is managing off of Linux's run queue, and only reinserts a task when it wants Linux to run it. This means that less time is spent in the Linux scheduler, which partially offsets the additional overhead introduced by the RTS. Quantifying the RTS overhead is future work.


Limitations

Following are the limitations of the scheduling framework as currently implemented.

General

PlanetLab-specific


To Add


Copyright © 2003 Andy Bavier