PathKit

The PathKit abstracts Scout paths from the Scout OS. The goal of the PathKit is to introduce the resource isolation and configurability provided by paths into a number of contexts, for instance within the Linux kernel or inside a user-space application. To this end, the PathKit provides a conceptual object-oriented architecture based on Scout paths. PathKit objects are implemented in many cases by writing wrappers around pre-existing system components such as queues, threads, or protocols.

PathKit Objects

Object Implementations

SILK Module in PlanetLab

Limitations

Papers


PathKit Objects

The PathKit decomposes Scout paths into generic path components called Stages. Each Stage has a simple message-oriented interface that supports Push and Pull operations, resembling Click or the Scout partner interface:

  int Push(Stage s, Msg m);
  int Pull(Stage s, Msg m);
  int Destroy(Stage s);

The Push operation is used to push a message into the Stage, while Pull tries to fetch a message from the Stage. The Destroy operation is used to free state associated with the Stage.

The PathKit framework contains several different subclasses of the Stage class. Since all of these are also Stages, they all support the Push/Pull interface described above. First, the atomic Stages:

Demux IconA Demux inspects a message and pushes it to one or more Stages based on the message contents. Typically a Demux will extract flow information from a packet's headers and push the packet to the Path reponsible for this flow. The Demux interface provides methods to add and delete mappings between keys and Stages, and to resolve a key into a Stage.

Queue IconA Queue stores messages. Messages are pushed to a Queue, which either stores them or returns a failure code if the Queue is full. Pulling from a Queue returns a message if the Queue is not empty; otherwise the operation may block until a message is available. The Queue interface provides a method to check the queue length.

Thread IconA Thread represents a process in the system, and can store scheduling state for that process (e.g., how many shares have been assigned to the process in a proportional share CPU scheduler). Note that there is not necessarily a one-to-one correspondence between Thread objects and system processes. Multiple Threads may be multiplexed onto one process if it is not necessary to save per-Thread state (i.e., continuations), or multiple processes may share a single Thread that provides an entry-point into a path. The Thread interface provides methods to start and stop the Thread; it also contains a pointer to the Thread's start function, which is the function that executes when the Thread runs; and an entry-point method that can be called by a process to run the Thread's start function in the context of that process. A Thread can be marked non-blocking, so that pulling from an empty Queue will return failure.

Processing
IconA Processing Stage transforms a message that is pushed to or pulled from it, for example adding or removing protocol headers or processing the message payload.

Two Stage objects are compound objects built out of the atomic Stages listed above. These are:

Path IconA Path consists of one or more Stages chained together. A typical Path may start with a Queue, followed by a Thread, one or more Processing Stages, and then another Queue (as shown).

PktSched IconA PktSched schedules messages contained in one or more Queues. Pulling from the PktSched causes it to choose the next Queue to be serviced according to its internal scheduling policy. A message is then pulled from the chosen Queue and returned. The Queues managed by a PktSched may also be part of a Path.


Object Implementations

The objects described above are generic classes that simply define interfaces. In order to build functioning paths using the PathKit it is necessary to provide implementations of these objects, either as wrappers of existing functionality or by writing new code. The following objects have been implemented in the PathKit:

DemuxSkbuff
Demux Icon
Extracts address and port information from the IP, TCP, UDP, and ICMP headers of incoming packets contained in Linux sk_buffs , and demultiplexes them to Paths. It is possible to register two kinds of demux keys: passive keys contain a local IP address, protocol, and local port; in addition, active keys contain a remote IP address and remote port.

QueueSock
Queue Icon
Queues sk_buffs within a Linux struct sock. Uses enqueue and dequeue operations provided by Linux. When a Thread object pulls from an empty QueueSock, the call will block if the Thread allows blocking. A blocked Thread will awaken when a packet is pushed to the QueueSock.

ProcLinuxReinject
Processing Icon
Reinjects a sk_buff back into the Linux protocol stack at the point at which the netfilter shim intercepted it. Conceptually, protocol processing above IP in the Linux networking stack occurs in the context of this object.

ThreadKernel
Thread Icon
Provides a wrapper around a Linux kernel thread. Creating the ThreadKernel object causes the kernel thread to be created, and it is activated by calling the ThreadKernel's start function. A ThreadKernel is allowed to block.

ThreadContinuation
Thread Icon
A single kernel thread is multiplexed among all ThreadContinuation objects. This means that this type of Thread object is not allowed to block while pulling from a Queue, because it saves no state on the stack between runs.

ThreadEntryPoint
Thread Icon
A ThreadEntryPoint provides a point at which an existing process (i.e., one originating outside of the PathKit) can enter a path.

PathLinuxNetStack
Path Icon
This path implements a form of Lazy Receiver Processing (LRP) for TCP and UDP sockets, with the goal of performing protocol processing in the scheduling context of the process that consumes the data. The path consists of an input QueueSock, followed by a ThreadEntryPoint, a ProcLinuxReinject, and then an output QueueSock that is a wrapper around the normal TCP or UDP socket queue. An incoming packet is pushed onto the input QueueSock. When a process invokes a recv operation on the path, the process first enters the ThreadEntryPoint and drains the first QueueSock by repeatedly pulling messages from it (without blocking) and pushing them through the path. Messages pushed to the ProcLinuxReinject object are injected back into the Linux stack where they undergo TCP/UDP processing; the resulting data is placed in the socket buffer, corresponding to the path's final Queue. Note that in the current SILK module deployed in PlanetLab, the input queue of the PathLinuxNetStack is bypassed and therefore LRP is disabled. In other words, packets are pushed to this path, are reinjected back into Linux, and undergo protocol processing in a soft interrupt context; the only function of this path type is to perform packet accounting. The remainder of this document describes a system with LRP enabled for this path type, but more study is needed before introducing this feature into PlanetLab.

PathRawSocket
Raw
Socket Path Icon
The minimal path, consisting of a single QueueSock and used as part of the Safe Raw Sockets implementation for SILK on PlanetLab. Since raw sockets don't require any protocol processing after IP, incoming packets are simply pushed to the path and into the QueueSock.

PktSchedWFQ
PktSched Icon
Employs a WFQ packet scheduler to choose the next Queue for service. This object is tied to SILK's scheduling framework but is currently not used in SILK.


SILK Module in PlanetLab

In order to use the PathKit in an existing system such as the Linux kernel, we connect PathKit objects to the rest of the system using thin layers of code called shims. In SILK, three shims tie Path objects into the Linux kernel: socket, raw socket,and netfilter. The picture below illustrates how paths and shims relate in SILK.

Linux network stack The picture on the left depicts the Linux network stack. As an example of how data traverses it, consider an arriving TCP packet. The device layer copies the packet contents to an sk_buff message, inspects and strips the link-level header, and passes the message up to the IP layer. This layer inspects and strips the IP header, notes that the packet is a TCP packet addressed to the local host, and passes the message to the TCP layer. The TCP protocol code demultiplexes the message to the appropriate socket and places the data it contains in the socket's receive buffer. All of this occurs in the context of a soft interrupt. Subsequently, when a process calls read() on the file descriptor corresponding to this TCP socket, the system call traverses layers implementing generic operations on file descriptors and socket-specific operations, enters the TCP layer, and copies the data in the receive buffer to user space.

Linux
network stack w/SILK The picture on the right shows the Linux stack with the SILK module loaded, in particular highlighting the Path objects and shims. Again, we consider an arriving TCP packet. The packet is copied to an sk_buff message and processed by the IP layer as before. After IP reassembly has taken place, the message is intercepted by the netfilter shim and pushed to the root demux, an instance of a DemuxSkbuff object. The root demux matches the message to either a PathLinuxNetStack or a PathRawSocket and pushes it to the path; if the push succeeds, the netfilter shim reports to the Linux network stack that it has stolen the sk_buff and that Linux should stop processing it. Assume as before that the message belongs to a TCP socket and therefore is pushed to a PathLinuxNetStack and deposited in its input QueueSock.

When the process calls read() on the TCP socket, the resulting system call traverses the file operations layer and is intercepted by the socket shim. This shim finds the PathLinuxNetStack that corresponds to the socket and calls into its ThreadEntryPoint, which causes the process to pull the message from the input queue and push it into the TCP layer via the ProcLinuxReinject object. Now TCP processes the message in the context of the process that called read() rather than in a soft interrupt. After the message is processed by TCP, its data is placed in the receive buffer as before; conceptually the message has been pushed to the output QueueSock that wraps this receive buffer in a Queue object. Finally, execution returns from the ThreadEntryPoint back to the socket shim, and continues down into the TCP layer where the data is copied from the receive buffer to user space.

Socket Shim

The socket shim connects socket system calls to operations on Path objects. It does this by wrapping all socket calls within the kernel, allowing the shim to intercepts these calls, take some PathKit-related action, and then continue the call within Linux. For instance, the bind call is used to bind a socket to a local port. An application that calls bind first invokes the socket shim's bind wrapper; the shim checks its own port mapping to make sure the local port is available, enters a demultiplexing key for the socket into a Demux object, and then calls the bind function for that socket type in the kernel.

As implied by the bind example above, SILK maintains its own port usage map. One function of this map is to allow ports to be reserved for a particular user, meaning only that user can bind to that port. For this reason SILK takes over the assignment of free ports from Linux. That is, connect and the send* operations will assign a free local port if the socket is not already bound. In order to ensure that Linux does not choose a reserved port, the socket shim chooses the port itself and binds it to the socket within its connect or sendmsg wrappers.

The PathKit-specific operations performed in the socket wrappers are described below.

Socket call PathKit action
socket Creates a PathLinuxNetStack corresponding to the socket, and changes all of the operations on that socket to point to wrapper functions in the shim.
bind Checks to see if the local port is available; if so, adds a key to the root Demux object to demultiplex incoming packets on that local port to the appropriate PathLinuxNetStack object.
connect Binds socket to an available local port if necessary. Adds a key containing the local and remote ports to the Demux object.
send* If necesary, binds socket to an available local port and adds a key to the Demux object.
recv* Calls into the ThreadEntryPoint of the PathLinuxNetStack object corresponding to the socket.
close Destroys the PathLinuxNetStack for that socket, releasing port references and demux keys./TD>

Raw Socket Shim

The raw socket shim intercepts operations on safe raw sockets, which allow non-privileged users to send and receive IP datagrams. In order to use a safe raw socket, it must first be bound to an unused local TCP or UDP port like a standard TCP/UDP socket. The raw socket shim implements this addition to the Linux raw socket interface, and performs header checks on outgoing packets to ensure that the bound local port is the source port in the transport header.

The PathKit-specific operations performed in the raw socket wrappers are described below.

Socket call PathKit action
socket Creates a PathRawSocket corresponding to the socket, and changes all of the operations on that socket to point to wrapper functions in the shim.
bind Checks to see if the local port is available; if so, adds a key to the root Demux object to demultiplex incoming packets on that local port to the appropriate PathRawSocket.
connect Not supported for raw sockets
send* Checks the transport header to make sure that the local port is the same one to which the socket was bound. Also verifies the protocol field in the IP header and sets the header's source address to the local IP address (to prevent spoofing).
recv* No-op
close Destroys the PathRawSocket for that socket, releasing port references and demux keys

Netfilter Shim

The netfilter shim intercepts both incoming and outgoing packets by registering hooks with Linux's netfilter interface. Incoming packets are intercepted at the NF_IP_LOCAL_IN hook, after IP reassembly has occurred. The incoming packet hook pushes packets to the root Demux object; if the push succeeds, the hook returns NF_STOLEN, SILK assumes responsiblity for the packet, and Linux ceases processing it. If the push fails because of a full Path input queue, then the input hook returns NF_DROP and Linux drops the packet. Otherwise, the input hook returns NF_ACCEPT, indicating that this packet does not belong to a socket under control of SILK and that Linux can continue processing it.

All outgoing packets, including those sent on sockets not under SILK's control, are intercepted at the NF_IP_POST_ROUTING hook. This hook accounts an outgoing packet to the vserver that sent it using the vserver ID field in the struct sock associated with the packet, and pushes copies of the packet to any raw sniffer sockets that are listening on the packet's local port.


Limitations


To Add


Copyright © 2003 Andy Bavier