SBL Programming Model

This chapter describes programming model supported by the SHRIMP Base Library.

A SHRIMP machine is a set of UNIX nodes, each with SHRIMP network interface connected to high-bandwidth low-latency network. Additionally, each node has standard Ethernet connection, and is identified with a unique Internet address. The SHRIMP Base Library uses these addresses for node identification. SBL_Hosts SBL call can be used to obtain node ids of a SHRIMP machine. Processes are identified with standard UNIX process ids. Note that since the same process id can be used by two nodes, full process identification requires node id AND process id.

User address space is divided in SHRIMP pages, which size is given by SBL_PageSize() (currently 4096 bytes). Each such page is a multiple of virtual memory page. Each SHRIMP page contains integer number of SHRIMP words. The size of SHRIMP word is given by SBL_WordSize() (currently 4 bytes).

Receive Buffers

Communication supported by SBL is based on receive buffers. Receive buffer is a contiguous region of process memory used for receiving data from other processes running on SHRIMP. Each receive buffer is identified with user-selected buffer id (unsigned integer). Receiver process makes a receive buffer available for senders with SBL_ExportRecvBuf call which takes as arguments buffer id, buffer starting address and buffer length. Buffer id must be unique among all ids of receive buffers exported by a given process. Receive buffers cannot overlap.

Sender process has to import a given receive buffer before it can send any data. The import operation is implemented with SBL_ImportRecvBuf call. Import takes as arguments receive buffer id and full identification of a process (nodeId, pid) which exported this receive buffer. Import succeeds only after export call has been completed for this receive buffer. There is also an asynchronous version of import call, SBL_ImportRecvBufReq, which issues only an import request and returns immediately.

For a given process, ids of exported and imported buffers belong to two disjoint name spaces. As a result, one buffer id can be used in both export and import calls. However, for a given process, ids of exported buffers have to be unique. The full identification of imported buffer is not its buffer id, but a triple (buffid, nodeId, pid). As a result it is possible to import two buffers with the same buffer id, if they were exported by two different processes.

With one export call, a process can export exactly one receive buffer to potentially unlimited number of processes. With one import call, a process can import only one receive buffer.

[missing figure]

SBL allocates part of user virtual address space for destination space (DestSpace in short). Imported receive buffers are mapped into this space. There is no actual memory backing DestSpace. Import operation returns an address in local DestSpace, which corresponds to the imported buffer. If we import a buffer of size nwords (each word is SBL_WordSize() bytes, currently 4) and its address assigned by import operation is raddr, then the range (raddr, .. ,raddr + nwords*4 -1) in DestSpace corresponds to imported receive buffer. This address range represents a proxy buffer in sender's address space for this receive buffer. Given address in DestSpace can belong to no more than one proxy. Both terms proxy and imported receive buffer denote the same thing: a local representation of a receive buffer which has been imported from remote node and mapped into local DestSpace.

We say that successful import establishes an import-export link between receive buffer and its local proxy.

Protection Guarantees

Receive buffers need not begin or end on a page boundary. Although the SHRIMP library respects the true boundaries of a receive buffer, the SHRIMP hardware enforces protection on a page granularity.

Thus, if a process exports a receive buffer and a malicious process imports it, the importing process will be able, by bypassing the SHRIMP library, to send data to locations that are on the same page as the buffer but are not actually part of the buffer.

If you must communicate with a process you don't trust, you can assure absolute safety by aligning your buffer at the beginning of a page, and making its size a multiple of the page size. The size of SHRIMP page is returned by SBL_PageSize().

Communication

There are two modes of communication supported by SHRIMP: deliberate update and automatic update. The two modes differ in how communication is initiated. On the receiving side, there is no difference. In both cases, the data is directly transferred into receive buffer, without any action by receiver.

Deliberate update requires explicit operation, SBL_SendMesg, to initiate communication on the sender's side. SBL_SendMsg can transfer data from any memory in sender's address space (excluding DestSpace) to previously imported receive buffer on remote node. SBL_SendMsg takes as arguments address in DestSpace, which identifies receive buffer to be used, local "standard" (i.e. not in DestSpace) address which identifies data to be sent, and nwords which gives the size of the message.

For automatic update, a region of sender's memory must be mapped (with SBL_Map) to previously imported receive buffer. After this mapping has been performed, any write to mapped region of sender's memory is automatically propagated to receive buffer on remote node.

Deliberate update has higher overhead, but provides better bandwidth and is more flexible than automatic update. With deliberate update, one can send from any local memory to any imported receive buffer. With automatic update, one can send data from a given previously mapped address only to one destination (which was determined at the time of mapping).

Notifications

In each of transfer modes we can transfer data only (messages) or data and control (messages with notifications).

Notification is similar to UNIX signal mechanism. When a message with notification attached arrives at destination receive buffer, user-level notification handler will be invoked after the data is transferred into memory.

Handlers can be associated with receive buffers during export operation. Each receive buffer can have zero or one handler. If a message with notification arrives at receive buffer with no handler attached, the notification has no effect.

SBL_SendMsgWithNotify sends message with notification in deliberate update mode. This call takes the same arguments as SBL_SendMesg.

In automatic update mode, there is an argument to SBL_Map which determines if automatic update messages send along a given mapping should generate notifications. Please note that this means that for a given mapping we can have only two choices: one is that all updates generate notification, the other is that no update generates notification.

Each handler has the same function signature (i.e number and type of arguments). The first argument is the address of the last word of data transferred by the message which generated this notification, the second argument is the value of this word. Since SHRIMP continues to receive incoming messages between arrival of a message with notification and the call to associated user handler, the data of notification message can be overwritten by the subsequent messages even before the handler is called. However, SBL makes sure that the handler is called with the value of the last word, as delivered by the message with notification.

SBL provides two calls: SBL_BlockNotifications() and SBL_UnblockNotifications() to control the delivery of notifications. Blocking notifications is useful to ensure consistency of data structures modified by both user-level handlers and main thread of execution.

Blocking notifications is global, i.e. it affects all receive buffers of a given process. When notifications are blocked, they are queued by the system. After they are unblocked, they are delivered in-order to proper user-level handlers. Since there is a limited space to store queued notifications, they should not be blocked for too long.

For each SBL_BlockNotifications() there should be a call to SBL_UnblockNotifications(). Pairs of these calls can be nested. SBL_UnblockNotifications() unblocks notifications only if it is called on the first level of nesting. In such case, this call returns positive integer, otherwise it returns zero. To make sure that notifications are unblocked unconditionally, one can call SBL_UnblockNotifications() in the loop until it returns positive integer. If notifications are unblocked, further calls to SBL_UnblockNotifications() have no effect.

While user-level notification handler is executing, notifications remain blocked. Notification handler should not block or wait spinning. Not all SBL calls can be used from within notification handler. Both SBL_BlockNotifications() and SBL_UnblockNotifications() can be called from within the handler, provided they are paired. However, any attempt to unblock notifications unconditionally by repeated calls to SBL_UnblockNotifications() will eventually return an error (as notifications must remain blocked within a handler).

Removal of Import-Export Link

There are two calls provided to undo export and import operations.

SBL_UnimportRecvBuf is executed by importer of a given buffer. This call undoes a previous call to SBL_ImportRecvBuf. The call breaks the connection to the remote receive buffer, and deallocates the local proxy memory.

SBL_UnexportRecvBuf undoes a previous call to SBL_ExportRecvBuf. All existing connections to the buffer are forcibly broken, and the buffer is made unavailable for further connections. In particular all importers of this buffer cannot send any messages to this buffer after this call completes.

Errors

Most SBL calls return error status. Negative integer indicates an error, while zero means no error occurred. Errors can be reported with SBL_Error.

The following errors are possible:

Costs

This section discusses costs associated with interface calls in terms of execution overhead and memory consumption. These costs are associated with particular implementations of SBL and may change in new implementations later.

Exporting a receive buffer pins its pages in physical memory in both SHRIMP-I and SHRIMP-II systems.

Both export and import require change of protection domain to get to a trusted arbiter (kernel or server) to establish a connection. Network transaction is required to complete import-export link. Currently this transaction is initiated by import, but it may change in the future.

Sending a message is a user-level operation in SHRIMP-II, but requires a system call in SHRIMP-I system. For short transfers (within a page on both sender and receiver) deliberate update in SHRIMP-II takes a few machine instructions to initiate a DMA. For longer transfers, the number of DMA initiations in the worst case can be equal to number of pages on BOTH sender and receiver which are touched by this transfer minus one. Those initiations are performed in user level in SHRIMP-II and in the kernel in SHRIMP-I.

Automatic update has a write buffer latency (in the best case) in SHRIMP-II and is not available in SHRIMP-I system.

Receiving a message without notification does not require a system call on receiver. For a message with notification, an interrupt happens which is later translated by system software into an upcall to user handler.

Both unimport and unexport are expensive operations which usually require network transactions and switch of protection domain. In particular, unexport must communicate with all importers of a given buffer and wait for acks before it can be completed.


Copyright (c) 1995, Princeton University