The Standard ML Basis Library


Introduction

This document is a proposal for a Standard ML Basis Library. This library provides a rich initial basis for Standard ML, which complements the language described by the Definition of Standard ML The goals of the Basis Library are to:

In this chapter, we discuss the principles used in the design of the Library, and present a high-level view of the library structure.

Design principles

By design, the Basis Library is meant to provide a fairly rich collection of general-purpose modules that can serve as the basis for applications programming or for more domain-specific libraries. One criterion for inclusion in the Basis Library was that a type or value requires compiler or run-time system support. In addition, the Library defines a standard minimal environment that anyone using SML interactively can expect to find. The Library also attempts to provide similar functions in similar contexts. Thus, the traditional app function for lists, which applies a function to each member of a list, has also been provided for arrays and vectors.

An opposite design force has been the desire to keep the basis small. In general, a function has been included only if it has clear or proven utility, with additional emphasis on those that are complicated to implement, require compiler support, or are more concise or efficient than an equivalent combination of other functions. Some exceptions were made for historical reasons.

The Basis Library is contained in a set of structures. Almost every type, exception constructor and value belongs to some structure. Although some identifiers are also bound in the initial top-level environment we have attempted to keep the number of top-level identifiers small. Infix declarations and overloading are specified for the top-level environment.

We have divided the modules into required and optional modules. Any conforming implementation of SML Standard Library will provide implementations of all of the required modules. In addition, if an implementation provides any of the services covered by the optional modules, then they shall conform to the given interfaces.

Many of the structures are variations on some generic module (e.g., single and double-precision floating-point numbers). [TABLE] gives a list of the required generic signatures.


Required generic signatures
Signature Description
CHAR Generic character interface
INTEGER Generic integer interface
MATH Generic math library interface
IMPERATIVE_IO Imperative I/O interface
MONO_ARRAY Mutable monomorphic arrays
MONO_VECTOR Immutable monomorphic vectors
PRIM_IO System-call operations for IO
REAL Generic real number interface
STREAM_IO Stream I/O interface
STRING Generic string interface
SUBSTRING Generic substring interface
TEXT_IO Text I/O interface
TEXT_STREAM_IO Text stream I/O interface
WORD Generic word (i.e., unsigned modular integer) interface

Non-generic signatures typically define the interface of a unique structure. [TABLE] gives a list of the required non-generic signatures.
Required non-generic signatures
BIN_IO
BOOL
BYTE
COMMAND_LINE
DATE
GENERAL
IEEE_REAL
IO
LIST
LIST_PAIR
OPTION
OS
OS_FILE_SYS
OS_IO
OS_PATH
OS_PROCESS
STRING_CVT
TIME
TIMER

The required structures (and their signatures) are listed in [TABLE].
Required structures
Module Signature Status Description
Array ARRAY OM Mutable polymorphic arrays
BinIO BIN_IO Binary input/output types and operations
BinPrimIO PRIM_IO M Low-level binary IO
Bool BOOL O Boolean type and values
Byte BYTE M Conversions between Word8 and Char
Char CHAR OM Ordinary characters
CharArray MONO_ARRAY M Mutable arrays of characters
CharVector MONO_VECTOR M Immutable arrays of characters
CommandLine COMMAND_LINE M Program name and arguments
Date DATE M Calendar operations
General GENERAL OM General-purpose types, exceptions and values
IEEEReal IEEE_REAL M Floating-point classes and hardware control
Int INTEGER OM Default integer structure
IO IO Basic I/O types and exceptions
LargeInt INTEGER M Structure providing largest integer
LargeReal REAL M Largest floating-point representation
LargeWord WORD M Structure providing largest word
List LIST O List type and utility functions
ListPair LIST_PAIR List of pairs and utility functions
Math MATH Default math structure
Option OPTION O Optional values and partial functions
OS OS M Basic operating system services
OS.FileSys OS_FILE_SYS M File status and directory operations
OS.IO OS_IO M Support for polling I/O devices
OS.Path OS_PATH Pathname operations
OS.Process OS_PROCESS M Simple process operations
Position INTEGER M File system positions
Real REAL OM Default real structure
String STRING OM Ordinary strings
StringCvt STRING_CVT Conversions between strings and various types
Substring SUBSTRING O Substrings
TextIO TEXT_IO O Text input/output types and operations
TextPrimIO PRIM_IO M Low-level text IO
Time TIME M Representation of time values
Timer TIMER M Timing operations
Vector VECTOR OM Immutable polymorphic vectors
Word WORD OM Default word structure
Word8 WORD M 8-bit words
Word8Array MONO_ARRAY M Arrays of 8-bit words
Word8Vector MONO_VECTOR M Vectors of 8-bit words

The key to the status column is:
O
Are any of the structure's members available in the initial top-level environment?
M
Does the structure require special compiler or run-time system support?
[TABLE] gives the set of optional structures.
Optional structures
BoolArray MONO_ARRAY Mutable arrays of booleans
BoolVector MONO_VECTOR Immutable arrays of booleans
FixedInt INTEGER Largest fixed precision integers
ImperativeIO IMPERATIVE_IO Functor to convert stream I/O into imperative IO
IntInf INT_INF Arbitrary-precision integers
IntN INTEGER N-bit, fixed precision integers
IntArray MONO_ARRAY Mutable arrays of default integer
IntNArray MONO_ARRAY Mutable arrays of N-bit integers
IntVector MONO_VECTOR Immutable vectors of default integers
IntNVector MONO_VECTOR Immutable vectors of N-bit integers
Locale LOCALE Support for locale-dependent applications
MultiByte MULTIBYTE Support for multibyte characters
PackRealNBig PACK_REAL Big-endian packing for N-bit floats
PackRealNLittle PACK_REAL Little-endian packing for N-bit floats
PackRealBig PACK_REAL Big-endian packing for default floats
PackRealLittle PACK_REAL Little-endian packing for default floats
PackNBig PACK_WORD Big-endian packing for N-byte words
PackNLittle PACK_WORD Little-endian packing for N-byte words
Posix POSIX Root POSIX structure
Posix.Error POSIX_ERROR POSIX error values
Posix.FileSys POSIX_FILE_SYS POSIX file system operations
- POSIX_FLAGS Generic POSIX flag interface
Posix.IO POSIX_IO POSIX I/O operations
Posix.ProcEnv POSIX_PROC_ENV POSIX process environment operations
Posix.Process POSIX_PROCESS POSIX process operations
Posix.Signal POSIX_SIGNAL POSIX signal types and values
Posix.SysDB POSIX_SYS_DB POSIX system database types and values
Posix.TTY POSIX_TTY Control of POSIX TTY drivers
PrimIO PRIM_IO Functor to build PRIM_IO structure
RealArray MONO_ARRAY Mutable arrays for default reals
RealVector MONO_VECTOR Immutable vectors for default reals
RealN REAL N-bit floating-point numbers
RealNArray MONO_ARRAY Mutable arrays of N-bit floating-point numbers
RealNVector MONO_VECTOR Immutable vectors of N-bit floating-point numbers
StreamIO STREAM_IO Functor to convert primitive I/O into stream I/O
SysWord WORD Words sufficient for OS operations
WideChar CHAR Support for wide characters
WideString STRING Support for wide strings
WideSubstring SUBSTRING Support for wide substrings
WideTextPrimIO PRIM_IO Low-level wide char IO
WideTextIO TEXT_IO Text I/O on wide characters
WordN WORD N-bit words

For completeness, we list the optional signatures in [TABLE].
Optional structures
INT_INF
LOCALE
MULTIBYTE
PACK_REAL
PACK_WORD
POSIX
POSIX_ERROR
POSIX_FILE_SYS
POSIX_FLAGS
POSIX_IO
POSIX_PROC_ENV
POSIX_PROCESS
POSIX_SIGNAL
POSIX_SYS_DB
POSIX_TTY

We specify certain relationships among the modules.

Miscellaneous structures

To permit users to compile programs written under the old basis, we require that each implementation provide the structure SML90. This structure contains the top-level bindings specified in the Definition, along with one or more substructures that define the top-level bindings of various implementations. For example, a user might write:

local
  open SML90 SML90.NJ
in
  (* user's program *)
end
to compile a user's program under the old SML/NJ basis.

We expect that at some future point, the SML90 module will be deemed obsolete, and will be dropped from the standard basis.

Conforming implementations must provide modules that exactly match the signatures defined in the SML Standard Library. For example, the Int structure provided by an implementation should not match a superset of the INTEGER signature. Additional structures should be provided for extensions to the basis, other libraries, or access to implementation-specific information.

Orthographic conventions

We use a new set of spelling and capitalization conventions. Some of these conventions, e.g., the capitalization of value constructors, seem to be widely accepted in the user community. Other decisions were based less on dominant style or compelling reason than on compromise and the need for consistency and some sense of good taste. We hope users will accept the conventions and concentrate on the issues of semantics.

The conventions we use are:

The above conventions concerning variable and constructor names, if followed consistently, can be used by a compiler to aid in detecting the subtle error in which a constructor is misspelled in a pattern-match and is thus treated as a variable binding. Some implementations may provide the option of enforcing these conventions by generating warning messages.

Naming

Similar values should have similar names, with similar type shapes, following the conventions outlined above. For example, the function Array.app has the type:

    val app : ('a -> unit) -> 'a array -> unit
which has the same shape as List.app. Names should be meaningful, but concise. However, we have broken this rule in certain instances where previous usage seemed compelling. For example, we have kept the name app rather than adopt apply. More dramatically, we have purposely kept most of the traditional Unix names in the optional Posix modules, to capitalize on the familiarity of these names and the available documentation.

Comparisons

Many structures define a type ty along with a comparison function

    val compare : ty * ty -> order
plus the expected relational operators >, >=, < and <=. In all cases, the standard relationships hold between these functions. For example, we have x > y = true if and only if compare(x, y) = GREATER. If, in addition, ty is an equality type, we assume that the operators = and <> satisfy the usual relationships with compare and the relational operators. For example, if x = y, then compare(x,y) = EQUAL. Note that these assumptions are not quite true for real values; see the REAL signature for more details.

Types that have a standard or obvious linear order come with the full set of relational operators plus a compare function. Certain abstract types, e.g., OS.FileSys.file_id, provide a compare function for use with, for example, ordered binary trees.

Conversions

Most structures defining a type provide conversion functions to and from other types. When unambiguous, we use the naming convention toT and fromT, where T is some version of the name of the other type. For example, in WORD, we have

    val fromInt : Int.int -> word
    val toInt : word -> Int.int
If this naming is ambiguous (e.g., a structure defines multiple types that have conversions from integers), we use the convention TFromTT and TToTT. For example, in POSIX_PROC_ENV, we have
    val uidToWord : uid -> SysWord.word
    val gidToWord : gid -> SysWord.word

There should be conversions to and from strings for most types. Following the convention above, these functions are typically called toString and fromString. Usually, modules provide additional string conversion functions that allow more control over format and operate on an abstract character stream. These functions are called fmt and scan. The input accepted by fromString and scan consists of printable ASCII characters. The output generated by toString and fmt consists of printable ASCII characters.

We adopt the convention that conversions from strings should be forgiving, allowing initial white space and multiple formats, and ignoring additional terminating characters. On the other hand, we have tried to specify conversions to strings precisely. In addition, for basic types, scanning functions should accept legal SML literals, and formatting functions should, whenever possible, produce the value part of a valid SML literal but, for flexibility, may omit certain annotations. For example, String.toString produces a valid SML string constant, but without the enclosing quotes, and Word.toString produces a word constant without the "0wx" prefix.

Characters and strings

The old basis did not provide a character type, only a string type. To manipulate characters, programmers used integers corresponding to the character's code. This was unsatisfactory for several reasons:

Alternatively, programmers used strings of length one to represent characters, which is less efficient and cannot be enforced by the type system. We have replaced the single string type provided by the Definition with the types string and char, where the string type is a vector of characters. In addition, we define the optional types WideString.string and WideChar.char, in which the former is again a vector of the latter, for handling character sets more extensive than Latin-1.

Miscellany

Functional arguments that are evaluated solely for their side-effects should be required to have a return type of unit. For example, the list application function should have the type:

   val app : ('a -> unit) -> 'a list -> unit

[ INDEX | TOP | Parent | Root ]

Last Modified January 21, 1997
Copyright © 1996 AT&T