The Standard ML Basis Library


The CHAR signature

The CHAR signature defines a type char of characters and provides basic operations and predicates on values of that type. There is a linear ordering supported on characters. In addition, there is an encoding of characters into a contiguous range of non-negative integers that preserves the linear ordering.

There are two structures matching the CHAR signature. The Char structure defines a superset of the usual ASCII characters and locale-independent operations on them. For this structure, Char.maxOrd = 255.

The optional WideChar structure defines wide characters, which are represented by a fixed number of 8-bit words (bytes). If the WideChar is provided, it is distinct from the Char structure.


Synopsis

signature CHAR
structure Char : CHAR
structure WideChar : CHAR

Interface

eqtype char
eqtype string
val minChar : char
val maxChar : char
val maxOrd : int
val ord : char -> int
val chr : int -> char
val succ : char -> char
val pred : char -> char
val < : (char * char) -> bool
val <= : (char * char) -> bool
val > : (char * char) -> bool
val >= : (char * char) -> bool
val compare : (char * char) -> order
val contains : string -> char -> bool
val notContains : string -> char -> bool
val toLower : char -> char
val toUpper : char -> char
val isAlpha : char -> bool
val isAlphaNum : char -> bool
val isAscii : char -> bool
val isCntrl : char -> bool
val isDigit : char -> bool
val isGraph : char -> bool
val isHexDigit : char -> bool
val isLower : char -> bool
val isPrint : char -> bool
val isSpace : char -> bool
val isPunct : char -> bool
val isUpper : char -> bool
val fromString : String.string -> char option
val scan : (Char.char, 'a) StringCvt.reader -> 'a -> (char * 'a) option
val toString : char -> String.string
val fromCString : String.string -> char option
val toCString : char -> String.string

Description

eqtype char

eqtype string

minChar
is the least character in the ordering. It always equals chr 0.

maxChar
is the greatest character in the ordering.

maxOrd
is the greatest character code; equals ord maxChar.

ord c
chr i
returns the integer code of the character c and the character whose code is i, respectively. The function chr raises Chr if i < 0 or i > maxOrd. When chr is restricted to the interval [0,maxOrd], these two functions denote the character encoding function and its inverse.

succ c
returns the character immediately following c in the ordering, or raises Chr if c = maxChar. When defined, succ c is equivalent to chr(ord c + 1).

pred c
returns the character immediately preceding c, or raises Chr if c = minChar. When defined, pred c is equivalent to chr(ord c - 1).

c < d
c <= d
c > d
c >= d
compare characters in the character ordering. Note that the functions ord and chr preserve orderings.

compare (c, d)
returns LESS, EQUAL, or GREATER, according as c precedes, equals, or follows d in the character ordering.

contains s c
returns true if character c occurs in the string s; otherwise false.
Implementation note:

In some implementations, the partial application of contains to s may build a table, which is used by the resulting function to decide whether a given character is in the string or not. Hence it may be expensive to compute val p = contains s, but fast to compute p c for any given character c.



notContains s c
returns true if character c does not occur in the string s; false otherwise. Equivalent to not(contains s c).
Implementation note:

As with contains, notContains may be implemented via table lookup.



toLower c
toUpper c
returns the lowercase (respectively, uppercase) letter corresponding to c if c is a letter; otherwise returns c.

isAlpha c
returns true if c is a letter (lowercase or uppercase).

isAlphaNum c
returns true if c is alphanumeric (a letter or a decimal digit).

isAscii c
returns true if c is a (seven-bit) ASCII character, i.e., 0 <= ord c <= 127. Note that this function is independent of locale.

isCntrl c
returns true if c is a control character. Equivalent to not o isPrint.

isDigit c
returns true if c is a decimal digit (0-9).

isGraph c
returns true if c is a graphical character, that is, it is printable and not a whitespace character.

isHexDigit c
returns true if c is a hexadecimal digit (0-9, a-f, A-F).

isLower c
returns true if c is a lowercase letter.

isPrint c
returns true if c is a printable character (whitespace or visible), i.e., not a control character.

isSpace c
returns true if c is a whitespace character (space, newline, tab, carriage return, vertical tab, formfeed).

isPunct c
returns true if c is a punctuation character: graphical but not alphanumeric.

isUpper c
returns true if c is an uppercase letter.

fromString s
scan getc strm
scan a character (including space) or an SML escape sequence representing a character from the prefix of a string or a character stream. After a successful conversion, fromString ignores any additional characters in s. If no conversion is possible, e.g., if the first character is non-printable (i.e., not in the ASCII range 0x20-0x7E), NONE is returned.

The allowable escape sequences are:

          \a       Alert (ASCII 0x07)
          \b       Backspace (ASCII 0x08)
          \t       Horizontal tab (ASCII 0x09)
          \n       Linefeed or newline (ASCII 0x0A)
          \v       Vertical tab (ASCII 0x0B)
          \f       Form feed (ASCII 0x0C)
          \r       Carriage return (ASCII 0x0D)
          \\       Backslash
          \"       Double quote
          \^c      A control character whose encoding is C - 64, where C
                   is the encoding of the character c, with C in the range
                   [64,95].
          \ddd     The character whose encoding is the number ddd, three decimal
                   digits denoting an integer in the range [0,255].
          \f...f\  This sequence is ignored, where f...f stands for a sequence
                   of one or more formatting characters.
          


toString c
returns a printable string representation of the character, using, if necessary, SML escape sequences. Printable characters, except for #"\\" and #"\"", are left unchanged. Backslash #"\\" becomes "\\\\"; double quote #"\"" becomes "\\\"". The common control characters are converted to two-character escape sequences:
          Alert (ASCII 0x07)                    "\\a"
          Backspace (ASCII 0x08)                "\\b"
          Horizontal tab (ASCII 0x09)           "\\t"
          Linefeed or newline (ASCII 0x0A)      "\\n"
          Vertical tab (ASCII 0x0B)             "\\v"
          Form feed (ASCII 0x0C)                "\\f"
          Carriage return (ASCII 0x0D)          "\\r"
          
The remaining characters whose codes are less than 32 are represented by three-character strings in ``control character'' notation, e.g., #"\000" maps to "\\^@", #"\001" maps to "\\^A", etc. All other characters (i.e., those whose codes are 127 or greater) are mapped to four-character strings of the form "\\ddd", where ddd are the three decimal digits corresponding to a character's code.

fromCString s
scans the string s as a C source program string, converting escape sequences into the appropriate characters (cf. Section 6.1.3.4 of the ISO C standard ISO/IEC 9899:1990).

toCString c
returns a printable string corresponding to c, with non-printable characters replaced by C escape sequences. Specifically, printable characters, except for #"\\", #"\"", #"?" and #"'" are left unchanged. Backslash #"\\" becomes "\\\\"; double quote #"\"" becomes "\\\"", question mark #"?" becomes "\\?", single quote #"'" becomes "\\'". The common control characters are converted to two-character escape sequences:
          Alert (ASCII 0x07)                    "\\a"
          Backspace (ASCII 0x08)                "\\b"
          Horizontal tab (ASCII 0x09)           "\\t"
          Linefeed or newline (ASCII 0x0A)      "\\n"
          Vertical tab (ASCII 0x0B)             "\\v"
          Form feed (ASCII 0x0C)                "\\f"
          Carriage return (ASCII 0x0D)          "\\r"
          
All other characters are represented by one to three octal digits, corresponding to a character's code, preceded by a backslash.


Discussion

In WideChar, the functions toLower, toLower, isAlpha,..., isUpper are locale-dependent. In Char, these functions are locale-independent, with the following semantics:

Add table for ISO Latin-1 characters and predicates.

See Also

Locale, MultiByte, STRING

[ INDEX | TOP | Parent | Root ]

Last Modified May 15, 1996
Copyright © 1996 AT&T