7.1 Formal Languages
This section under major construction.
The theory that we consider in this chapter is based on the following simple abstract notions. An alphabet is a set of symbols. A string over an alphabet is a sequence of symbols taken from that alphabet. A formal language over an alphabet is a set of strings over that alphabet. When addressing theoretical questions, we often use the binary alphabet that consists of the two symbols 0 and 1. In practice, we use whatever alphabet is suitable to the task at hand: the Roman alphabet for processing text; decimal digits for processing numbers, and the characters acgt for processing genetic data.
Alphabet Symbols Symbol
NameString
Namebinary 01 bit bitstring Roman abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZletter word decimal 0123456789 digit integer special ~!@#$%^&*()_-+={[}]|\:;"'<,>.?/ keyboard Roman + decimal + special keystroke typescript DNA actg gene genome monomer ACDEFGHIKLMNPQRSTVWY amino acid protein ASCII see Appendix byte String UNICODE see Appendix char String
The simplest way to specify a formal language is to enumerate its strings.
For example, we can enumerate all bitstrings of length 3:
000 001 010 011 100 101 110 111
One complication is that languages can be large or infinite. For the moment, we will use informal English descriptions to describe (possibly) infinite sets of strings.
Language true false Tail(2)
next-to-last bit is 000
11101
111111010101
0
000010
11111111111
Equal
equal number of 0s and 1s10
110010
000011111100
0
11100
01010101010101010
Why not just work with precise, complete definitions?
This is the crux of the matter! Our goal is to
work with precise, complete definitions. This is
known as the specification problem for formal
languages. How is formal language related to computation?
Each formal language leads to the precise
statement of several specific computational problems.
Perhaps the most basic is the following: given a language
L and a bitstring x, determine whether x is in L or not.
This task is called the recognition problem
for formal languages.
With suitable interpretations of the meaning of the bitstrings,
we can define formal languages that relate to nontrivial
computational problems. For example, we might define
Prime to be the set of all bitstrings that are the binary
representation of a prime number. Solving the recognition
problem for this language involves understanding some mathematics,
and probably using a computer.
Formal languages are very important tools that are widely used
in practical applications. Natural languages like English,
programming languages like Java, and genetic codes are all examples
of formal languages. Specification, recognition, and related problems
for these sorts of languages are of critical importance in
modern computing.
