COS441 Lecture 1

Princeton University
Computer Science Dept.

Computer Science 441
Programming Languages
Fall 1998

Lecture 1

Do programming languages matter?

Absolutely!!!

Otherwise, why would the Java folks have rejected C++ for their programming language?

Explore in this course all aspects of programming languages, including their features, type systems, programming styles supported, and implementations.

Programming languages are the main interface between computers and programmers, allowing us to express or understand algorithms to be executed by the computer.

Abstractions

By providing abstractions (or mechanisms to create abstractions) they influence the way we think about problems.

Data abstractions:

basic data types: integers, reals, booleans, characters, pointers
structured: arrays, records
unit: E.g., support for ADT's, modules, packages, classes

Control abstractions:

basic: assignment, goto, sequencing
structured: if..then..else.., loops, procedures & functions
unit: separately compiled units, modules, packages, concurrent tasks

Syntax and Semantics

For all constructs need to have clearly specified syntax and semantics:

syntax: what is a legal expression?
semantics: what is the result when it is executed?

Syntax always given formally (as well as informally).

Semantics usually given informally (English), but more and more formally.

Both necessary in order to ensure programs give predictable results.

How can programming languages support the software development process?

Phases in the development process are:

Requirements
Specification
Implementation
Certification or Validation (includes testing and verification)
Maintenance

Need to evaluate languages with respect to overall picture. Not good if just supports one aspect. Important to evaluate language based on what its goals are.

BASIC - quick development of interactive programs
Pascal - instruction
C - low-level systems programming

Languages which are good for quick hacking together of programs may not be suitable for large-scale software development.

Better choices for large-scale software include:

Ada, Modula-2, Clu, object-oriented languages like C++, Eiffel, etc.

Most languages today designed to support specific software development methodology or philosophy. E.g., top-down design, object-based design, encapsulation, information-hiding.

Languages influence the way people think about programming process.

Faculty always complain about BASIC hackers.

Minimum requirements for programming languages:

Universal - if can solve on computer then can program it in the language
Natural (expressive) - easy to express ideas.
Implementable
Efficient (for writing, compilation, or execution?)
Reliable - writeable (high-level), readable, ability to deal with exceptional behavior)
Maintainable - Easy to make changes, decisions compartmentalized, clean interfaces.

Alternative Programming Language Paradigms

Important to be aware of different programming language paradigms, allow one to think about problems in different ways.

Partially driven by new architectures (or at least not constrained by old). Imperative is closest to machine architecture.

Other paradigms:

functional - Popular in AI (LISP & Scheme) and theoretical prog. lang. community.
Closest to theoretical models and mathematics.
logic - Originally of interest in AI (though fading).
Also important as database query languages.
Now shifting to "constraint" languages.
Declarative, control implicit in search.
object-oriented - Latest fad. All major systems must be "object-oriented" to be up-to-date. Important, but current languages have some flaws.
- Based on discrete simulation, objects responsible for knowing what to do.
- Package state with appropriate operations.

History of Programming Languages

Machine language -> Assembly language -> High-level languages

Programmers: Single highly trained programmer -> large teams which must cooperate

Early

FORTRAN - Backus @ IBM, 1957, 18 person-years to write first compiler,
- Goals & contributions: numerical problems, very efficient compiler, separate compilation
- Many revisions FORTRAN II, IV, 66, 77, 90
ALGOL 60 - Committee, 1960
- Goals & contributions: numeric, block structure, recursion, elegant, very influential
- Ancestor of ALGOL W, ALGOL 68, Pascal, Modula, Ada, etc.
COBOL - Committee, 1960
- Goals & contributions: business data processing, records
- Several revisions

Early Schisms:

LISP - McCarthy @ MIT, 1962, core is functional
- Goals & contributions: List processing and symbolic manipulation, AI
- Scheme & Common LISP are modern descendents
APL - Iverson @ Harvard, IBM, 1960 (for notation),
SNOBOL 4 - Griswold @ Bell Labs, 1966 - string processing via pattern matching
- modern successor is ICON

Consolidation

PL/I - IBM committee, 1967
- combine FORTRAN, COBOL, ALGOL 60 - but not integrated, now considered a failure.
- Multipurpose - include ptrs, records, exceptions, etc.

Next Leap Forward

ALGOL 68 - Committee, "orthogonal" elements, elegant but very hard to understand
Simula 67 - precursor of object-oriented languages - designed for simulation, coroutines
Pascal - Wirth, ETH, 1971 - designed only as a teaching language
- Support structured programming, spare & elegant
- Successful beyond expectations

Abstract Data Types

Clu, Mesa, Modula-2, Ada
- Supports modules for encapsulation and information hiding

Other paradigms

Object-oriented: Smalltalk (1972), Eiffel, C++, Object Pascal, Java
Functional: Scheme (1985?), ML (1978), Miranda (1986) & Haskell (1991)
Logic: PROLOG (1972), newer constraint programming languages.

4th generation languages

Important in business applications
Specialized packages of powerful commands w/simplified "user-friendly" syntax.

Topics to be covered in course:

Programming language features and organization, including modules, classes, exception handlers, generic types
functional, object-oriented, and (perhaps) logic, as well as imperative paradigms
Programming language support for reliable programming:

abstraction, encapsulation, information hiding, polymorphism, higher-order operators.
Language support for concurrency(?)
Formal definitions of programming languages
Compilers and interpreters
Run-time behavior of programming languages, including impact of binding time.

Three main concerns:

Programming language features for reliable programming
Run-time behavior of programming languages
Formal semantics and interpreters for implementing languages.

Start out by learning ML so can explore some new ideas & rapidly program interesting applications.

Write our own interpreters for simple languages so can see impact of various design decisions.

Functional Languages

Problems with imperative languages

In his 1978 Turing award lecture (granted in recognition of his role in the development of FORTRAN, ALGOL 60, and BNF-grammars), John Backus attacked the pernicious influence of imperative programming languages and their dependence on the von Neumann architecture.

What is problem with imperative languages?

Designed around architectures available in 1950's.

Components:

CPU with accumulator and registers.
Memory
Tube connecting CPU with Memory which transmits one word at a time.

To execute an instruction, go through fetch, decode, execute cycle.

Ex. To execute statement stored in location 97 (ADD 162):

Fetch instruction from memory location 97 (to CPU)
Decode into operation (ADD) and address (162)
Fetch contents of address.
Add contents to accumulator and leave result in accumulator.

Simple statement like A:=B+C results in several accesses to memory through "Von Neumann bottleneck."

Imperative program can be seen as control statements guiding execution of a series of assignment statements (accesses and stores to memory).

Variable in programming language refers to location whose contents may vary with time.

Hard to reason about variables whose values are always changing, even within same procedure or function.

Math notation not like that. Static. If want to add time, add new parameter. Gives static reasoning about dynamic processes.

Important notion called referential transparency. Can replace an expression anywhere that it occurs by its value.

Very important for parallelism, since compute once and then reuse.

Not true of imperative languages. Can't compute x+1 once and replace all occurrences by its value.

Order of execution in imperative programs very important - inhibits parallel execution.

We will see several advantages of functional programming.

Referentially transparent - easier to reason about, easier to make parallel. Once expression evaluated, it can be reused.
Order of execution need not be specified. Expressions can automatically be executed when needed, even in parallel.
Higher-level, resulting in shorter, more understandable programs.
Can build new higher-order functions which allow you to put together old programs in more flexible ways.
"Lazy evaluation" can allow one to compute with infinite objects.

Other important reasons to study functional languages:

Useful in AI research
Useful in developing executable specifications and prototype implementations.
Closely related to CS theory (e.g., recursive functions, denotational semantics).

Commands vs. Expressions

Characteristics of commands and imperative languages in general:

Support for variables - represent memory locations for storing updatable values.
Assignment operation - progress in computation depends on changes in values stored in variables.
Repetition - flow of control guided by conditional and looping statement controlling order in which assignment statements are executed.

Imperative languages are organized around notion of statements.

Meaning of a statement is operation which, based on current contents of memory, and explicit values supplied to it, modifies the current contents of memory.

How are results of one command communicated to the next? Via changes to values in memory.

Problems

Too low level and architecture dependent.

Characteristics of expressions

Expressions return a value, depending on the state of the computation

Examples:

Literals: 3, true, "hello", 42.56
Aggregates: arrays, records, sets, lists, etc. E.g. {1,3,5}
Function calls: F(a,b), a + b * (c - d), (if x > 0 then sin else cos)(_)
Conditional expressions: if x <> 0 then a/x else 1, case (only in functional languages)
Named constants and variables: pi, x

Expressions (at least in math) better behaved than commands.

Meaning of a (pure) expression is operation which, based on current contents of memory, and explicit values supplied to it, returns a value.

Referential transparency

System is referentially transparent if, in a fixed context, the meaning of the whole can be determined solely by the meaning of its parts.

Independent of the surrounding expression.

Therefore once have evaluated an expression in a particular context, never have to evaluate it again in that context since value won't change.

Math. expressions are referentially transparent.

Ex. To evaluate "(2ax + b) (2ax +c)" in a context in which a = 3, b = 4, c = 7, and x = 2, sufficent to evaluate "2ax" only once.

Can determine meaning of f(g(x)) by only knowing the value of f, g, and x (independently).

Moreover if meaning of g' is same as g, then f(g(x)) = f(g'(x)).

(Note importance of replacing construct by equivalent one in compiler optimizations)

Lose referential transparency if allow functions with side effects.

I.e. suppose call to f(x) results in incrementing x by 1.

Then f(x) + f(x) != 2 * f(x).

Program supporting referential transparency much easier to prove correct since only need be concerned about meaning of components and then put them together.

With imperative languages, lose referential transparency.

x := x + y; y := 2 * x; versus y := 2 * x; x := x + y;

Since each command changes underlying state of computation and evaluation depends on state, ordering is critical.

Also correctness of program depends on contents of all memory cells.

Even when try to isolate portions of computations into procedures, can have non-local effects because of use of non-local variables and reference parameters.

Issues with expressions

Order of evaluation
e.g. short-cut evaluations of boolean expressions.
If i > 0 and A[i] <> 99 then ....
What happens if A : ARRAY [1..100] OF INTEGER and i = 0 ?
Pascal vs. Modula-2 conventions.
Side-effects - destroy referential transparency.

Some language conflate (identify) expressions and commands (ALGOL 68 and C).

Often artificial and results in loss of advantages of expressions (e.g., referential transparency).

Ex: x = (y = x+1) + y + (x++)

Compare 2*(x++) and (x++) + (x++)

We will restrict our attention (for the most part) to functional languages with pure expressions.

Try to eliminate problems of commands and take advantage of referential transparency.

Promote reasoning about programs & implementation on parallel computers.

Idea - Program is simply application of a function to data.

No notion of memory or assignment - like a mathematical function - No side effects.

Very rich expressions - virtually all expressions first-class (unlike most imperative languages) in particular, functions are first class objects.

History of functional languages: LISP, Scheme, FP, ML, Haskell, Miranda, Id

Gödel's general recursive functions (developed further by Kleene) (§10.6) and Church and Kleene's lambda calculus (§10.7) used as foundations for computable functions (before Turing machines). All found to be equivalent, leading to Church's thesis.

John McCarthy (then at MIT) in 1958-60 introduced a functional language (LISP), originally in study of symbolic differentiation with linked lists. Key article published in 1960 showing examples of important programs could be expressed as pure functions operating on lists. (LISP since been revised into competing dialects - Common LISP and Scheme.)

Functional languages or notation used in describing denotational semantics of programming languages starting in 1960's.

Most stunning event was Backus' Turing award lecture in 1978.
Proposed language FP (since replaced by FL) supporting "functional" style of programming.

First ML compiler was put out in 1977 (originally in support of interactive theorem proving system - text Edinburgh LCF by Gordon, Milner, and Wadsworth published). (Milner just won Turing award.) Standardized in about 1986.

Other important languages include SASL, KRC, and Miranda (all by David Turner). Haskell is successor. All support lazy evaluation.

Currently 3 main schools of functional languages:

LISP/Scheme
Strict functional (eager evaluation) (ML, Hope)
Lazy languages (Miranda, Haskell)

First two classes of languages support imperative features (though much more controlled in ML).

First uses dynamic typing, other two support static typing w/ polymorphic functions and type inference.

We choose ML for somewhat arbitrary reasons. Heavily used to develop real software, supports modern programming constructs.

The point of this part of the course is NOT to teach you ML, it is to teach familiarity with thinking in the functional paradigm with ML as the example language (though talk about others as well). I expect you to mainly learn ML on your own in the lab while I lecture on related material.

ML

Overview of ML

Developed in Edinburgh in late 1970's as Meta-Language for automated theorem proving system.

Success led to adoption and strengthening as programming language.

Important attributes:

Primarily applicative
Functions are first class values
Statically scoped
Static typing via type inference
Polymorphic types
Rich type system including support for ADT's.
Support for imperative features.
Support for exception handling
Automatic storage management via garbage collection
Incremental compiler supporting interactive program development.

How to use the run-time system.

Before launching sml, you must add its directory to your path. Add /usr/local/sml/bin to your path.
For most of you, this will mean adding the following to your .cshrc file:

   setenv PATH ${PATH}:/usr/local/sml/bin

If you use the CS Dept's version of .cshrc, you will see the obvious place to uncomment a similar line and make minor changes.

To launch ML type:

sml

System responds with message saying in ML, and then "-" prompt.

Can load definitions from UNIX file by typing:

   use "myfile.sml";

where myfile.sml is the name of your file. It should be in the same directory you were in when you typed sml.

Terminate session by typing control-D.

CS441 | CS Department | Princeton University

Princeton University Computer Science Dept.

Computer Science 441 Programming Languages Fall 1998 Lecture 1