Princeton University
Computer Science Dept.

Computer Science 441
Programming Languages
Fall 1998

Lecture 7

Variables

What does "N := N + 1" mean?

Variable has 6 components

Name
Type
Location or reference (l-value)
Value (r-value)
Scope - where variable accessible and manipulable - static vs dynamic
Lifetime - interval of time in which location bound to variable

Scope and Lifetime same in some languages - clearly different in some (FORTRAN)

   N := N + 1

First N refers to location (l-value), second to value (r-value).

Obtaining value of variable called dereferencing. (going from reference or location to value)

Explicit in some languages. In ML write N := !N + 1
p^ - explicit dereferencing an extra level in Pascal
A[i] - reference valued expression - most expressions only give r-values

Most commonly think of value of variable as changing at run-time, but others can as well.

E.g., name can change (via call by reference parameter)

Aliasing:

Similarly assignment of variables (e.g., x := y) can be by copying or by sharing.

In copy semantics, target variable retains its location, but copies new value from source variable.

In sharing semantics, target variable gets location of source variable, so both share the same location (like objects in Java).

Constants have values, but no location.

One way of classifying a language is according to sorts of entities that can be bound to an identifier. Called denotable values.

In Pascal, objects which can be bound to id's and corresponding declarations are:

primitive values and strings (in constant def's)
ref's to variables and associated types (in variable declarations)
procedure and function abstractions (in procedure and function def's)
types (in type def's)

Note restrictions in constant def's - irregularity in language

Scope

Scope of a variable is the range of program instructions over which the variable is known.

Static vs Dynamic

Static

Most languages use static scoping (e.g., Pascal, Modula-2, C, ..)

Scope is associated with the static text of program.

Can determine scope by looking at structure of program rather than execution path telling how got there.

May have holes in scope of variable

program ...
    var M : integer;
    ....
    procedure A ...
        var M : array [1..10] of real;
        begin
            ...
        end;
begin
    ...
end.

Variable M declared in main program is not visible in procedure A, since new declaration of M "shades" old declaration.

Symbol table keeps track of which declarations are currently visible.

Think of symbol table as stack. When enter a new scope, new declarations are pushed on and can shade old ones. When exit scope, declarations arising from the scope are popped off.

Dynamic scoping:

Scope is associated with the execution path of program.

In particular an occurrence of an identifier in a procedure may be associated with different variables at different times in the execution of the program.

Example:

program ...
    var A : integer;

    procedure Y(B: integer);
        begin
            ...; 
            B := A + B; 
            ...
        end; {Y}

    procedure Z(...);
        var A: integer;
        begin
            ...; 
            Y(...); 
            ...
        end; {Z}

    begin {main}
        ...; 
        Z(...);
        ...
    end.

Question: Which variable with name A is used when Y is called from Z?

In static, clearly globally defined A.
In dynamic, local A in Z (since declaration of A in Z is most recent).

With dynamic scoping, symbol table built and maintained at run-time.

Push and pop entries when enter and exit scopes at run-time.

For obvious reasons, dynamic scoping usually implies dynamic typing!

LISP and APL use dynamic scoping (though SCHEME has default of static)

In FORTRAN, variables are allocated statically. All variables are allocated storage before execution of program begins. As consequence, when return to a procedure local variables still have last value left at end of previous invocation. Not so in Pascal or C.

In Pascal or C, when enter procedure any local variables are allocated and are then deallocated when exit. (Called dynamic allocation of variables).

In block-structured language (e.g., Pascal, C, Modula-2, etc.):

Use run-time stack to allocate space for local variables and parameters when enter a new unit (procedure, functions, etc.). Space is called an activation record.

Pop it off run-time stack when exit unit.

Note that a procedure may have several activation records on stack if called recursively.

Even without recursion may have several distinct variables on stack with same name!

When pointers are used, utilize another kind of memory, called "heap".

When do "new(p)" operation where p is of type pointer to T, sufficient memory is allocated from the heap to hold a value of type p and p is assigned the location of that memory. The value is accessed by writing p^ (in Pascal).

This memory does not follow the stack discipline. The lifetime of the heap-allocated memory is determined manually by "new" and "dispose" commands. Entering or exiting a scope has no impact on the allocation or deallocation of this memory.

Therefore in Pascal and C (for example) there are three kinds of memory: static (occupied by global variables), stack-based or automatic (occupied by parameters and local variables of procedures and functions), and heap-based or manually allocated.

In ML, everything comes off of heap. But automatically allocated when needed and deallocated (by garbage collector) when no way of accessing it. Java is similar.

More implementation details later.

Value

Usually think of value as bound at execution time, but can vary. If bound at language definition time (e.g., maxint, true, false), then called language defined constant.

If freeze at compilation, then program constant

    const size = 100;
           doubleSize = 2 * size {called manifest constant}

In other languages allow:

    procedure ... (n : integer) is
        var x: constant integer := 3 * n - 2;  
                             {value bound & frozen on procedure entry}
             A:  array[1..n] of real;

Some language allow variables to be initialized at declaration.

    var x : integer := 5;

But when is binding done - only first time procedure is entered or every time ? FORTRAN only first time, Java every time.

Postpone discussion of binding time for types until later.

Confusion over names and locations

Two expressions are said to be aliases if they denote the same location:

This can occur especially easily when using var parameters.

Ex: If p(x,y) is a procedure where x and y are both var parameters, then the call p(z,z) makes x and y aliases in the body of p.

If the body of p(x,y) first increases x by one and then y by one, what is the result of the call to p(z,z)? Get z increased by 2! Aliasing often producing surprising (undesirable) behavior in functions.

Also arises when global variable, x, is used inside procedure where x was used as an actual parameter for a var parameter.

Also easy to get aliasing with pointers:

    var x, y: ^ int;
    ...
    x := y;

Then x^ and y^ are aliases - any change to one, changes the other!

In languages with assignment by sharing (e.g., Java), get aliasing automatically with all assignments.

The ultimate in bad manners: Pointers

Recognized as major cause of run-time errors.

"Pointers have been lumped with the goto statement as a marvelous way to create impossible to understand programs."

Kernighan & Ritchie, The C Programming Language

(In fairness, they then go on to defend the use of pointers.)

Problems:

If not specify what type they point to (PL/I), then can break type system.
Dangling pointers
1. If pointers can point to object on run-time stack (named variable - PL/I, C), then object may go away before pointer.
2. User may explicitly deallocate pointer even if other variables still point to same object. Possible solutions involve reference counting or garbage collection.
Dereferencing uninitialized or nil pointers may cause crashes.
Garbage: Unreachable items may clog heap memory & can't recycle. Garbage collection or reference counting may solve.
Holes in typing system may allow arbitrary integers to be used as pointers (through variant records in Pascal)

Pointer arithmetic norm in C

though p+1 for pointer is not same as p + 1 for integer
for pointer, address incremented by size of object pointed to (e.g., array indexing).

TYPES

Support abstractions of set of elements and operations on them.

Built-in types:

Hide representation
Allow type-checking at compile and/or run-time
Help disambiguate operators
Allow expression of constraints on accuracy of representation.
- (COBOL, PL/I, Ada) LongInt, DoublePrecision, etc.
- Save space and check on legal values.

Aggregates

Also come with built-in operations.

Cartesian products:

S x T = {<s,t> | s in S , t in T}.

Can also write as PROD_{i in I} S_i = S₁ x S₂ x ... x S_n. If all are the same, write Sⁿ.

Tuples of ML: type point = int * int

How many elts in product?

What if have S^o? Called unit in ML.

Records (COBOL, Pascal, Ada) or Structures (PL/I, C, and ALGOL 68).

Heterogeneous collections of data.

Differ from Cartesian product since fields associated with labels

E.g.

    record                   record
       x : integer;    /=       a : integer;
       y : real                 b : real
    end;                     end

Operations and relations: selection ".", :=, =.

Can use generalized product notation: PROD_{l in Lab} T(l)

Ex. in first example above, Lab = {x,y}, T(x) = integer, T(y) = real.

Disjoint Union:

Variant record - type1 union type2 w/discriminant

Support alternatives w/in type:

Ex.

        RECORD
           name : string;
           CASE status : (student, faculty) OF
              student: gpa : real;
                       class : INTEGER;
           |  faculty: rank : (Assis, Assoc, Prof);
           END;
        END;

Save space yet (hopefully) provide type security. Saves space because the amount of space reserved for a variable of this type is the larger of the variants.

Fails in Pascal / MODULA-2 since variants not protected.

How is this supported in ML?

datatype IntReal = INTEGER of int | REAL of real;

Can think of enumerated types as variant w/ only tags!

NOTICE: Type safe. Clu and Ada also support type-safe case for variants:

Ada: Variants - declared as parameterized records:

type geometric (Kind: (Triangle, Square) := Square) is
    record
       color : ColorType := Red ;
       case Kind of
          when Triangle =>
                 pt1,pt2,pt3:Point;
          when Square =>
                 upperleft : Point;
                 length : INTEGER range 1..100;
       end case;
    end record;

ob1 : geometric -- default is Square
ob2 : geometric(Triangle) -- frozen, can't be changed

Avoids Pascal's problems w/holes in typing.

Illegal to change "discriminant" alone.

ob1 := ob2   -- OK
ob2 := ob1   -- generate run-time check to ensure Triangle

If want to change discriminant, must assign values to all components of record:

ob1 := (Color=>Red,Kind=>Triangle,pt1=>a,pt2=>b,pt3=>c);

If write code

    ... ob1.length...

then converted to run-time check:

    if ob1.Kind = Square then ... ob1.length ....
                         else raise constraint_error
    end if.

Fixes type insecurity of Pascal

Note disjoint union is not same as set-theoretic union, since have tags.

    IntReal = {INTEGER} x int + {REAL} x real

C supports undiscriminated unions:

    typedef union {int i; float r;} utype.

As usual with C, it is presumed that the programmer knows what he/she is doing and no static or run-time checking is performed.

Mappings:

Encompasses functions w/ both infinite and finite domains.

Arrays:

homogeneous collection of data.

Mapping from index type to range type
E.g. Array [1..10] of Real corresponds to {1,...,10} -> Real

Operations and relations: selection "^.[^.]", :=, =, and occasionally slices.

E.g. A[2..6] represents an array composed of A[2] to A[6]

Index range and location where array stored can be bound at compile time, unit activation, or any time.

static: FORTRAN
semi-static: Pascal,
(semi-)dynamic: ALGOL 60, Ada
flexible: Algol 68 & Clu

In both static and semi-static languages the index set of an array is bound at compile time. The difference is that with static arrays, the location of the array in memory is bound at compile time (as in FORTRAN), while with semi-static, the size of the array is bound at compile time, but its location is determined at run-time.

For instance, in Pascal, an array stored in a local variable is allocated on the run-time stack, and its location on the stack may vary in different invocations of the procedure.

With semi-dynamic (or dynamic) arrays, the index set (and hence size) of the array may vary at run-time. For instance in ALGOL 60 or Ada, an array held in a local variables may have index bounds determined by a parameter to the routine. It is called semi-dynamic because the size is fixed once the routine has been activated.

A flexible array is one whose size can change at any time during the execution of a program. Thus, while a particular size array may be allocated when a procedure is invoked, the array may be expanded in the middle of a loop if more space is needed.

The key to these differences is binding time, as usual!

Function abstractions:

S->T ... function f(s:S):T (where S could be n-tuple)

What if S were a record instead of an n-tuple?

Operations: abstraction and application, sometimes composition.

What is difference from an array? Efficiency, esp. w/update.

	update f arg result x = if x = arg then result else f x

	update f arg result = fn x => if x = arg then result else f x

Procedure can be treated as having type S -> unit for uniformity.

Powerset:

	set of elt_type;

Typically implemented as bitset or linked list of elts

Operations and relations: All typical set ops, :=, =, subset, .. in ..

Why need base set to be primitive type? What if base set records?

Recursive types:

Examples:

  	tree = Empty | Mktree of int * tree * tree
	list = Nil | Cons of int * list

In most lang's built by programmer from pointer types.

Sometimes supported by language (e.g. Miranda, Haskell, ML).

Why can't we have direct recursive types in ordinary imperative languages?

OK if use ref's:

	list = POINTER TO RECORD
			first:integer;
			rest: list
		END;

Recursive types may have many sol'ns

E.g. list = {Nil} union (int x list) has following sol'ns:

finite sequences of integers followed by Nil: e.g., (2,(5,Nil))
finite or infinite sequences, where if finite then end with Nil

Similarly with trees, etc.

Theoretical result: Recursive equations always have a least solution - though infinite set if real recursion.

Can get via finite approximation. I.e.,

   list₀ = {Nil}
   list₁ = {Nil} union (int x list₀) 
         = {Nil} union {(n, Nil) | n in int}

   list₂ = {Nil} union (int x list₁) 
         = {Nil} union {(n, Nil) | n in  int}
                 union {(m,(n, Nil)) | m, n in int}

      ...

   list = Union_n list_n

Very much like unwinding definition of recursive function

	fact = fun n => if n = 0 then 1 else n * fact (n-1)
	
	fact₀ = fun n => if n = 0 then 1 else undef
	
	fact₁ = fun n => if n = 0 then 1 else n * fact₀(n-1)
	      = fun n => if n = 0, 1 then 1 else undef
	      
	fact₂ = fun n => if n = 0 then 1 else n * fact₁(n-1)
	      = fun n => if n = 0, 1 then 1 else 
	                 if n = 2 then 2 else undef
	...


	fact = Union_n fact_n

Notice solution to T = A + (T->T) is inconsistent with classical mathematics!
In spite of that, however, it can be used in Computer Science,

	datatype univ = Base of int | Func of (univ -> univ);

Composite (arrays) in Pascal, Modula-2, ...
Primitive in ML
Lists in Miranda and Prolog: provides more flexibility (no length bound)

User-Defined Types

User gets to name new types. Why?

more readable
Easy to modify if localized
Factorization - why copy same complex def. over and over (possibly making mistakes)
Added consistency checking in many cases.

CS441 | CS Department | Princeton University

Princeton University
Computer Science Dept.

Computer Science 441
Programming Languages
Fall 1998

Lecture 7

Variables

Scope

Static

Dynamic scoping:

Lifetime

Value

Confusion over names and locations

The ultimate in bad manners: Pointers

Problems:

TYPES

Built-in types:

Aggregates

Cartesian products:

Records (COBOL, Pascal, Ada) or Structures (PL/I, C, and ALGOL 68).

Disjoint Union:

Mappings:

Arrays:

Function abstractions:

Powerset:

Recursive types:

Sequence:

Lists

sequential files

strings:

User-Defined Types

Computer Science 441 Programming Languages Fall 1998 Lecture 7

Computer Science 441
Programming Languages
Fall 1998

Lecture 7