Implementing an ADT: List_T

The interface of a list

Let us try to deduce a good interface for a list.

First, we need functions to create a new list, List_new(), and to delete an existing list, List_delete().

Second, we need functions to insert an element in a list, List_insert(), and to erase an element from the list, List_erase().

Third, we need a function that returns the number of elements in a list, List_size(), and it is nice to have a function that simply checks whether a list is empty or not, List_is_empty().

Fourth, we need functions to traverse a list. As we will see, it is not so easy to come up with a good set of functions to do that without exposing the list's internal structure.

Suppose that in the header file that defines the list interface (let us call it list.h) we have something like this:

struct List_node_T
{
        void *element;
        struct List_node *previous;
        struct List_node *next;
};

typedef struct List_T
{
        int count;
        struct List_node *begin;
        struct List_node *end;
} *List_T;
Let us imagine that we have a client program that uses the list ADT. Suppose that the code for this program is in user.c, which includes list.h. Somewhere in user.c the variables list and p are defined like this:
List_T list;
struct List_node_T *p;
Let us further assume that list->end is manipulated by the implementation of the list ADT in such a way that we know that if we reach list->end, it means that we have gone past the last node of the list. In this case, the code to traverse the list could be written like this:
for (p = list->begin; p != list->end; p = p->next)
{
        do_something(*p);
}
It seems OK, but it is not. This kind of implementation unnecessarily exposes too many details of the implementation to the users of the ADT. The worst thing about this is that if we change the structure List_T or the structure List_node_T, programs that use this ADT may stop working or compiling because they could have made assumptions about the implementation. We have to avoid this.

The second kind of implementation could hide completely the details of those structures by making List_T an opaque pointer in list.h, like this:

typedef struct List_T *List_T;
In this case, the list ADT would need to have its own internal pointer to the node currently being visited, and the interface of the list ADT would need to have functions to manipulate this pointer. Then we would not need to have the pointer p in the preceding code, which could be rewritten like this:
List_move_first(list);
while(!List_reached_end(list))
{
        do_something(List_current(list));
        List_move_next(list);
}
This is quite superior to the first option, but still not good enough. What if instead of one pointer scanning the list we wanted two pointers to two possibly different nodes? Well, we could provide the second internal pointer and expand the interface, but what if we wanted three pointers? And what about ten? We cannot keep adding internal pointers and changing the interface because it would become cumbersome.

The third (and final) option uses opaque pointers to a list and opaque pointers to list nodes. So in the list.h file we have something like this:

typedef struct List_T *List_T;
typedef struct List_node_T *List_iterator_T;
This introduces the concept of an iterator which is an opaque pointer to a list node. We can have as many iterators as we want. However, we cannot operate on them as if they were non opaque pointers. In particular, we cannot dereference an iterator. As a consequence, we can neither directly access the element pointed to by an iterator, nor directly move the iterator back and forth. So the interface must provide functions to do these operations. Let us declare the variable p as an iterator like this:
List_iterator_T p;
In this case, the code to traverse the list would look like this:
for (p = List_begin(list); p != List_end(list); p = List_next(p))
{
        do_something(List_element(list, p));
}
In summary, to traverse a list we need: Lastly, we would like to provide a function that moves a sequence of nodes from one list to a specified position in another list, an operation known as "splicing". In the process, the sequence of nodes is removed from the original list. The function List_splice() takes the list to insert nodes into, an iterator for this list that indicates where to insert, a second list from which nodes will be removed, and two iterators that specify the beginning and end of the sequence of nodes to remove from the second list.

What you have to do

Given the list.h file in /u/cs217/Assignment4, you have to write the corresponding list.c file, without changing list.h.

Be sure that your implementation handles the boundary cases well. For example, insertions can happen in any place >= List_begin() and <= List_end(). However, accesses and erasures can only happen in places >= List_begin() and < List_end(). It is not valid to dereference the iterator returned by List_end(),  that is, you can neither access the element of the node pointed to by List_end() nor delete this node.

Also, check any possible run-time errors like an attempt to pass a NULL pointer when a valid list, iterator, or element is expected; getting a NULL pointer from malloc(); or an attempt to erase a node from an empty list.

You are supposed to allocate the memory dynamically in your program. Thus, your program has the possibility of suffering from a memory leak. Please make sure you eventually free all  the memory that you allocate, and do not free any memory that you did not allocate.

In /u/cs217/Assignment4 you will also find a working list.o file and some client programs to test your implementation.

Submission

Submit the RCS version of your list.c file, your own makefile and a clear readme file electronically with the command
/u/cs217/bin/submit 4 list.c,v makefile readme
You may use other computing facilities to develop your program, but the submitted version must be compilable with lcc and execute correctly on the arizona machines.

Due: submitted by 11:59 PM, Monday, March 12, 2001.