COS 217 Spring 1996. Using an ADT: Word Frequencies

COS 217. Introduction to Programming Systems. Spring 1996.

Using an ADT: Word Frequencies

Write wf [ options ...] [files ...], a program that prints, on the standard output, a list of the words that appear in its file arguments and the number of times each word appears. A tab separates each frequency from its word. If there are no file arguments, wf prints the frequencies of the words in the standard input.

A ``word'' is a sequence of one or more letters, underscores, and digits that begins with an underscore or letter. Case matters, so ``Hello'' and ``hello'' are different words. Input lines may be of arbitrary length, but words longer than 64 characters are truncated.

The option -n causes wf to print only those words that occur n or fewer times in increasing order of frequency of occurrence. The option +n causes wf to print only those words that occur n or more times in decreasing order of frequency of occurrence. In the absence of an explicit option, wf behaves as if +0 were specified; that is, it prints all the words, most frequently occurring word first.

You must break your program into the following modules.

main.c
the main program; it parses the input, and prints the output.
getword.c
the implementation of the interface /u/cs217/3/getword.h, which reads words.
makefile
make specification for building wf.
Your implementation of getword.c must export only the symbol getword. If you need auxiliary functions, make them static.

You must use the table abstract datatype described in the interface /u/cs217/4/table.h. The object code for an implementation of table.c is available in /u/cs217/4/table.o.

The interfaces include statements about checked runtime errors, e.g., ``It is a checked runtime error to pass a NULL key or table to any function in this interface.'' You are a client of the table ADT in this assignment because you use /u/cs217/4/table.o, but you do not implement it. Make sure your main.c obeys the contract specified in the interface.

You are both a client and the implementor of getword.c. Use assertions to implement the checked runtime errors specified in /u/cs217/3/getword.h. You may also use assertions to check for NULL return values from the allocation functions, like malloc.

In your makefile, use the options -I/u/cs217/3 and -I/u/cs217/4 in CFLAGS to direct lcc to look for include files in those directories, so a directive like

#include "table.h"
will include /u/cs217/4/table.h. Do not put absolute paths in your source code. The default target in your makefile should build wf so that the command ``make'' builds wf. You may use the path /u/cs217/4/table.o in the command that builds wf.

Use RCS to track the history of your code. Create a subdirectory named RCS. Check in versions of your files with ci as you develop and debug them. You do not have to check them in after every tiny change; check in ``major'' milestones, or after you make substantial changes. Use co -l in to check out the most recent revisions, if necessary.

Object files for implementations of main.c and getword.c are available in /u/cs217/3. When linked together with /u/cs217/4/table.o, these files form a working wf, which is also available in /u/cs217/3/wf. You may use these versions of main.o and getword.o to test your program during development. Your versions must be functionally interchangeable with them. Hanson's main.c is 91 lines and his getword.c is 34 lines.

Use the ANSI standard library function qsort to sort the output; qsort is described on page 253 of K&R and in its man page.

Your program must not use any global or static variables. All variables must be locals or parameters. In addition, only casts between pointer types are permitted; you must not, for example, use casts to hide integers in pointers.

Submission

Submit your program electronically with a command like
/u/cs217/bin/submit 3 makefile RCS/main.c,v RCS/getword.c,v
where the ,v files are the RCS files for your main program and your implementation of getword. Do not submit a readme file; it is unnecessary. The man page describes the program, and the RCS files give the author and modification history of the corresponding source files. For example, the command ``rlog *.c'' displays the check-in history of each .c file. Make sure your RCS files hold the latest revision when you submit them.

Due: submitted by 11:59pm, Mon. 2/26.

Copyright (c) 1994,1996 by David R. Hanson
Last modified:Tue Feb 20 11:26:29 EST 1996