COS 217 Spring 1999. Using an ADT: Word Frequencies

COS 217. Introduction to Programming Systems. Spring 1999.

Using an ADT: Word Frequencies

Write freq [ options ...] [files ...], a program that prints, on the standard output, a list of the characters, words and lines that appear in its file arguments and the number of times each appears. A tab separates each frequency from its character, word, or line. It first prints all characters, then all words, and then all lines, with a blank line between each set. If there are no file arguments, freq reads from the standard input.

A "character" is a single letter, underscore or digit. A ``word'' is a sequence of one or more letters, underscores, and digits that begins with an underscore or letter. Case matters, so ``Hello'' and ``hello'' are different words. Words longer than 64 characters and lines longer than 512 characters are truncated.

The option -n causes freq to print only those characters, words and lines that occur n or fewer times in increasing order of frequency of occurrence. The option +n causes freq to print only those characters, words and lines that occur n or more times in decreasing order of frequency of occurrence. In the absence of an explicit option, freq behaves as if +0 were specified; that is, it prints all the characters, words and lines, most frequently occurring word first. However, if -n is not specified and -a is specified, then the words, characters and lines should be printed in alphabetical order.

There are two modes in which output should be printed. If -f is specified, then freq should print output for each file separately, each followed by the filename. If not, it should simply print output once, representing all files put together.

You must break your program into the following modules.

main.c
the main program; it parses the input, and prints the output.
getline.c
the implementation of the interface /u/cs217/3/getline.h, which reads lines.
getword.c
the implementation of the interface /u/cs217/3/getword.h, which obtains words.
OR
getword2.c
the implementation of the interface /u/cs217/3/getword2.h, which obtains words.
makefile
make specification for building freq.
Your implementation of getline.c must export only the symbol getline. Similarly for getword. If you need auxiliary functions, make them static.

You must use the table abstract datatype described in the interface /u/cs217/3/table.h wherever and as often as this might be useful. The object code for an implementation of table.c is available in /u/cs217/3/table.o. The interfaces include statements about checked runtime errors, e.g., ``It is a checked runtime error to pass a NULL key or table to any function in this interface.'' You are a client of the table ADT in this assignment because you use /u/cs217/3/table.o, but you do not implement it. Make sure your main.c obeys the contract specified in the interface.

You are both a client and the implementor of getline.c and getword.c. Use assertions to implement the checked runtime errors specified in their interfaces. You may also use assertions to check for NULL return values from the allocation functions, like malloc.

In your makefile, use the options -I/u/cs217/3 and -I/u/cs217/3 in CFLAGS to direct lcc to look for include files in those directories, so a directive like

#include "table.h"
will include /u/cs217/3/table.h. Do not put absolute paths in your source code. The default target in your makefile should build freq so that the command ``make'' builds freq. You may use the path /u/cs217/3/table.o in the command that builds freq. Use RCS to track the history of your code. Create a subdirectory named RCS. Check in versions of your files with ci as you develop and debug them. You do not have to check them in after every tiny change; check in ``major'' milestones, or after you make substantial changes. Use co -l in to check out the most recent revisions, if necessary.

Object files for implementations of a version of this program that only counts words and does not implement the -f or -a options is are available in /u/cs217/3. When linked together with /u/cs217/3/table.o, these files form a working word-freq, which is also available in /u/cs217/3/word-freq. You may use these versions of main.o and getword.o to test your program during development.

You may use the ANSI standard library function qsort to sort the output; qsort is described in Harbison and Steele and on its man page. Your program must not use any global or static variables. All variables must be locals or parameters. In addition, only casts between pointer types are permitted; you must not, for example, use casts to hide integers in pointers.

Submission

Submit your program electronically with a command like
/u/cs217/bin/submit 3 makefile main.c,v getword.c,v getline.c,v
where the ,v files are the RCS files for your main program and your implementation of getword. Do not submit a readme file; it is unnecessary. The man page describes the program, and the RCS files give the author and modification history of the corresponding source files. For example, the command ``rlog *.c'' displays the check-in history of each .c file. Make sure your RCS files hold the latest revision when you submit them.

Due: submitted by 11:59pm, Monday, March 1, 1999

Copyright (c) 1994,1996,1998 by David R. Hanson, J.P. Singh