Princeton University
COS 217: Introduction to Programming Systems

The SymTable ADT and the Spam Filter

The Problem

If appropriate, you should use your SymTable ADT in your spam filter program.  Specifically, you will find it convenient to define your FeatureFinder module so it creates and uses a SymTable object that relates each feature (that is, word, that is, string) to a unique index (of type int).

Thus the SymTable object may contain bindings whose values are integers. However, a SymTable object stores its values as void pointers. So the problem is this: how can you store integers as values within a SymTable object when it expects them to be void pointers? 

There are two solutions...

Solution 1

On the hats cluster, it happens that the amount memory required to store an integer (4 bytes) is the same as the amount of memory required to store a void pointer, or a pointer of any type. Thus you can use the cast operator to "trick" the compiler into generating code that will store integers as void pointers, using statements similar to these:

SymTable_put(oFeatures, "dollars", (void*)0);
...
SymTable_put(oFeatures, "million", (void*)1);
...
iIndex = (int)SymTable_get(oFeatures, "dollars");
...
iIndex = (int)SymTable_get(oFeatures, "million");

Note that solution1 works on only those systems for which sizeof(int) <= sizeof(void*).

Solution 2

An alternative is to place each integer in memory, and then store memory addresses in the SymTable object. Since you do not know a priori how many integers you will need, you must allocate that memory dynamically. The code will be similar to this:

piIndex = (int*)malloc(sizeof(int));
*piIndex = 0;
SymTable_put(oFeatures, "dollars", piIndex);
...
piIndex = (int*)malloc(sizeof(int));
*piIndex = 1;
SymTable_put(oFeatures, "million", piIndex);
...
iIndex = *(int*)SymTable_get(oFeatures, "dollars");
...
iIndex = *(int*)SymTable_get(oFeatures, "million");

Of course you must free the memory in which each integer resides when that memory is no longer needed.  You can do that by mapping a "freeInt" function (which you must define) across all values in the SymTable object:

SymTable_map(oSymTable, freeInt, NULL);

Note that solution 2 works on any system.

Which Solution is Better?

Solution 1 is simpler and more efficient, but less portable. Solution 2 is more complex and less efficient, but completely portable. Either approach is acceptable for the assignment.