COS 333 Assignment 5: The unRegistrar (Spring 2008)
Due midnight, Friday, March 28.
Note that this is after spring break. There are concurrent project
obligations, however, so use the time wisely.
No extensions on this deadline.
Mon Mar 3 18:54:57 EST 2008
The Registrar's web
site leaves something to be desired when all you want is a quick
look at a few courses. Fortunately, the registrar's data is freely
available, so it's possible to make a private version that might be more
satisfactory at least along this dimension. This assignment is a
somewhat open-ended exercise in using Ajax technology to make a highly
responsive alternative.
My own quick and dirty version is
the unRegistrar,
which you can use as a starting point. It includes minimal Ajax
functionality as described in class, and simple tooltip code adapted
from
lixlpixel.org.
As you play with it, you will see that although it is responsive and
easy to use, it's also dumb and the code is sleazy. Your task is to
make it somewhat smarter and add some new features while preserving its
speed, simplicity and convenience for ad hoc queries like "what
QR classes start at 1:30 on Monday and Wednesday?".
1. Something Old
Your version must include these two features, presented in
any way you like:
- It should enable searching for the standard
3-letter department codes for
departments and programs, but only in context; that is, "cos" if clearly
by itself should yield CS courses, but not items whose descriptions
merely include strings like "cost" and "Costa Rica". There are some
3-letter codes like "art" and "his" that are also 3-letter words; it
would be nice if you could do something sensible with them.
- It should enable easy and unambiguous searches for the
distribution codes like QR
and HA. Identifying "qr" is easy, since it appears nowhere in English
text, but many of the other codes are parts of English text. This is a
variant of the problem in the previous paragraph. Neither feature has to work
perfectly but they should mostly get things right.
2. Something Borrowed
You must include one or two of these:
- Add something easy to use and not too space intensive that easily
converts cryptic department and program codes like "QCB" into the
official name "Program in Quantitative and Computational Biology".
- Add something similar for distribution codes, to map
for example "EM" to "Ethical Thought and Moral Values".
- Add something that will convert the cryptic
5-letter building codes
into the full building name.
3. Something New
You must add at least one new feature of your own if you do four of
the above, or two new ones if you do three of the above. If nothing new
comes to mind, consider some of these:
- My version does not display precept information since that is
usually unhelpful; include precepts if you have a sensible way to
display them beyond just adding another line for each one.
- The data includes prerequisites and cross-list info but my code
makes no use of those; you might include them.
- I have made a stab at handling non-USASCII characters but the job
is not complete; you could fix more of those.
- My reg.cgi uses grep and quietly processes regular expressions like
"1[678]th". Is there some way to use RE's more effectively without
complicating things for users who don't realize that they exist?
- Allow the output to be sorted in various ways, e.g., by time of day
or by instructor.
- Provide some way to save the information about some course(s) on
the page, perhaps by catching the onMouseDown event.
- Is there some way to make better use of the schedule information?
- The scripts present only the data for the current semester, but
previous semesters are there too. Add a way to make (some or all of?)
that information readily accessible.
- Is there any summary information that might be worth display?
- You can integrate other databases if you like; for example,
one very useful registrar site gives current enrollments, though
access to it is easier through
this link.
The primary goal is functionality, with esthetics a secondary but still
relevant consideration. We will expect to see three or four of the
numbered features above and one or two additional features of your own,
for a total of five. More is ok if you're on a roll.
The directory a5data includes the raw materials
that you need to get started. My unRegistrar code includes several Awk
scripts that convert the registrar's information into nicer form, the
CGI script reg.cgi that searches it, and the HTML file
reg.html that includes the basic tooltip and Ajax code. The
files foo_* contain processed data from the Registrar's web
site for Spring 2008. The script
get_all.awk explains briefly what each
one contains. These are all in a tar file
a5data.tar
that you can download.
Advice
To get started,
- Register to use the
campuscgi facility
if you have not already done so. (You can use your own web server
if you are running one on a machine of your own.)
- Download the tar file a5data.tar
and extract it into a subdirectory, for example
/usr/campuscgi/your_netid/a5 if you use campuscgi. You
should keep all the files in this one directory so you can create a
submission from them when you're ready.
- Important: Make sure it works for you in its current form
before you start modifying it. This has been tested on campuscgi but
slipups are always possible.
You're welcome to ignore my code entirely, and you can use
any tools and languages that get the job done. The assignment is
meant to give you some hands-on experience with Ajax and Javascript, but
not to take a huge amount of time, so don't kill yourself.
Firefox's Javascript Console (on the Tools menu) is very helpful for
debugging Javascript code; the Firebug add-in is much better. I have
been unable to make my code work properly with Internet Explorer 6
(partly but not entirely because of CRLF issues), and I see no reason
why you should waste your time on IE either, so focus on Firefox.
Safari would be nice but not if it takes extra work.
Here are some other hints:
- You have to run get_all to make a reg.txt; the latter is not part
of the distribution.
- Make sure that permissions in your campuscgi directory permit an
ordinary user to at least run your code and access your files; the
server is probably running your scripts as user "none", not as you.
- The campuscgi machine runs SunOS, not Linux; that's what running when
you run the script(s) via reg.html. This means that what you managed to
get running when logged in to hats or arizona is not necessarily what's
running via the browser. I got bitten by this myself by using a
search path in reg.cgi that was fine for Linux but not for Solaris.
- You do not have to use awk! It's good for some things but it is not
the most expressive language in the world and it has some surprising
behaviors. I wrote my partial solution in Perl, based on example code
presented in class. You could even write in Java, which appears to work
fine, and will be even more familiar.
- Print statements are your friend! Those residual echo statements
in reg.cgi and the commented-out prints in various scripts are examples.
When you're working in an unfamiliar environment with an unfamiliar
language and unfamiliar tools, verifying each step by printing input and
output is much more efficient than beating your head against the wall.
Work your way through one line at a time if necessary: print what came
in and what went out, to see if they are correct. (That's how I
ultimately figured out my search path problems, though it took longer
than it should have because I forgot this cardinal principle.)
- This is not meant to be a time-consuming exercise, nor is a lot of
code necessary if you think clearly and cut the right kinds of corners.
My Perl script is under 20 lines long aside from some static data
structures, and it's very mundane. In hindsight I could have done it
nearly as easily in Awk, though the latter doesn't have an explicit
case-insensitive RE match. An example of corner-cutting: the static
data structures were created by a text editor; there's no need to write
code to create them since they don't change. I made a few simple
changes to get_all.awk but it stayed about the same size. (And I didn't do a
fifth feature; that might add another few lines.)
- Here's a useful awk feature, using FILENAME to
select what actions to perform on different input files:
awk '
FILENAME == "distcode.txt" { action done only on lines in distcode.txt }
FILENAME == "reg.txt" { action done only on lines in reg.txt }
' distcode.txt reg.txt
This lets you use the implicit input loop rather than explicit getlines.
- Another useful feature that might help pass in a query string to an
awk program:
awk -v qs="$q1" -f whatever.awk reg.txt
The -v argument (of which there may be more than one) sets an awk variable
to a value before the awk program begins execution.
Submission
You must use the names reg.html, reg.cgi, and
get_all for the web page, the cgi script and the code that
creates your data file(s). Your get_all must create the output
file(s) that reg.cgi will read, as mine does.
Create a README file with one paragraph for each feature that you
added so we can see what you had in mind. A few sentences each should
be enough, so it probably won't be over a page long. For example, it
might say
Displays expanded form "Computer Science" when the mouse passes over
'COS' in the one-line display, and similarly for other departments.
Collect all your files (but not the registrar data) into a single tar file:
tar cf a5.tar reg.html reg.cgi get_all README other_files...
Submit with the command
~cos333/bin/submit 5 a5.tar
You should use only relative filenames in these files: no absolute
filenames, since otherwise it will be too hard for us to install
your code and experiment with it. We will expand your a5.tar
into a subdirectory of our own public_html or campuscgi; the directory
will already contain the foo_* files, distcode.txt, bldg.txt, and
deptcode.txt, We will then run your get_all (without any
commandline arguments), load your reg.html into a browser, and
experiment with that. Your code must work in this context. Please
perform the experiment of exporting your package yourself to be sure it
works this way; we don't have the resources to beat each submission into
submission.
We will assess your version primarily on whether it correctly
implements the features requested and the new features you added,
how well it handles interesting queries, and how easy and natural it
seems; esthetics are a secondary concern but not irrelevant.
Please follow the rules on what to submit.
It's a big help if your submission arrives in the right
form, and your programs do exactly what is asked for. For example, if a
file is required, be sure to submit it. If it should be executable,
make sure it is executable. Thanks.
Acknowledgement
Many thanks to Eirik Bakke '08, who provided the initial scripts to
extract raw information from the Registrar's web pages, and helped me to
make sense of the data.