COS-518 Assignments: HTTP Proxy


Assignment 2: HTTP Proxy with Fork and thread support

In the previous assignment, you were asked to implement a caching web proxy that was able to handle a single request at a time. In this assignment, you should extend this proxy to be able to handle concurrent clients through the use of multiple threads of execution, one per client request.

The Basics

As before, your task is to build a basic web proxy capable of accepting HTTP requests, making requests from remote servers, caching results, and returning data to a client. Unlike before, you should be able to accept multiple client requests concurrently. You must implement two different versions of the proxy: one that achieves concurrency by forking a request for each new client request, and one that uses the pthread library to spawn a new thread for each new client request.

If you want, you can implement other optimizations, such as handle persistent connections from a client (see HTTP's Keep-Alive instructions), or by creating a process or thread pool for faster processing. A process/thread pool starts up by creating some fixed number of processes/thread on bootup (say, 20). Then, when receiving a new request, it hands-off the request to one of the existing processes/threads, removing it from the pool. (If none are available, showing a higher degree of concurrency, then it can create a new one.) Upon completing executing a request, the process/thread is returned to the pool for future requests. Apache and most servers that adopt a multi-process/threaded style use such pools for lower latency and system load. But again, these optimizations are optional.

This assignment can be completed in either C or C++. It should compile and run (using g++) without errors or warnings from the penguin servers, producing a binary called proxy that takes as its first argument a port to listen from. Don't use a hard-coded port number (e.g., port 80). As before, you shouldn't assume that your server will be running on a particular IP address, or that clients will be coming from a pre-determined IP.

Testing Your Proxy

Run your client with the following command:

./proxy [-p|-t] <port>, where port is the port number that the proxy should listen on. The argument -p specifies that the proxy should run in multi-process mode, while -t specifies that the proxy should run in multi-threaded mode. You must implement both. As a basic test of functionality, try requesting a page using telnet concurrently from two different shells.

Instructions for setting up your browser to access your proxy can be fou nd in the instructions of the previous assignment.

Download the testing script. (Note: You should use 'Save Target As' in the browser or 'wget' to download this script. Copy and paste may not work, since Python scripts differentiate tabs from spaces.)

Multi-Process/Thread Programming

In addition to the Berkeley sockets library, there are some functions you will need to use for this assignment

You can find the details of these functions in the Unix man pages:


Links:

Grading

You should submit your completed proxy by the date posted on the course website. You will need to submit a tarball file containing the following:

Your tarball should be named cos518_proxy_USERNAME.tgz where USERNAME is your username. The sample Makefile in the skeleton zip file we provide will make this tarball for you with the make tar command.

Your proxy will be graded with the following criteria:

  1. When running make on your assignment, it should compile without errors or warnings on the penguin cycle machines and produce a binary named proxy. The first command line argument be the -p or -t switch, the second should be the port that the proxy will listen from.
  2. Your proxy should run silently- any status messages or diagnostic output should be off by default.
  3. You can complete the assignment in either C or C++.
  4. Your proxy should work with both Firefox and Internet Explorer.
  5. We'll check that your proxy works correctly with a small number of major web pages, using the same script that we've given you to test your proxy.
  6. Well written (good abstraction, error checking, readability) and well commented code will get additional points.

A Note on Network Programming

Writing code that will interact with other programs on the Internet is a little different than just writing something for your own use. The general guideline often given for network programs is: be lenient about what you accept, but strict about what you send. This is often referred to as Postel's Law. That is, even if a client doesn't do exactly the right thing, you should make a best effort to process their request if it is possible to easily figure out their intent. On the other hand, you should ensure that anything that you send out conforms to the published protocols as closely as possible. If an incoming request has a single field out of whack (such as sending you a request using HTTP 0.9 or 1.1), uses non-standard line terminators (some clients only send \r instead of the standard \r\n), or does something you don't quite expect with HTTP headers, you should still handle the request rather than dropping the request. Pay attention to parts of the RFC that specify areas where not all clients may conform exactly to what you expect. We'll be looking for this kind of interoperability in both the second round of tests that we run and in the style portion of your grade.

When in doubt, try to follow the behavior specified in RFC 1945. Also, check the FAQ for more specific guidelines.


Last updated: Wed Oct 07 11:09:28 -0400 2009