As in the last assignment, your task is to build a basic web proxy capable of
accepting HTTP requests, making requests from remote servers, and returning data to a client. Unlike before, you should be
able to accept multiple client requests concurrently. You proxy should achieve
concurrency by using the pthread
library to spawn a new thread for
each new client request. There should be a reasonable cap on the no. of threads your proxy can create e.g., 100. Also, your proxy should be able to cache webpages on disk and all subsequent requests for the same page should be served from the cache instead. When storing webpages on disk, you should ensure that a) the file name does not contain "/", b) the file name is less than 255 characters long, and c) the cache files are created within the working directory of your proxy (e.g., in some subfolder). You are allowed to use external libraries (e.g., openssl for hasing) in this assignment, but you should ensure that you only link libraries available on the Friend 010 machines and submit your makefiles with your submission.
If you want, you can implement other optimizations, such as handle persistent connections from a client (see HTTP's Keep-Alive instructions), or by creating a thread pool for faster processing. A thread pool starts up by creating some fixed number of threads on bootup (say, 20). Then, when receiving a new request, it hands-off the request to one of the existing processes/threads, removing it from the pool. (If none are available, showing a higher degree of concurrency, then it can create a new one.) Upon completing executing a request, the thread is returned to the pool for future requests. Apache and most servers that adopt a multi-threaded style use such pools for lower latency and system load. But again, these optimizations are optional.
This assignment can be completed in either C or C++. It should
compile and run (using g++) without errors or warnings on the Friend 010 machines, producing a binary called proxy
that takes as an argument a port to listen from. Don't use a hard-coded port
number (e.g., port 80). As before, you shouldn't assume that your
server will be running on a particular IP address, or that clients
will be coming from a pre-determined IP.
After determining which web object is being requested (as named by the object's full URL), you should check to see if this object is already cached on the server. If so, you should return the content from the cache. For simplicity, you do not need to implement proper HTTP expiry: You can simply clear your cache on bootup but cache objects indefinitely while the server is alive. You similarly do not need to support conditional GETS (e.g., "If-Modified-Since") to the remote origin server. If desired, however, you can support real cache expiry.
After downloading a web object successfully, you should cache the object to disk so that subsequent fetches can use the local copy as opposed to fetching it again remotely. You should not cache the item if it is marked as "no-cache" or "private"; see the RFC. For this assignment, you only need to cache objects for requests that return type 200 (OK); you do not need to worry about other cacheable status codes such as 410 (GONE). Reading from cache need not be thread-safe, but writing to cache should be. If multiple threads simultaneously detect a cache-miss and fetch the same content from the Internet, then it's OK if only one thread writes to cache and others serve the content from the Internet.
Run your client with the following command:
./proxy
-t <port>
, where port
is the port
number that the proxy should listen on. The argument
-t
specifies that the proxy should run in multi-threaded
mode. As a basic test of functionality, try
requesting a page using telnet concurrently from two different shells.
Instructions for setting up your browser to access your proxy can be found in the instructions of the previous assignment.
In addition to the Berkeley sockets library, there are some functions you will need to use for this assignment
pthread_create
, pthread_exit
, etc.
You can find the details of these functions in the Unix man
pages:
man pthread
Links:
You should submit your completed proxy by the date posted on the course website to Blackboard. Remember to submit after uploading. You will need to submit a tarball file containing the following:
You can complete the assignment in either C or C++. Your tarball should be named cos461_ass3_USERNAME.tgz
where USERNAME
is your username. The sample Makefile in the skeleton zip file we provide will make this tarball for you with the make tar
command.
Your proxy will be graded out of ten points, with the following criteria:
make
on your assignment, it should compile without errors or warnings on the FC 010 cluster machines and produce a binary named proxy
.
Writing code that will interact with other programs on the Internet is a little different than just writing something for your own use. The general guideline often given for network programs is: be lenient about what you accept, but strict about what you send. This is often referred to as Postel's Law. That is, even if a client doesn't do exactly the right thing, you should make a best effort to process their request if it is possible to easily figure out their intent. On the other hand, you should ensure that anything that you send out conforms to the published protocols as closely as possible. If an incoming request has a single field out of whack (such as sending you a request using HTTP 0.9 or 1.1), uses non-standard line terminators (some clients only send \r instead of the standard \r\n), or does something you don't quite expect with HTTP headers, you should still handle the request rather than dropping the request. Pay attention to parts of the RFC that specify areas where not all clients may conform exactly to what you expect. We'll be looking for this kind of interoperability in both the second round of tests that we run and in the style portion of your grade.
When in doubt, try to follow the behavior specified in RFC 1945. Also, check the FAQ for more specific guidelines.
Last updated: Mon Apr 26 13:04:41 -0400 2010