Princeton University
Computer Science Department

Computer Science 461
Computer Networks

Jennifer Rexford

Spring 2006


Directory
Summary | Administrivia | Schedule | Assignments



Programming Assignment #1 - FTP Directory Copy

Due : March 6, 2006, 9:00pm

submit by email to mhw@cs.princeton.edu


The Problem

The company you work for, Monolith Megasystems, has decided to write a more intelligent install script for the next version of their application. Among other things, the script will be able to determine the install configuration and copy the needed files directly from Monolith Megasystem's FTP server on the internet. The files are organized in flat directories partitioned by the desired architecture and operating system.

You have been assigned to write a utility that will copy certain type of files (denoted by their extension) of an entire directory and its subdirectories (up to a certain level, more on this later) from Monolith's server to the client machine. (Yes, we know there are many better ways to do this than rewriting the FTP client, but your boss is new to Unix and doesn't know any better). You have been given the following specifications by the implementor of the install script:

  1. Your program should take five command-line arguments in the following order:
    1. The extension of the files to retrieve (e.g. exe or gz).
    2. The number of directory levels that it should be looking at (0 means it should only look in the root level, 2 it should look in the root and first 2 levels).
    3. The name(symbolic) or IP address(dotted quad) of the FTP server to contact. If the name is of the form number@host, then number should be used as the port number of the ftp server; otherwise, the default ftp service port should be used.
    4. The directory at the local site to transfer into [optional; defaults to the current directory]
    5. The directory at the FTP site to begin the transfer from [optional; defaults to login directory]. If only 4 arguments are given, assume the last one is the local directory, not the remote one.
  2. Your program should transfer all files as type image, using file structure and stream transmission mode. (These modes and how to change them are defined in the FTP specification).
  3. Your program should NOT ignore subdirectories. In other words, you need to recursively copy subdirectories of the directory you are transferring, but to the level of the hierarchy that has been specified in the command line, if it exists. This is best explained by an example. Suppose the directory structure you are copying is as below: (All names starting from capital D are directories, "D1" is an empty subdirectory of the "srcdir" directory)
    srcdir
    |
    |-f1.gz
    |
    |-g1.exe
    |
    |-D1
    |
    |-D2
    | |
    | |-D4
    |   |
    |   |-f10.txt
    |   |
    |   |-D7
    |     |
    |     |-f11.gz
    |
    |-D5
    | |
    | |-D8
    |   |
    |   |-f8.Z
    |   
    |-D6
      |
      |-f2.gz
      |
      |-D3
      | |
      | |-h1.Z
      | 
      |
      |-f3.gz
    
    
    The following must be the directory structure after the copying of the gz files and subdirectories till the first level (command line: ftpcopy gz 2 ftp.monolith.com dstdir srcdir):
    dstdir
    |
    |-f1.gz
    |
    |-D6
      |
      |-f2.gz
      |
      |-f3.gz
    
    
    In other words, the specifications with regards to files and subdirectories are:
    1. If a filename does not terminate in the specified ending, do not copy it. For practical purposes it is as if that file did not exist. Also do not create a null-size file with the same name.
    2. If an empty subdirectory exists below or at the level specified at the command line (e.g "D1" or "D8" above), do not copy it (i.e. you must not change the state of the file system at all). "Empty" directory means that it and all its subdirectories (up to the level specified in 1.c) do not contain any file that matches the ending of 1.a. Since level refers to levels below in the directory tree, being below the level means being higher in the directory tree. Also do not create a null-size file with the same name. For this reason, do not create "D2", "D4" or "D5". Note in the above example the contents of "srcdir" are considered to be at depth 0, the contents of "D2" at depth 1, and the contents of "D7" at depth 3. Hence the contents of D7 and below are not considered by the ftpcopy program when run to depth 2.
    3. If a non-empty subdirectory exists below or at the level specified at the command line, copy it and all the matching files in it (if any) to the destination machine.
    4. At the last level, if there are subdirectories (e.g. "D7"), these subdirectories are considered being "empty", and they should not be copied, nor should null-size files be created of the same name as these subdirectories after the last level.
  4. Your utility should ignore any symbolic links it encounters.
  5. True to the name, the software from Monolith Megasystems is huge. For customers installing the full application, your utility may be expected to transfer up to half a gigabyte of data. Performance should be a key factor in your design decisions.
  6. All files should be created with the permissions S_IREAD and S_IWRITE. You may assume that the local directory already exists; exit with an error if it does not. You should overwrite any files which already exist without prompting.( WARNING: Make sure that you are running the client in a directory where there are no useful files that may be overwritten!!)
  7. If you encounter a file which you cannot access due to permissions, you should skip over it without creating it locally or producing any error message. Do not create a null-size file with the same name.
  8. Your program should terminate on any unexpected response across the FTP control connection, any network error across either connection, or a file write error. In this case, you should issue an error message at termination in the format specified below. On any premature termination, you can assume that the install script knows how to restore the state of the file system. Your program is not expected to remove files from partially completed transfers.
  9. Your program should print nothing but a one-line status report at termination. This report should be written to stderr, and should contain:
    1. The message: "OK: xxxxxxxxx bytes copied" on success (where xxxxxxxxx is the total of all bytes copied, without leading zeros or spaces)
    2. The message: "ERROR: " followed by some meaningful error message on any error. If you received an error response across the FTP control connection, using that message is acceptable. If you are terminating for any other reason, you should write a meaningful error message. (Hint: look at the perror system call, the errno variable, and the sys_errlist variable).
    For example:
    OK: 45614 bytes copied
    ERROR: 530 Login incorrect.
    ERROR: write: Disk full.
  10. If your program terminates successfully, it should return a result of 0. Otherwise, it should return a result of 1. (See the exit() system call).
  11. Your program should follow the conventions of anonymous ftp. This means logging in as the user "anonymous" and sending an e-mail address as the password. For the e-mail address, use the current login name, the '@' sign, and the fully qualified name of the host on which you are running. For example, "bigboss@monolith.megasys.com".
  12. Your deliverables will be the source code for the utility (including a makefile) and a concise write-up describing your program's design and any caveates (such as functionality you weren't able to complete or doesn't work quite right).

Doing the Assignment

To get you started, we've created a tarfile containing a skeletal .c file and a Makefile, download it here. You should work within this directory as we provide a make target (make dist) to create the final tarfile for submission.

Before you start writing the assignment, you will have to familiarize yourself with both the Berkeley socket API and the FTP protocol.

The FTP protocol is specified completely in RFC 959. It is summarized in Chapter 27 of Stevens, TCP/IP Illustrated, Volume 1: The Protocols. A short summary is also provided below.

WARNING: As mentioned before , you absolutely should *NOT* run tests in the same directory as your program. You are likely to accidentally destroy your source files. It is recommended that you backup your files frequently in another directory.

You may also, of course, test your program by connecting to anonymous ftp servers. Some possible sites to test against are ftp.cs.princeton.edu, ftp.fedworld.gov, ftp.microsoft.com , ftp.cs.stanford.edu, gatekeeper.dec.com, ftp.kernel.org. Of course, you can choose any other anonymous ftp site too. We strongly recommend that you do not go to only one of these servers, since all of you might overload it, and the response time for everybody will be much slower. A way to do it is to start with one server at random and move to the next one the next time that you test your program.

You will need to understand the basic socket calls to do this assignment. You can also read about the following calls, either with man or in the Unix Network Programming book by Stevens:

To make things simple, assume the shell always sets the environment variables USER and HOST. You can use the call getenv to get their values. (As with any library/system call, you should still remember to check the return value from this call, though, to make sure this assumption holds!) Use these values to build the password string as described above.

Be careful where you run the supplied ftpd. Since it allows anonymous transfers with basically no security, anyone can use it to get at whatever files are in its current directory. Your fellow students could even use this to get at your solution to this assignment! 


Summary of the FTP protocol

An FTP session consists of a control connection and one or more data connections. All actions are initiated by the client over the control connection. This model is defined in section 2.3 of RFC 959.

The control connection is a normal TCP connection (created by the client with the socket() and connect() calls). The client sends commands over this connection in a format called NVT ASCII. NVT ASCII is defined in the TELNET RFC (RFC 854). For the purposes of this assignment, it is enough to know that NVT ASCII is normal ASCII in which each line is terminated with the CR and LF characters (the string "\r\n" in C). In this assignment you can assume that no NVT string will be longer than 1024 bytes.

Each command is a 3 or 4 character string, followed by optional parameters separated by spaces. All of the commands are specified in RFC 959, section 4.1. For this assignment, a good starting point is to read about the following commands:

In response to each command, the server will respond with one or more replies. Each reply is prefaced by a reply code. The reply codes have the following format: Section 4.2 of RFC 959 specifies this reply format and how multi-line replies are handled. Your program will have to parse these replies.

In addition to the reply codes, the LIST and RETR commands send back data over a data connection. This connection is created in the following manner:

  1. The client creates another socket with socket()
  2. The client binds an address to the socket with bind(). A port address of 0 is specified to obtain an ephemeral port.
  3. The client calls listen() to indicate willingness to accept a connection to the new socket.
  4. The client calls getsockname() to determine the port number of the socket.
  5. The client sends a PORT command to the server over the control connection to inform the server of the port number.
  6. The server connects to the new socket.
  7. The client sends the appropriate RETR or LIST command over the control connection.
  8. The client calls accept() to accept the server's connection.
At this point, the data connection has been established and the server will send the requested data over the connection. In the case of LIST, the returned data will be the list of filenames in NVT ASCII. In the case of RETR, the returned data will be the file contents. The server will automatically close the connection from its end once all of the data has been sent. This will be indicated by a return code of 0 on a read from the data connection's socket descriptor.

A word on LIST. Unfortunately the output of the LIST command is not standarized. The other option is to use NLST, but some server implementations do not show the subdirectories in response to a NLST command, which makes it difficult to use. Hence, we are requiring the use of LIST for this assignment. Due to the large variety of responses a ftp server may generate when issued the LIST command, you may face difficulty when attempting to parse the output in search of the file/directory name. You are certainly welcome to write such parsing code yourself, though we have provided a slightly modified version of the publicly available ftpparse library here (the minor modifications present in our version allow for clean compilation under gcc). You may read about how to use the ftpparse library at its home page, http://cr.yp.to/ftpparse.html. Below is some source code you can use to easily manipulate the result of a LIST command:

#include <string.h>
#include "ftpparse.h"

#define kTelnetEOF 	"\r\n"

char *listResponse;	/* Assume this contains the server's LIST response, terminated with a '\0' */
char *curPtr, *endPtr;
struct ftpparse ftpInfo;

curPtr = listResponse;
for (endPtr = strstr(curPtr, kTelnetEOF); endPtr != NULL; 
     endPtr = strstr(curPtr, kTelnetEOF))
{
	*endPtr = '\0'; /* 'curPtr' now points to a C-string representing one line of the response */
	if (ftpparse(&ftpInfo, curPtr, strlen(curPtr)) == 1)
	{
		/* A wealth of information may be extracted regarding the directory entry here by
		 * examining the fields of ftpInfo (see ftpparse.h).
		 * 'size' = entry size in bytes (relevant if 'sizetype' == FTPPARSE_SIZE_BINARY)
		 * For ['flagtrycwd', 'flagtryretr'] =
		 *      - [0, 1]: most likely a file
		 *      - [1, 0]: most likely a directory
		 *      - [1, 1]: most likely a link (could be either a file or a directory)
		 *      - [0, 0]: unknown
		 * 'name' points to a character array of length 'namelen' within the string pointed to
		 *      by 'curPtr.'
		 */
	}
	curPtr = endPtr + strlen(kTelnetEOF);	/* Advance to start of next line */
}
We expect that your program will work with any FTP server software supported by this library (even if you choose to write your own parser). You will not be held responsible for other abnormal sorts of LIST responses (in particular, you do not have to support FTP server software that is not supported by the ftpparse library).

Miscellaneous Notes and Hints

Assignment FAQ and Grading Guidelines

A FAQ and the grading guidelines are also available. Please look over them before asking your question from the TAs.

Deliverables

Your submission is expected be a tarfile of your source directory (including the directory!). It should include your README, your source files and a Makefile which creates your program. Be sure your submissions conforms to the following: Submit your assignment by email to mhw@cs.princeton.edu , by 9:00pm on March 1st, 2006.

References