Directory
Summary |
Administrivia |
Schedule |
Assignments
Programming Assignment #1 - FTP Directory Copy
Due : March 6, 2006, 9:00pm
submit by email to mhw@cs.princeton.edu
The Problem
The company you work for, Monolith Megasystems, has decided to write a more
intelligent install script for the next version of their application. Among
other things, the script will be able to determine the install configuration
and copy the needed files directly from Monolith Megasystem's FTP server on the
internet. The files are organized in flat directories partitioned by the
desired architecture and operating system. You have been assigned to write
a utility that will copy certain type of files (denoted by their extension) of
an entire directory and its subdirectories (up to a certain level, more on this
later) from Monolith's server to the client machine. (Yes, we know there are
many better ways to do this than rewriting the FTP client, but your boss is new
to Unix and doesn't know any better). You have been given the following
specifications by the implementor of the install script:
-
Your program should take five command-line arguments in the following order:
-
The extension of the files to retrieve (e.g. exe or gz).
-
The number of directory levels that it should be looking at (0
means it should only look in the root level, 2 it should look
in the root and first 2 levels).
-
The name(symbolic) or IP address(dotted quad) of the FTP server to contact.
If the name is of the form number@host, then number should be used as the
port number of the ftp server; otherwise, the default ftp service port
should be used.
-
The directory at the local site to transfer into [optional; defaults to
the current directory]
-
The directory at the FTP site to begin the transfer from [optional; defaults
to login directory]. If only 4 arguments are given, assume the last one is
the local directory, not the remote one.
-
Your program should transfer all files as type image, using file structure
and stream transmission mode. (These modes and how to change them are defined
in the FTP specification).
-
Your program should NOT ignore subdirectories. In other words, you need
to recursively copy subdirectories of the directory you are transferring,
but to the level of the hierarchy that has been specified in the command
line, if it exists. This is best
explained by an example.
Suppose the directory structure you are copying is as below:
(All names starting from capital D are directories, "D1" is an empty
subdirectory of the "srcdir" directory)
srcdir
|
|-f1.gz
|
|-g1.exe
|
|-D1
|
|-D2
| |
| |-D4
| |
| |-f10.txt
| |
| |-D7
| |
| |-f11.gz
|
|-D5
| |
| |-D8
| |
| |-f8.Z
|
|-D6
|
|-f2.gz
|
|-D3
| |
| |-h1.Z
|
|
|-f3.gz
The following must be the directory structure after the copying of the
gz files
and subdirectories till the first level (command line: ftpcopy gz 2
ftp.monolith.com dstdir srcdir):
dstdir
|
|-f1.gz
|
|-D6
|
|-f2.gz
|
|-f3.gz
In other words, the specifications with regards to files and subdirectories are:
- If a filename does not terminate in the specified ending, do not
copy it. For practical purposes it is as if that file did not exist.
Also do not create a null-size file with the same name.
- If an empty subdirectory exists below or at the level specified at the
command line (e.g "D1" or "D8" above), do not copy it (i.e. you must not change
the state of the file system at all). "Empty" directory means that it and all
its subdirectories (up to the level specified in 1.c) do not contain any file
that matches the ending of 1.a. Since level refers to levels below in the
directory tree, being below the level means being higher in the directory tree.
Also do not create a null-size file with the same name. For this reason, do not
create "D2", "D4" or "D5". Note in the above example the contents of "srcdir"
are considered to be at depth 0, the contents of "D2" at depth 1, and the
contents of "D7" at depth 3. Hence the contents of D7 and below are not
considered by the ftpcopy program when run to depth 2.
- If a non-empty subdirectory exists below or at the level specified
at the command line, copy it and
all the matching files in it (if any) to the destination machine.
- At the
last level, if there are subdirectories (e.g. "D7"), these subdirectories are
considered being "empty", and they should
not be copied, nor should null-size files be created of the same name as
these subdirectories after the last level.
-
Your utility should ignore any symbolic links it encounters.
-
True to the name, the software from Monolith Megasystems is huge. For customers
installing the full application, your utility may be expected to transfer
up to half a gigabyte of data. Performance should be a key factor in your
design decisions.
-
All files should be created with the permissions S_IREAD and S_IWRITE.
You may assume that the local directory already exists; exit with an error
if it does not. You should overwrite any files which already exist without
prompting.( WARNING: Make sure that you are running the client in a directory
where there are no useful files that may be overwritten!!)
-
If you encounter a file which you cannot access due to permissions, you
should skip over it without creating it locally or producing any error
message. Do not create a null-size file with the same name.
-
Your program should terminate on any unexpected response across the FTP
control connection, any network error across either connection, or a file
write error. In this case, you should issue an error message at termination
in the format specified below. On any premature termination, you can assume
that the install script knows how to restore the state of the file system.
Your program is not expected to remove files from partially completed transfers.
-
Your program should print nothing but a one-line status report at termination.
This report should be written to stderr, and should contain:
-
The message: "OK: xxxxxxxxx bytes copied" on success (where xxxxxxxxx is
the total of all bytes copied, without leading zeros or spaces)
-
The message: "ERROR: " followed by some meaningful error message on any
error. If you received an error response across the FTP control connection,
using that message is acceptable. If you are terminating for any other
reason, you should write a meaningful error message. (Hint: look at the
perror system call, the errno variable, and the sys_errlist variable).
For example:
OK: 45614 bytes copied
ERROR: 530 Login incorrect.
ERROR: write: Disk full.
-
If your program terminates successfully, it should return a result of 0.
Otherwise, it should return a result of 1. (See the exit() system call).
-
Your program should follow the conventions of anonymous ftp. This means
logging in as the user "anonymous" and sending an e-mail address as the
password. For the e-mail address, use the current login name, the '@' sign,
and the fully qualified name of the host on which you are running. For
example, "bigboss@monolith.megasys.com".
-
Your deliverables will be the source code for the utility (including a
makefile) and a concise write-up describing your program's design
and any caveates (such as functionality you weren't able to complete
or doesn't work quite right).
Doing the Assignment
To get you started, we've created a tarfile containing a skeletal .c file and
a Makefile, download it here. You should work
within this directory as we provide a make target (make dist) to create the
final tarfile for submission.
Before you start writing the assignment, you will have to familiarize yourself
with both the Berkeley socket API and the FTP protocol.
The FTP protocol is specified completely in
RFC 959.
It is summarized in Chapter 27 of Stevens, TCP/IP Illustrated, Volume 1: The
Protocols. A short summary is also provided below.
WARNING: As
mentioned before , you absolutely should *NOT* run tests in the same
directory as your program. You are likely to accidentally destroy your source
files. It is recommended that you backup your files frequently in another
directory.
You may also, of course, test your program by connecting to anonymous
ftp servers. Some possible sites to test against are ftp.cs.princeton.edu,
ftp.fedworld.gov, ftp.microsoft.com , ftp.cs.stanford.edu,
gatekeeper.dec.com, ftp.kernel.org.
Of course, you can choose any other anonymous ftp site too.
We strongly recommend that you do not go to only one of these
servers, since all of you might overload it, and the response time for
everybody will be much slower. A way to do it is to start with one server at
random and move to the next one the next time that you test your program.
You will need to understand the basic socket calls to do this assignment.
You can also read about the following calls, either with man or in the
Unix Network Programming book by Stevens:
-
Parsing addresses
-
inet_addr
-
Convert a dotted quad IP address (such as 36.56.0.150) into a 32-bit address.
-
gethostbyname
-
Convert a hostname (such as ftp.cs.princeton.edu) into a 32-bit address.
-
getservbyname
-
Find the port number associated with a particular service, such as FTP.
-
Setting up a connection
-
socket
-
Get a descriptor to a socket of the given type
-
connect
-
Connect to a peer on a given socket
-
getsockname
-
Get the local address of a socket
-
Communicating over the connection
-
read/write
-
Read and write data to a socket descriptor
- htons, htonl / ntohs , ntohl
-
Convert between host and network byte orders (and vice versa) for 16 and 32-bit values
To make things simple, assume the shell always sets the environment variables
USER and HOST. You can use the call getenv to get their values.
(As with any library/system call, you should still remember to check the
return value from this call, though, to make sure this assumption holds!)
Use these values to build the password string as described above.
Be careful where you run the supplied ftpd. Since it allows anonymous
transfers with basically no security, anyone can use it to get at whatever
files are in its current directory. Your fellow students could even use
this to get at your solution to this assignment!
Summary of the FTP protocol
An FTP session consists of a control connection and one or more
data
connections. All actions are initiated by the client over the control connection.
This model is defined in section 2.3 of RFC 959.
The control connection is a normal TCP connection (created by the client
with the socket() and connect() calls). The client sends commands over
this connection in a format called NVT ASCII. NVT ASCII is defined in the
TELNET RFC (RFC
854). For the purposes of this assignment, it is enough to know that
NVT ASCII is normal ASCII in which each line is terminated with the CR
and LF characters (the string "\r\n" in C). In this assignment you can
assume that no NVT string will be longer than 1024 bytes.
Each command is a 3 or 4 character string, followed by optional parameters
separated by spaces. All of the commands are specified in RFC 959, section
4.1. For this assignment, a good starting point is to read about the following
commands:
-
USER - Specify the user name to login as
-
PASS - Specify the user's password
-
CWD - Change to the given directory
-
TYPE - Set the transfer type to ASCII(A) or binary image(I)
-
PORT - Specify the port number of the upcoming data connection
-
LIST - List the files for the given file specification
-
RETR - Retrieve the given file
-
QUIT - Close the FTP connection
In response to each command, the server will respond with one or more replies.
Each reply is prefaced by a reply code. The reply codes have the following
format:
-
1xx - Positive preliminary reply. The action is being started but expect
another reply before sending the next command.
-
2xx - Positive completion reply. The action succeeded and a new command
can be sent.
-
3xx - Positive intermediate reply. The command was accepted but another
command is now required.
-
4xx - Transient negative completion reply. The command failed and should
be retried later.
-
5xx - Permanent negative completion reply. The command failed and should
not be retried.
Section 4.2 of RFC 959 specifies this reply format and how multi-line replies
are handled. Your program will have to parse these replies.
In addition to the reply codes, the LIST and RETR commands send back
data over a data connection. This connection is created in the following
manner:
-
The client creates another socket with socket()
-
The client binds an address to the socket with bind(). A port address of
0 is specified to obtain an ephemeral port.
-
The client calls listen() to indicate willingness to accept a connection
to the new socket.
-
The client calls getsockname() to determine the port number of the socket.
-
The client sends a PORT command to the server over the control connection
to inform the server of the port number.
-
The server connects to the new socket.
-
The client sends the appropriate RETR or LIST command over the control
connection.
-
The client calls accept() to accept the server's connection.
At this point, the data connection has been established and the server
will send the requested data over the connection. In the case of LIST,
the returned data will be the list of filenames in NVT ASCII. In the case
of RETR, the returned data will be the file contents. The server will automatically
close the connection from its end once all of the data has been sent. This
will be indicated by a return code of 0 on a read from the data connection's
socket descriptor.
A word on LIST. Unfortunately the output of the LIST command is not
standarized. The other option is to use NLST, but some server
implementations do not show the subdirectories in response to a NLST command,
which makes it difficult to use. Hence, we are requiring the use of LIST for this
assignment. Due to the large variety of responses a ftp server may
generate when issued the LIST command, you may face difficulty when
attempting to parse the output in search of the file/directory name.
You are certainly welcome to write such parsing code yourself, though we have
provided a slightly modified version of the publicly available ftpparse library here (the minor
modifications present in our version allow for clean compilation under gcc). You may read about how
to use the
ftpparse library at its home page, http://cr.yp.to/ftpparse.html. Below is some source
code you
can use to easily manipulate the result of a LIST command:
#include <string.h>
#include "ftpparse.h"
#define kTelnetEOF "\r\n"
char *listResponse; /* Assume this contains the server's LIST response, terminated with a '\0' */
char *curPtr, *endPtr;
struct ftpparse ftpInfo;
curPtr = listResponse;
for (endPtr = strstr(curPtr, kTelnetEOF); endPtr != NULL;
endPtr = strstr(curPtr, kTelnetEOF))
{
*endPtr = '\0'; /* 'curPtr' now points to a C-string representing one line of the response */
if (ftpparse(&ftpInfo, curPtr, strlen(curPtr)) == 1)
{
/* A wealth of information may be extracted regarding the directory entry here by
* examining the fields of ftpInfo (see ftpparse.h).
* 'size' = entry size in bytes (relevant if 'sizetype' == FTPPARSE_SIZE_BINARY)
* For ['flagtrycwd', 'flagtryretr'] =
* - [0, 1]: most likely a file
* - [1, 0]: most likely a directory
* - [1, 1]: most likely a link (could be either a file or a directory)
* - [0, 0]: unknown
* 'name' points to a character array of length 'namelen' within the string pointed to
* by 'curPtr.'
*/
}
curPtr = endPtr + strlen(kTelnetEOF); /* Advance to start of next line */
}
We expect that your program will work with any FTP server software
supported by this library (even if you choose to write your own parser).
You will not be held responsible for other abnormal sorts of LIST
responses (in particular, you do not have to support FTP server software
that is not supported by the ftpparse library).
Miscellaneous Notes and Hints
- You are not allowed to use the STATUS (STAT) command in your client, as the
implementation of this command differs across different ftp servers.
- A well-documented, clear solution to the problem should be
feasible in as few as 500 lines of code, and easily doable in 1000
lines or less--probably no more than two or three source files,
with functions no longer than 50-70 lines. If your total code
length or individual functions are much longer than this, you might
want to rethink your design.
- While Stevens is an excellent reference for socket programming,
some of the code fragments do not necessarily represent good coding
style as required in this course--think long and hard before using
code as-is from the book! Would you have written it that way?
Assignment FAQ and Grading Guidelines
A
FAQ and the
grading guidelines
are also available. Please look over them before asking your question from the TAs.
Deliverables
Your submission is expected be a tarfile of your source directory (including
the directory!). It should include your README, your source files and a
Makefile which creates your program. Be sure your submissions conforms
to the following:
- Your Makefile uses -Wall to compile your code
- Your Makefile does not issue warnings when compiling your code
- Your submission was created by using "make dist" from Makefile we provided
Submit your assignment by email to mhw@cs.princeton.edu , by 9:00pm on March 1st, 2006.
References