CS 320: Compiling Techniques

COS 320 - Assignment 4

Compiling Techniques, Spring 2012, Princeton University. Due date: Monday 5 March.

Lexical analyzer

Setup: Make a new directory as4 for this assignment by unpacking as4.zip.

Run sml and compile the code using CM.make "sources.cm";

Build a lexer for the Fun language. In order to do this, you will need to figure out all of the tokens that you need to lex, by reading The definition of Fun.

You will be building your lexer using ML-LEX. There is online documentation on ML-LEX here. There is also some help in your textbook (Appel, chapter 2).

Read and understand the following files: sources.cm, errormsg.sml, tokens.sig, tokens.sml, fun.lex, runlex.sml.

(Actually, you can get by without understanding the stuff about "('svalue,'pos) token" in tokens.sig. However, do the experiment of writing a little program that just calls Tokens.ARROW(0,0) just to see what it's doing. You can even make this call from the SML/NJ interactive top-level prompt.)
(Also, I wouldn't think any less of you if you didn't bother to understand the first four lines of fun.lex; that's just boilerplate for attaching the lexer to the parser.)

Edit the code in fun.lex. You will have to remove some of the sample code and add a lot of your own.

The tokens are declared in tokens.sig. Here is a list of symbols that will appear in the source along with the tokens they should be associated with:

`->`	ARROW		`!`	BANG
`:=`	ASSIGN		)	RPAREN
`(`	LPAREN		\|\|	OR
`&`	AND		`=`	EQ
`>`	GT		`<`	LT
`*`	TIMES		`-`	MINUS
`+`	PLUS		`;`	SEMICOLON
`,`	COMMA		:	COLON
`#i`	PROJ	(where i is a nonnegative integer without leading 0's: 0,1,...)

Fun keywords should be represented using tokens with the same name. The end-of-file token should be represented using the token EOF.

Each token takes two integers: the line number and column number of the beginning of the token. These are used for error reporting. In the example below, x is on the second line in column 7. Notice, the first row is row 1 and the first column is column 1 (as opposed to 0).

if true then let x = ...

Use ErrorMsg.error : ErrorMsg.pos2 -> string -> unit to report errors. It takes two arguments: a pair of file-positions (the beginning and end of the erroneous text, measured in characters from the beginning of the file), and the error message to print out. You should keep the ErrorMsg module informed of where the newlines occur (by calling newLine) so that it can translate these file-positions to line numbers.

The make_pos function is a convenient way to convert the things ML-Lex knows (yypos and yytext) into the file positions of the beginning and end of the token.

Be careful with your syntax in lex files. Remember that each lex definition must end with ";" and each lexing rule must also end with ";". If you forget ";" then ML-Lex will complain. For example, the following is correct:
```
 .... fun eof () = ... %% alpha = [A-Za-z]; .... %% fun => (Tokens.FUN(make_pos(yypos,yytext)));


 ....

 
```
type is a reserved keyword that we may use for later language extensions. For now, the easiest way to handle it is to simply enforce that programs don't contain it. Since there's no token type for it in the current files, this can't be done in the parser, so you may want to do it in the lexer. But it's also fine if you don't handle it at all - simply assume that programs don't contain type. But strings like typexx should still be lexed as identifiers.
To test your code, run RunLex.runlex "test.fun"; It will output a sequence of tokens along with the line and column number of each token. As always, a single test is far from complete. You will want to write your own test cases and thoroughly test your lexer.

Nested comments. This assignment would be a lot easier if the Fun language didn't have nested comments. I recommend that you first get everything working except nested comments. Then add that feature as your time permits; it is worth 10% of this assignment's grade.

What to turn in

Submit ~~myfun.sml,~~ fun.lex, as well as a README. In the README, please write,

Who you got help from, if any, or who you helped with this assignment, or with whom you just discussed and exchanged ideas. Briefly explain the nature of the help or the discussion.
Explain any design decisions you had to make.
Give any other feedback or comments that you wish to about this assignment.

Upload your submission to dropbox here.

Caveat: Some of the example code in the ML-Lex documentation is written for an earlier version of SML; for example, the library function "revfold" used in the documentation is now called "foldr".

Back to COS 320 front page