COS401/TRA301 - Introduction to Machine Translation

Spring 2009

by Srinivas Bangalore

Course home Syllabus and Readings Blackboard

 

Annoucements

18/May/2009: Slides from student presentations are available.
10/May/2009: Take a look at the reading list for the final exam.
05/May/2009: Lecture slides about WordNet are now posted.

 

Older Annoucements

23/Apr/2009: Lecture slides about Interlingua-based MT are now posted.
22/Apr/2009: Grades for Homework #4 are now posted on Blackboard.
22/Apr/2009: Lecture slides about Transfer-based MT are now posted.

16/Apr/2009: Grades for Homework #3 are now posted on Blackboard.

11/Apr/2009: Homework #5 is now posted. Due date: 23/Apr in class. Please submit a hard copy.
11/Apr/2009: Lecture slides about direct MT are now posted.

28/Mar/2009: Grades for Homework #2 and the Midterm exam are now posted on Blackboard.
28/Mar/2009: Homework #4 is now posted. Due date: 9/Apr in class. Please submit a hard copy.
28/Mar/2009: Lecture slides about semantics are now posted.

15/Mar/2009: Homework #3 is now posted.

07/Mar/2009: Lecture slides about parsing and syntax are now available.

26/Feb/2009: Grades for Homework 1 are posted on Blackboard.

20/Feb/2009: Reading assignment for next class: J&M Chapters 12,13. Also check out Chapter 4 about language models.
19/Feb/2009: A discussion forum has been set up in Blackboard. Click on Discussion Forum on the left hand side menu.
19/Feb/2009: Lecture 3 slides are available here.
19/Feb/2009: Homework #2 is now posted here. Due date: 5/Mar in class. Please submit a hard copy.

12/Feb/2009: Next class reading assignment: J&M Book: Chapter 5 and 6.
12/Feb/2009: Lecture 2 slides are now available here and here.
12/Feb/2009: Homework #1 is now posted here. You will need to use the AT&T FSM library. Due date: 19/Feb.

06/Feb/2009: Next class reading assignment: J&M Book: Chapter 3 on Morphology and Chapter 5 on Part-of-speech tagging.
06/Feb/2009: Lecture 1 slides are now available here and here.
06/Feb/2009: More reading materials have been posted here.

05/Feb/2009: For non-CS major students, please apply for a CS class account. A CS account will not be needed any longer.

28/Jan/2009: First class will be Thursday February 5th 1:30pm-4:20pm, Robertson Hall 023.



Instructor: Dr. Srinivas Bangalore

Office: 330 Aaron Burr Hall

Phone: (973)-360-7041

Email: srini [at] research [dot] att [dot] com

Office hour: after class (Thursday 4:20pm)

 

 

 

TA: Juan Carlos Niebles

Office: 215 Computer Science Bldg.

Phone: (609) 258-8241

Email:  jniebles [at] princeton

Office hour: Tuesdays 2:00pm-3:00pm or by appointment.

 

Course Description

The course will introduce machine translation from a historical and commercial perspective, provide details of the three different paradigms of machine translation (direct, transfer, interlingua), discuss the strengths and limitations of each of these techniques. The course will cover extensively the techniques that are used to process human languages (morphological analysis, parsing, language generation) and the linguistic resources required to transform them into a machine processeable form. There will be extensive use of linguistic examples to motivate the need for these techniques for processing human language. Finally, the course will cover topics of specialized interest such as domain-limited machine translation, human-aided machine translation, and highlight some recent research topics in statistical/example-based machine translation and speech-to-speech translation.

 

Prerequisites

The course will involve some programming exercises. Students are required to have programming experience or should have completed COS126 (Introduction to Computer Science) course.

 

Course website:

http://www.cs.princeton.edu/courses/archive/spring09/cos401/

 

Course location and time:

Thursday, 1:30pm – 4:20pm, Robertson Hall 023

 

Suggested Reading List:

  1. (HS) An introduction to machine translation, W. John Hutchins and Harold L. Somers, London: Academic Press, 1992.
  2. (HS-EBMT) Review Article: Example-based Machine Translation. Harold Somers. Machine Translation. 1999.
  3. (Dorr 1994) Dorr, Bonnie J., "Machine Translation Divergences: A Formal Description and Proposed Solution". Computational Linguistics, 20:4, pp. 597-633.
  4. (Brown et al, 1993) "The Mathematics of Statistical Machine Translation: Parameter Estimation". P. Brown, S. Della Pietra, V. Della Pietra, R. Mercer. Computational Linguistics, 19(2).
  5. (NSW) Readings in Machine Translation, S. Nirenberg, H. Somers and Y. Wilks, MIT Press, 2002
  6. (AT) Translation Engines: Techniques for Machine Translation, Arturo Trujillo, Springer 1999
  7. (JM) Speech and Language Processing, Jurafsky and Martin, Prentice Hall
  8. (CW) Recent Advances in Example-Based Machine Translation (Text, Speech and Language Technology), Carl and Way, Kluwer Academic Publishers, 2003
  9. Research papers on Machine Translation from recent conferences.
  10. Lexicon for KBMT.

 

Resources:

 

Homeworks:

Homework 1 [pdf]. Data in [zip] and [tar.gz] formats. Due date: 19/Feb/2009. Solutions [ppt]

Homework 2 [pdf]. Data in [zip] and [tar.gz] formats. Due date: 5/Mar/2009 in class.

Homework 3 [pdf]. Data in [zip] and [tar.gz] formats. Due date: 26/Mar/2009 in class.

Homework 4 [pdf]. Data in [zip] and [tar.gz] formats. Due date: 9/Apr/2009 in class.

Homework 5 [pdf]. Data in [zip] format. Due date: 23/Apr/2009 in class.

 

Assessment:

Class participation and attendance 15%

Homework assignments 20%

Midterm exam 30%

Final exam 35%