MDST 311-1 (Schedule
#
40410)
University of Virginia
Spring 2006
MWF 11:00-11:50am :: Ruffner Hall
G004B
Mr. David Golumbia
Office: 304B Bryan
Spring 2006 Office Hours: MW 9:30-10:30am, W 1:30-2:30pm
Computers and Languages
This course provides a synoptic overview of topics in the study of computers and languages. We will primarily study the major research areas in computational linguistics, including the standard approaches to natural language processing (NLP), corpus linguistics, the computation of the lexicon, of morphology, and of grammar. Following recent trends, we will also read about statistical and probabilistic approaches to language understanding and analysis. Then we will turn to the study of larger spans of linguistic data such as texts and databases. We will also read about the wider impact of computers on the world of languages, and particularly in the ways that computers frame our linguistic interactions with speakers of minority and endangered languages. Satisfies second writing requirement, linguistics requirement for Cognitive Science, and the theory requirement for Linguistics majors. The student's course of study will vary slightly depending on which requirement they are satisfying. Students will choose one research topic relevant to their area of study and present their research at the end of the term, in addition to submitting this research in the form of a paper or project. Students will also write two briefer assignments during the term and, one time, summarize the day's reading for the class. This course presumes basic familiarity with computer concepts and terminology, and/or a previous course in linguistics, but has no other formal prerequisites.
Required textbooks (available at UVa Bookstore)
- David Crystal, The Language Revolution, Polity Press, 2004 (indicated as Crystal in syllabus)
- Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999 (indicated as Manning-Schütze in syllabus)
Secondary texts (not required for purchase; many toolkit selections are derived from these books)
- Douglas Biber, Susan Conrad and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambridge University Press, 1998
- Robert Dale, Hermann Moisl, and Harold Somers, eds., Handbook of Natural Language Processing, Marcel Dekker, 2000
- Roger Schank and Kenneth Colby, eds., Computer Models of Thought and Language, WH Freeman, 1973
- George Smith, Computers and Human Language, Oxford University Press, 1991
Texts on toolkit and online
- James Allen, George Ferguson, Eric K. Ringger, Teresa Sikorski Zollo, and Bradford W. Miller, "Dialogue Systems: From Theory to Practice in TRAINS-96" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Douglas Biber, Susan Conrad and Randi Reppen, "Register Variation and ESP" (from Corpus Linguistics)
- Douglas Biber, Susan Conrad and Randi Reppen, "Lexico-Grammar" (from Corpus Linguistics)
- Eric Brill, "Part-of-Speech Tagging" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Laura Buszard-Welcher, "Can the Web Help Save My Language?" (from Leanne Hinton and Ken Hale, eds., The Green Book of Language Revitalization, Academic Press, 2001). Online at http://www.potawatomilang.org/Reference/endlgsweb_update.htm
- Roland Hausser, "Computational Language Analysis" (from his Foundations of Computational Linguistics, 1998)
- Dan Jurafsky and James H. Martin, "Dialogue and Conversational Agents" (from their Speech and Language Processing, Prentice-Hall, 2000)
- David D. McDonald, "Natural Language Generation" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Hermann Moisl, "NLP Based on Artificial Neural Networks: Introduction" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Philip Resnik and Noah H. Smith, "The Web as a Parallel Corpus" (Computational Linguistics 29:3, 2003)
- George Smith, "The Challenge of Spoken Language" (from Computers and Human Language)
- George Smith, "Components of Words" (from Computers and Human Language)
- Harold Somers, "Example-Based Machine Translation" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Harold Somers, "Machine Translation" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Richard K. Sproat, "Lexical Analysis" (from Dale, Moisl, and Somers, eds., Handbook of NLP)
- Dan Sullivan, "Understanding the Structure of Text" (from his Document Warehousing and Text Mining, John Wiley & Sons, 2000)
- Mark Warschauer, "Language, Identity, and the Internet" (from Beth E Kolko, Lisa Nakamura, and Gilbert B. Rodman, eds., Race in Cyberspace, Routledge, 2000)
- Yorick Wilkes, "An Artificial Intelligence Approach to Machine Translation" (from Schank and Colby, Computer Models of Thought and Language)
- Terry Winograd, "A Procedural Model of Language Understanding" (from Schank and Colby, Computer Models of Thought and Language)
Course Requirements
- Participation
- You are expected to come to class each day having done the day's reading and ready to discuss the issues raised with the instructor and with other members of the class. This class will be taught primarily through discussion.
- Presentation 1
- Each day we will have either one or two students present a 5-minute summary of and response to major ideas in that day's reading for the class. Click here for a list of discussion leaders for each day.
- Presentation 2
- Assignment 1
(due Fri, Feb 27)
- For CogSci, CompSci, Ling Theory, & Graduate Students: Problem set in Manning-Schütze Chapter 3. Do all fifteen problems.
- For other students: write a 4-5 page summary and review of one of the readings from later in the term (after the date on which this assignment is due).
- Assignment 2
(due Fri, Mar 31)
- For CogSci, CompSci, Ling Theory, & Linguistics Graduate Students: Problem set in Manning-Schütze Chapter 7. Do problems 7.1, 7.8, 7.9, 7.10, 7.11, 7.12, 7.14.
- For other students: write a 4-5 page essay describing computer or web use from the perspective of the speaker of a particular minority language. To find language, use the list of Language Families from the web version of the SIL Ethnologue (the listings include the number of speakers for each langauges). We will define minority languages as any language with fewer than 1,000,000 speakers. This rules out only 347 or 6.3% of the Ethnologue's 6,900 languages. As this table from Ethnologue shows, there are around 2500 languages with between 100,000 and 999,999 speakers, and it is this group among whom you are likely to find the strongest web presence, although you are welcome to go further down the list of candidate languages. However, please feel free to propose any language to me directly as a candidate, including those with around 1,000,000 speakers, if you feel there is something intersting to say about its use on computer or the web and that there is a compelling case for it as a minority language.
- You may also choose to write a 4-5 pg paper on a topic related to the class by prior approval with the instructor.
- Assignment 3 (Final assignment, due on the last day of class)
- For CogSci, CompSci, Ling Theory, & Linguistics Graduate Students: Do one of the extra units for the class. The final result of this unit can be either a paper on a topic in the unit, developed with the instructor, or a project in the area. Students may also develop their own projects in a relevant area of research after consultation with the instructor.
- For other students: write a 10 page essay. Students can develop their own topic in consultation with the instructor. All students are free to do the extra unit topics for their final projects. Students can also develop a topic raised in a previous assignment for their final paper.
Grading
- Grading for the course will be calculated on a simple point basis with the emphasis on written work, although 20% of the course grade will be based on student participation (including the two presentations):
- Assignment 1: 20%
- Assignment 2: 20%
- Assignment 3: 40%
- Presentation 1: 5%
- Presentation 2: 5%
- Participation: 10%
Policies
- No late work will be accepted in this class. Work handed in after the due date will be penalized 1/3 grade for each day late (ie, an A- paper handed in one day late becomes a B+).
- You are expected to have done the primary reading and any other primary course assignments before the beginning of course each day.
- You are allowed to miss 3 classes without explanation during the semester, but absences above 3 without prior arrangement will count against your participation grade
- All work in this course is subject to the University's Honor Code. You may work in teams for some assignments, but all written work must be solely your own, and any reliance on published work must be properly cited.
Week-by-Week Syllabus
Note: reading on some Fridays is marked with an asterisk (*). These readings are required only for Cognitive Science, Computer Science, Linguistic Theory and graduate students, and recommended for other students. All readings are either in toolkit or in one of the two textbooks for the course as indicated above.
- Course Introduction
- Weds Jan 18
- Reading: Manning-Schütze, beginning through Ch 1, pts 1.1, 1.2
- Fri Jan 20
- Reading: Manning-Schütze, Ch 1, pts 1.3, 1.4, 1.5
- Linguistic Issues
- Mon Jan 23
- Reading: Manning-Schütze, Ch 3, pt 3.1
- Weds Jan 25
- Reading: Manning-Schütze, Ch 3, pt 3.2
- Fri Jan 27
- Reading: Smith, "Components of Words"
- Computers in Language History
- Mon Jan 30
- Weds Feb 1
- Fri Feb 3
- Overview of NLP
- Mon Feb 6
- Reading: Winograd, "Procedural Model"
- Weds Feb 8
- Reading: Hausser, "Computational Language Analysis"
- Fri Feb 10
- Reading: Brill, "Part-of-Speech Tagging"
- Corpus Linguistics
- Mon Feb 13
- Reading: Manning-Schütze Ch 4
- Weds Feb 15
- Reading: Biber, Conrad, & Reppen, "Lexico-Grammar"
- Fri Feb 17
- Reading: Biber, Conrad, & Reppen, "Register Variation and ESP" (*)
- No classes week of Feb 20-24 (instructor away)
- Computing Spoken Language
- Mon Feb 27
- Reading: Smith, "The Challenge of Spoken Language"
- Weds Mar 1
- Reading: Jurafsky and Martin, "Dialogue and Conversational Agents"
- Fri Mar 3
- Reading: Allen et al, "Dialogue Systems: From Theory to Practice in TRAINS-96"
- Assignment 1 due
- No classes week of Mar 6-10 (UVa Spring Break)
- Computing Words
- Mon Mar 13
- Reading: Manning-Schütze Ch 5
- Weds Mar 15
- Reading: Manning-Schütze Ch 7
- Fri Mar 17
- Reading: Sproat, "Lexical Analysis"
- Computing Grammars
- Mon Mar 20
- Reading: Manning-Schütze Ch 9
- Weds Mar 22
- Reading: Manning-Schütze Ch 10
- Fri Mar 24
- Reading: McDonald, “Natural Language Generation” (*)
- Statistical NLP
- Mon Mar 27
- Reading: Manning-Schütze Ch 11
- Weds Mar 29
- Reading: Manning-Schütze Ch 12
- Fri Mar 31
- Reading: Moisl, "NLP Based on Artificial Neural Networks"
- Assignment 2 due
- Computing Text
- Mon Apr 3
- Reading: Sullivan, "Understanding the Structure of Text"
- Weds Apr 5
- Reading: Manning-Schütze Ch 15
- Fri Apr 7
- Reading: Manning-Schütze Ch 16 (*), Manning-Schütze Ch 14 (*)
- Language Politics of Computing
- Mon Apr 10
- Reading: Warschauer, "Language, Identity, and the Internet"
- Weds Apr 12
- Reading: Buszard-Welcher, "Can the Web Help Save My Language?"
- Fri Apr 14
- Reading: Crystal, Chapters 4-5
- Translation and Pragmatics
- Mon Apr 17
- Reading: Somers, "Machine Translation" and "Example-Based Machine Translation"
- Weds Apr 19
- Reading: Resnik and Smith, "The Web as a Parallel Corpus"
- Fri Apr 21
- Reading: Wilkes, "Artificial Intelligence" (*); Manning-Schütze Ch 13 (*)
- Final Presentations and Summary
- Mon Apr 24
- Weds Apr 26
- Fri Apr 28
- Mon May 1
- Final presentations
- Assignment 3 due
Last updated April 7, 2006.