LING 83600 Language Technology / CS 84010 Natural Language Processing, CUNY Graduate Center, Fall 2014

Time and Location T 11:45am-1:45pm, Room 6496
Personnel Prof. Liang Huang (huang at cs.qc), Instructor
James Cross (jcross at gc.cuny), TA
Office Hours Tuesday afternoons at CS Lab. Additional office hours available before HW dues and exams.
Prerequisites CS: algorithms and datastructures (especially recursion and dynamic programming).
solid at programming (in Python). basic understanding of formal language and automata theory.
LING: minimal understanding of morphology, phonolgy, and syntax (we'll review these).
MATH: good understanding of basic probability theory.
Textbooks / MOOCs This course is self-contained (with slides and handouts) but you may find the following textbooks helpful:
  • Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default reference)
  • Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing.

You might also find these Coursera courses helpful:

  • Jurafsky and Manning (Stanford)
  • Collins (Columbia) -- more mathematical
  • Homework: 48%.
    • programming exercises in Python + pen-n-paper exercises
    • late penalty: you can submit only one (1) HWs late (by 48 hours).
  • Quiz: 7%
  • Final Project: 5 (proposal) + 5 (talk) +15 (report) = 25%. individually or in teams of 1-3 students.
  • Exercises: 5+5=10%. graded by completeness, not correctness.
  • Class Participation: 10%
    • asking/answering questions in class; helping peers on HWs (5%)
    • catching/fixing bugs in slides/exams/hw & other suggestions (2%)
    • reward for submitting no HW late (3%)

Tentative Schedule:
1Sep 2Intro to NLP and Rudiments of linguistic theory
Intro to Python for text processing
Unit 1: Sequence Models and Noisy-Channel: Morphology, Phonology
2Sep 9Basic automata theory. FSA (DFA/NFA) and FST.
3Sep 16FSAs/FSTs cont'd
The Noisy-channel model.
HW1 out: FSA/FSTs, carmel; recovering vowels
5Sep 30 hw1 discussions
SVO/SOV vs. infix/postfix; adv of SVO: less case-marking; adv of SOV: no attachment ambiguity
simple pluralizer
language model: basic smoothing: Laplacian, Witten-Bell, Good-Turing
Quiz 0
ex1 out
6Oct 7 language model (cont'd): information theory, entropy and perplexity, Shannon game
Viterbi decoding for HMM; transliteration
hw2 out: English pronunciation, Japanese transliteration
7Oct 14 Pluralizer demo;
discussions of HW2.
More on HMM/Viterbi; sample code.
intro to HW3 (semi-markov).
hw3 out: decoding for Japanese transliteration
Unit 2: Unsupervised Learning for Sequences: Transliteration and Translation
8Oct 21 Korean vs. Japanese writing systems.
More on semi-markov Viterbi.
EM for transliteration.
9Oct 28 More on EM: forward-backward and theory
hw4 out: EM for transliteration.
10Nov 4 Machine Translation: IBM Models 1-2
11Nov 11 EM for IBM Model 1
12Nov 18 EM/HMM demo from Jason Eisner
Pointwise mutual information vs. IBM model 1 and IBM model 4

Unit 3: Tree Models: Syntax, Parsing, and Semantics
13Nov 25 CFG and CKY hw5 out: IBM model 1
14Dec 2 semantics intro; entailment; upward and downward monotonicity.
15Dec 9
(last class)
compositional semantics: quantifiers, type raising. hw6 out: parsing

Other NLP/CL courses:
Reading List

Liang Huang
Last modified: Fri Mar 15 18:03:42 EDT 2013