LING 83600 Language Technology / CS 84010 Natural Language Processing, CUNY Graduate Center, Fall 2014

Time and Location T 11:45am-1:45pm, Room 6496
Personnel Prof. Liang Huang (huang at cs.qc), Instructor
James Cross (jcross at gc.cuny), TA
Office Hours
Additional office hours available before HW dues and exams.
Prerequisites CS: algorithms and datastructures (especially recursion and dynamic programming).
solid at programming (in Python). basic understanding of formal language and automata theory.
Ling: minimal understanding of morphology, phonolgy, and syntax (we'll review these).
Math: good understanding of basic probability theory.
Textbooks This course is self-contained (with slides and handouts) but you may find the following textbooks helpful:
  • Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default reference)
  • Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing.
MOOCs You might also find these Coursera courses helpful:
  • Jurafsky and Manning (Stanford)
  • Collins (Columbia) -- more mathematical
  • Homework: 48%.
    • programming exercises in Python + pen-n-paper exercises
    • late penalty: you can submit only one (1) HWs late (by 48 hours).
  • Quiz: 7%
  • Final Project: 5 (proposal) + 5 (talk) +15 (report) = 25%. individually or in teams of 1-3 students.
  • Exercises: 5+5=10%. graded by completeness, not correctness.
  • Class Participation: 10%
    • asking/answering questions in class; helping peers on HWs (5%)
    • catching/fixing bugs in slides/exams/hw & other suggestions (2%)
    • reward for submitting no HW late (3%)

Tentative Schedule:
1Sep 2Intro to NLP and Rudiments of linguistic theory
Intro to Python for text processing
Unit 1: Sequences and Noisy-Channel
2Sep 9Basic automata theory. FSA (DFA/NFA) and FST.
3Sep 16FSAs/FSTs cont'd
The Noisy-channel model.
HW1 out: FSA/FSTs, carmel; recovering vowels
5Sep 30 hw1 discussions
SVO/SOV vs. infix/postfix; adv of SVO: less case-marking; adv of SOV: no attachment ambiguity
simple pluralizer
language model: basic smoothing: Laplacian, Witten-Bell, Good-Turing
Quiz 0
ex1 out
6Oct 7 language model (cont'd): information theory, entropy and perplexity, Shannon game
Viterbi decoding for HMM; transliteration
hw2 out: English pronunciation, Japanese transliteration
7Oct 14 Pluralizer demo;
discussions of HW2.
More on HMM/Viterbi; sample code.
intro to HW3 (semi-markov).
hw3 out: decoding for Japanese transliteration
8Oct 21 Korean vs. Japanese writing systems.
More on semi-markov Viterbi.
EM for transliteration.
9Oct 28 More on EM: forward-backward and theory
hw4 out: EM for transliteration.
10Nov 4 Machine Translation: IBM Models 1-2
11Nov 11 EM for IBM Model 1 hw5 out: IBM model 1

Other NLP/CL courses:
Reading List

Liang Huang
Last modified: Fri Mar 15 18:03:42 EDT 2013