Liang Huang

Assistant Professor, Department of Computer Science, Queens College
Doctoral Faculty, Computer Science Program, The Graduate Center
The City University of New York

Research Scientist (part-time), IBM T. J. Watson Research Center

ACL 2014 Tutorial on Structured Prediction: updated slides [pdf, animated, 19M] [pdf, 4ups (6M)] [keynote.zip (8M)]

My research interests are mainly in the algorithmic and formal aspects of computational linguistics (esp. parsing and machine translation) and artificial intelligence in general. The key questions that motivate my research are:

Why are computers so bad at understanding and processing natural language? Can we teach computers to process natural language the way we humans do, that is, both fast and accurate? Or, can computers process natural language the way they process programming languages in spite of the inherent ambiguity of the former?

So recently I have been focusing on linear-time algorithms for parsing and translation inspired by both human processing (psycholinguistics) and compiler theory.

On the other hand I also work on theoretical and practical problems in structured learning with inexact search that rises from NLP but also applies to other structured domains such as computational biology. I had also worked on structural biology (esp. protein folding) using dynamic programming inspired by computational linguistics (see below).

Lastly, I remain interested in theory and algorithms and some of my NLP papers draw unexpected connections from theoretical computer science, e.g., the linear-time synchronous binarization algorithm was inspired by Graham Scan for Convex Hull, and the k-best parsing algorithms are often featured in the exams when I teach Algorithms.

Listing of my papers on Google Scholar and a subset in ACL Anthology Network.

Recent and Past Talks

  • Structured Learning with Inexact Search. Talks given at UMass Amherst, IBM, Columbia, Rochester, etc. slides.

  • Dynamic Programming for Incremental Parsing. Talks given at UCSD, Google, JHU, MIT, IBM, etc.
    slides and video from the talk at JHU CLSP Seminar (Sep 2010).

  • Tree-based and Forest-based Translation. Talks given at BBN, CUHK, Berkeley, Pomona, etc. slides.

  • Forest-based Algorithms in NLP (thesis work). Talks given at MIT, Google, Stanford, CMU, etc.

  • Tutorial: Advanced Dynamic Programming in Semiring and Hypergraph Frameworks.
    NAACL 2009 and COLING 2008. [slides] [paper]
    (based on my candidacy exam and Chapter 2 of my thesis)
  • Binarization of Synchronous Context-Free Grammars (and its connection to the Graham Scan for Convex Hulls).
    talks given at CAS/ICT and HKUST (2007).
    [slides]
  • Fast Decoding with Synchronous Grammars and n-gram Models ("forest rescoring" paper).
    video and slides from the Microsoft Research talk (Dec 2006).

Latest and Current Work

Representative Publications (organized by techniques)

For representative publications please see the "Research" Tab.

For a more complete list you can visit my google scholar page.

Back in China, I also co-authored a popular textbook for Algorithmic Programming Contests: The Art of Algorithms and Programming Contests, with the legendary Rujia Liu (Tsinghua University Press, 2003). It was a national best-seller in computer science, 2005--2006, and has been widely adopted as the standard textbook for NOI, IOI, and ACM/ICPC contests.

I love teaching so much. Currently I teach PhD-level courses in both Computer Science and Linguistics Programs at the CUNY Graduate Center, as well as undergraduate and Master's courses at CUNY Queens College.

Current Teaching at CUNY:

Past Teaching at USC:

Past Teaching at Penn:

  • CSE 399: Python Programming (new course, Spring 2006)
    Chosen by the Department to be the first graduate student ever to teach a course.
  • Recitation Instructor, CSE 320, Algorithms (Spring 2005, Prof. Sanjeev Khannna)
  • Recitation Instructor, CSE 262, Automata and Formal Language Theory (Fall 2004, Prof. Jean Gallier).
  • Awarded University Graduate Teaching Prize, 2005. [more details]
I'm looking for PhD students, postdocs, and visiting PhD students to join my group. Drop me a note if you're interested. The CUNY PhD application information is here.
I prefer students with a solid background in algorithms, math, and programming (e.g., experience with ACM/ICPC or similar contests).

My research group is called the Algorithms for Computational Linguistics (ACL) Group.
(Note that we are a group, not a lab, although we do have a physical lab space.
Our lab space is unfortunately free of windows, but is also proudly free of Windows.)

Former Ph.D. and M.S. Students at USC

  • Ashish Vaswani, Ph.D., June 2014 (co-advised by David Chiang). first job: Computer Scientist, ISI.
  • Theerawat "Dome" Songyot, M.S. student, Fulbright scholar.

We are also part of a larger family of NLP faculty and students at CUNY: NLP @ CUNY.
In particular, we have a wonderful NLP seminar series. You are more than welcome to attend our talks if you happen to be in the city.

I am from Shanghai, China and speak Wu as my native language.

Education

Ph.D., Computer Science, University of Pennsylvania, 2008. (old old homepage) [thesis] [slides]
B.S., Computer Science, Shanghai Jiao Tong University, 2003. summa cum laude. (minor studies in French and English)

Professional Experience

Research Scientist (part-time), IBM T. J. Watson Research Center, 2014/6--present. (Bowen Zhou's group)

Doctoral Faculty, The Graduate Center, City University of New York (CUNY), 2012/8--present.
Assistant Professor, Queens College, City University of New York (CUNY), 2012/8--present.

Research Assistant Professor, University of Southern California (USC), 2010/7--2012/8.
Computer Scientist, USC Information Sciences Institute, 2009/7--2012/8.

Research Scientist, Google Research (Mountain View), 2009/1--7. (Fernando Pereira's group)

Visiting Scholar, Hong Kong Univ. of Science and Technology, 2008/10--2008/11.
Visiting Scholar, Institute of Computing Technologies, Chinese Academy of Sciences, 2007/10--2008/1.
Summer Intern, USC/ISI, 2005/5--10 and 2006/5--10.

Awards

Best Paper Award, ACL 2008.
Best Paper Nominations, ACL 2007, EMNLP 2008, and ACL 2010.
Google Faculty Research Awards, 2010 and 2013.
Regional Champions (as Faculty Advisor for USC), ACM Int'l Collegiate Programming Contest (ICPC), 2011.
University Prize for Graduate Student Teaching, University of Pennsylvania, 2005. [more details]

Selected Service

NSF Panelist, 2014
Area Chair (machine translation), ACL 2014
Area Chair (syntax and parsing), ACL 2012
Program Co-Chair, IWPT 2013

CV

Brief bio for talk announcements

Liang Huang is currently an Assistant Professor at the City University of New York (CUNY) and a part-time Research Scientist at IBM T. J. Watson Research Center. He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing and structured machine learning. He has received a Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), and a University Teaching Prize at Penn (2005).

Note: since 2006, almost all my code is written in Python2.7, with some Python extension libraries in C.
I also love functional programming and declarative programming in general (OCaml, Haskell, and Prolog), but hate C++ and Perl which are too ugly. Compared to Python/Haskell/Ocaml, languages like C/C++ and Java are stone-age artifacts; don't use them unless absolutely necessarily (such as for Python libraries).

Linear-Time Dynamic Programming Parser (with Max-Violation Perceptron Trainer)

This parser is described in the following two papers:
  1. Liang Huang and Kenji Sagae (2010). Dynamic Programming for Linear-Time Incremental Parsing.
    Proceedings of ACL 2010. Best Paper Nominee.
  2. Liang Huang, Suphan Fayong, and Yang Guo (2012). Structured Perceptron with Inexact Search.
    Proceedings of NAACL 2012.
Parser Features:
  • Extremely fast linear-time parsing (about 0.04 seconds per sentence).
  • Dynamic programming and forest output (encodes exponentially many trees).
  • k-best output (Huang and Chiang, 2005).

Trainer Features:

  • Flexibility in defining new feature templates (my Python code compiles your feature definition language into a dynamically generated Python snippet).
  • Max-Violation Perceptron Training (Huang et al 2012).
  • Parallelized Perceptron Training (McDonald et al 2010).

Download: version 2.0.3 (Jan 2013).


Python Sparse Vector

Written in C as a Python extension module based on collections.defaultdict.
Much faster and slimmer (4 times less memory usage) than David Chiang's svector.
Builtin support for averaged parameters in online learning (e.g. perceptron, MIRA, etc.).

Note: for decoding (e.g. parsing), defaultdict is fast enough (mine is even faster by doing dot-product in C, which is also possible via Cython), but for learning (e.g. perceptron), defaultdict becomes terrible on big data because Python float/int are immutable, which caused too many unnecessary hash operations. Using my hvector can make your learner up to 5 times faster.

Download: hvector version 1.0 (Jan 2013).


Forest-Reranking Parser

This parser/reranker is described in the following paper:
  • Liang Huang (2008). Discriminative Parsing with Non-Local Features.
    Proceedings of ACL 2008. (Best Paper Award)
    errata: Following Charniak, the dev set was section 24, not section 22.
This software has three components:
  1. The forest-dumping version of Charniak parser.
  2. The forest reranker.
  3. The perceptron trainer.
Currently part 1 is downloadable as a standalone package. Parts 2 and 3 are being packaged for release.

Important: If you're using 64-bit Ubuntu, it is recommended that you install Python from source code (see Python.org). The default Python2.7 in those Ubuntus (at least 12.04) has an obscure floating point problem which gives inconsistent results.


We gratefully acknowledge the support from funding agencies.

  • PI, NSF EAGER, $135k for one year, 2014--2015.

  • PI, Google Faculty Research Award, unrestricted gift, $88k for one year, 2013--2014.

  • PI, PSC-CUNY Enhanced Research Award, $12k for one year, 2013--2014.

  • co-PI, DARPA DEFT Program, $2M for 4.5 years, 2012--2016. PI: Andrew Rosenberg.

  • PI, Google Faculty Research Award, unrestricted gift, $75k for one year, 2010--2011.
Liang Huang
Computer Science Department, CUNY/QC
Science Building A-202
65-30 Kissena Blvd., Queens, NY 11367.
718-997-5006 (lab)
718-997-3487 (office)

huang at cs dot qc dot cuny dot edu.

I am also at

Computer Science Department, CUNY/GC
365 Fifth Avenue, New York, NY 10016.
212-817-8208 (office)

I don't check voice messages.

Disclaimer: I am known to be highly opinionated, and some points below might sound offensive to some readers.

Classical Music

I am a big fan of Classical Music. The composers I admire most are Johann Sebastian Bach (whose music is so mathematical), Peter Ilych Tchaikovsky (whose melodic talent almost rivals that of Mozart), and Antonin Dvorak (whose music blends Bohemia with America). I also love, among others, (in chronological order) Luigi Boccherini, Wolfgang Amadeus Mozart, Ludwig van Beethoven, Felix Mendelssohn, and Sergei Rachmaninoff. Yes, I do have a preference for Baroque, Slavic, and melodic beauty.

On the other hand, I don't have a taste or much respect for Richard Wagner (whom I found disgusting), nor do I like Franz Lizst. Compared to Frederic Chopin or Nicolo Paganini, Lizst has almost nothing original to himself (like comparing Clementi to Mozart).

A Personal History of Languages

I grew up speaking Wu, but in a multilingual environment. Back in the old days, Shanghai was just as multicultural as New York City today with speakers and communities of all kinds of languages. When I grew up, my parents spoke Shanghainese, and my grandparents Ningbonese, which I understood perfectly but could not speak well; the majority of our neighbors, however, spoke another distinctive language called Lower Yangtze Mandarin, which is an "interpolation" of Wu and Northern Mandarin, and because of that I am still fluent in it today. I started to learn Standard Mandarin rigorously as a de facto first foreign language in the elementary school, but ended up speaking it with a heavy Wu accent. During college I took up French seriously but forgot all of it after moving to the US. On the other thand, living in the US helped me get rid of my heavy Wu accent in Mandarin where finally the "training data" around me had more native samples than non-native ones. The US also exposed me to other Chinese languages and dialects which I never heard back in Shanghai, such as the Upper Yangtze Mandarin (aka "Sichuan") and Cantonese, but most importantly, various English dialects and Spanish. I still enjoy learning new languages and dialects today.

More on Wu Chinese.

History of Science

My old collection on aesthetics of mathematics (essays by C.N. Yang, M. Atiyah, T. Gowers, A. Connes, and T. Tao) and special collection on Paul Erdos.

Michael Atiyah on the Art of Mathematics and interview video.

Talk Shows