My research interests are mainly in the algorithmic and formal aspects of computational linguistics (esp. parsing and machine translation) and artificial intelligence in general.
The key questions that motivate my research are:
Why are computers so bad at understanding and processing natural language?
Can we teach computers to process natural language the way we humans do,
that is, both fast and accurate?
Or, can computers process natural language the way they process programming languages in spite of the inherent ambiguity of the former?
So recently I have been focusing on linear-time algorithms for parsing and translation inspired by both human processing (psycholinguistics)
and compiler theory.
On the other hand I also work on theoretical and practical problems in structured learning with inexact search that rises from NLP but
also applies to other structured domains such as computational biology.
I had also worked on structural biology (esp. protein folding) using
dynamic programming inspired by computational linguistics (see below).
Liang Huang, Hao Zhang, Daniel Gildea, and Kevin Knight (2009).
Binarization of Synchronous Context-Free Grammars. Computational Linguistics, 35 (4). Conference version appeared at NAACL 2006.
(The core linear-time synchronous binarization algorithm
was inspired by the Graham Scan for Convex Hull.
It was a rather unexpected connection.)
grammars and dynamic programming for computational biology
Back in China, I also co-authored a popular textbook for
Algorithmic Programming Contests:
The Art of Algorithms and Programming Contests,
with the legendary Rujia Liu (Tsinghua University Press, 2003).
It was a national best-seller in computer science, 2005--2006,
and has been widely adopted as the standard textbook for NOI, IOI, and ACM/ICPC contests.
I love teaching so much. Currently I teach PhD-level courses
in both Computer Science and
at the CUNY Graduate Center,
as well as undergraduate and Master's courses at CUNY Queens College.
Current Teaching at CUNY:
Fall 2013: CS 71010, Programming Languages (Functional Programming in Haskell, Operational Semantics, Lambda Calculus, and Type Theory), Graduate Center.
Modeled after: CIS 500, Fall 2003 at Penn (the best class I've ever taken, taught by the best instructor I've ever had).
I'm looking for PhD students, postdocs, and visiting PhD students to join my group. Drop me a note if you're interested. The CUNY PhD application information is
I prefer students with a solid background in algorithms, math, and programming
(e.g., experience with ACM/ICPC or similar contests).
We are also part of a larger family of NLP faculty and students at
CUNY (Queens and Hunter Colleges):
NLP @ CUNY.
I am from Shanghai, China and speak Wu as my native language.
Ph.D., Computer Science, University of Pennsylvania, 2008. (old old homepage)
B.S., Computer Science, Shanghai Jiao Tong University, 2003. summa cum laude. (minor studies in French and English)
Assistant Professor, The City University of New York (CUNY), 2012/8--present.
Research Assistant Professor, University of Southern California (USC), 2009/7--2012/8.
Research Scientist, Google Research (Mountain View), 2009/1--7.
Visiting Scholar, Hong Kong Univ. of Science and Technology, 2008/10--2008/11.
Visiting Scholar, Institute of Computing Technologies, Chinese Academy of Sciences, 2007/10--2008/1.
Summer Intern, USC/ISI, 2005/5--10 and 2006/5--10.
Note: since 2006, almost all my code is written in Python2.7, with some Python extension libraries in C.
I also love functional programming and declarative programming in general
(OCaml, Haskell, and Prolog), but hate C++ and Perl which are too ugly.
Compared to Python/Haskell/Ocaml, languages like C/C++ and Java are
stone-age artifacts; don't use them unless absolutely necessarily (such as for Python libraries).
Written in C as a Python extension module based on collections.defaultdict.
Much faster and slimmer (4 times less memory usage) than David Chiang's svector.
Builtin support for averaged parameters in online learning (e.g. perceptron, MIRA, etc.).
Note: for decoding (e.g. parsing), defaultdict is fast enough (mine is even faster by doing dot-product in C, which is also possible via Cython), but for learning (e.g. perceptron), defaultdict
becomes terrible on big data because Python float/int are immutable, which caused too many unnecessary hash operations. Using my hvector can make your learner up to 5 times faster.
This parser/reranker is described in the following paper:
Liang Huang (2008). Discriminative Parsing with Non-Local Features.
Proceedings of ACL 2008. (Best Paper Award)
errata: Following Charniak, the dev set was section 24, not section 22.
This software has three components:
The forest-dumping version of Charniak parser.
The forest reranker.
The perceptron trainer.
Currently part 1 is downloadable as a standalone package.
Parts 2 and 3 are being packaged for release.
Important: If you're using 64-bit Ubuntu, it is recommended that you install Python from source code (see Python.org).
The default Python2.7 in those Ubuntus (at least 12.04) has an obscure floating point problem
which gives inconsistent results.
We gratefully acknowledge the support from funding agencies.
co-PI, DARPA DEFT Program, $2M for 4.5 years, 2012--2016. PI: Andrew Rosenberg.
PI, Google Faculty Research Award, unrestricted gift, $75k for one year, 2010--2011.
PI, PSC-CUNY Enhanced Research Award, $12k for one year, 2013--2014.
PI, Google Faculty Research Award, unrestricted gift, $88k for one year, 2013--2014.
Computer Science Department, CUNY/QC
Science Building A-202
65-30 Kissena Blvd., Queens, NY 11367.
718-997-3487 (I can't check voice messages here).
huang at cs dot qc dot cuny dot edu.
I am also at
Computer Science Department, CUNY/GC
365 Fifth Avenue, New York, NY 10016.
212-817-8208 (I occasionally check voice messages here).
Disclaimer: I am known to be highly opinionated, and some points below might sound offensive to some readers.
I am a big fan of Classical Music.
The composers I admire most are
Johann Sebastian Bach (whose music is so mathematical),
Peter Ilych Tchaikovsky (whose melodic talent almost rivals that of Mozart),
and Antonin Dvorak (whose music blends Bohemia with America).
I also love, among others, (in chronological order)
Wolfgang Amadeus Mozart,
Ludwig van Beethoven,
Felix Mendelssohn, and Sergei Rachmaninoff.
Yes, I do have a preference for Baroque, Slavic, and melodic beauty.
On the other hand, I don't have a taste or much respect for Richard Wagner
(whom I found disgusting),
nor do I like Franz Lizst.
Compared to Federic Chopin or Nicolo Paganini,
Lizst has nothing original to himself
(like comparing Clementi to Mozart).
A Personal History of Languages
I grew up speaking Wu,
but in a multilingual environment.
Back in the old days, Shanghai was just as multicultural as New York City today
with speakers and communities of all kinds of languages.
When I grew up, my parents spoke Shanghainese,
and my grandparents Ningbonese,
which I understood perfectly but could not speak well;
the majority of our neighbors, however,
spoke another distinctive language called
Lower Yangtze Mandarin,
which is an "interpolation" of Wu and Northern Mandarin,
and because of that I am still fluent in it today.
I started to learn Standard Mandarin rigorously as a de facto first foreign language
in the elementary school,
but ended up speaking it with a heavy Wu accent.
During college I took up French seriously
but forgot all of it after moving to the US.
On the other thand, living in the US helped me
get rid of my heavy Wu accent in Mandarin
where finally the "training data" around me
had more native samples than non-native ones.
The US also exposed me to other Chinese languages and dialects
which I never heard back in Shanghai,
such as the Upper Yangtze Mandarin (aka "Sichuan") and Cantonese,
but most importantly, various English dialects and Spanish.
I still enjoy learning new languages and dialects today.