Update: The rest of this page is no longer up to date but the following additional links have been brought to my attention:

I thank Steven Bird for this information and hope to update the entire page eventually. — M.C., 2007 Dec. 3.

One more note: Also try this overview of Python resources.

But bear in mind, THIS PAGE IS OUT OF DATE. I know there are broken links on it. I am not actively updating it, having moved on to other things.



Python Resources for Linguists New to Programming

Michael A. Covington

Institute for Artificial Intelligence

The University of Georgia

Last revised 2004 March 23


For some of the following information I am grateful to Kow Kuroda, Mary Dalrymple, Carlos Rodriguez, Karin Verspoor, Ryan Gabbard, Tom Emerson, Maria Gavriel, Mike Maxwell, Liu Haitao, Adam Zachary Wyner, and Frédérique Passot, who responded to a query that I posted on the LINGUIST List or wrote to me afterward.
Note: This page is not intended to be a comprehensive collection of Python links. The Python community is doing that very well for itself. There is no need to send me further links unless you are certain that they are essential updates to what is already here.

The Python programming language is very popular with linguists doing text processing, corpus statistics, and the like. Python is an "instant gratification" language. You don't have to learn much of it to start getting useful results. For example,

print "Hello, world!"
is a complete, ready-to-run Python program. Also, Python is: (However, Python contains no inference engine and does not directly support unification or backtracking. That is why I continue to use Prolog for my serious NLP research.)

I recommend that you download the ActiveState Python package, which is a neater installation (of the same compiler) than you would get from python.org. It installs correctly for multi-user Windows and includes several tutorials on its help menu.

Update, March 23: No, I don't. ActiveState Python's IDE is Pythonwin, which has a couple of annoying habits. First, raw_input() pops up an input window rather than merely waiting for input; this is very unlike what happens when running in console mode. Second, severe syntax errors are almost not reported to the user at all (there is only a small message at the bottom of the window).

Pythonwin is good for advanced Pythoneers, but for beginners, I prefer IDLE. My introductory notes on it are here.

Those with some programming experience can easily learn Python from the Python Tutorial or from three O'Reilly books, Learning Python, Programming Python, and Python Cookbook

But these may be unsuitable for linguists who have no programming background. For linguist non-programmers, we'd like to have something like Michael Hammond's books on Java and Perl, but he hasn't written a Python book. Neither Java nor Perl is quite as useful as Python for the things we want to do.

What we need is a tutorial that:

Here's what I recommend, based on suggestions of others:

These are just a selection. Some very good tutorial material comes with Python itself.

A more advanced book, of considerable interest, is Text Processing in Python, by David Mertz, available both in print and on line. It is for people who already know quite a bit of Python.

The Natural Language Toolkit (NLTK) makes rather sophisticated use of the object-oriented features of Python, so it's a good idea to get a good grounding in Python before using it.

Finally, see Adam Zachary Wyner's Computational Linguistics Book Review for brief information about many useful books.




The content and opinions expressed on this Web page do not necessarily reflect the views of,
nor are they endorsed by, the University of Georgia or the University System of Georgia.