Update: The rest of this page is no longer up to date but the following additional links have been brought to my attention: I thank Steven Bird for this information and hope to update the entire page eventually. — M.C., 2007 Dec. 3.
- NLTK now supports unification in Python and can do things traditionally done only in Prolog;
- Here is a good new tutorial on Python for linguists.
Python Resources for Linguists New to Programming
Michael A. Covington
Institute for Artificial Intelligence
The University of Georgia
Last revised 2004 March 23
For some of the following information I am grateful to Kow Kuroda, Mary Dalrymple, Carlos Rodriguez, Karin Verspoor, Ryan Gabbard, Tom Emerson, Maria Gavriel, Mike Maxwell, Liu Haitao, Adam Zachary Wyner, and Frédérique Passot, who responded to a query that I posted on the LINGUIST List or wrote to me afterward.
Note: This page is not intended to be a comprehensive collection of Python links. The Python community is doing that very well for itself. There is no need to send me further links unless you are certain that they are essential updates to what is already here.
The Python programming language is very popular with linguists doing text processing, corpus statistics, and the like. Python is an "instant gratification" language. You don't have to learn much of it to start getting useful results. For example,
print "Hello, world!"is a complete, ready-to-run Python program. Also, Python is:(However, Python contains no inference engine and does not directly support unification or backtracking. That is why I continue to use Prolog for my serious NLP research.)
- Available free of charge from www.python.org;
- Especially good at string and text processing, including regular-expression pattern matching;
- The language of the excellent, free Natural Language Toolkit (NLTK), comprising everything from parsers to statistics packages for linguistic analysis.
I recommend that you download the ActiveState Python package, which is a neater installation (of the same compiler) than you would get from python.org. It installs correctly for multi-user Windows and includes several tutorials on its help menu.
Update, March 23: No, I don't. ActiveState Python's IDE is Pythonwin, which has a couple of annoying habits. First, raw_input() pops up an input window rather than merely waiting for input; this is very unlike what happens when running in console mode. Second, severe syntax errors are almost not reported to the user at all (there is only a small message at the bottom of the window).Those with some programming experience can easily learn Python from the Python Tutorial or from three O'Reilly books, Learning Python, Programming Python, and Python CookbookPythonwin is good for advanced Pythoneers, but for beginners, I prefer IDLE. My introductory notes on it are here.
But these may be unsuitable for linguists who have no programming background. For linguist non-programmers, we'd like to have something like Michael Hammond's books on Java and Perl, but he hasn't written a Python book. Neither Java nor Perl is quite as useful as Python for the things we want to do.
What we need is a tutorial that:
- Presumes no prior knowledge of computers;
- Helps the reader build a mental model of how the computer works;
- Gets to string and text processing early (rather than getting bogged down in arithmetic or some other computer application);
- Describes the software that the reader is actually using (not some other version, so that the reader is not burdened with keeping track of differences).
Here's what I recommend, based on suggestions of others:
These are just a selection. Some very good tutorial material comes with Python itself.
- Josh Cogliati's Non-Programmers' Tutorial for Python, which is in the ActiveState Python help system under Helpful Resources, An Easy Tutorial.... This is very clear and systematic. It is slightly dated, uses IDLE rather than Pythonwin (which I now think is a good thing), and is clear and well constructed.
- Allen Downey and colleagues' How to Think Like a Computer Scientist. This is available both on line and in print. The first 2 chapters are essential reading for anybody new to computers, in order to build a mental model of what a computer is and what programming is. The rest of the book is basically a Python reference manual, but it's designed for people who have never read a reference manual before.
- Ron Zacharski's Python for Linguists. This may be the book whose purpose best fits mine, but it is presently in rough form, incomplete, with typing errors and inconsistencies. It is available on line in draft form, and it starts with exact instructions on how to download and install Python. (This is the first of the "Available Tutorials" on his web page.) Thus you do not have to deal with a book that doesn't match your own exact software. I suggest using this and the previous item in conjunction.
- The best printed introduction on Python for absolute beginners that I've come across is Learn to Program Using Python, by Alan Gauld. I've used this in the classroom with linguistics students. It works, but it isn't particularly geared to the needs of linguists.
A more advanced book, of considerable interest, is Text Processing in Python, by David Mertz, available both in print and on line. It is for people who already know quite a bit of Python.
The Natural Language Toolkit (NLTK) makes rather sophisticated use of the object-oriented features of Python, so it's a good idea to get a good grounding in Python before using it.
Finally, see Adam Zachary Wyner's Computational Linguistics Book Review for brief information about many useful books.
|
nor are they endorsed by, the University of Georgia or the University System of Georgia. |