ProNTo = Prolog Natural Language Tools
Most of these reusable software packages were developed by students in CSCI/LING 8570 at The University of Georgia. A few were developed by the instructor, or by people associated with the course in other ways.
Many of them are related in some way to WordNet, a lexical database distributed by Princeton University.
Most of them run under SWI-Prolog, a leading free compiler for the Prolog language that sticks close to the ISO standard. It should be easy to port them to other versions of Prolog.
- Brooks, Philip (2003)
SCP: A Simple Chunk Parser.
Implementation of a shallow parser that analyzes English phrases but not complete sentences.
- Covington, Michael A. (2003)
ET the Efficient Tokenizer.
Package for chopping up a text file into words.
Unicode version contributed by Donald Rogers
- Covington, Michael A. (2003)
A Free-Word-Order Dependency Parser in Prolog.
Simple example of parsing with dependency grammar. This parser handles languages whose word order is totally free (variable).
See also the following important documents (PDF):
Important Additional Notes about Dependency Parsing
A Fundamental Algorithm for Dependency Parsing
- Covington, Michael A. (2007)
CGI Scripting in SWI-Prolog Under Windows 2003 Server.
- Hu, Cheng (2003)
Text Statistics Tool Box for Natural Language Processing. A text statistics package in Prolog.
- Kwon, So Young (2003)
Parsing Korean based on Dependency Grammar and GULP.
A partly-free-word-order dependency parser for Korean.
Dissertation (KorPar: A Rule-Based Dependency Parser for Korean Implemented in Prolog)
(For Covington's GULP system, used by this package, click here.)
- Lyle, Arlo (2006)
LPA-Speech: An Interface Between LPA-Prolog and Microsoft SAPI.
(Similar to next item, but runs in LPA Prolog instead of SWI-Prolog.)
- McClain, Jonathan (2003)
SWI-Speech: An Interface Between SWI-Prolog and Microsoft SAPI. Interfacing SWI-Prolog to the Microsoft Speech API (the built-in speech synthesizer and recognizer in Windows XP).
- [New] Perez Barrenechea, Dennis D. (2006)
A Spanish Stemming Algorithm: Implementation in Prolog and C#. (Like the Porter Stemming Algorithm, but for Spanish.)
Files (ZIP) [revised June 26, 2007]
- Schlachter, Jason G. (2003)
ProNTo_Morph: Morphological Analysis Tool.
Morphological analyzer to break written English words into their components. Includes large databases of irregular forms derived from WordNet, as well as extensive rule sets.
Files (ZIP)This has been used by Macquarie University (Australia).
- Voss, Matthew (2004)
Improving Upon Earley's Parsing Algorithm in Prolog.
Implementation of a modified form of Earley's Algorithm.
- Witzig, Sarah (2003)
Accessing WordNet from Prolog.
Package of routines for using the WordNet Prolog database files in an efficient and sophisticated way.
Copyright: Unless otherwise indicated, these software packages are the intellectual property of their respective authors, and are published here for others to learn from, and to reuse non-commercially. Commercial exploitation requires the author's permission.
How to cite: Any project that uses this code must give proper credit both in the code and in the documentation. A suggested bibliography format for citing these items is:Covington, Michael A. (2003) ET the Efficient Tokenizer. University of Georgia. http://www.ai.uga.edu/mc/ProNTo.
User support: None. These are completed projects. Many of them are still in use and will be extended in future years; some won't. Please e-mail firstname.lastname@example.org (Michael Covington) if you find serious problems, but please understand that we receive no continuing funds for these projects and cannot do any substantial amount of work to help people elsewhere.
In many cases, our students or recent graduates would be glad to work for you (for a reasonable hourly fee) to deploy and extend this software. To see if the author of a package is still at The University of Georgia, use the University's online directory, and if unsuccessful, contact the Artificial Intelligence Center.